SAA-C03 SAA-C03 Questions 676–750 | Page 10/14

676

MCQmedium

A public API for a B2B file exchange site is deployed on API Gateway. Clients must authenticate with standards-based tokens issued by an external OpenID Connect provider. Which authorization mechanism should be used?

A.API keys only

B.IAM authorization for all internet users

C.JWT authorizer configured for the OpenID Connect issuer

D.A VPC endpoint policy

AnswerC

A JWT authorizer validates tokens from a trusted OIDC issuer with low operational overhead.

Why this answer

Option C is correct because API Gateway's JWT authorizer natively validates JSON Web Tokens issued by an external OpenID Connect (OIDC) provider. It verifies the token's signature, expiry, and issuer against the OIDC provider's JWKS endpoint without requiring custom Lambda code, making it the simplest and most secure choice for standards-based token authentication.

Exam trap

The trap here is that candidates often confuse API keys (which are for rate limiting and client identification) with authentication, or assume IAM authorization can be used for external users, but IAM requires AWS credentials and is not designed for third-party OIDC tokens.

How to eliminate wrong answers

Option A is wrong because API keys only provide client identification, not authentication; they do not validate the identity of the caller or support OIDC tokens. Option B is wrong because IAM authorization is designed for AWS principals (e.g., IAM users/roles) and requires AWS Signature V4 signing, which is not suitable for internet clients using external OIDC tokens. Option D is wrong because a VPC endpoint policy controls access to API Gateway via VPC endpoints, not authentication or token validation for public internet clients.

Full explanation →

677

Multi-Selecthard

A product catalog system uses a relational database for orders and a simple key-value profile store for shopping carts. Traffic is unpredictable, and the company wants to avoid paying for large idle database instances. Which two choices are best? Select two.

Select 2 answers

A.Use Aurora Serverless v2 for the relational order system.

B.Use DynamoDB on-demand capacity for the shopping-cart profile store.

C.Keep both workloads on large provisioned RDS instances and add read replicas for the cart store.

D.Use DynamoDB provisioned capacity with a fixed minimum despite the unpredictable traffic.

E.Replace the relational order system with a wide-column table to reduce SQL licensing.

AnswersA, B

Correct. Aurora Serverless v2 is designed for variable relational workloads because capacity can scale without constantly paying for a large fixed instance. It preserves SQL features while reducing idle overprovisioning.

Why this answer

Aurora Serverless v2 automatically scales compute capacity up and down based on demand, so you only pay for the resources you use. This eliminates the need to provision for peak traffic and avoids paying for large idle database instances, making it cost-optimized for unpredictable workloads.

Exam trap

The trap here is that candidates may think provisioned capacity with a minimum is acceptable for unpredictable traffic, but the question explicitly requires avoiding paying for idle capacity, so on-demand or serverless options are the only correct choices.

Full explanation →

678

MCQmedium

A DynamoDB-backed event processing system experiences throttling during a promotion. All events are written and read using the same partition key value (tenantId = "ACME"). The workload is time-ordered per tenant, and the application can tolerate slight reordering across partitions. Which design change will most directly increase throughput and reduce hot-partition throttling?

A.Increase the table's provisioned capacity (read/write units) to handle the promotion peak.

B.Change the partition key to include an additional sharding attribute derived from a hash of eventId.

C.Enable DAX caching for all reads but keep the same partition key and item layout.

D.Switch the table to eventually consistent reads for queries to lower read throttling.

AnswerB

When all traffic targets one partition key value, that partition becomes the bottleneck regardless of total table capacity. Adding a shard/salt attribute to the partition key (for example, tenantId + shardId where shardId = hash(eventId) mod N) spreads writes across multiple partition key values, increasing partition-level parallelism. Because the scenario allows slight reordering across partitions, losing strict single-partition time ordering is acceptable while improving throughput and reducing throttling.

Why this answer

Option B is correct because adding a sharding attribute derived from a hash of eventId allows writes and reads to be distributed across multiple partition keys, breaking the single hot partition caused by using tenantId='ACME' for all operations. DynamoDB's throughput is limited per partition, so distributing the load across many partitions directly reduces throttling without changing the application's tolerance for slight reordering.

Exam trap

The trap here is that candidates often assume increasing provisioned capacity (Option A) is the universal fix for throttling, but AWS specifically tests the understanding that DynamoDB's per-partition throughput limits require a sharding strategy to distribute load across partitions.

How to eliminate wrong answers

Option A is wrong because simply increasing provisioned capacity does not resolve the hot-partition issue; the single partition key (tenantId='ACME') still caps throughput at 3000 RCU/1000 WCU per partition, so throttling persists regardless of total table capacity. Option C is wrong because DAX caching only reduces read load on the table, but writes (which are the primary source of throttling during a promotion) still hit the same hot partition, and DAX does not help with write throttling. Option D is wrong because eventually consistent reads only reduce read costs and latency, but they do not address the root cause of throttling—the single partition bottleneck—and have no effect on write throttling.

Full explanation →

679

MCQmedium

Your order-processing system uses EventBridge rules to send events to a Lambda function that updates order status. Over the last week, some events fail with a transient database timeout, and the Lambda retries intermittently but then the events are lost (no alerts after failures). You want at-least-once processing, bounded retries, and a way to inspect unprocessable events for later reprocessing. Which architecture change best meets these requirements?

A.Send EventBridge events to an SQS queue, configure a redrive policy to move messages to a dead-letter queue (DLQ) after a defined receive count, and make the Lambda processing idempotent.

B.Invoke Lambda directly from EventBridge in asynchronous mode, and increase the Lambda timeout to reduce failures.

C.Use SNS topics with Lambda subscriptions, but remove all retry and DLQ configuration to minimize duplicate events.

D.Store failed events only in CloudWatch logs, and have operators manually copy log entries back into the database for reprocessing.

AnswerA

EventBridge-to-SQS provides buffering and decoupling; SQS redrive with a DLQ bounds retries and preserves failed events for analysis and replay.

Why this answer

Option A is correct because it introduces an SQS queue between EventBridge and Lambda, which provides at-least-once processing through message visibility timeouts and retries. The redrive policy moves messages to a dead-letter queue (DLQ) after a defined receive count, ensuring bounded retries and preserving unprocessable events for later inspection and reprocessing. Making the Lambda idempotent prevents duplicate side effects from at-least-once delivery.

Exam trap

The trap here is that candidates may think increasing Lambda timeout or relying on asynchronous invocation retries alone is sufficient, but they overlook the need for a DLQ to capture and inspect events that fail after all retries are exhausted.

How to eliminate wrong answers

Option B is wrong because increasing the Lambda timeout does not address transient database timeouts or prevent event loss; asynchronous invocation already retries twice by default, but after exhausting retries, events are discarded without a DLQ. Option C is wrong because SNS with Lambda subscriptions does not provide a built-in DLQ mechanism; removing retry and DLQ configuration would cause events to be lost immediately on failure, violating the requirement for bounded retries and inspectable failures. Option D is wrong because storing failed events only in CloudWatch logs does not provide a structured, automated way to reprocess them; manual copy-paste is error-prone and does not meet the requirement for at-least-once processing or bounded retries.

Full explanation →

680

MCQmedium

Account A hosts a role named AppReadRole. Account B needs to access it using STS AssumeRole. Account A’s role trust policy includes this condition: - StringEquals: { "sts:ExternalId": "b-7f9a" } When Account B runs: aws sts assume-role --role-arn arn:aws:iam::111111111111:role/AppReadRole --role-session-name test the call fails with: "AccessDenied: ExternalId mismatch". What should Account B change?

A.Provide the correct --external-id value (b-7f9a) in the AssumeRole call.

B.Add kms:Decrypt permissions to Account B’s IAM user because trust policy failures are KMS related.

C.Remove the ExternalId condition from the trust policy so any caller can assume the role.

D.Use AssumeRoleWithSAML instead of AssumeRole so ExternalId is not required.

AnswerA

The trust policy requires sts:ExternalId to equal b-7f9a. If the caller does not supply the matching external ID, STS fails the trust-policy condition and denies AssumeRole. Supplying --external-id b-7f9a satisfies the condition.

Why this answer

The error 'AccessDenied: ExternalId mismatch' occurs because the trust policy on Account A's role requires an `sts:ExternalId` condition with the value `b-7f9a`, but Account B's `aws sts assume-role` command did not include the `--external-id` parameter. By providing the correct `--external-id b-7f9a` in the call, Account B satisfies the condition, allowing the role assumption to succeed. This is a standard security mechanism to prevent the confused deputy problem.

Exam trap

The trap here is that candidates may think the error is due to missing permissions (like KMS) or that changing the API method (SAML) avoids the condition, when in fact the fix is simply to include the required `--external-id` parameter in the AssumeRole call.

How to eliminate wrong answers

Option B is wrong because the error is explicitly about an ExternalId mismatch, not a KMS permissions issue; KMS is unrelated to STS AssumeRole trust policy conditions. Option C is wrong because while removing the condition would technically allow the call, it weakens security and is not the minimal change required to fix the mismatch error. Option D is wrong because AssumeRoleWithSAML does not bypass the ExternalId condition; the condition is evaluated regardless of the STS API used, and SAML-based calls have their own requirements.

Full explanation →

681

MCQhard

Based on the exhibit, an EC2 application runs in private subnets with no NAT gateway and must retrieve a secret from AWS Secrets Manager. The secret uses a customer managed KMS key. Which change will allow the application to reach the service while keeping traffic off the internet?

A.Create an interface VPC endpoint for Secrets Manager and another interface VPC endpoint for KMS, and enable private DNS for both.

B.Create an S3 gateway endpoint for Secrets Manager and use the existing S3 gateway endpoint for both secret retrieval and KMS decryption.

C.Add a NAT gateway in a public subnet and route 0.0.0.0/0 from the private subnets to the NAT gateway.

D.Move the application into a public subnet so it can call the public Secrets Manager endpoint directly.

AnswerA

Secrets Manager is an interface endpoint service, and the customer managed KMS key means the application also needs private access to KMS for decrypt operations. Private DNS lets the SDK resolve standard service names to the VPC endpoints, keeping all traffic inside AWS private networking.

Why this answer

Option A is correct because it creates interface VPC endpoints for both Secrets Manager and KMS, which allows the EC2 instance in the private subnet to securely access these services over the AWS network without traversing the internet. Enabling private DNS ensures that the standard service endpoints resolve to the private IP addresses of the VPC endpoints, eliminating the need for a NAT gateway or internet gateway.

Exam trap

The trap here is that candidates often assume a single endpoint type (like a gateway endpoint) can serve all AWS services, but Secrets Manager and KMS specifically require interface endpoints, and forgetting the KMS endpoint is a common oversight.

How to eliminate wrong answers

Option B is wrong because S3 gateway endpoints are designed for Amazon S3 only, not for Secrets Manager or KMS; Secrets Manager requires an interface endpoint (powered by AWS PrivateLink) and KMS also requires an interface endpoint or a separate connection. Option C is wrong because adding a NAT gateway would route traffic to the internet, which violates the requirement to keep traffic off the internet; the goal is to avoid internet-bound traffic entirely. Option D is wrong because moving the application to a public subnet would expose it to the internet, contradicting the requirement to keep traffic off the internet and potentially compromising security.

Full explanation →

682

MCQmedium

A web application runs on an Amazon EC2 Auto Scaling group behind an Application Load Balancer (ALB). After each deployment, new instances take about 2 minutes to download artifacts and become ready to accept requests on the target port. In the last deployment, the ALB started marking targets unhealthy before the app was ready, and the Auto Scaling group then replaced those instances repeatedly, causing a prolonged outage. Which change best improves resilience during instance start-up without reducing actual availability once the application is healthy?

A.Increase the Auto Scaling group’s health check grace period so it exceeds the ~2-minute initialization time.

B.Add more subnets across additional Availability Zones to distribute the same instances more widely.

C.Switch the load balancer target type from instance targets to IP targets to avoid health check failures.

D.Reduce the ALB health check interval so unhealthy targets are removed faster.

AnswerA

A health check grace period prevents the Auto Scaling group from treating early health check failures as instance health problems. This avoids terminating instances before the application finishes initializing, which stops the restart/replace loop during deployments while still allowing normal health checks to apply once the app is ready.

Why this answer

The Auto Scaling group's health check grace period allows instances to initialize without being marked unhealthy by the ELB health checks. By setting this grace period to exceed the ~2-minute artifact download time, the ASG will not replace instances that are still starting up, preventing the cascade of terminations and redeployments that caused the outage. This directly addresses the root cause—premature health check failures—without changing the health check configuration or reducing availability once the app is ready.

Exam trap

The trap here is that candidates confuse the ALB health check interval or target type with the Auto Scaling group's lifecycle management, mistakenly thinking that changing how the ALB checks health (interval or target type) will fix the premature replacement, when the correct solution is to adjust the ASG's grace period to align with the application's startup time.

How to eliminate wrong answers

Option B is wrong because adding more subnets across additional Availability Zones distributes instances more widely for fault tolerance but does not prevent the ALB from marking starting instances as unhealthy, so it does not solve the premature replacement issue. Option C is wrong because switching from instance targets to IP targets changes how the ALB routes traffic but does not alter the health check logic or timing; the ALB will still mark the target as unhealthy if the health check fails during the initialization window. Option D is wrong because reducing the ALB health check interval causes unhealthy targets to be detected and removed faster, which would worsen the problem by accelerating the replacement cycle, not improving resilience during start-up.

Full explanation →

683

MCQmedium

A partner company needs read-only access to reports in an S3 bucket for a e-learning platform. The partner has its own AWS account. What is the most secure scalable access pattern? The design must avoid adding custom operational scripts.

A.Copy the objects to a public website bucket

B.Create an IAM user in the company account and share the access keys

C.Create a bucket policy that grants the partner role least-privilege access to the required prefix

D.Make the objects public and rely on difficult-to-guess object names

AnswerC

A resource policy can grant cross-account access to a specific external role and prefix.

Why this answer

Option C is correct because it uses a resource-based bucket policy that grants the partner's IAM role cross-account read-only access to a specific prefix, adhering to the principle of least privilege. This approach is secure (no public exposure), scalable (no manual key rotation), and requires no custom scripts, as AWS handles the cross-account trust automatically via the bucket policy's Principal element referencing the partner's AWS account ID.

Exam trap

AWS often tests the misconception that sharing IAM user access keys is acceptable for cross-account access, but the trap here is that IAM users are for human identities within the same account, not for secure cross-account automation, and bucket policies with IAM roles are the recommended pattern for least-privilege, script-free access.

How to eliminate wrong answers

Option A is wrong because copying objects to a public website bucket exposes them to the entire internet, violating security and requiring operational scripts for synchronization. Option B is wrong because creating an IAM user in the company account and sharing access keys introduces long-term static credentials that must be manually rotated, are insecure if leaked, and require custom scripts for key management. Option D is wrong because making objects public with difficult-to-guess names relies on security through obscurity, which is not a valid security control; objects can still be discovered via enumeration or leaks, and this violates AWS best practices for data protection.

Full explanation →

684

MCQeasy

Account A hosts an IAM role that Account B developers must assume for a limited task. You want to require MFA for anyone assuming the role. Which trust policy condition most directly enforces that requirement for sts:AssumeRole?

A.Add a statement condition requiring "Bool": {"aws:MultiFactorAuthPresent": "true"} in the role trust policy.

B.Add a condition requiring "StringEquals": {"aws:PrincipalOrgID": "o-example"} without any MFA condition.

C.Add a statement that denies sts:AssumeRole when the requested role session name contains the text "dev".

D.Require HTTPS by setting a condition on "aws:SecureTransport": "true" in the trust policy.

AnswerA

aws:MultiFactorAuthPresent is a built-in IAM condition context key set when the caller authenticated with MFA. Requiring it to be true causes trust policy evaluation to fail for non-MFA sessions.

Why this answer

Option A is correct because the `aws:MultiFactorAuthPresent` condition key in the role trust policy directly checks whether the caller authenticated with a valid MFA device before calling `sts:AssumeRole`. When set to "true" with a Bool condition, it enforces that the session must have been established after MFA verification, which is the most direct and standard way to require MFA for role assumption.

Exam trap

The trap here is that candidates confuse transport-layer security (HTTPS) with authentication-layer MFA, thinking that requiring encrypted communication also enforces multi-factor authentication, but `aws:SecureTransport` only ensures the channel is encrypted, not that the caller proved possession of a second factor.

How to eliminate wrong answers

Option B is wrong because `aws:PrincipalOrgID` checks the AWS Organization ID of the principal, which has no relation to MFA enforcement; it only restricts which accounts in an organization can assume the role. Option C is wrong because denying `sts:AssumeRole` based on the role session name containing "dev" is a naming convention check, not an MFA requirement; it does not verify the caller's authentication method. Option D is wrong because `aws:SecureTransport` enforces HTTPS for the API call, which ensures encryption in transit but does not require the caller to have used MFA; a non-MFA session over HTTPS would still succeed.

Full explanation →

685

MCQmedium

An order processing workflow uses Amazon SQS as the decoupling layer between a producer and a consumer Lambda function. The consumer intermittently fails due to a downstream dependency. The team has observed that certain “poison” messages keep being retried repeatedly and prevent other messages from being processed efficiently. Which SQS configuration most directly addresses this issue?

A.Set the SQS queue’s retention period to 10 years and rely on application retries to eventually succeed.

B.Increase visibility timeout to a very large value and avoid dead-letter queues to keep ordering stable.

C.Configure a redrive policy with a dead-letter queue (DLQ) and set an appropriate visibility timeout greater than the maximum processing time.

D.Switch the queue to FIFO and remove retries in the Lambda event source mapping entirely.

AnswerC

A DLQ isolates poison messages after a receive count threshold, and correct visibility timeout prevents premature retries.

Why this answer

Option C is correct because configuring a redrive policy with a dead-letter queue (DLQ) allows messages that exceed a specified maximum receive count to be moved to the DLQ, isolating poison messages. Setting the visibility timeout greater than the maximum processing time ensures the consumer has enough time to process each message before it becomes visible again, preventing premature retries. This directly addresses the issue of poison messages blocking the queue and degrading throughput.

Exam trap

The trap here is that candidates often confuse increasing the visibility timeout or switching to FIFO as solutions for poison messages, but neither addresses the root cause of isolating messages that repeatedly fail processing.

How to eliminate wrong answers

Option A is wrong because increasing the retention period to 10 years does not prevent poison messages from being retried; it only keeps them in the queue longer, worsening the problem. Option B is wrong because increasing visibility timeout to a very large value without a DLQ means poison messages will still be retried indefinitely, and avoiding DLQs does not help with ordering or poison message handling. Option D is wrong because switching to a FIFO queue does not address poison messages; FIFO ensures strict ordering but still requires a DLQ for poison message handling, and removing retries entirely would cause message loss if processing fails.

Full explanation →

686

Multi-Selectmedium

A startup runs an API on Amazon EC2. The instance must read items from one DynamoDB table and upload logs to one S3 bucket. Platform engineers also need a way to create new application roles, but those roles must never exceed a predefined set of permissions. Which three actions should the architect take? Select three.

Select 3 answers

A.Attach an IAM role to the EC2 instance profile and remove long-lived access keys from the server.

B.Give the EC2 instance an IAM user with administrator access for simplicity.

C.Scope the application policy to the exact DynamoDB table ARN and S3 bucket prefix.

D.Store the access keys in the application configuration file and rotate them later.

E.Use a permissions boundary for any IAM roles the platform team is allowed to create.

AnswersA, C, E

This gives the workload temporary credentials through the instance metadata service and avoids storing secrets on the host. It is the standard least-privilege pattern for EC2-based applications.

Why this answer

Option A is correct because attaching an IAM role to the EC2 instance profile allows the instance to obtain temporary credentials via the instance metadata service (IMDS), eliminating the need to store long-lived access keys on the server. This follows the AWS security best practice of using IAM roles for EC2 to securely access DynamoDB and S3 without managing static credentials.

Exam trap

The trap here is that candidates may think storing access keys in a config file is acceptable if rotated later, but AWS explicitly recommends using IAM roles for EC2 to avoid the security risks of long-lived credentials.

Full explanation →

687

MCQeasy

A customer-facing application has a relational data model and needs frequent complex queries (joins and aggregations), but it also experiences a significant read-heavy workload. Which design choice best improves read performance while keeping relational features?

A.Use DynamoDB with a single partition key and avoid indexes to keep writes simple.

B.Add read replicas to an RDS or Aurora cluster and keep the primary for writes.

C.Store the data in S3 and query it directly from the application without a database.

D.Switch the database to DynamoDB but keep using the same relational SQL queries and joins.

AnswerB

Read replicas offload read operations from the primary database instance, improving read throughput and reducing contention with writes. RDS/Aurora preserve relational capabilities like joins and SQL queries. This is a common and practical way to scale performance for read-heavy workloads without completely changing the data model.

Why this answer

Adding read replicas to an RDS or Aurora cluster offloads read traffic from the primary instance, improving read performance for complex queries (joins and aggregations) while preserving the relational data model. Aurora automatically scales read replicas and uses a shared storage volume, making this highly efficient for read-heavy workloads.

Exam trap

The trap here is that candidates often assume NoSQL databases like DynamoDB can handle relational queries if they simply 'switch' the database, ignoring that DynamoDB lacks native support for joins and complex aggregations, which are core to the relational data model described in the question.

How to eliminate wrong answers

Option A is wrong because DynamoDB with a single partition key and no indexes cannot efficiently support complex relational queries (joins and aggregations), and it sacrifices relational features. Option C is wrong because S3 is an object store, not a relational database; querying it directly for complex joins and aggregations is extremely slow and lacks transactional consistency. Option D is wrong because DynamoDB does not support SQL joins or complex relational queries; attempting to use the same relational SQL queries would fail or require significant application-level workarounds.

Full explanation →

688

Multi-Selectmedium

An internal API is deployed in two AWS Regions behind separate Application Load Balancers. The company wants clients to use the primary Region when it is healthy and automatically switch to the secondary Region if the primary health check fails. Which two Route 53 record configurations are required? Select two.

Select 2 answers

A.Create a primary failover record that points to the primary ALB and associates a Route 53 health check.

B.Create a weighted record set that sends 50 percent of traffic to each Region.

C.Create a secondary failover record that points to the secondary ALB.

D.Create a latency-based record set so Route 53 always prefers the fastest Region.

E.Create a multivalue answer record to return both ALB addresses on each lookup.

AnswersA, C

A primary failover record is the active answer while the primary Region remains healthy. The associated health check tells Route 53 when the primary endpoint should stop being returned to clients.

Why this answer

Option A is correct because a primary failover record in Amazon Route 53 directs traffic to the primary ALB and is associated with a Route 53 health check. If the health check fails, Route 53 automatically fails over to the secondary failover record, ensuring high availability across Regions.

Exam trap

The trap here is that candidates often confuse failover routing with weighted or latency routing, assuming any health-aware routing provides automatic primary/secondary failover, but only failover records enforce a strict active-passive pattern.

Full explanation →

689

MCQeasy

A company runs Amazon RDS for MySQL in a Multi-AZ configuration. If the primary database instance fails, what is the expected behavior?

A.The database remains unavailable until an administrator manually creates a new instance.

B.RDS automatically fails over to the standby instance in the same Region and keeps the same endpoint.

C.Traffic is routed to a read replica in another Region for immediate continuity.

D.The failed primary continues serving traffic while the standby synchronizes in the background.

AnswerB

Multi-AZ RDS is built for high availability. If the primary instance becomes unavailable, AWS automatically promotes the standby in the same Region and updates the DNS behind the database endpoint. Applications keep using the same connection string, so failover is largely transparent. This reduces downtime without requiring manual intervention or application changes.

Why this answer

Amazon RDS for MySQL in a Multi-AZ configuration automatically synchronously replicates data to a standby instance in a different Availability Zone. When the primary database instance fails, RDS automatically fails over to the standby instance, updating the DNS record for the same CNAME endpoint so that applications can resume operations without manual intervention. This ensures high availability with minimal downtime.

Exam trap

The trap here is that candidates often confuse Multi-AZ failover with read replicas, mistakenly thinking a read replica in another Region can serve as the failover target, whereas Multi-AZ uses a synchronous standby in the same Region.

How to eliminate wrong answers

Option A is wrong because RDS Multi-AZ automatically handles failover without requiring an administrator to manually create a new instance; the standby instance is already provisioned and ready. Option C is wrong because read replicas are used for read scaling and disaster recovery, not for automatic failover; Multi-AZ failover uses a standby in the same Region, not a read replica in another Region. Option D is wrong because the failed primary cannot continue serving traffic; RDS detects the failure and redirects traffic to the standby instance, while the primary is taken out of service.

Full explanation →

690

Multi-Selecthard

A latency-sensitive mobile game backend uploads large files to S3 from users around the world. Which two features can improve upload performance? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.S3 Object Lock

B.S3 multipart upload

C.S3 Inventory

D.S3 Transfer Acceleration

AnswersB, D

Multipart upload parallelizes large object upload parts and improves reliability.

Why this answer

B is correct because S3 multipart upload allows a large file to be broken into smaller parts and uploaded in parallel, which significantly reduces the impact of network latency and improves throughput. This is especially beneficial for latency-sensitive applications where upload speed is critical.

Exam trap

The trap here is that candidates might confuse S3 Transfer Acceleration (which uses AWS edge locations and optimized network paths) with multipart upload, but both are valid; however, the question asks for two features, and both B and D are correct, while A and C are irrelevant to performance.

Full explanation →

691

Multi-Selecteasy

A production Amazon RDS database must continue serving the application if the primary DB instance fails. The application should reconnect automatically without hard-coding a new IP address. Which two actions should you take? Select two.

Select 2 answers

A.Create an RDS Multi-AZ deployment for the database.

B.Connect the application to the RDS endpoint instead of hard-coding the database IP address.

C.Disable automated backups to reduce the time needed for failover.

D.Use a single-AZ deployment so the standby is not split across Zones.

E.Replace the database with an Amazon S3 bucket and store rows as objects.

AnswersA, B

RDS Multi-AZ maintains a synchronous standby in another Availability Zone and automatically promotes it when the primary fails. This is the standard AWS high-availability pattern for managed relational databases.

Why this answer

A is correct because an RDS Multi-AZ deployment automatically provisions and maintains a synchronous standby replica in a different Availability Zone. If the primary DB instance fails, Amazon RDS automatically fails over to the standby, typically within 60–120 seconds, without requiring manual intervention. This ensures high availability and continuity for the production database.

Exam trap

The trap here is that candidates may think disabling backups or using a single-AZ deployment could improve failover speed, but in reality, Multi-AZ and endpoint-based connections are the only correct combination for automatic failover and reconnection without hard-coded IP addresses.

Full explanation →

692

MCQmedium

An internal-facing application is available in two AWS regions (Region 1 and Region 2). Each region has its own Application Load Balancer (ALB) and target group. The company uses an AWS Route 53 private hosted zone to route clients to Region 1 by default, but it must automatically fail over to Region 2 when Region 1’s ALB is unhealthy. Which Route 53 design best meets this requirement?

A.Use latency-based routing with two alias records; Route 53 will automatically shift traffic away from the unhealthy region.

B.Use weighted routing with weights 100/0 and update weights manually after detecting failures.

C.Use failover routing with two alias A records for the same name: one PRIMARY and one SECONDARY, both pointing to each region’s ALB; attach the health check to the PRIMARY record.

D.Use geolocation routing with a single alias record for Region 1, and enable EDNS Client Subnet to detect unhealthy endpoints.

AnswerC

Failover routing uses health checks to determine which record Route 53 should return. By creating PRIMARY and SECONDARY alias records and associating a health check with the PRIMARY ALB endpoint, Route 53 can automatically stop routing to Region 1 when the health check fails and route to Region 2 until Region 1 recovers.

Why this answer

Option C is correct because Route 53 failover routing with a PRIMARY and SECONDARY alias record allows automatic failover when the health check attached to the PRIMARY record fails. The health check monitors Region 1's ALB, and upon failure, Route 53 returns the SECONDARY record's IP (Region 2's ALB) to clients. This design meets the requirement for automatic failover without manual intervention.

Exam trap

The trap here is that candidates often assume latency-based or geolocation routing inherently handle health checks, but Route 53 only supports health check-based failover with failover routing (or multivalue answer routing for non-alias records), not with latency or geolocation policies.

How to eliminate wrong answers

Option A is wrong because latency-based routing does not support health checks on alias records; it routes based on lowest latency, not endpoint health, so it cannot automatically fail over when an ALB is unhealthy. Option B is wrong because weighted routing with manual weight updates requires human intervention to detect failures and change weights, which violates the 'automatically fail over' requirement. Option D is wrong because geolocation routing routes based on client location, not endpoint health, and EDNS Client Subnet only improves location accuracy; it does not trigger failover when an endpoint becomes unhealthy.

Full explanation →

693

MCQmedium

A static web application uses CloudFront with an S3 origin for assets (JavaScript, CSS, images). After deploying a new frontend build, the CloudFront cache hit ratio dropped significantly because the S3 origin receives many repeated requests for the same assets. The team notices that requests now include the Authorization header in asset requests. Which change is most likely to restore cache efficiency and reduce origin request costs?

A.Keep the Authorization header but increase the cache TTL to 1 year to reduce revalidation frequency.

B.Update the CloudFront cache policy so that Authorization is excluded from the cache key for static asset paths.

C.Remove CloudFront and serve assets directly from the S3 website endpoint to reduce CloudFront charges.

D.Switch the S3 origin from private access to public access so CloudFront can cache assets more effectively.

AnswerB

When Authorization is part of the cache key, each unique token can create separate cache entries, lowering the cache hit ratio and increasing origin requests. Excluding Authorization from the cache key (and typically from the origin request policy for static assets) allows caching to be based on the URL path/query string, improving hit ratio and reducing S3 origin load.

Why this answer

The drop in cache hit ratio is caused by the Authorization header being included in asset requests, which makes each request unique from CloudFront's perspective, preventing cache reuse. By updating the CloudFront cache policy to exclude the Authorization header from the cache key for static asset paths, CloudFront can treat identical asset requests as cache hits, restoring cache efficiency and reducing origin load.

Exam trap

The trap here is that candidates may assume increasing TTL or making the origin public solves caching issues, but the real problem is the cache key variation caused by the Authorization header, which must be explicitly excluded from the cache policy for static content.

How to eliminate wrong answers

Option A is wrong because increasing the TTL to 1 year does not address the root cause—the Authorization header still varies the cache key, so requests will continue to miss cache and revalidate unnecessarily. Option C is wrong because removing CloudFront and serving assets directly from the S3 website endpoint would eliminate caching entirely, increasing origin request costs and latency, not reducing them. Option D is wrong because switching the S3 origin from private to public access does not affect CloudFront's ability to cache; the cache key issue with the Authorization header remains, and public access introduces security risks without solving the problem.

Full explanation →

694

MCQmedium

A dev sandbox has unpredictable DynamoDB traffic with long idle periods and occasional spikes. Which capacity mode should minimize operational overhead and avoid paying for idle provisioned capacity? The design must avoid adding custom operational scripts.

A.Reserved capacity for maximum daily traffic

B.Provisioned capacity set for peak traffic

C.DynamoDB on-demand capacity mode

D.Global tables in every Region

AnswerC

On-demand capacity is suitable for unpredictable workloads and charges per request without capacity planning.

Why this answer

DynamoDB on-demand capacity mode is ideal for unpredictable workloads with long idle periods and occasional spikes because it automatically scales to handle traffic without requiring any capacity planning or provisioning. You pay only for the reads and writes you actually perform, eliminating the cost of idle provisioned capacity and the operational overhead of managing scaling scripts or alarms.

Exam trap

The trap here is that candidates may confuse 'reserved capacity' with DynamoDB's reserved capacity pricing model (which is actually a commitment discount for provisioned mode) or assume that provisioned capacity with auto-scaling is sufficient, but auto-scaling still requires setting minimum and maximum values and can incur costs for idle provisioned capacity during low-traffic periods.

How to eliminate wrong answers

Option A is wrong because Reserved capacity is not a DynamoDB pricing model; it applies to services like EC2 RIs or Aurora, and even if interpreted as provisioned capacity, it would require estimating peak traffic and paying for idle time. Option B is wrong because Provisioned capacity set for peak traffic would incur costs for unused capacity during idle periods and would require manual scaling or custom scripts to adjust capacity, violating the requirement to avoid custom operational scripts. Option D is wrong because Global tables replicate data across multiple Regions for disaster recovery or low-latency global access, which adds complexity and cost without addressing the core issue of unpredictable traffic and idle capacity waste.

Full explanation →

695

MCQmedium

A SaaS platform plans to run in two AWS Regions for lower latency. The team wants to enable active-active writes (both regions accept updates) to avoid failover downtime. However, the business requires strong consistency for order status transitions (for example, only one transition from “Paid” to “Shipped” must be allowed). Which statement is the best architectural choice to meet the consistency requirement?

A.Use active-active writes only when the workload tolerates eventual consistency; for strongly consistent transitions, use a single-writer pattern with failover (active-passive/pilot light).

B.Active-active writes always provide strong consistency because AWS replicates data across Regions automatically and immediately.

C.Active-active writes can be used safely by simply enabling retries and expecting the application to resolve conflicts without coordination.

D.To ensure strong consistency, run both Regions with different IAM roles and block cross-Region writes at the API layer only.

AnswerA

Strong consistency requirements typically conflict with multi-master active-active replication semantics, so single-writer designs are safer.

Why this answer

Option A is correct because active-active writes across AWS Regions cannot guarantee strong consistency due to the inherent latency and lack of synchronous replication between Regions. For order status transitions that require exactly-once semantics (e.g., only one transition from 'Paid' to 'Shipped'), a single-writer pattern (active-passive or pilot light) ensures that only one Region accepts writes at a time, avoiding conflicts and maintaining a single source of truth. AWS services like DynamoDB global tables offer eventual consistency for multi-region writes, while Aurora Global Database provides read replicas with failover but not active-active writes for strong consistency.

Exam trap

The trap here is that candidates assume AWS's global services (like DynamoDB global tables or Aurora Global Database) inherently provide strong consistency for multi-region writes, when in fact they are designed for eventual consistency and require careful trade-offs for strict ordering requirements.

How to eliminate wrong answers

Option B is wrong because AWS does not replicate data across Regions automatically and immediately for strong consistency; cross-Region replication is asynchronous by design (e.g., DynamoDB global tables use eventual consistency, and S3 CRR is eventually consistent). Option C is wrong because retries alone cannot resolve conflicts in an active-active write scenario; without a distributed consensus protocol (like Paxos or Raft) or a conflict-resolution mechanism, concurrent writes can lead to inconsistent states (e.g., two orders transitioning to 'Shipped' simultaneously). Option D is wrong because blocking cross-Region writes at the API layer with IAM roles does not prevent concurrent writes within each Region; both Regions could still accept updates independently, leading to conflicts, and IAM does not coordinate write ordering across Regions.

Full explanation →

696

MCQeasy

A consumer application reads from an Amazon SQS queue. Some messages have an invalid format and always fail processing. They are retried repeatedly and consume consumer capacity. What is the best way to prevent these "poison pill" messages from blocking normal processing?

A.Enable long polling and increase the maximum message retention to 30 days.

B.Configure a dead-letter queue (DLQ) with a redrive policy and a maxReceiveCount.

C.Switch the queue to FIFO and disable retries in the consumer code.

D.Delete the main queue and recreate it after every failure.

AnswerB

A DLQ with a redrive policy isolates poison-pill messages. After a message fails processing and is received more than maxReceiveCount times, SQS stops returning it to the main queue and moves it to the DLQ. Normal messages continue to be processed without repeatedly consuming consumer capacity.

Why this answer

Option B is correct because a dead-letter queue (DLQ) with a redrive policy and a maxReceiveCount allows messages that repeatedly fail processing to be moved to a separate queue after a specified number of receive attempts. This prevents poison pill messages from being retried indefinitely, freeing consumer capacity for valid messages. Amazon SQS automatically redirects messages to the DLQ once the maxReceiveCount threshold is exceeded, ensuring normal processing is not blocked.

Exam trap

The trap here is that candidates may think increasing retention or polling settings will solve the problem, but they fail to recognize that only a DLQ with a redrive policy isolates repeatedly failing messages from consuming consumer capacity.

How to eliminate wrong answers

Option A is wrong because enabling long polling and increasing maximum message retention does not address the root cause of invalid messages; it only reduces empty responses and keeps messages longer, but poison pills will still be retried. Option C is wrong because switching to a FIFO queue does not prevent poison pills; FIFO ensures exactly-once processing but still retries failed messages, and disabling retries in consumer code would cause message loss without moving them to a DLQ. Option D is wrong because deleting and recreating the main queue after every failure is disruptive, loses all messages, and does not provide a systematic way to isolate or inspect poison pills.

Full explanation →

697

Multi-Selecthard

An application stores user-uploaded binaries in S3. Access is unpredictable for the first month, then most objects become cold. The team wants the cheapest approach that avoids manually guessing access patterns. Which two actions are best? Select two.

Select 2 answers

A.Enable S3 Intelligent-Tiering on the bucket.

B.Keep all objects in S3 Standard because lifecycle transitions add too much management.

C.Add a lifecycle rule to move very old objects to S3 Glacier Deep Archive when minute-level retrieval is no longer required.

D.Copy all binaries to Amazon EFS so retrieval is faster.

E.Disable versioning because S3 Intelligent-Tiering needs it to work.

AnswersA, C

Correct. Intelligent-Tiering is designed for objects with uncertain or changing access patterns. It automatically moves data between access tiers, reducing the need for manual guessing and avoiding overpaying for standard storage.

Why this answer

A is correct because S3 Intelligent-Tiering automatically moves objects between access tiers based on changing access patterns, eliminating the need to manually guess or configure lifecycle rules. It charges a small monthly monitoring fee per object but avoids the higher cost of keeping cold data in S3 Standard, making it the cheapest hands-off approach for unpredictable access followed by cold storage.

Exam trap

The trap here is assuming that lifecycle rules require manual guessing of access patterns, when S3 Intelligent-Tiering automates this without upfront configuration, and that versioning is a prerequisite for Intelligent-Tiering, which it is not.

Full explanation →

698

MCQmedium

A Lambda function in Account A must upload reports to an S3 bucket in Account B. Security does not want long-lived access keys anywhere, and the access should be easy to revoke from Account B. Which approach is best?

A.Create an IAM role in Account B that Account A can assume through STS, then grant the role S3 permissions.

B.Create an IAM user in Account B and store its access keys in Lambda environment variables.

C.Attach a security group to the Lambda function that allows outbound traffic to the bucket.

D.Use AWS Organizations SCPs to grant the Lambda function permission to write to the bucket.

AnswerA

Cross-account role assumption with AWS STS is the standard way to grant temporary access without sharing long-lived credentials. By placing the permissions on a role in Account B and controlling the trust policy there, the bucket-owning account keeps central control and can revoke access by changing the trust relationship or permissions. The Lambda execution role in Account A assumes the role when needed and receives short-lived credentials only.

Why this answer

Option A is correct because it uses cross-account IAM roles with AWS Security Token Service (STS) to grant temporary credentials to the Lambda function. This avoids long-lived access keys, and the permissions can be revoked immediately by modifying or deleting the role in Account B, meeting the security requirements.

Exam trap

The trap here is that candidates may confuse security groups (network-layer controls) with IAM policies (identity-based access), or mistakenly think SCPs can grant cross-account permissions when they only act as guardrails.

How to eliminate wrong answers

Option B is wrong because storing IAM user access keys in Lambda environment variables creates long-lived credentials that violate the 'no long-lived access keys' requirement and are harder to revoke without deleting the user. Option C is wrong because security groups control network traffic at the instance/ENI level, not S3 bucket access; S3 uses IAM policies, not security groups, for authorization. Option D is wrong because AWS Organizations SCPs are used to set permission boundaries across accounts in an organization, not to grant specific Lambda function permissions to an S3 bucket; SCPs cannot grant access, only restrict it.

Full explanation →

699

Multi-Selecthard

A private application in two private subnets must download objects from S3 and read parameters from Systems Manager Parameter Store without routing traffic through the public internet. Which two components should the architect use?

Select 2 answers

A.Interface VPC endpoint for Systems Manager

B.Internet gateway attached to the VPC

C.NAT gateway in each Availability Zone

D.Gateway VPC endpoint for Amazon S3

AnswersA, D

Systems Manager/Parameter Store access uses interface endpoints powered by AWS PrivateLink.

Why this answer

Interface VPC endpoints (AWS PrivateLink) enable private connectivity to Systems Manager Parameter Store by creating an elastic network interface in the subnet with a private IP, allowing the application to read parameters without traversing the internet. Gateway VPC endpoints for S3 provide private access to S3 objects via route table entries, using the S3 public IP space but staying within the AWS network, avoiding the need for an internet gateway or NAT gateway.

Exam trap

The trap here is that candidates often confuse gateway endpoints (used for S3 and DynamoDB) with interface endpoints (used for most other AWS services), and may incorrectly assume a NAT gateway or internet gateway is needed for private subnet outbound traffic, ignoring that gateway endpoints work via route tables without public IPs.

Full explanation →

700

MCQmedium

A company runs a stateful analytics workload on EC2 instances that use EBS volumes. The data must be restorable in another Region after a major outage, with frequent point-in-time recovery. Which approach provides the most suitable replication mechanism for the EBS-backed data?

A.Create scheduled EBS snapshots and copy them to another Region, then restore the volumes from those snapshots during recovery.

B.Enable EBS multi-attach to spread the workload across AZs and replicate snapshots automatically between Regions.

C.Use RDS read replicas in another Region and keep the analytics dataset in an RDS instance only.

D.Rely on instance store for durability and copy only AMIs across Regions.

AnswerA

Snapshotting and cross-Region copying gives point-in-time images of EBS volumes that can be restored in the target Region.

Why this answer

Option A is correct because scheduled EBS snapshots provide point-in-time recovery and can be copied to another Region for cross-region disaster recovery. When a major outage occurs, you can restore EBS volumes from those snapshots in the target Region, meeting the requirement for frequent restorable backups. This approach is native to AWS, cost-effective, and supports the stateful analytics workload without architectural changes.

Exam trap

The trap here is that candidates may confuse EBS multi-attach (which is for high availability within a single AZ) with cross-Region replication, or mistakenly think instance store can provide durable, restorable data across Regions.

How to eliminate wrong answers

Option B is wrong because EBS multi-attach allows a single EBS volume to be attached to multiple EC2 instances within the same Availability Zone, but it does not replicate snapshots across Regions or provide cross-region disaster recovery. Option C is wrong because RDS read replicas are for relational databases and cannot store or replicate arbitrary analytics datasets from EC2 instances with EBS volumes; this option misapplies RDS to a non-database workload. Option D is wrong because instance store volumes are ephemeral and lose data on instance stop or termination, making them unsuitable for durable data that must be restorable after an outage; copying AMIs across Regions does not preserve the analytics data stored on instance store.

Full explanation →

701

MCQmedium

A company serves versioned images from S3 through CloudFront. After a release, CloudFront origin fetches increased sharply and the monthly CloudFront bill went up. They reviewed CloudFront logs and found that many requests include a query string parameter `reqId` that is unique per request (for example, `...?v=2026-04-01&reqId=...`). The team currently forwards all query strings to the cache key. What change is most likely to reduce origin fetches and cost while keeping the versioned images correct?

A.Update the CloudFront cache policy to ignore `reqId` and include only the stable `v` query string parameter in the cache key.

B.Lower the CloudFront minimum TTL to 0 seconds so cached objects revalidate more often, reducing origin fetch volume.

C.Set the S3 bucket to use compression and enable S3 Transfer Acceleration to reduce origin fetch charges.

D.Disable forwarding of the query string to the origin, but keep using the full query string (including `reqId`) in the cache key.

AnswerA

Because `reqId` is unique per request, including it in the cache key prevents cache reuse (each request maps to a different cache entry), resulting in frequent origin fetches. Excluding `reqId` and keeping only `v` allows many requests for the same version to share cached objects, reducing origin traffic and cost while preserving correct version behavior.

Why this answer

Option A is correct because the `reqId` query string parameter is unique per request, which forces CloudFront to treat each request as a distinct cache object when all query strings are forwarded to the cache key. By configuring the cache policy to include only the stable `v` parameter (the version identifier) and ignore `reqId`, CloudFront can serve cached responses for all requests with the same `v` value, drastically reducing origin fetches and lowering costs. This approach preserves correct versioned image delivery because the `v` parameter still differentiates between image versions.

Exam trap

The trap here is that candidates may think forwarding all query strings is harmless or that lowering TTL helps reduce origin fetches, but the real issue is cache key fragmentation caused by unique parameters like `reqId`.

How to eliminate wrong answers

Option B is wrong because lowering the minimum TTL to 0 seconds would cause CloudFront to revalidate cached objects more frequently, increasing origin fetches and costs, which is the opposite of the desired outcome. Option C is wrong because enabling S3 Transfer Acceleration and compression reduces data transfer latency and size but does not address the root cause of excessive origin fetches caused by unique query strings in the cache key. Option D is wrong because disabling forwarding of the query string to the origin while keeping the full query string (including `reqId`) in the cache key would still create unique cache objects for each `reqId`, failing to reduce origin fetches.

Full explanation →

702

MCQmedium

A mobile game backend uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added?

A.An internet gateway

B.S3 Select

C.RDS Proxy

D.A larger Route 53 hosted zone

AnswerC

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy is the correct solution because it sits between Lambda functions and the Aurora database, pooling and reusing database connections. This prevents connection storms by reducing the overhead of establishing new connections for each short-lived Lambda invocation, and it also helps manage IAM authentication for Lambda functions without storing database credentials.

Exam trap

The trap here is that candidates may think scaling the database (e.g., increasing instance size) is the answer, but the question specifically targets connection management, not compute or storage capacity, and RDS Proxy is the AWS-managed service designed exactly for this use case.

How to eliminate wrong answers

Option A is wrong because an internet gateway is used to enable VPC-to-internet communication, not to manage database connection pooling or reduce connection storms. Option B is wrong because S3 Select is a service for retrieving subsets of data from objects in S3 using SQL-like expressions, and it has no role in database connection management. Option D is wrong because a larger Route 53 hosted zone increases the number of DNS records you can host but does not affect database connection handling or reduce connection storms.

Full explanation →

703

Multi-Selecthard

Select 2 answers

A.Interface VPC endpoint for Systems Manager

B.Internet gateway attached to the VPC

C.NAT gateway in each Availability Zone

D.Gateway VPC endpoint for Amazon S3

AnswersA, D

Systems Manager/Parameter Store access uses interface endpoints powered by AWS PrivateLink.

Why this answer

An Interface VPC endpoint for Systems Manager (SSM) allows private subnets to communicate with AWS Systems Manager Parameter Store over the AWS network using private IP addresses, without traversing the internet. This endpoint uses AWS PrivateLink, enabling secure and private access to SSM APIs, which is required for reading parameters from Parameter Store.

Exam trap

The trap here is that candidates often confuse Gateway VPC endpoints (used for S3 and DynamoDB) with Interface VPC endpoints (used for most other AWS services like Systems Manager), and may incorrectly assume a NAT gateway or internet gateway is needed for private subnet access to AWS services.

Full explanation →

704

Multi-Selecthard

A media company serves versioned JavaScript and CSS from an S3 origin through CloudFront. After a release, the cache hit ratio drops because the SPA sends an Authorization header and several tracking query strings on every request, even though the assets are public and identical for all users. Which changes would most improve cache efficiency without changing the content returned? Select three.

Select 3 answers

A.Create a CloudFront cache policy that excludes the Authorization header from the cache key when the assets do not require per-user authorization.

B.Use versioned object names for each release and apply long cache TTLs so viewers reuse the same objects until the content changes.

C.Use a cache policy that forwards only required query strings and ignores the tracking parameters that do not affect object content.

D.Place the S3 origin behind an Application Load Balancer so CloudFront can reuse more cached responses.

E.Enable S3 Transfer Acceleration to increase the cache hit ratio for repeated browser requests.

AnswersA, B, C

Correct because an unnecessary Authorization header fragments the cache into many unique variants. If the files are truly public and identical, CloudFront should not vary the cache key on that header.

Why this answer

Option A is correct because CloudFront's default behavior includes the Authorization header in the cache key, causing unique cache entries for each user even when the content is public. By creating a cache policy that excludes the Authorization header, CloudFront treats all requests for the same object as identical, dramatically improving the cache hit ratio without affecting the content served.

Exam trap

The trap here is that candidates may think adding an ALB or enabling Transfer Acceleration improves caching performance, but these services address availability and speed, not cache key efficiency, which is the root cause of low hit ratios.

Full explanation →

705

MCQmedium

A media processing workflow uses CloudWatch Logs heavily. Retaining all debug logs forever is increasing costs. What should be configured? The design must avoid adding custom operational scripts.

A.Route 53 health checks

B.CloudWatch Logs retention policies per log group

C.CloudWatch detailed monitoring on all instances

D.AWS Config aggregation

AnswerB

Retention policies automatically delete older logs after the required period.

Why this answer

Option B is correct because CloudWatch Logs retention policies allow you to set a time-based expiration (e.g., 30 days) on log groups, automatically deleting old log events. This directly reduces storage costs without requiring custom scripts, as the retention policy is a native CloudWatch Logs feature configured per log group.

Exam trap

The trap here is that candidates may confuse CloudWatch Logs retention policies with CloudWatch metrics retention or detailed monitoring, thinking that reducing metric granularity will lower log storage costs, when in fact log retention is a separate, per-log-group setting.

How to eliminate wrong answers

Option A is wrong because Route 53 health checks monitor endpoint availability and DNS routing, not log retention or cost optimization. Option C is wrong because CloudWatch detailed monitoring increases metric frequency (1-minute intervals) and incurs additional costs, but does not manage log retention or deletion. Option D is wrong because AWS Config aggregation centralizes resource configuration snapshots and compliance rules, not log lifecycle management.

Full explanation →

706

MCQmedium

A risk simulation workload uses CloudWatch Logs heavily. Retaining all debug logs forever is increasing costs. What should be configured? The design must avoid adding custom operational scripts.

A.CloudWatch Logs retention policies per log group

B.AWS Config aggregation

C.CloudWatch detailed monitoring on all instances

D.Route 53 health checks

AnswerA

Retention policies automatically delete older logs after the required period.

Why this answer

CloudWatch Logs retention policies allow you to set per-log-group expiration rules (e.g., 30 days, 90 days) to automatically delete old log events, directly reducing storage costs without custom scripts. Since the workload uses CloudWatch Logs heavily and retains debug logs forever, configuring a retention policy on each log group is the simplest, most cost-effective solution that requires no operational overhead.

Exam trap

The trap here is that candidates may confuse cost optimization features (like retention policies) with monitoring or compliance tools (like AWS Config or detailed monitoring), assuming that any AWS service that 'monitors' can also reduce log storage costs.

How to eliminate wrong answers

Option B is wrong because AWS Config aggregation is used to consolidate configuration and compliance data from multiple accounts/regions, not to manage log retention or cost. Option C is wrong because CloudWatch detailed monitoring on EC2 instances collects metrics at 1-minute intervals (vs. 5-minute basic), which increases costs and does not affect log retention or deletion. Option D is wrong because Route 53 health checks monitor endpoint availability and DNS routing, not log storage or lifecycle management.

Full explanation →

707

MCQmedium

A company is deploying a high-performance computing (HPC) cluster with 16 EC2 instances. The workload requires the lowest possible network latency and highest throughput between all nodes for tightly coupled parallel MPI computations. Which EC2 placement group type should a solutions architect recommend?

A.Cluster placement group

B.Partition placement group

C.Spread placement group

D.No placement group — use Auto Scaling across multiple AZs

AnswerA

Cluster PGs place instances physically close together in a single AZ for lowest latency and highest throughput. They support EFA for MPI-level performance — the standard HPC choice.

Why this answer

Cluster placement groups pack instances physically close together within a single Availability Zone, providing the lowest possible network latency and highest network throughput between instances. They support enhanced networking (SR-IOV) and Elastic Fabric Adapter (EFA) for inter-node MPI communication.

Tightly coupled parallel HPC workloads require all nodes to communicate frequently with minimal latency. Cluster placement groups are specifically designed for this use case. The trade-off is all instances are in one AZ — if the AZ fails, the entire cluster is affected.

Exam trap

Spread and Partition placement groups improve availability by distributing instances across racks or partitions — they intentionally increase inter-node distance, which increases latency. For HPC requiring sub-microsecond inter-node communication, low latency trumps availability. Cluster PG = maximum performance in one AZ.

Spread PG = maximum isolation across racks.

Why the other options are wrong

Partition PGs distribute instances across separate hardware racks to reduce rack-failure impact. Instances in different partitions have higher inter-node latency. Designed for distributed databases (Hadoop, Cassandra, Kafka), not tightly coupled HPC.

Spread PGs place each instance on a distinct hardware rack for maximum isolation. Instances are intentionally spread further apart, increasing latency — the opposite of what HPC requires.

Multi-AZ Auto Scaling distributes instances across AZs for availability. Cross-AZ networking has higher latency than within-AZ. For HPC requiring sub-microsecond inter-node communication, all instances must be in the same AZ within a Cluster PG.

Full explanation →

708

MCQeasy

Based on the exhibit, the web tier becomes unavailable if us-west-2a has an outage. What is the best change to improve resilience with the least redesign?

A.Increase the Auto Scaling group desired capacity from 2 to 3 in the same subnet.

B.Attach the Application Load Balancer and Auto Scaling group to subnets in a second Availability Zone.

C.Replace the Application Load Balancer with a Network Load Balancer.

D.Increase the health check grace period so instances stay registered longer.

AnswerB

Spanning the load balancer and Auto Scaling group across at least two Availability Zones removes the single-AZ dependency shown in the exhibit. If us-west-2a fails, the remaining AZ can continue serving traffic and Auto Scaling can replace unhealthy instances there. This is the smallest architectural change that directly improves availability.

Why this answer

The web tier is currently deployed in a single Availability Zone (us-west-2a), so an outage of that AZ makes the entire tier unavailable. By attaching the Application Load Balancer and Auto Scaling group to subnets in a second Availability Zone, the application can continue serving traffic from the healthy AZ, achieving high availability with minimal architectural changes. This is the standard AWS best practice for multi-AZ resilience.

Exam trap

The trap here is that candidates may think increasing instance count or changing load balancer type improves resilience, but the core issue is the single-AZ deployment, which only multi-AZ subnets can fix.

How to eliminate wrong answers

Option A is wrong because increasing the desired capacity to 3 in the same subnet still keeps all instances in a single Availability Zone; an AZ outage would still take down all instances. Option C is wrong because replacing the Application Load Balancer with a Network Load Balancer does not address the single-AZ failure; both ALB and NLB can operate across AZs, but the issue is the lack of multi-AZ subnets, not the load balancer type. Option D is wrong because increasing the health check grace period only delays instance deregistration, it does not prevent the loss of all instances when the entire AZ fails.

Full explanation →

709

MCQmedium

A media company runs a 24/7 recommendation engine on EC2 in one AWS Region. The workload is interruption-intolerant, and the team expects steady usage but may change instance families and sizes during planned optimizations. Compared to the current On-Demand setup, they want the lowest cost while avoiding the rigidity of locking to a specific instance type. What should the solutions architect recommend?

A.Switch the instances to Spot Instances and use interruption handling because it is the largest discount.

B.Purchase a Compute Savings Plan for the expected steady hourly usage in that Region.

C.Purchase a Standard Reserved Instance tied to a single specific instance type for the next 3 years.

D.Keep On-Demand and rely on Auto Scaling to reduce capacity when utilization is low.

AnswerB

Compute Savings Plans discount the usage while allowing flexibility across instance families and sizes in the Region.

Why this answer

A Compute Savings Plan offers the lowest cost for steady-state usage without locking to a specific instance type, providing up to 66% discount over On-Demand while allowing flexibility to change instance families, sizes, OS, or tenancy within a Region. This matches the requirement for cost savings with instance flexibility during planned optimizations.

Exam trap

The trap here is that candidates often confuse Reserved Instances with Savings Plans, assuming a Standard Reserved Instance is the only way to get significant discounts, but the question explicitly requires flexibility to change instance families, which a Compute Savings Plan provides while a Standard Reserved Instance does not.

How to eliminate wrong answers

Option A is wrong because Spot Instances can be interrupted with a 2-minute warning, making them unsuitable for an interruption-intolerant workload that runs 24/7. Option C is wrong because a Standard Reserved Instance locks to a specific instance type in a specific AZ, which contradicts the requirement to avoid rigidity and change instance families during optimizations. Option D is wrong because keeping On-Demand provides no cost savings, and Auto Scaling reduces capacity only when utilization is low, not addressing the need for lowest cost on steady usage.

Full explanation →

710

MCQmedium

You serve private reports stored in an S3 bucket through CloudFront. After a recent change, users report that they can access the S3 object URLs directly (bypassing CloudFront), which violates your design. You want to ensure S3 objects are readable only through CloudFront using Origin Access Control (OAC), even if someone guesses the S3 URL. Which update best enforces this at the S3 bucket level?

A.Add a bucket policy Allow for s3:GetObject only when the principal is cloudfront.amazonaws.com and aws:SourceArn matches your CloudFront distribution ARN, while blocking public access.

B.Enable an S3 bucket lifecycle policy to transition objects to Glacier, so public S3 URLs become inaccessible.

C.Rely only on CloudFront signed URLs validation; do not change the S3 bucket policy.

D.Add a WAF rule on CloudFront to block requests that contain "amazonaws.com" in the URL path.

AnswerA

CloudFront OAC requests are authorized via a CloudFront principal plus a SourceArn restriction. Coupled with public access blocking, this prevents direct S3 URL reads.

Why this answer

Option A is correct because it uses an S3 bucket policy that grants s3:GetObject access only when the principal is cloudfront.amazonaws.com and the aws:SourceArn matches the CloudFront distribution ARN. This ensures that only CloudFront, using Origin Access Control (OAC), can retrieve objects, blocking direct S3 URL access even if the URL is guessed. Blocking public access at the bucket level further prevents any anonymous or public reads.

Exam trap

The trap here is that candidates may think CloudFront signed URLs alone are sufficient for security, but without a restrictive bucket policy, the S3 bucket remains publicly accessible, allowing direct URL access to bypass CloudFront.

How to eliminate wrong answers

Option B is wrong because a lifecycle policy to transition objects to Glacier does not prevent direct S3 URL access; it only changes storage class, and objects in Glacier are still accessible via S3 APIs if permissions allow. Option C is wrong because relying solely on CloudFront signed URLs without updating the S3 bucket policy leaves the bucket publicly accessible, allowing direct S3 URL access to bypass CloudFront entirely. Option D is wrong because a WAF rule on CloudFront that blocks requests containing 'amazonaws.com' in the URL path would block legitimate CloudFront requests to the S3 origin, and it does not prevent direct S3 URL access which bypasses CloudFront altogether.

Full explanation →

711

Multi-Selecthard

A log archive has old unattached EBS volumes and many stale snapshots. Which two actions reduce storage cost without affecting running instances?

Select 2 answers

A.Stop all EC2 instances in the account

B.Disable CloudTrail logging

C.Delete unattached EBS volumes after verifying they are no longer needed

D.Apply snapshot lifecycle policies to expire obsolete snapshots

AnswersC, D

Unattached volumes continue to incur charges until deleted.

Why this answer

Unattached EBS volumes incur storage costs even when not in use, as EBS pricing is based on provisioned capacity per GB-month. Deleting them after verifying they are no longer needed eliminates this cost without affecting running instances, since attached volumes are untouched. This directly addresses the question's requirement to reduce storage costs without impacting running workloads.

Exam trap

AWS often tests the misconception that stopping instances or disabling services like CloudTrail reduces storage costs, but the trap here is that only direct actions on the storage resources themselves (deleting volumes and expiring snapshots) affect EBS and snapshot billing.

Full explanation →

712

Multi-Selectmedium

A static site is hosted in Amazon S3 and delivered by CloudFront. After a frontend release, the same JavaScript bundles are fetched repeatedly from the origin. Logs show that requests include unneeded query strings and cookies, which prevent cache reuse. Which two changes should the team make to reduce origin traffic and cost? Select two.

Select 2 answers

A.Configure a CloudFront cache policy that forwards only the query strings, headers, and cookies the app actually needs.

B.Use versioned file names for static assets and set a long TTL for immutable objects.

C.Increase the size of the S3 bucket.

D.Place an Application Load Balancer in front of S3.

E.Disable caching so clients always get the latest files.

AnswersA, B

Reducing the cache key to only required values increases cache hit ratio and lowers origin fetches. CloudFront can reuse responses more effectively when unnecessary request data is not forwarded.

Why this answer

Option A is correct because CloudFront cache policies allow you to explicitly control which query strings, headers, and cookies are forwarded to the origin. By forwarding only the parameters the application actually needs, you prevent cache keys from being polluted by unneeded values, which increases the cache hit ratio and reduces requests to the S3 origin. This directly addresses the problem of repeated fetches caused by extraneous query strings and cookies.

Exam trap

The trap here is that candidates may think disabling caching (Option E) or adding an ALB (Option D) would help, but these actions either increase origin load or add unnecessary cost, whereas the correct approach is to refine the cache key and use immutable asset versioning.

Full explanation →

713

MCQmedium

A deployment engineer created an IAM role for an automation workflow (AppDeployRole). The role has an attached identity policy that allows iam:CreateRole for specific resource ARNs. However, the role is also created with a permission boundary named DeployBoundary. The DeployBoundary policy currently does not include the iam:CreateRole action. During execution, the automation fails with AccessDenied for iam:CreateRole, even though the attached identity policy allows it. What is the best fix?

A.Edit AppDeployRole’s attached identity policy to add iam:CreateRole again; permission boundaries only apply when permissions are missing.

B.Update DeployBoundary to allow iam:CreateRole for only the required resource ARNs, following least privilege.

C.Remove the permission boundary from the role because permission boundaries are not enforced at runtime.

D.Encrypt the deployment artifacts with KMS so IAM denies become KMS authorization failures.

AnswerB

IAM permission boundaries define the maximum set of permissions the role can use. To permit iam:CreateRole, the DeployBoundary must explicitly allow iam:CreateRole (and scope it to the required resources). The attached identity policy alone is not sufficient when the boundary is more restrictive.

Why this answer

B is correct because when an IAM role has a permission boundary, the boundary defines the maximum permissions the role can have. Even if the identity-based policy allows iam:CreateRole, the effective permissions are the intersection of the identity policy and the permission boundary. Since DeployBoundary does not include iam:CreateRole, the action is denied.

Updating the boundary to allow iam:CreateRole for the required resource ARNs, following least privilege, grants the necessary permission while still constraining the role.

Exam trap

The trap here is that candidates often think permission boundaries are optional or only restrict when the identity policy is too permissive, but in reality they are an absolute limit that always reduces effective permissions, so even if the identity policy allows an action, the boundary can deny it.

How to eliminate wrong answers

Option A is wrong because permission boundaries are not a fallback that only apply when permissions are missing; they are an upper limit that always applies, and adding the action again to the identity policy does not override the boundary. Option C is wrong because permission boundaries are enforced at runtime; removing the boundary would bypass security controls and is not a best practice. Option D is wrong because encrypting deployment artifacts with KMS does not affect IAM authorization for iam:CreateRole; KMS handles encryption/decryption, not IAM policy evaluation.

Full explanation →

714

Multi-Selectmedium

A company is migrating a legacy monolithic application to AWS and wants to improve its resilience by decoupling components. The application currently writes directly to a shared file system and uses synchronous HTTP calls between modules. Which three AWS services should the company use to achieve a more resilient, decoupled architecture? (Choose three.)

Select 3 answers

.Amazon SQS for asynchronous message passing between application components.

.Amazon EFS as a shared file system to replace the on-premises NAS.

.Amazon SNS to fan-out notifications to multiple subscribers for event-driven processing.

.AWS Lambda to run components as stateless functions with automatic scaling.

.AWS Direct Connect to provide a dedicated network link for inter-component communication.

.Amazon Elastic Block Store (EBS) with Multi-Attach for shared block storage between components.

Why this answer

Amazon SQS is correct because it enables asynchronous message passing between application components, decoupling them so that a failure in one component does not block others. This replaces the synchronous HTTP calls, improving resilience by allowing messages to be buffered and processed independently.

Exam trap

The trap here is that candidates often confuse shared storage solutions (EFS, EBS Multi-Attach) with decoupling mechanisms, but these still create tight coupling and single points of failure, whereas the correct services (SQS, SNS, Lambda) enable true asynchronous, stateless, and event-driven decoupling.

Full explanation →

715

MCQmedium

An AWS Organizations setup uses an SCP to enforce that developers can read only non-production secrets. A developer role in a member account is correctly configured with an identity policy that allows: - secretsmanager:GetSecretValue on arn:aws:secretsmanager:us-east-1:222222222222:secret:app/* However, the developer gets AccessDenied with an error message mentioning an organization policy (SCP). The SCP includes this Deny statement: "Deny secretsmanager:GetSecretValue on * unless secretsmanager:ResourceTag/environment equals 'dev'". Which change best restores access for secrets tagged environment=dev while still blocking prod secrets?

A.Update the SCP to match the correct tag key/format actually used on your Secrets Manager secret resources so the condition evaluates to true for environment=dev.

B.Remove the Deny statement from the SCP and rely only on the member account identity policy.

C.Add an IAM policy statement with Effect=Allow and "Condition: aws:PrincipalOrgID" in the member account to override the SCP.

D.Use a longer STS session duration so the SCP is evaluated less frequently.

AnswerA

SCP conditions that rely on resource tags must use the correct tag key and the correct Secrets Manager tag condition key (for example, secretsmanager:ResourceTag/<tag-key>). If the SCP references a tag key/format that doesn’t match how the secrets are actually tagged, the 'unless' condition won’t evaluate as intended, and the Deny will still apply.

Why this answer

The SCP Deny statement uses a condition key `secretsmanager:ResourceTag/environment` that must evaluate to true for the tag value 'dev' to allow access. If the actual Secrets Manager secrets are tagged with a different key format (e.g., `environment` vs. `Environment`) or the tag is missing, the condition fails and the Deny applies. Updating the SCP to match the exact tag key and value used on the secrets ensures the condition evaluates to true, allowing GetSecretValue for dev-tagged secrets while still blocking prod secrets.

Exam trap

The trap here is that candidates assume SCPs can be overridden by identity-based policies or that tag conditions are case-insensitive, leading them to choose Option C or B instead of correcting the SCP's tag key format.

How to eliminate wrong answers

Option B is wrong because removing the Deny statement would allow access to all secrets, including prod, which violates the requirement to block prod secrets. Option C is wrong because SCPs are evaluated before identity-based policies and cannot be overridden by any IAM policy in the member account; an `aws:PrincipalOrgID` condition does not bypass SCP Deny statements. Option D is wrong because STS session duration has no effect on SCP evaluation; SCPs are evaluated on every API call regardless of session length.

Full explanation →

716

MCQmedium

A company hosts an application on EC2 instances in private subnets. The instances must (1) read objects from Amazon S3 and (2) retrieve secrets from AWS Secrets Manager. The team currently sends all outbound traffic through a NAT gateway to reach both services. They want to reduce monthly cost while keeping traffic private (no internet egress) and without changing application logic. Which change is the most cost-effective?

A.Create a Gateway VPC endpoint for S3 and an Interface VPC endpoint for Secrets Manager, and ensure the subnet route tables / endpoint routing directs those service calls to the endpoints instead of the NAT gateway.

B.Keep the NAT gateway, but add AWS WAF rules to block non-service outbound requests to reduce NAT usage.

C.Disable IPv4 on the VPC subnets and rely on IPv6-only egress to reduce NAT gateway costs.

D.Replace the NAT gateway with a VPC firewall appliance instance to proxy outbound calls and reduce NAT fees.

AnswerA

This is the most cost-effective change because it removes the need to traverse the NAT gateway for those AWS service calls. S3 uses a Gateway VPC endpoint (route-table-based) for traffic to the S3 prefix list, so requests to S3 stay on the AWS network. Secrets Manager uses an Interface VPC endpoint (ENIs with private DNS), so requests to Secrets Manager stay private within the VPC/VPC endpoint network path. Because the application still calls the same AWS APIs, there is no logic change, and NAT data-processing charges drop to near zero for S3/Secrets Manager traffic.

Why this answer

Option A is correct because Gateway VPC Endpoints for S3 and Interface VPC Endpoints for Secrets Manager allow private connectivity to these AWS services without traversing the internet or a NAT gateway. This eliminates NAT gateway hourly charges and data processing fees, reducing costs while keeping traffic within the AWS network. The application logic remains unchanged as the endpoints are accessed via the same DNS names, with route tables directing traffic to the endpoints instead of the NAT gateway.

Exam trap

The trap here is that candidates may assume NAT gateways are the only way to provide private subnet internet access, overlooking that VPC endpoints can provide private, cost-effective connectivity to specific AWS services without internet egress.

How to eliminate wrong answers

Option B is wrong because AWS WAF is a web application firewall for HTTP/HTTPS traffic, not a mechanism to reduce NAT gateway costs; it does not eliminate the NAT gateway's hourly and per-GB data processing fees. Option C is wrong because disabling IPv4 and relying on IPv6-only egress would require the application to use IPv6 addresses, which changes the application logic and may not be supported by all services; additionally, NAT gateways are not used for IPv6 traffic (egress-only internet gateways are used), so this does not address the cost of the NAT gateway for IPv4 traffic. Option D is wrong because replacing the NAT gateway with a VPC firewall appliance instance still incurs instance costs and management overhead, and it does not eliminate the need for internet egress to reach S3 and Secrets Manager unless endpoints are used; it is not more cost-effective than using VPC endpoints.

Full explanation →

717

Multi-Selectmedium

A company stores customer invoices in an Amazon S3 bucket. The application must keep the bucket private, ACLs should not be used, and customers should receive temporary download links for individual invoices. Which three changes should the architect make? Select three.

Select 3 answers

A.Enable S3 Block Public Access on both the bucket and the AWS account.

B.Continue using object ACLs so each customer invoice can be made public briefly.

C.Configure Bucket owner enforced object ownership to disable ACLs.

D.Generate presigned URLs for customers to download specific invoices for a limited time.

E.Move the bucket to another AWS Region to isolate it from the internet.

AnswersA, C, D

Block Public Access prevents accidental public exposure through bucket policies, ACLs, and other public settings. It is a strong baseline control when the data must remain private.

Why this answer

Option A is correct because enabling S3 Block Public Access at both the bucket and account level ensures that no public access can be granted to the bucket or its objects, which aligns with the requirement to keep the bucket private. This setting overrides any other permissions that might inadvertently allow public access, providing a strong security baseline.

Exam trap

The trap here is that candidates might think moving the bucket to a different region or using ACLs can solve the temporary access requirement, but they overlook that S3 Block Public Access and presigned URLs are the correct mechanisms for private, time-limited access without ACLs.

Full explanation →

718

Multi-Selectmedium

A workload in private subnets must upload logs to Amazon S3 and retrieve one secret from AWS Secrets Manager. The security team forbids internet egress and wants the lowest operational overhead. Which two VPC endpoints should be created? Select two.

Select 2 answers

A.An Amazon S3 gateway endpoint for private S3 access.

B.An AWS Secrets Manager interface endpoint for private secret retrieval.

C.A NAT Gateway in the public subnet.

D.An Internet Gateway attached to the VPC.

E.A DynamoDB gateway endpoint for the log upload path.

AnswersA, B

S3 supports gateway endpoints, which route traffic through the AWS network without requiring a NAT gateway or internet gateway. This is the lowest-overhead private access option for S3.

Why this answer

Amazon S3 gateway endpoints allow private subnet resources to access S3 without traversing the internet, using prefix lists and route table entries to direct traffic through AWS's internal network. This satisfies the security team's no-egress requirement and incurs no hourly cost, offering the lowest operational overhead for S3 access.

Exam trap

The trap here is that candidates often confuse gateway endpoints (for S3 and DynamoDB) with interface endpoints (for most other AWS services), and may incorrectly select a DynamoDB gateway endpoint for S3 or assume a NAT Gateway is required for private subnet access to AWS services.

Full explanation →

719

MCQeasy

You must ensure that all requests to an S3 bucket use TLS (HTTPS). Which S3 bucket policy approach best enforces this requirement for S3 access?

A.Allow all principals to GetObject when aws:SecureTransport is true

B.Use a policy statement that explicitly Denies any action when aws:SecureTransport is false

C.Deny requests only when the bucket name is not matched exactly in the request

D.Require that the requester uses SSE-KMS and reject requests without SSE-KMS configuration

AnswerB

A bucket policy statement with Effect = Deny and a condition aws:SecureTransport = false blocks non-HTTPS requests. Because explicit Deny overrides Allow during policy evaluation, this prevents access for any request that does not use TLS, even if other statements grant permissions.

Why this answer

Option B is correct because the `aws:SecureTransport` condition key evaluates whether the request was sent using TLS. By explicitly denying all S3 actions when `aws:SecureTransport` is false, any HTTP request is rejected, ensuring only HTTPS requests succeed. This approach uses an explicit deny, which overrides any allow, making it the most secure and reliable method to enforce TLS.

Exam trap

The trap here is that candidates confuse encryption in transit (TLS/HTTPS) with encryption at rest (SSE-KMS or SSE-S3), leading them to select Option D, which does not address the transport security requirement.

How to eliminate wrong answers

Option A is wrong because an allow statement with `aws:SecureTransport: true` does not block HTTP requests; it only permits HTTPS, but any other policy that allows access (e.g., a public bucket policy) could still allow HTTP requests. Option C is wrong because matching the bucket name has no relation to TLS enforcement; it addresses routing or bucket identification, not transport security. Option D is wrong because requiring SSE-KMS enforces encryption at rest, not encryption in transit (TLS); requests without SSE-KMS could still be sent over HTTP, violating the requirement.

Full explanation →

720

MCQmedium

A mobile banking backend stores audit logs in S3. The compliance team requires that logs cannot be overwritten or deleted for seven years. What should be configured? The design must avoid adding custom operational scripts.

A.S3 server access logging

B.S3 lifecycle expiration after seven years

C.S3 versioning only

D.S3 Object Lock in compliance mode with an appropriate retention period

AnswerD

Object Lock compliance mode enforces write-once-read-many retention that even privileged users cannot bypass during the retention period.

Why this answer

S3 Object Lock in compliance mode prevents any user, including the root user, from overwriting or deleting objects for the specified retention period. This meets the compliance requirement of immutable audit logs for seven years without custom scripts. Compliance mode enforces a legal hold that cannot be removed by any user, ensuring logs are write-once-read-many (WORM) protected.

Exam trap

The trap here is that candidates often choose versioning (option C) thinking it prevents deletion, but versioning alone does not block overwrites or permanent deletion of the current version without additional safeguards like MFA delete or Object Lock.

How to eliminate wrong answers

Option A is wrong because S3 server access logging only records requests made to the bucket; it does not prevent deletion or overwriting of existing logs. Option B is wrong because S3 lifecycle expiration deletes objects after a set period, which would violate the requirement to prevent deletion for seven years. Option C is wrong because S3 versioning alone preserves previous versions but does not prevent deletion of the current version or overwrites; it requires additional controls like MFA delete or Object Lock to enforce immutability.

Full explanation →

721

MCQeasy

A company keeps daily database backups in an S3 bucket. They may restore from backups during the first 30 days if there is an issue. After 30 days, backups are rarely restored, but must be retained for 2 years. Which lifecycle strategy most cost-effectively meets these requirements?

A.Delete backups after 30 days to avoid storage costs, since restores are rare.

B.Keep all backups in S3 Standard for the entire 2-year retention period.

C.Use an S3 lifecycle policy to keep backups in S3 Standard for 30 days, then transition them to S3 Glacier Deep Archive for the remainder of the 2-year retention period.

D.Move backups to S3 Glacier Deep Archive immediately after creation, even for the first 30 days.

AnswerC

A lifecycle transition after the initial restore window reduces cost while still meeting the 2-year retention requirement.

Why this answer

Option C is correct because it balances cost and compliance: backups are kept in S3 Standard for the first 30 days when restores are frequent, ensuring low-latency access, then transitioned to S3 Glacier Deep Archive for the remaining retention period. S3 Glacier Deep Archive offers the lowest storage cost (approximately $0.00099/GB/month) for long-term retention, and the lifecycle policy automates the transition without manual intervention. This approach minimizes storage costs while meeting the 2-year retention requirement.

Exam trap

The trap here is that candidates may assume immediate deletion (Option A) or immediate archiving (Option D) are acceptable, failing to recognize the dual requirement of frequent access in the first 30 days and long-term retention at minimal cost, which the lifecycle policy elegantly addresses.

How to eliminate wrong answers

Option A is wrong because deleting backups after 30 days violates the 2-year retention requirement, even if restores are rare after that period. Option B is wrong because storing all backups in S3 Standard for the full 2 years incurs unnecessary high costs (approximately $0.023/GB/month) for data that is rarely accessed after 30 days, making it cost-inefficient. Option D is wrong because moving backups immediately to S3 Glacier Deep Archive ignores the need for frequent restores during the first 30 days; Glacier Deep Archive has a retrieval time of 12–48 hours and a minimum storage charge of 180 days, making it unsuitable for immediate access needs.

Full explanation →

722

Multi-Selecthard

A customer portal uses Amazon Aurora MySQL. The application currently sends all SELECT queries to the writer instance endpoint. During traffic spikes, read latency increases, and the team wants the cluster to survive a writer failover without manual endpoint changes for the application. Which changes should the team make? Select three.

Select 3 answers

A.Replace the hard-coded DB instance endpoint with the Aurora reader endpoint for read traffic.

B.Add additional Aurora Replicas and spread read queries across them.

C.Enable Aurora Auto Scaling for the replicas so the cluster can add readers during demand spikes.

D.Keep sending all traffic to the writer endpoint because it always has the freshest data.

E.Replace Aurora with a single larger RDS instance and continue using the same read pattern.

AnswersA, B, C

Correct. The reader endpoint is a cluster-level endpoint that automatically routes read connections to available Aurora Replicas. That removes the dependency on a specific DB instance endpoint and avoids application changes when the writer fails over or the writer instance identifier changes.

Why this answer

Option A is correct because the Aurora reader endpoint automatically distributes read-only connections across all available Aurora Replicas, including the writer instance if it is configured to accept reads. This eliminates the need for manual endpoint changes during a failover, as the reader endpoint remains constant while the underlying instances change.

Exam trap

The trap here is that candidates may think the writer endpoint is the only way to get fresh data, but Aurora Replicas provide nearly real-time consistency and survive failover without application changes, making them the correct choice for read scaling and high availability.

Full explanation →

723

MCQeasy

Based on the exhibit, what should the architect recommend to reduce inter-node latency for this workload?

A.Use a spread placement group so each instance is placed on separate hardware.

B.Launch the instances in a cluster placement group within the same Availability Zone.

C.Move the instances into different Availability Zones to improve fault tolerance.

D.Use a partition placement group to balance traffic across partitions.

AnswerB

A cluster placement group places instances close together in a single Availability Zone, which is the best choice for workloads that exchange many small messages and need very low network latency. This design is common for tightly coupled compute, analytics, and HPC-style applications. Because the workload is not bandwidth-saturated but latency-sensitive, proximity matters more than broader distribution.

Why this answer

A cluster placement group provides the lowest possible latency and highest packet-per-second performance by ensuring instances are placed in close physical proximity within a single Availability Zone. This is ideal for tightly coupled, high-performance computing workloads that require low inter-node latency.

Exam trap

The trap here is that candidates confuse placement group types, assuming 'spread' or 'partition' can also reduce latency, when only cluster placement groups are designed for low-latency, high-throughput networking within a single AZ.

How to eliminate wrong answers

Option A is wrong because a spread placement group places instances on separate hardware to reduce correlated failures, not to minimize latency; it actually increases inter-node latency due to physical separation. Option C is wrong because moving instances into different Availability Zones increases network distance and latency, which is the opposite of what is needed. Option D is wrong because a partition placement group is designed to isolate instances across logical partitions for fault tolerance, not to reduce latency, and it does not guarantee close physical proximity.

Full explanation →

724

MCQhard

A warehouse integration service must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The design must avoid adding custom operational scripts.

A.Amazon EFS with mount targets in multiple Availability Zones

B.S3 mounted as a POSIX file system without a file gateway

C.Instance store volumes

D.An EBS volume attached to all instances

AnswerA

EFS is regional file storage and supports mount targets across AZs.

Why this answer

Amazon EFS provides a fully managed, POSIX-compliant NFSv4.1 shared file system that can be mounted concurrently across multiple Linux EC2 instances. By deploying mount targets in multiple Availability Zones, the file system remains accessible even if one AZ fails, satisfying the high-availability requirement without any custom scripts.

Exam trap

The trap here is that candidates may confuse EBS Multi-Attach (which has strict limitations and requires cluster-aware file systems) with a true shared file system, or assume that S3 with a FUSE mount is a viable POSIX alternative without considering the operational overhead and lack of native consistency.

How to eliminate wrong answers

Option B is wrong because mounting S3 as a POSIX file system (e.g., via s3fs-fuse) requires custom operational scripts and does not provide native POSIX semantics or strong consistency, making it unsuitable for shared file storage across AZs. Option C is wrong because instance store volumes are ephemeral, tied to a single EC2 instance, and cannot be shared across instances or survive AZ failures. Option D is wrong because a single EBS volume cannot be attached to multiple EC2 instances; it can only be attached to one instance at a time, and while Multi-Attach EBS exists, it is limited to specific instance types and does not provide a shared file system without additional cluster-aware software.

Full explanation →

725

MCQmedium

A caching layer uses Amazon ElastiCache for Redis in front of a stateless web service. The service must continue to read cached responses during maintenance events and should automatically fail over to another node if one AZ becomes impaired. Which design change best satisfies this requirement?

A.Deploy a single-node Redis cluster and rely on application-level retries when cache misses occur.

B.Configure an ElastiCache Redis replication group with automatic failover across multiple Availability Zones.

C.Move the cache into the VPC but keep it in one Availability Zone to reduce network latency.

D.Use a Memcached cluster and configure only client-side connection pooling without failover support.

AnswerB

Multi-AZ replication groups provide redundant nodes and automatic failover, improving cache resilience during AZ events.

Why this answer

Option B is correct because an ElastiCache Redis replication group with automatic failover across multiple Availability Zones ensures that if the primary node or its AZ becomes impaired, a read-replica in another AZ is automatically promoted to primary. This allows the stateless web service to continue reading cached responses during maintenance events without interruption, as the failover is transparent to the application.

Exam trap

The trap here is that candidates often confuse Memcached with Redis, assuming that Memcached also supports replication and automatic failover, or they mistakenly think a single-node Redis cluster with retries is sufficient for high availability during AZ impairments.

How to eliminate wrong answers

Option A is wrong because a single-node Redis cluster provides no redundancy; if the node or its AZ fails, the cache is completely unavailable, forcing the web service to fall back to the origin server for all requests, which defeats the purpose of a caching layer. Option C is wrong because keeping the cache in one Availability Zone does not protect against AZ impairment; a single-AZ deployment cannot automatically fail over to another node in a different AZ, so the service would lose cached data during an AZ outage. Option D is wrong because Memcached does not support replication or automatic failover; it is a distributed cache with no built-in mechanism to promote a standby node, so any node failure results in cache misses and requires client-side reconfiguration.

Full explanation →

726

MCQmedium

A research team runs a latency-sensitive distributed training job on Amazon EC2. They deploy 80 identical nodes that exchange small messages frequently and need low network jitter. The job must run entirely within one Availability Zone. Which placement group strategy should a solutions architect use to maximize intra-cluster network performance?

A.Use a cluster placement group to keep all instances in close proximity within the same Availability Zone.

B.Use a spread placement group to distribute instances across distinct hardware to reduce jitter.

C.Use a partition placement group and place each node into its own partition for uniform latency.

D.Do not use a placement group; rely on the default EC2 scheduling to balance latency and availability.

AnswerA

A cluster placement group is optimized to place instances close together (for example, within the same rack/cluster) to reduce latency and jitter for traffic between the instances. Because the workload runs in a single Availability Zone, the cluster placement group aligns with the requirement for strong locality and low-jitter communication.

Why this answer

A cluster placement group is the correct choice because it places all 80 EC2 instances in close physical proximity within a single Availability Zone, ensuring low-latency, high-bandwidth network connections with minimal jitter. This placement group type is specifically designed for tightly coupled, latency-sensitive workloads like distributed training that require frequent, small message exchanges, as it leverages non-blocking, high-throughput networking between instances.

Exam trap

The trap here is that candidates often confuse spread placement groups (which reduce jitter by isolating hardware failures) with cluster placement groups (which reduce jitter by minimizing physical distance), not realizing that jitter in this context is caused by network hops, not hardware faults.

How to eliminate wrong answers

Option B is wrong because a spread placement group distributes instances across distinct hardware to maximize fault tolerance, which increases network distance and jitter, making it unsuitable for latency-sensitive workloads. Option C is wrong because a partition placement group divides instances into logical partitions for fault isolation, but it does not guarantee the uniform low latency and close proximity needed for frequent small message exchanges. Option D is wrong because relying on default EC2 scheduling does not ensure instances are placed in close physical proximity, leading to higher network latency and jitter compared to a cluster placement group.

Full explanation →

727

MCQmedium

A Lambda function for a mobile banking backend needs to read a database password. The password must rotate automatically every 30 days and should not be stored in environment variables. Which service should be used?

A.An encrypted object in Amazon S3

B.AWS Secrets Manager with rotation enabled

C.AWS Systems Manager Parameter Store SecureString without automation

D.A KMS-encrypted Lambda environment variable

AnswerB

Secrets Manager stores secrets securely and supports automatic rotation using a rotation Lambda function.

Why this answer

AWS Secrets Manager is the correct choice because it provides built-in automatic rotation of secrets (e.g., database passwords) with a configurable rotation interval (e.g., 30 days). It integrates natively with AWS Lambda via the AWS SDK, allowing the function to retrieve the password at runtime without storing it in environment variables. Secrets Manager also encrypts secrets at rest using KMS and supports automatic rotation via a Lambda rotation function, meeting both the security and rotation requirements.

Exam trap

The trap here is that candidates often confuse AWS Systems Manager Parameter Store SecureString with Secrets Manager, but Parameter Store does not support automatic rotation without custom automation, making it unsuitable for a 30-day rotation requirement.

How to eliminate wrong answers

Option A is wrong because storing an encrypted object in Amazon S3 does not provide automatic rotation of the password; you would need to manually manage rotation and versioning, and the Lambda function would require additional logic to decrypt and rotate the secret. Option C is wrong because AWS Systems Manager Parameter Store SecureString without automation lacks built-in automatic rotation; you would need to implement custom rotation logic, and Parameter Store does not natively support scheduled rotation like Secrets Manager. Option D is wrong because a KMS-encrypted Lambda environment variable is static and cannot be rotated automatically; the password would remain the same until the Lambda function is redeployed, and environment variables are visible in the function configuration, violating the requirement to not store the password in environment variables.

Full explanation →

728

MCQmedium

A game streaming service must use UDP for real-time gameplay traffic. For external firewall allowlisting, the service requires stable, static IP addresses. The TLS handshake must be handled end-to-end by the application servers (the load balancer must not terminate TLS). Which AWS load balancing option best fits these requirements?

A.Use a Network Load Balancer (NLB) with a UDP listener, configure the NLB to use Elastic IP addresses for static IPs, and use TCP listeners for TLS passthrough to the application servers.

B.Use an Application Load Balancer (ALB) with UDP listeners and configure TLS passthrough.

C.Use Amazon API Gateway with a WebSocket API and keepalive pings to provide UDP-like low-latency delivery.

D.Use a Classic Load Balancer and multiplex UDP over TCP to meet the UDP and low-latency requirements.

AnswerA

NLB supports UDP listeners and is designed for low-latency, high-performance networking. Associating Elastic IP addresses with the NLB provides stable public IP addresses for firewall allowlisting. For TLS passthrough, using a TCP listener keeps the TLS handshake and encryption between the client and the targets (no load balancer TLS termination).

Why this answer

A Network Load Balancer (NLB) supports UDP listeners, which are required for real-time gameplay traffic, and can be assigned Elastic IP addresses to provide stable, static IPs for firewall allowlisting. Additionally, NLB supports TCP listeners with TLS passthrough, meaning it forwards the encrypted traffic without terminating the TLS handshake, allowing the application servers to handle end-to-end encryption as required.

Exam trap

The trap here is that candidates may assume an ALB can handle UDP traffic because it supports WebSocket or HTTP/2, but ALB is strictly Layer 7 and only supports TCP-based protocols, while NLB is the correct choice for UDP and TLS passthrough with static IPs.

How to eliminate wrong answers

Option B is wrong because an Application Load Balancer (ALB) does not support UDP listeners; it only handles HTTP/HTTPS and WebSocket traffic over TCP. Option C is wrong because Amazon API Gateway with WebSocket API operates over TCP (not UDP) and does not provide static IP addresses for allowlisting; it also terminates TLS at the API Gateway endpoint, not at the application servers. Option D is wrong because a Classic Load Balancer does not support UDP listeners and cannot multiplex UDP over TCP in a way that meets low-latency requirements; it is a legacy option that lacks the necessary protocol support and static IP capabilities.

Full explanation →

729

MCQmedium

Company A (account 1111) hosts an IAM role (RoleInAccountA) that is assumed by a workload in Company B (account 2222) using sts:AssumeRole. Security requires that only Company B’s intended workload can assume the role, even if another principal in account 2222 tries to assume it. The trust policy already restricts who can assume the role to account 2222. What additional trust policy condition most directly satisfies this requirement?

A.Add a condition requiring sts:ExternalId to equal a specific value that Company B’s workload must provide in sts:AssumeRole.

B.Add a condition requiring aws:PrincipalArn to start with arn:aws:iam::2222:role/.

C.Add a condition requiring sts:RoleSessionName to match the string "integration".

D.Rely only on an SCP in account 1111 to block all sts:AssumeRole calls except from Company B’s OU.

AnswerA

An ExternalId acts as a shared secret known only to the intended workload (or integration). Adding a sts:ExternalId condition causes sts:AssumeRole to fail for any other principals in account 2222 that do not supply the correct ExternalId, directly mitigating confused-deputy scenarios.

Why this answer

Option A is correct because the sts:ExternalId condition key is specifically designed to prevent the confused deputy problem. By requiring a unique external ID that only Company B's intended workload knows and passes in the sts:AssumeRole call, the trust policy ensures that even if another principal in account 2222 attempts to assume the role, it cannot provide the correct external ID, thus blocking the request.

Exam trap

The trap here is that candidates often confuse the purpose of sts:RoleSessionName (which is for auditing, not security) or assume that restricting by principal ARN prefix is sufficient, but they overlook that any role in account 2222 could match that prefix, failing to isolate the specific workload.

How to eliminate wrong answers

Option B is wrong because aws:PrincipalArn can be spoofed or misused; the trust policy already restricts to account 2222, but any role in that account (including unintended ones) could match the prefix, so it does not uniquely identify the specific workload. Option C is wrong because sts:RoleSessionName is a user-supplied string that can be arbitrarily set by any caller; it is not a secure mechanism to enforce workload identity and can be easily bypassed. Option D is wrong because SCPs in account 1111 cannot restrict actions within account 2222; SCPs are applied to the account that owns the role (1111) and cannot control which principals in account 2222 can assume the role—they only affect permissions for principals in the same organization.

Full explanation →

730

MCQeasy

A new feature stores user events in DynamoDB. Each event must be fetched by user_id and sorted by event_time. The team expects many different users and wants to avoid a single hot partition. Which partition key design is best?

A.Use a constant partition key value (for example, partition_key='events') and store user_id as an attribute.

B.Use user_id as the partition key and event_time as the sort key.

C.Use event_time as the partition key and user_id as an attribute to query later.

D.Use a randomly generated UUID as the partition key and query by user_id using a full table scan.

AnswerB

Using user_id as the partition key spreads data across many partitions based on user distribution. event_time as the sort key supports efficient range queries and retrieving events in time order per user. This design matches the stated access pattern and reduces hot partition likelihood.

Why this answer

Option B is correct because using `user_id` as the partition key ensures each user's events are stored in a separate partition, distributing the workload evenly and avoiding hot partitions. Adding `event_time` as the sort key allows DynamoDB to efficiently retrieve events for a given user in sorted order using a Query operation, which is both fast and cost-effective.

Exam trap

The trap here is that candidates may choose a constant partition key (Option A) thinking it simplifies queries, but they overlook that DynamoDB's scalability depends on partition key cardinality, and a single partition key creates a bottleneck that defeats the purpose of a NoSQL database.

How to eliminate wrong answers

Option A is wrong because using a constant partition key value (e.g., `'events'`) forces all data into a single partition, creating a hot partition that throttles performance and defeats the purpose of DynamoDB's distributed architecture. Option C is wrong because using `event_time` as the partition key scatters events across partitions without grouping by user, so fetching all events for a specific user would require a costly full table scan or a Scan with a filter, which is inefficient and not sorted. Option D is wrong because a randomly generated UUID partition key distributes writes well but makes it impossible to query by `user_id` without a full table scan, as DynamoDB cannot query across partitions without knowing the exact partition key values.

Full explanation →

731

Matchinghard

Match each operational condition to the load balancing or Auto Scaling behavior that should occur.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

The target is marked unhealthy by the ALB health check and removed from routing until it passes again.

The Auto Scaling health check grace period prevents premature termination while startup work completes.

Instances that fail load balancer health checks are considered unhealthy by the group and are replaced automatically.

The health check verifies protocol and port reachability rather than an HTTP response body or status code.

Using EC2 health checks allows Auto Scaling to replace the instance even when the app itself has not reported an error.

Why these pairings

Auto Scaling adds instances when metrics like CPU or network exceed thresholds; load balancer scales based on request count; unhealthy hosts trigger replacement.

Full explanation →

732

MCQeasy

A company runs the same public API in two regions (Region A and Region B), each fronted by an ALB. They want Route 53 to automatically route clients to the Region B API when Region A becomes unhealthy, with minimal configuration effort. Which Route 53 approach should they use?

A.Use a single Route 53 A record that points only to Region A’s ALB and manually update it after failures.

B.Use Route 53 latency-based routing with separate records for each region.

C.Use Route 53 failover routing with health checks for each region’s endpoint.

D.Use weighted routing and set the Region B weight to 0 to ensure it is only used when needed.

AnswerC

Failover routing works with health checks to move traffic from a primary endpoint to a secondary endpoint when the primary becomes unhealthy.

Why this answer

Route 53 failover routing with health checks is the correct choice because it automatically directs traffic to a secondary endpoint (Region B) when the primary endpoint (Region A) fails a health check. This provides active-passive failover with minimal configuration, as Route 53 monitors the health of each ALB and updates DNS responses accordingly without manual intervention.

Exam trap

The trap here is that candidates often confuse latency-based routing with failover capabilities, assuming latency routing will automatically avoid unhealthy endpoints, but it only optimizes for speed and requires health checks to be manually integrated via a separate routing policy.

How to eliminate wrong answers

Option A is wrong because manually updating a single A record after failure is not automated, contradicts the requirement for minimal configuration effort, and introduces significant downtime during the manual update window. Option B is wrong because latency-based routing routes clients based on lowest latency, not health; it does not automatically fail over to Region B when Region A is unhealthy—clients would still be directed to Region A if it has lower latency, even if it is down. Option D is wrong because setting Region B's weight to 0 would never route traffic to it, even if Region A fails; weighted routing does not support automatic failover based on health checks.

Full explanation →

733

MCQmedium

A company uses IAM permission boundaries to prevent developers from escalating privileges. The security team created a permission boundary that allows only read-only actions on most AWS services, but teams can still manage their own resources. A developer can create an IAM role with broad permissions, and the boundary does not appear to be restricting it. Which corrective action best aligns with how permission boundaries work?

A.Rely on an AWS-managed policy attached to the developer’s IAM user; permission boundaries only apply to users.

B.Ensure the role creation process sets the permission boundary on the new role, using the boundary’s ARN in the CreateRole call or role template.

C.Attach the permission boundary policy as an SCP in AWS Organizations so it automatically applies to all roles.

D.Grant the developer IAM permissions to add a “deny” statement to the boundary policy so the boundary blocks escalation.

AnswerB

Permission boundaries are evaluated based on the boundary attached to the principal/role being created or used. If a developer creates roles without specifying the boundary, the boundary won’t restrict the resulting permissions. Enforcing boundary attachment via role templates or required parameters ensures every created role is constrained.

Why this answer

Permission boundaries must be explicitly applied to a role during its creation (via the `CreateRole` API call or an infrastructure-as-code template). Without setting the boundary ARN, the role inherits no restriction, allowing the developer to create a role with broad permissions that bypasses the intended boundary. Option B correctly identifies that the role creation process must include the boundary ARN to enforce the limitation.

Exam trap

The trap here is that candidates assume permission boundaries are automatically inherited or enforced by default, when in fact they must be explicitly applied to each role during creation, and SCPs are often confused as a substitute for permission boundaries.

How to eliminate wrong answers

Option A is wrong because permission boundaries apply to IAM roles and users, not just users; the developer's user policy does not restrict roles they create unless the boundary is applied to the role. Option C is wrong because SCPs (Service Control Policies) operate at the AWS Organizations account level and cannot be attached to individual roles; they provide a maximum permission guardrail but do not replace the need for a permission boundary on the role. Option D is wrong because a developer cannot modify a permission boundary policy after it is created; the boundary is a separate policy document that the developer cannot edit, and adding a 'deny' statement would require permissions the developer does not have.

Full explanation →

734

MCQhard

Based on the exhibit, the security team needs to detect and alert on both successful and failed attempts to change S3 bucket policies and KMS key policies across the organization. Which solution best meets that requirement?

A.Enable an organization trail for management events in all regions and create an EventBridge rule that matches PutBucketPolicy and PutKeyPolicy, then send alerts to SNS.

B.Enable AWS Config in all accounts and use only a periodic compliance evaluation to alert when bucket or key policies drift.

C.Use IAM Access Analyzer because it continuously blocks policy changes that would expose the resources publicly.

D.Turn on S3 server access logging and KMS key rotation, because both services will capture policy modifications automatically.

AnswerA

CloudTrail management events record API activity, including failed attempts, and an organization trail provides coverage across accounts and Regions. EventBridge can react to those API calls in near real time and route notifications to SNS. This is the clean detective-control pattern for policy-change auditing.

Why this answer

Option A is correct because AWS CloudTrail management events capture all API calls that modify S3 bucket policies (PutBucketPolicy) and KMS key policies (PutKeyPolicy). By enabling an organization trail for all regions, you centralize these events across the entire AWS Organization. An Amazon EventBridge rule can then filter for these specific API calls and send alerts via Amazon SNS, meeting the requirement to detect both successful and failed attempts.

Exam trap

The trap here is that candidates often confuse AWS Config's compliance checks or IAM Access Analyzer's policy analysis with real-time API call monitoring, failing to realize that only CloudTrail management events capture every attempt (including failures) to change policies.

How to eliminate wrong answers

Option B is wrong because AWS Config periodic compliance evaluations only check resource compliance at scheduled intervals, not in real-time, and they do not directly capture or alert on every API call attempt (including failed ones) to change policies. Option C is wrong because IAM Access Analyzer is designed to analyze resource-based policies for unintended public or cross-account access, not to block or alert on all policy change attempts; it does not continuously block changes or capture failed attempts. Option D is wrong because S3 server access logging logs object-level access requests, not management API calls like PutBucketPolicy, and KMS key rotation does not capture policy modifications; neither service logs policy change attempts.

Full explanation →

735

MCQmedium

You have an S3 bucket that stores customer-specific private files. You want to serve these files through CloudFront, where clients must use signed cookies (or signed URLs) to access the content. In addition, you need to block common web exploits and rate-limit suspicious traffic at the edge. Which design best meets these requirements?

A.Keep the S3 bucket private, configure CloudFront with Origin Access Control so only CloudFront can access the origin, require signed cookies/URLs for viewers, and associate an AWS WAF web ACL with CloudFront for blocking and rate limiting.

B.Enable public read access on the S3 bucket and rely on WAF alone for authorization because WAF can validate signatures.

C.Configure CloudFront with signed URLs but do not change the S3 bucket access settings; leaving public access enabled is acceptable since CloudFront can filter traffic.

D.Use WAF at CloudFront but omit signed cookies/URLs because rate limiting and exploit blocking already provide access control for private files.

AnswerA

This ensures S3 remains non-public while CloudFront becomes the only origin access path using Origin Access Control. Signed cookies/URLs enforce authenticated authorization at the edge for each request. Attaching AWS WAF adds request inspection and protections like rate limiting and exploit blocking.

Why this answer

Option A is correct because it combines a private S3 bucket with Origin Access Control (OAC) to ensure only CloudFront can access the origin, enforces signed cookies/URLs for viewer authentication, and uses AWS WAF at the edge to block common web exploits and rate-limit suspicious traffic. This layered approach provides both authorization (via signed requests) and security filtering (via WAF) at the CloudFront edge, meeting all requirements.

Exam trap

The trap here is that candidates often think WAF can handle authorization (like validating signed URLs) or that leaving the S3 bucket public is acceptable if CloudFront is used, but WAF cannot verify cryptographic signatures and a public bucket allows direct access bypassing CloudFront's authentication.

How to eliminate wrong answers

Option B is wrong because enabling public read access on the S3 bucket bypasses the need for signed cookies/URLs, and WAF cannot validate signatures—WAF inspects HTTP headers, URI paths, and IP addresses, but does not have the capability to verify CloudFront signed URL or signed cookie cryptographic signatures. Option C is wrong because leaving the S3 bucket publicly accessible defeats the purpose of using signed URLs; CloudFront does not filter traffic based on signed URLs at the origin level, so a public bucket would allow direct access to objects without authentication. Option D is wrong because omitting signed cookies/URLs means there is no mechanism to restrict access to authorized viewers only; WAF rate limiting and exploit blocking do not provide authentication or authorization for private content.

Full explanation →

736

MCQhard

Based on the exhibit, the database is manually promoted during an Availability Zone failure and the application outage lasts longer than the target. What change best improves resilience with the least operational intervention?

A.Keep the read replica and automate promotion with a runbook after CloudWatch alarms fire.

B.Convert the database to an RDS Multi-AZ deployment so a synchronous standby can fail over automatically.

C.Use a cross-Region read replica so promotion happens faster during an AZ failure.

D.Increase the application retry count and keep the current database design.

AnswerB

Multi-AZ is designed for automatic failover within the same Region and maintains a synchronous standby for high availability. The exhibit shows that the current read replica requires manual promotion and produces an outage longer than the target. Switching to Multi-AZ removes the manual step and aligns the database layer with the desired recovery time.

Why this answer

B is correct because RDS Multi-AZ automatically synchronously replicates data to a standby in a different Availability Zone and triggers an automatic failover with zero manual intervention when an AZ failure occurs. This directly addresses the requirement to improve resilience while minimizing operational effort, as the failover is handled by AWS without any runbook execution or manual promotion.

Exam trap

The trap here is that candidates often confuse read replicas (designed for read scaling and manual promotion) with Multi-AZ deployments (designed for automatic failover), and incorrectly assume that automating a runbook for read replica promotion is equivalent to the native automatic failover of Multi-AZ.

How to eliminate wrong answers

Option A is wrong because it still requires manual or automated runbook execution to promote the read replica, which introduces operational intervention and delay, failing the 'least operational intervention' requirement. Option C is wrong because a cross-Region read replica involves asynchronous replication and manual promotion, which is slower and more complex than Multi-AZ failover, and does not meet the 'least operational intervention' goal. Option D is wrong because increasing the application retry count does not resolve the underlying database unavailability during an AZ failure; it only masks the symptom and does not improve resilience.

Full explanation →

737

MCQhard

Based on the exhibit, which design change is the best way to reduce the observed read latency for this DynamoDB-backed service?

A.Add a DynamoDB Accelerator (DAX) cluster in front of the table and send repeated read traffic through it.

B.Increase the on-demand table limits so DynamoDB can automatically absorb more traffic.

C.Create a global secondary index on tenantId to distribute the load across more partitions.

D.Move the dashboard data into S3 and use Lambda functions to read it on demand.

AnswerA

DAX is designed to accelerate repeated eventually consistent reads from DynamoDB by caching hot items in memory. The exhibit shows one tenant driving most of the reads and the same dashboard items being requested repeatedly within a short window, which is an excellent fit for DAX. It reduces latency and offloads the hot key without requiring a schema redesign.

Why this answer

Adding a DynamoDB Accelerator (DAX) cluster in front of the table reduces read latency by providing an in-memory cache for repeated read traffic. DAX delivers microsecond response times for eventually consistent reads, which directly addresses the observed latency issue without requiring application-level caching or table redesign.

Exam trap

The trap here is that candidates often assume increasing capacity limits (Option B) or adding indexes (Option C) will solve latency issues, but they fail to recognize that latency is a caching problem, not a throughput or partitioning problem, and that DAX is the AWS-native solution for DynamoDB read-heavy workloads with repeated access patterns.

How to eliminate wrong answers

Option B is wrong because increasing on-demand table limits does not reduce read latency; it only prevents throttling by allowing DynamoDB to absorb more traffic, but the underlying read latency from the storage layer remains unchanged. Option C is wrong because creating a global secondary index on tenantId does not inherently reduce read latency; it distributes load across partitions but still requires a full table scan or query that incurs the same storage-layer latency, and it does not provide caching. Option D is wrong because moving dashboard data to S3 and using Lambda to read it on demand introduces cold start latency and S3's eventual consistency model, which can increase read latency rather than reduce it, and it adds unnecessary complexity for a use case that benefits from DynamoDB's low-latency access.

Full explanation →

738

MCQmedium

A batch-processing system runs only during business hours (08:00–18:00 UTC). The jobs are restartable, and the architecture can tolerate occasional interruptions. Which approach minimizes cost while meeting the business-hours constraint?

A.Use only On-Demand instances during business hours and scale to zero outside those hours.

B.Use Spot Instances for the batch workload during business hours and scale the capacity down to zero outside that window.

C.Purchase a 3-year Reserved Instance and keep the workload running 24/7 to use the commitment fully.

D.Purchase Reserved Instances and disable scaling so the fleet stays within the commitment regardless of job demand.

AnswerB

Spot is appropriate because the workload is restartable and can tolerate interruptions. Scaling to zero outside business hours prevents paying for unused capacity when jobs are not running.

Why this answer

Option B is correct because Spot Instances offer significant cost savings (up to 90% compared to On-Demand) and are ideal for fault-tolerant, restartable batch workloads. The architecture can tolerate interruptions, and scaling to zero outside business hours ensures no compute costs are incurred when the system is not needed.

Exam trap

The trap here is that candidates often choose On-Demand instances (Option A) because they assume reliability is paramount, but the question explicitly states the workload is restartable and can tolerate interruptions, making Spot Instances the lower-cost choice.

How to eliminate wrong answers

Option A is wrong because using only On-Demand instances during business hours incurs higher costs than necessary; Spot Instances are more cost-effective for restartable batch jobs. Option C is wrong because purchasing a 3-year Reserved Instance and running 24/7 wastes money on compute time outside the 10-hour business window, and the commitment does not align with the workload's schedule. Option D is wrong because purchasing Reserved Instances and disabling scaling prevents the fleet from adapting to job demand, leading to either underutilization or inability to handle peak loads, and still incurs costs outside business hours.

Full explanation →

739

MCQeasy

An application runs on an EC2 Auto Scaling group. Over the last month, CPU utilization averaged 8% with no sustained memory pressure, and response times are stable. The team wants to lower monthly cost without changing the application. What is the most appropriate next step for cost optimization?

A.Evaluate a smaller EC2 instance type (via the Auto Scaling launch template/configuration) for the group and validate performance metrics after the change.

B.Increase desired capacity to 2x so utilization increases and instances become “more efficient.”

C.Disable Auto Scaling so the group never scales down to preserve baseline performance.

D.Switch the workload to Spot instances immediately to avoid On-Demand charges, regardless of interruption risk.

AnswerA

If utilization is consistently low and performance is stable, the current instances are likely overprovisioned. Moving to a smaller instance type directly reduces compute cost while preserving capacity for normal load.

Why this answer

Option A is correct because the application is over-provisioned: CPU utilization averages only 8% with no memory pressure and stable response times. By selecting a smaller EC2 instance type in the Auto Scaling launch template or configuration, you directly reduce the per-instance cost while maintaining adequate performance. This is the most straightforward cost optimization step without modifying the application code or architecture.

Exam trap

The trap here is that candidates may think increasing capacity (Option B) improves efficiency, but in reality, adding more instances to an already underutilized workload only increases cost without any performance benefit.

How to eliminate wrong answers

Option B is wrong because increasing desired capacity to 2x would add more instances, increasing total cost while utilization per instance would drop even further, making the system less efficient, not more. Option C is wrong because disabling Auto Scaling removes the ability to scale down during low demand, which would lock in higher costs and prevent the group from right-sizing to actual load. Option D is wrong because switching to Spot instances immediately without testing or implementing interruption-handling mechanisms (e.g., graceful shutdown, checkpointing) risks application availability and stability, which is not acceptable when the goal is to lower cost without changing the application.

Full explanation →

740

MCQmedium

A video platform uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added?

A.S3 Select

B.An internet gateway

C.A larger Route 53 hosted zone

D.RDS Proxy

AnswerD

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy sits between Lambda functions and the Aurora database, pooling and reusing database connections. This prevents the Lambda functions from overwhelming the database with many short-lived connections, which can cause connection storms and degrade performance. RDS Proxy also reduces the overhead of establishing new connections and improves scalability.

Exam trap

The trap here is that candidates may confuse connection pooling with network-level components (like internet gateways) or data retrieval services (like S3 Select), overlooking that RDS Proxy is the AWS-native solution for managing short-lived database connections from serverless or highly concurrent workloads.

How to eliminate wrong answers

Option A is wrong because S3 Select is used to retrieve subsets of data from objects in Amazon S3 using SQL expressions, not for managing database connections. Option B is wrong because an internet gateway enables VPC resources to communicate with the internet, not to manage or pool database connections. Option C is wrong because a larger Route 53 hosted zone increases the number of DNS records you can create, but does not address connection pooling or database connection storms.

Full explanation →

741

MCQmedium

A company uses Amazon SQS and AWS Lambda to process orders. Lambda typically completes in 4 minutes, but complex orders can take up to 12 minutes. The team reports that some orders are being processed more than once. Which is the MOST likely cause and the recommended fix?

A.Enable SQS FIFO queue to prevent duplicate message delivery

B.Increase the SQS queue visibility timeout to exceed the maximum Lambda processing time

C.Reduce the Lambda function timeout to 4 minutes to match typical processing time

D.Enable SQS long polling to reduce the frequency of message retrieval

AnswerB

Setting visibility timeout above 12 minutes (the maximum processing time) prevents messages from reappearing while being processed. This eliminates the root cause of duplicate processing.

Why this answer

SQS visibility timeout defines how long a message is hidden from other consumers after it is received. If a Lambda function takes longer than the visibility timeout to process a message, the message becomes visible again and another Lambda invocation picks it up — causing duplicate processing.

The default SQS visibility timeout is 30 seconds. If processing takes 12 minutes but visibility timeout is 30 seconds, messages reappear and are processed again. The fix is to increase the visibility timeout to exceed the maximum processing time — at least 13-15 minutes.

Exam trap

Many architects set up SQS/Lambda integrations without adjusting the visibility timeout from the default 30 seconds. When Lambda functions run longer than this, the message reappears and creates duplicates. The symptom is duplicate processing — a classic visibility timeout mismatch.

Fix the root cause (extend visibility timeout) rather than adding application-level deduplication logic.

Why the other options are wrong

SQS FIFO queues provide exactly-once processing within a deduplication window, but they have throughput limits and the root cause here is a visibility timeout mismatch. Switching to FIFO adds complexity without addressing the underlying cause.

Reducing Lambda timeout to 4 minutes would cause 12-minute complex orders to fail before completing, sending them to the DLQ or causing retries. This makes the problem worse.

Long polling reduces API calls and costs for sparse queues by waiting up to 20 seconds for messages. It has no effect on visibility timeout or duplicate processing.

Full explanation →

742

MCQmedium

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The team wants the control to be enforceable during normal operations.

A.An EBS snapshot schedule

B.S3 Cross-Region Replication with versioning enabled

C.S3 lifecycle transition to Glacier Flexible Retrieval

D.A CloudFront distribution

AnswerB

CRR asynchronously replicates objects to a bucket in another Region and requires versioning.

Why this answer

S3 Cross-Region Replication (CRR) automatically replicates objects to a destination bucket in a different AWS Region, providing a disaster recovery copy. Enabling versioning on both source and destination buckets is required for CRR to function, and replication can be enforced during normal operations by applying an IAM policy that denies `s3:PutObject` unless the request includes the `x-amz-server-side-encryption` header or by using a bucket policy that requires replication. This meets the business requirement for an enforceable, automated DR copy.

Exam trap

The trap here is that candidates confuse S3 Cross-Region Replication with S3 lifecycle policies or Glacier transitions, thinking that moving data to a cold storage class in the same region provides DR, when in fact DR requires a copy in a separate geographic region.

How to eliminate wrong answers

Option A is wrong because EBS snapshots are for Amazon EC2 block storage volumes, not for S3 objects; they cannot replicate data stored in S3 buckets. Option C is wrong because S3 lifecycle transition to Glacier Flexible Retrieval only moves objects to a lower-cost storage class within the same region, it does not create a copy in another AWS Region for disaster recovery. Option D is wrong because CloudFront is a content delivery network (CDN) that caches content at edge locations for low-latency access, not a replication mechanism for creating a regional DR copy.

Full explanation →

743

MCQmedium

A production log archive runs continuously on EC2 with predictable usage for the next three years. The team wants a discount while retaining some instance-family flexibility. What should they buy? The design must avoid adding custom operational scripts.

A.S3 Intelligent-Tiering

B.Dedicated Instances

C.Compute Savings Plan

D.Spot Instances only

AnswerC

Compute Savings Plans provide discounts for a committed spend while allowing flexibility across instance families, sizes, Regions, and compute services.

Why this answer

The Compute Savings Plan (C) offers the largest discount (up to 66%) in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) for a 1- or 3-year term, while still allowing flexibility across instance families, sizes, OS, tenancy, and regions. This matches the predictable three-year workload and the requirement for instance-family flexibility without custom scripts.

Exam trap

The trap here is that candidates confuse Savings Plans with Reserved Instances, assuming that any commitment requires locking into a specific instance family, but Compute Savings Plans explicitly provide family flexibility while still delivering a significant discount.

How to eliminate wrong answers

Option A is wrong because S3 Intelligent-Tiering is an object storage class for data with changing access patterns, not a compute pricing model for EC2 instances. Option B is wrong because Dedicated Instances provide physical isolation at a higher cost and do not offer a discount or instance-family flexibility; they are for compliance or licensing needs. Option D is wrong because Spot Instances offer deep discounts but can be interrupted with a 2-minute warning, making them unsuitable for a production log archive that must run continuously without disruption.

Full explanation →

744

MCQmedium

An application writes to an Amazon Aurora DB cluster. After a planned Aurora failover, the application experiences several minutes of connection errors. The logs show the application continues connecting to the specific DB instance endpoint that was the primary before the failover. What change most directly improves resilience during Aurora failovers?

A.Update the application to use the Aurora cluster writer endpoint for write traffic so it always resolves to the current writer instance.

B.Increase Aurora storage autoscaling so failovers are unnecessary.

C.Point both reads and writes to the Aurora reader endpoint to keep the DNS name the same.

D.Disable Aurora failover capability so the cluster never switches writer instances.

AnswerA

During failover, Aurora changes which underlying DB instance is the writer. The cluster writer endpoint (for the cluster) always resolves to the current writer. Using the writer endpoint prevents the application from being pinned to an old instance endpoint that may stop accepting writes after failover.

Why this answer

The Aurora cluster writer endpoint always resolves to the current primary DB instance, even after a failover. By using this endpoint instead of a specific instance endpoint, the application automatically reconnects to the new writer without manual intervention or connection errors.

Exam trap

The trap here is that candidates may think using any Aurora endpoint (like the reader endpoint) is sufficient, but they must understand that only the cluster writer endpoint guarantees write availability after a failover, while the reader endpoint is strictly for read traffic.

How to eliminate wrong answers

Option B is wrong because increasing storage autoscaling does not prevent failovers; failovers occur due to instance health or AZ issues, not storage capacity. Option C is wrong because the reader endpoint is designed for read-only traffic and does not accept write connections, so pointing writes to it would cause immediate failures. Option D is wrong because disabling failover capability would make the cluster unable to recover from primary instance failures, leading to prolonged downtime.

Full explanation →

745

MCQhard

Based on the exhibit, a public API is behind CloudFront and is experiencing bursts of requests from the same client IP, causing upstream saturation. The team wants AWS to automatically block that IP when the request rate becomes excessive while keeping enforcement as close to the client as possible. Which control should they add?

A.Add an AWS WAF rate-based rule to the CloudFront distribution and configure it to block the source IP after the threshold is exceeded.

B.Add a network ACL rule that denies the source IP after five requests are observed.

C.Enable AWS Shield Advanced and create a custom protection group for the single IP address.

D.Place the API behind a security group rule that allows only the current client IP range.

AnswerA

AWS WAF rate-based rules are purpose-built for this use case. They evaluate the HTTP request rate from a source IP over a sliding window and can automatically block, CAPTCHA, or count when the threshold is exceeded. Attaching the Web ACL to CloudFront enforces the control at the edge, so abusive requests are stopped before they reach the origin and consume upstream capacity.

Why this answer

AWS WAF rate-based rules are designed to automatically block IP addresses that exceed a specified request rate within a 5-minute evaluation window. By attaching this rule to a CloudFront distribution, enforcement occurs at the edge location closest to the client, preventing excessive requests from reaching the upstream API and mitigating saturation.

Exam trap

The trap here is confusing stateless network ACLs or static security groups with the automatic, rate-aware blocking capability of AWS WAF, leading candidates to choose a manual or non-scalable solution.

How to eliminate wrong answers

Option B is wrong because network ACLs are stateless and require manual intervention to add or remove rules; they cannot automatically block an IP after a threshold of requests is observed. Option C is wrong because AWS Shield Advanced provides DDoS protection and custom protection groups for resource-level mitigation, not automatic per-IP rate limiting based on request count. Option D is wrong because security group rules are stateful and cannot dynamically update to block a specific client IP based on request rate; they only allow or deny traffic based on static rules.

Full explanation →

746

MCQhard

Based on the exhibit, a low-latency analytics platform runs 10 EC2 instances in the same Availability Zone. The nodes exchange a very high volume of east-west messages and must experience the lowest possible network latency and jitter. A separate operations team also wants to reduce the risk that all nodes land on the same physical hardware rack. Which placement strategy should the solutions architect use?

A.Cluster placement group

B.Spread placement group

C.Partition placement group

D.Auto Scaling group with a mixed instances policy

AnswerA

Cluster placement groups place instances physically close together to maximize bandwidth and minimize latency. That is the best fit for a high-chatty, east-west workload where network performance matters more than fault isolation at the rack level.

Why this answer

A cluster placement group is the correct choice because it provides the lowest possible network latency and jitter by ensuring all 10 EC2 instances are placed in close proximity within a single Availability Zone, enabling non-blocking, high-bandwidth communication. This is ideal for high-volume east-west traffic, as it maximizes network performance for tightly coupled workloads like analytics platforms.

Exam trap

The trap here is that candidates may choose a Spread placement group to reduce hardware rack risk, overlooking that the primary requirement is lowest latency and jitter, which cluster groups provide, while Spread groups sacrifice performance for fault isolation.

How to eliminate wrong answers

Option B (Spread placement group) is wrong because it spreads instances across distinct hardware racks to reduce risk of simultaneous failure, but it increases network latency and jitter due to physical separation, which conflicts with the requirement for lowest possible latency. Option C (Partition placement group) is wrong because it divides instances into logical partitions across racks to isolate failures, but it does not guarantee the lowest latency or jitter for east-west traffic, as instances in different partitions may be on separate racks. Option D (Auto Scaling group with a mixed instances policy) is wrong because it focuses on instance diversity and scaling, not on network placement; it does not control physical proximity or reduce latency/jitter, and it may even increase variability in network performance.

Full explanation →

747

MCQhard

A media processing workflow generates analytics files that are accessed unpredictably. Some files become hot again months later. The team wants automatic storage cost optimisation without retrieval delays. What should be used? The architecture review board prefers a managed AWS-native control.

A.S3 Intelligent-Tiering

B.Manual monthly review and object copying

C.S3 Glacier Flexible Retrieval for all files

D.EFS One Zone for analytics files

AnswerA

Intelligent-Tiering automatically moves objects between access tiers based on usage while preserving low-latency access.

Why this answer

S3 Intelligent-Tiering is the correct choice because it automatically moves objects between access tiers (frequent, infrequent, and archive instant retrieval) based on changing access patterns, without any retrieval delays for hot objects. This matches the unpredictable access pattern where files may become hot again months later, and it is a fully managed AWS-native solution that optimizes storage costs automatically.

Exam trap

The trap here is that candidates may choose S3 Glacier Flexible Retrieval (Option C) thinking it is the cheapest for all files, but they overlook the retrieval delay requirement and the fact that files may become hot again, which Intelligent-Tiering handles seamlessly without any retrieval latency.

How to eliminate wrong answers

Option B is wrong because manual monthly review and object copying is not automated, introduces operational overhead, and risks human error or delays, failing the 'automatic' and 'managed AWS-native' requirements. Option C is wrong because S3 Glacier Flexible Retrieval has retrieval delays (minutes to hours) for all files, which violates the 'no retrieval delays' requirement for files that become hot again. Option D is wrong because EFS One Zone is a file system, not an object storage service, and it is not designed for cost optimization of unpredictable access patterns; it also lacks the automatic tiering capability and is not the right service for analytics files that are accessed via S3 APIs.

Full explanation →

748

MCQeasy

A team runs a stateless web app on Amazon EC2 behind an Application Load Balancer. During traffic spikes, new EC2 instances take several minutes to finish bootstrapping before they can receive traffic. Which Auto Scaling configuration most directly reduces the time until additional capacity is available?

A.Increase the ALB target group deregistration delay.

B.Use an Auto Scaling warm pool so pre-initialized instances are ready to enter service.

C.Reduce the Auto Scaling group minimum size to one instance.

D.Replace the Application Load Balancer with a Network Load Balancer.

AnswerB

Warm pools keep instances pre-launched and initialized, which reduces the time needed to add capacity during spikes.

Why this answer

Option B is correct because an Auto Scaling warm pool allows you to maintain a pool of pre-initialized instances that are fully bootstrapped and ready to enter service. When a scale-out event occurs, instances from the warm pool can be moved into the Auto Scaling group and start receiving traffic almost immediately, bypassing the several-minute bootstrapping delay.

Exam trap

The trap here is that candidates may confuse the deregistration delay (which controls how gracefully existing connections are drained) with a mechanism that speeds up new instance readiness, or they may think that changing the load balancer type or reducing the minimum size will somehow accelerate instance bootstrapping.

How to eliminate wrong answers

Option A is wrong because increasing the ALB target group deregistration delay only affects how long the ALB waits before stopping traffic to instances that are being deregistered; it does not reduce the time for new instances to become ready. Option C is wrong because reducing the Auto Scaling group minimum size to one instance would actually reduce the baseline capacity, potentially making the application more vulnerable to traffic spikes and not addressing the bootstrapping delay. Option D is wrong because replacing the Application Load Balancer with a Network Load Balancer does not affect instance bootstrapping time; NLB operates at Layer 4 and does not provide the HTTP/HTTPS health checks or path-based routing that ALB offers, and it does not accelerate instance initialization.

Full explanation →

749

MCQmedium

A.Switch the instances to Spot Instances and use interruption handling because it is the largest discount.

B.Purchase a Compute Savings Plan for the expected steady hourly usage in that Region.

C.Purchase a Standard Reserved Instance tied to a single specific instance type for the next 3 years.

D.Keep On-Demand and rely on Auto Scaling to reduce capacity when utilization is low.

AnswerB

Compute Savings Plans discount the usage while allowing flexibility across instance families and sizes in the Region.

Why this answer

B is correct because a Compute Savings Plan offers the lowest cost for steady-state workloads without locking to a specific instance type, providing up to 66% discount compared to On-Demand while allowing flexibility to change instance families, sizes, OS, or tenancy within a Region. This matches the requirement for cost savings and flexibility during planned optimizations.

Exam trap

The trap here is that candidates often choose Spot Instances for cost savings without considering the interruption-intolerant requirement, or they select Standard Reserved Instances for the highest discount without recognizing the rigidity penalty for planned instance family changes.

How to eliminate wrong answers

Option A is wrong because Spot Instances are not suitable for interruption-intolerant workloads; they can be reclaimed with a 2-minute warning, causing service disruption. Option C is wrong because Standard Reserved Instances lock to a specific instance type and size, which contradicts the requirement to avoid rigidity and change instance families during optimizations. Option D is wrong because keeping On-Demand and relying on Auto Scaling does not reduce the per-hour cost; it only adjusts capacity, and the workload expects steady usage, so Auto Scaling would not lower costs significantly.

Full explanation →

750

MCQeasy

A website serves versioned JavaScript and CSS files through CloudFront, but origin fetches are still high and the CloudFront bill increased. Developers confirm that URLs include a version in the filename (for example, app.1.4.2.js). What CloudFront behavior/configuration is most likely to reduce origin fetches and associated costs?

A.Set long cache headers (for example, Cache-Control: max-age and immutable) on those versioned assets so CloudFront caches them longer.

B.Disable compression to reduce CPU time spent at the edge and therefore reduce total cost.

C.Lower the cache policy TTLs so clients always get the newest assets quickly.

D.Remove version identifiers from filenames so CloudFront caches fewer unique objects.

AnswerA

Because the filenames are versioned, each URL is effectively immutable. Longer TTL/max-age cache headers increase the cache hit ratio, so CloudFront serves subsequent requests from edge caches instead of re-fetching from the origin.

Why this answer

Setting long cache headers like `Cache-Control: max-age=31536000, immutable` on versioned assets tells CloudFront to cache these objects at edge locations for an extended period. Since the filename changes with each new version, CloudFront treats each version as a unique object and will not re-fetch the old version from the origin, dramatically reducing origin fetches and associated costs.

Exam trap

The trap here is that candidates may think lowering TTLs or removing versioning helps with freshness or cost, but the key insight is that versioned filenames already solve cache invalidation, so extending cache duration is the cost-optimized approach.

How to eliminate wrong answers

Option B is wrong because disabling compression does not reduce CPU time at the edge in a meaningful way for cost reduction; CloudFront charges for data transfer and requests, not CPU, and compression actually reduces data transfer costs. Option C is wrong because lowering cache policy TTLs would cause CloudFront to re-fetch objects from the origin more frequently, increasing origin fetches and costs, which is the opposite of the desired outcome. Option D is wrong because removing version identifiers would cause CloudFront to treat all updates as the same object, leading to cache invalidation issues and potentially higher origin fetches when clients request the latest version without a cache busting mechanism.

Full explanation →

SAA-C03 (SAA-C03) — Questions 676–750