SAA-C03 SAA-C03 Questions 451–525 | Page 7/14

451

MCQeasy

A company serves a public API through a CloudFront distribution. They want to automatically block common web exploits (for example, OWASP Top 10–style threats) without building custom detection logic. Which AWS service configuration best meets the goal?

A.Enable AWS WAF with AWS Managed Rules and associate the web ACL with the CloudFront distribution.

B.Enable AWS Shield Advanced only; it fully replaces the need for WAF rule evaluation.

C.Attach a security group rule to the ALB to block malicious patterns based on HTTP request bodies.

D.Use Security Hub to block requests automatically when it detects suspicious activity.

AnswerA

AWS WAF inspects HTTP(S) requests and applies allow/block decisions based on rule matches. AWS Managed Rules provide prebuilt protections for common threat patterns, and attaching the WAF web ACL to CloudFront applies filtering at the edge.

Why this answer

AWS WAF with AWS Managed Rules provides pre-configured rule sets specifically designed to block common web exploits, including OWASP Top 10 threats, without requiring custom detection logic. By associating the web ACL with a CloudFront distribution, the filtering occurs at the edge, protecting the origin from malicious traffic before it reaches the application.

Exam trap

The trap here is confusing AWS Shield Advanced (which handles volumetric DDoS attacks) with AWS WAF (which handles application-layer threats like OWASP Top 10), leading candidates to believe Shield alone can replace WAF rule evaluation.

How to eliminate wrong answers

Option B is wrong because AWS Shield Advanced provides DDoS protection and cost mitigation, but it does not include application-layer rule evaluation for OWASP Top 10 threats; it is not a replacement for WAF. Option C is wrong because security groups operate at the network layer (Layer 3/4) and cannot inspect HTTP request bodies or application-layer payloads to block patterns like SQL injection or XSS. Option D is wrong because AWS Security Hub is a security posture management service that aggregates findings and does not have the capability to automatically block requests in real-time; it lacks inline traffic inspection and enforcement actions.

Full explanation →

452

Multi-Selecthard

Security responders suspect exfiltration from an Amazon S3 bucket that stores sensitive reports encrypted with a customer managed KMS key. They need to identify which IAM principal downloaded each object and whether any principals called KMS Decrypt on the key during the same time window. Which two detective controls should be enabled? Select two.

Select 2 answers

A.Enable CloudTrail data events for the S3 bucket.

B.Include CloudTrail management events for KMS API calls on the customer managed key.

C.Enable S3 Object Lock in compliance mode.

D.Turn on default bucket encryption with SSE-KMS.

E.Enable MFA Delete on the bucket.

AnswersA, B

S3 data events record object-level API activity such as GetObject, PutObject, and DeleteObject, along with the IAM principal or role session that made the call. That visibility is required to determine exactly who downloaded which object and when.

Why this answer

Option A is correct because enabling CloudTrail data events for the S3 bucket captures detailed logs of object-level operations, including GetObject (download) requests. This allows you to identify which IAM principal downloaded each object, including the source IP, user agent, and request time. Without data events, CloudTrail only logs management-level actions (e.g., bucket creation) and misses object-level access.

Exam trap

The trap here is that candidates often think CloudTrail management events cover all KMS activity, but KMS Decrypt on a customer managed key is indeed a management event, while S3 object downloads require data events; confusing these two event types leads to missing the correct pairing.

Full explanation →

453

MCQmedium

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include?

A.A single EC2 instance with detailed monitoring

B.Subnets in at least two Availability Zones with health checks enabled

C.All instances in one larger subnet

D.A Network Load Balancer in one subnet

AnswerB

An Auto Scaling group spanning multiple AZs can replace unhealthy instances and maintain capacity during an AZ failure.

Why this answer

Option B is correct because distributing EC2 instances across subnets in at least two Availability Zones ensures that if one AZ fails, the Auto Scaling group can maintain capacity using instances in the remaining AZ(s). Enabling health checks allows the group to detect and replace unhealthy instances, which is essential for fault tolerance. This configuration meets the requirement to tolerate the failure of one Availability Zone.

Exam trap

The trap here is that candidates often confuse high availability with fault tolerance, thinking a single large subnet or a single instance with monitoring is sufficient, when in fact distributing across multiple Availability Zones is the key to surviving an AZ failure.

How to eliminate wrong answers

Option A is wrong because a single EC2 instance, even with detailed monitoring, cannot tolerate the failure of an entire Availability Zone; if that AZ goes down, the instance becomes unavailable. Option C is wrong because placing all instances in one larger subnet confines them to a single Availability Zone, providing no redundancy if that AZ fails. Option D is wrong because a Network Load Balancer in one subnet does not solve the AZ failure requirement; the Auto Scaling group must span multiple AZs, and the load balancer itself should be cross-zone enabled to distribute traffic across AZs.

Full explanation →

454

MCQmedium

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured?

A.An EBS snapshot schedule

B.S3 Cross-Region Replication with versioning enabled

C.S3 lifecycle transition to Glacier Flexible Retrieval

D.A CloudFront distribution

AnswerB

CRR asynchronously replicates objects to a bucket in another Region and requires versioning.

Why this answer

S3 Cross-Region Replication (CRR) with versioning enabled automatically replicates objects to a destination bucket in a different AWS Region, providing a durable, low-latency disaster recovery copy. Versioning must be enabled on both source and destination buckets to track object changes and ensure consistency during replication. This meets the requirement for a cross-region copy without manual intervention.

Exam trap

The trap here is that candidates may confuse lifecycle transitions (which change storage class within the same region) with cross-region replication (which copies data to a different region), or assume EBS snapshots apply to S3 storage.

How to eliminate wrong answers

Option A is wrong because EBS snapshots are used for backing up EC2 block storage volumes, not for S3 objects, and they are region-specific unless manually copied. Option C is wrong because S3 lifecycle transition to Glacier Flexible Retrieval moves objects to a cold storage tier for cost savings, not to a different AWS Region for disaster recovery. Option D is wrong because CloudFront is a content delivery network that caches data at edge locations for low-latency access, not a mechanism for replicating data to another region for DR.

Full explanation →

455

MCQeasy

An internal service is hosted behind an Application Load Balancer (ALB) with targets spread across two Availability Zones. If the targets in one Availability Zone become unhealthy, the service must continue serving traffic from the healthy AZ. What change most directly improves resilience at the load-balancing layer?

A.Turn off health checks and rely only on instance CPU utilization to route traffic.

B.Configure ALB listener rules to route all traffic to a single target group in one Availability Zone.

C.Configure target group health checks so the ALB stops sending traffic to unhealthy targets and continues routing to healthy targets in the other Availability Zone.

D.Store requests in an SQS queue before routing them to the ALB.

AnswerC

With target group health checks enabled and configured correctly, the ALB evaluates each target's health and stops routing requests to targets marked unhealthy. As long as healthy targets exist in the other AZ, the ALB preserves reachability.

Why this answer

Option C is correct because configuring target group health checks allows the ALB to automatically detect unhealthy targets and stop sending traffic to them, while continuing to route requests to healthy targets in the other Availability Zone. This directly improves resilience at the load-balancing layer by ensuring traffic is only forwarded to healthy instances, maintaining service availability even when an entire AZ fails.

Exam trap

The trap here is that candidates may think SQS decoupling (Option D) improves resilience at the load-balancing layer, but SQS operates at the application layer and does not affect how the ALB routes traffic to unhealthy targets.

How to eliminate wrong answers

Option A is wrong because turning off health checks removes the ALB's ability to detect unhealthy targets, which would cause traffic to be sent to failed instances, breaking resilience. Option B is wrong because routing all traffic to a single target group in one AZ creates a single point of failure and defeats the purpose of multi-AZ redundancy. Option D is wrong because storing requests in an SQS queue before routing to the ALB adds unnecessary latency and complexity, and does not address the immediate need for the ALB to stop sending traffic to unhealthy targets.

Full explanation →

456

MCQeasy

A latency-sensitive trading workload runs on 6 EC2 instances. You must distribute the instances so they do NOT share the same underlying hardware rack, reducing the risk of correlated rack-level faults. Which EC2 placement group strategy best meets this requirement?

A.Cluster placement group

B.Spread placement group

C.Partition placement group

D.No placement group, rely on the default scheduler

AnswerB

Spread placement groups place instances across distinct underlying hardware, separating them onto different racks within a single Availability Zone. This reduces the chance that a rack-level issue impacts multiple instances simultaneously and directly matches the requirement.

Why this answer

A Spread placement group is the correct choice because it ensures each EC2 instance is placed on distinct underlying hardware (different racks), eliminating shared fault domains. This directly meets the requirement to avoid correlated rack-level failures for latency-sensitive trading workloads.

Exam trap

The trap here is that candidates often confuse Partition placement groups with Spread groups, assuming partitions guarantee rack-level isolation, but partitions only separate instances into logical groups that may still share racks within a partition.

How to eliminate wrong answers

Option A is wrong because a Cluster placement group places instances in a single, low-latency rack, which increases the risk of correlated failures from rack-level faults. Option C is wrong because a Partition placement group spreads instances across logical partitions within an Availability Zone, but multiple instances can share the same rack within a partition, not guaranteeing isolation at the rack level. Option D is wrong because the default scheduler does not guarantee placement on separate hardware racks, leaving the workload vulnerable to correlated failures.

Full explanation →

457

Multi-Selecthard

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The design must avoid adding custom operational scripts.

Select 2 answers

A.Route 53 failover routing with health checks

B.S3 Transfer Acceleration

C.A deployed standby application stack in the secondary Region

D.AWS Organizations service control policies

AnswersA, C

Route 53 can monitor endpoint health and return the standby endpoint when the primary is unhealthy.

Why this answer

Route 53 failover routing with health checks (Option A) is required because it automatically evaluates the health of the primary endpoint and, upon detecting failure, updates DNS resolution to direct traffic to the secondary Region. This is the native AWS mechanism for DNS-based failover without custom scripts, relying on Route 53 health checkers to assess endpoint health via HTTP/HTTPS/TCP or calculated health checks.

Exam trap

The trap here is that candidates may think a single service like Route 53 alone can handle failover, but without a pre-deployed standby application stack in the secondary Region, there is no infrastructure to route traffic to, making both Route 53 failover routing and the standby stack required together.

Full explanation →

458

MCQeasy

A retail analytics app uses Amazon RDS for PostgreSQL. Read traffic is growing, and the database CPU spikes mainly due to SELECT-heavy workloads. Writes are less frequent, and the app can tolerate eventually consistent reads for the reports. What is the most appropriate AWS-native way to improve read performance with minimal application changes?

A.Create an RDS read replica and point the reporting queries to the replica endpoint.

B.Switch the cluster to DynamoDB without redesigning the data model.

C.Enable S3 event notifications to trigger a Lambda function after each write to the database.

D.Replace the RDS instance class with a smaller size to reduce cost and improve performance.

AnswerA

Read replicas offload reads from the primary and can speed up SELECT-heavy workloads with minimal changes.

Why this answer

Creating an RDS read replica is the most appropriate AWS-native solution because it offloads SELECT-heavy read traffic from the primary database instance to a separate read-only replica, reducing CPU spikes on the primary. The application can tolerate eventually consistent reads for reports, which aligns with the natural replication lag of RDS read replicas (typically sub-second). This requires minimal application changes—only updating the reporting queries to point to the replica endpoint—and leverages PostgreSQL's built-in streaming replication.

Exam trap

The trap here is that candidates may confuse read replicas with Multi-AZ deployments, thinking Multi-AZ improves read performance, but Multi-AZ only provides failover redundancy and does not offload read traffic—the standby is not accessible for reads.

How to eliminate wrong answers

Option B is wrong because switching to DynamoDB without redesigning the data model would require significant application changes (e.g., adapting from relational to NoSQL schema, handling partition keys, and losing SQL query capabilities), which contradicts the requirement for minimal application changes. Option C is wrong because enabling S3 event notifications to trigger a Lambda function after each write does not directly improve read performance on the database; it adds asynchronous processing overhead and does not offload SELECT queries from the RDS instance. Option D is wrong because replacing the RDS instance class with a smaller size would reduce CPU capacity, worsening performance under the existing SELECT-heavy workload, and does not address the root cause of CPU spikes.

Full explanation →

459

MCQmedium

A static website uses an Amazon S3 bucket as the origin for an Amazon CloudFront distribution. The team accidentally configured the S3 bucket policy to allow s3:GetObject to Principal "*", so objects are accessible via direct S3 URLs. They want to ensure objects are retrievable only through CloudFront. What is the best corrective action?

A.Remove public access from the bucket and update the bucket policy to allow GetObject only from CloudFront using the distribution’s SourceArn (and use CloudFront origin access control or origin access identity).

B.Enable S3 static website hosting and disable CloudFront, because website hosting blocks direct object URL access.

C.Add a WAF rule that rate-limits requests to the S3 bucket domain to make direct access impractical.

D.Turn on S3 object versioning so that attackers cannot read previous objects.

AnswerA

Restricting the bucket policy to CloudFront’s principal with a SourceArn condition prevents direct S3 access while enabling CloudFront.

Why this answer

Option A is correct because the S3 bucket policy currently allows s3:GetObject from any principal, making objects publicly accessible via direct S3 URLs. By removing public access and updating the policy to restrict GetObject to only requests that originate from the CloudFront distribution (using either Origin Access Control or Origin Access Identity), objects become retrievable exclusively through CloudFront, preventing direct S3 access.

Exam trap

The trap here is that candidates may think enabling S3 static website hosting or versioning solves the access control issue, but neither changes the bucket policy—only explicitly restricting the policy to CloudFront’s identity prevents direct S3 URL access.

How to eliminate wrong answers

Option B is wrong because enabling S3 static website hosting does not block direct object URL access; the S3 website endpoint is separate from the REST API endpoint, but the bucket policy still controls access, and objects remain accessible via direct S3 URLs unless the policy is restricted. Option C is wrong because a WAF rule applied to the S3 bucket domain is ineffective—WAF is a CloudFront feature and cannot be attached directly to an S3 bucket endpoint; rate-limiting would not prevent direct access, only reduce its frequency. Option D is wrong because enabling object versioning does not restrict access; it only preserves previous object versions, and without a restrictive bucket policy, all versions remain publicly accessible via direct S3 URLs.

Full explanation →

460

MCQmedium

A public API for a customer analytics portal is deployed on API Gateway. Clients must authenticate with standards-based tokens issued by an external OpenID Connect provider. Which authorization mechanism should be used?

A.API keys only

B.JWT authorizer configured for the OpenID Connect issuer

C.IAM authorization for all internet users

D.A VPC endpoint policy

AnswerB

A JWT authorizer validates tokens from a trusted OIDC issuer with low operational overhead.

Why this answer

Option B is correct because the scenario requires standards-based token authentication from an external OpenID Connect (OIDC) provider. API Gateway's JWT authorizer natively validates JSON Web Tokens (JWTs) issued by OIDC providers by verifying the token's signature against the provider's JWKS endpoint, checking the `iss` and `aud` claims, and enforcing token expiration. This directly meets the requirement without needing custom Lambda authorizers or additional infrastructure.

Exam trap

The trap here is that candidates confuse API keys (which are static and not standards-based) with JWT tokens (which are cryptographically signed and verifiable), or assume IAM authorization can be used for external identities without understanding that IAM requires AWS credentials, not OIDC tokens.

How to eliminate wrong answers

Option A is wrong because API keys only provide simple identification and rate limiting, not authentication or authorization; they do not validate token signatures, claims, or issuer trust. Option C is wrong because IAM authorization is designed for AWS internal identities (IAM users/roles) and requires AWS Signature V4 signing, which is not compatible with external OIDC tokens or internet-based clients without custom signing logic. Option D is wrong because a VPC endpoint policy controls access to API Gateway via VPC endpoints, not authentication; it cannot validate OIDC tokens or handle client identity from the public internet.

Full explanation →

461

Drag & Dropmedium

Order the steps for setting up a VPC with public and private subnets using a NAT gateway.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

VPC creation comes first, then subnets, IGW, NAT Gateway, and finally route table updates.

Full explanation →

462

MCQhard

A patient portal must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The architecture review board prefers a managed AWS-native control.

A.Use an in-memory queue on one EC2 instance

B.Use UDP messages sent directly to workers

C.Use Amazon SQS standard queue and design consumers to be idempotent

D.Use CloudFront signed URLs

AnswerC

SQS standard queues provide at-least-once delivery and high throughput; consumers must handle occasional duplicates.

Why this answer

Amazon SQS standard queues provide at-least-once delivery, meaning each message is delivered at least once but may be delivered more than once. This aligns perfectly with the requirement that every event must be processed at least once and that duplicate processing is acceptable if consumers are idempotent. SQS is a fully managed, AWS-native service that meets the architecture review board's preference for a managed solution.

Exam trap

The trap here is that candidates may confuse 'at-least-once' delivery with 'exactly-once' delivery and incorrectly choose a solution like an in-memory queue (Option A) or UDP (Option B), overlooking that SQS standard queues are the managed, AWS-native way to achieve at-least-once delivery with idempotent consumers.

How to eliminate wrong answers

Option A is wrong because an in-memory queue on a single EC2 instance is not managed, introduces a single point of failure, and cannot guarantee at-least-once delivery across failures. Option B is wrong because UDP is a connectionless, unreliable protocol that does not guarantee message delivery, so it cannot ensure every event is processed at least once. Option D is wrong because CloudFront signed URLs are used for securing content delivery, not for event processing or messaging between components.

Full explanation →

463

MCQmedium

A microservice running in ECS retrieves a secret from AWS Secrets Manager. The secret is encrypted with a customer-managed CMK. An administrator re-keyed the secret to a new CMK (the key ARN changed), but kept the same KMS alias name. After re-keying, the service fails with an error from KMS: AccessDenied for kms:Decrypt. The ECS task role’s IAM policy still grants kms:Decrypt but only for the old CMK ARN. What is the best remediation to restore access while maintaining least privilege?

A.Update the IAM policy to allow kms:Decrypt for all CMKs in the account using a wildcard resource (for example, arn:aws:kms:region:account-id:key/*).

B.Update the ECS task role IAM policy to grant kms:Decrypt on the CMK alias ARN (arn:aws:kms:region:account-id:alias/<alias-name>) or to include the new CMK ARN, so decrypt authorization matches the re-keyed CMK.

C.Change the application to decrypt the secret itself using SSE-C keys so Secrets Manager no longer needs KMS.

D.Enable KMS key rotation for the old CMK so the CMK ARN resolves to the new key.

AnswerB

Since the IAM policy references only the old CMK ARN, it no longer matches the CMK used after re-keying. Using the alias ARN maintains least privilege and continues to work because the alias now points to the new CMK.

Why this answer

Option B is correct because the ECS task role's IAM policy still references the old CMK ARN, but the secret is now encrypted with a new CMK. Since KMS authorization is based on the key ARN (not the alias), the policy must be updated to grant kms:Decrypt on the new CMK ARN or on the alias ARN (which resolves to the current underlying key). This restores access while maintaining least privilege by scoping permissions to the specific key used for decryption.

Exam trap

The trap here is that candidates assume KMS aliases can be used directly in IAM resource ARNs for authorization, but IAM policies that specify a key ARN (not an alias ARN) will fail after re-keying because the secret's encryption key has changed to a different CMK.

How to eliminate wrong answers

Option A is wrong because using a wildcard resource (arn:aws:kms:region:account-id:key/*) grants kms:Decrypt on all CMKs in the account, violating least privilege and potentially exposing other encrypted resources. Option C is wrong because SSE-C keys are used for Amazon S3 server-side encryption, not for Secrets Manager; Secrets Manager relies on KMS for envelope encryption, and changing to SSE-C would not integrate with Secrets Manager's native KMS-based encryption. Option D is wrong because enabling KMS key rotation for the old CMK does not change the key ARN; rotation creates new backing key material but retains the same key ID and ARN, so the old CMK ARN still does not authorize decryption of a secret encrypted with a different CMK.

Full explanation →

464

MCQeasy

An event consumer sometimes processes the same SQS message more than once due to timeouts and retries. The consumer must ensure the payment is not charged twice. What design choice best addresses this requirement?

A.Assume messages are processed exactly once because SQS uses durable storage.

B.Make the payment operation idempotent by using an idempotency key and skipping side effects when the key indicates the payment already succeeded.

C.Increase the consumer visibility timeout to several days so messages are not redelivered.

D.Delete the message immediately even if processing fails validation.

AnswerB

Idempotency ensures that repeated processing attempts produce the same result. The consumer should use a stable idempotency key (for example, a business transaction ID) and record completion in durable storage. If the key already indicates the payment succeeded, the consumer skips charging again.

Why this answer

Option B is correct because making the payment operation idempotent using an idempotency key ensures that even if the same SQS message is processed multiple times due to timeouts and retries, the payment will only be charged once. The consumer checks the idempotency key before executing the payment; if the key indicates the payment already succeeded, the consumer skips the side effect. This pattern directly addresses the requirement of not charging twice without relying on SQS's at-least-once delivery guarantee.

Exam trap

The trap here is that candidates assume SQS provides exactly-once delivery or that increasing the visibility timeout is a reliable solution, but the exam tests understanding that SQS is at-least-once and that idempotency is the correct architectural pattern to handle duplicates.

How to eliminate wrong answers

Option A is wrong because SQS guarantees at-least-once delivery, not exactly-once processing; messages can be duplicated due to network issues or consumer timeouts, so assuming exactly-once processing is incorrect. Option C is wrong because increasing the visibility timeout to several days does not prevent redelivery; it only delays it, and if the consumer crashes or fails to delete the message, it will still be redelivered after the timeout expires. Option D is wrong because deleting a message immediately even if processing fails validation means the message is lost permanently, preventing any retry or dead-letter queue handling, which can lead to data loss or incomplete processing.

Full explanation →

465

Multi-Selectmedium

A media company stores application logs in S3. The logs must be kept for 400 days. They are read heavily for the first 30 days, occasionally for the next 90 days, and almost never after that. Retrieval after the first 3 months can wait a few hours. Which three lifecycle actions should they use to minimize storage cost? Select three.

Select 3 answers

A.Transition objects to S3 Standard-IA after 30 days.

B.Transition objects to S3 Glacier Flexible Retrieval after 90 days.

C.Expire objects after 400 days.

D.Keep all objects in S3 Standard for the full retention period.

E.Transition objects to S3 One Zone-IA after 30 days.

AnswersA, B, C

Standard-IA is appropriate once logs are no longer read frequently but still need fast retrieval. Moving after the heavy-access period lowers storage cost while keeping objects available for occasional reads.

Why this answer

Option A is correct because after the initial 30-day period of heavy reads, transitioning objects to S3 Standard-IA reduces storage costs while still providing low-latency retrieval for the occasional access needed over the next 90 days. S3 Standard-IA is designed for data accessed infrequently but requires rapid access when needed, making it cost-effective for this usage pattern.

Exam trap

The trap here is that candidates may confuse S3 One Zone-IA with Standard-IA, not realizing that One Zone-IA lacks the multi-AZ resilience and is not appropriate for logs that must be retained for compliance or occasional retrieval.

Full explanation →

466

MCQmedium

A inventory service uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The architecture review board prefers a managed AWS-native control.

A.Lambda reserved concurrency set to zero

B.A Lambda dead-letter queue or failure destination

C.A larger deployment package

D.CloudFront error pages

AnswerB

A DLQ or asynchronous failure destination captures failed events after retry attempts.

Why this answer

Lambda dead-letter queues (DLQs) or failure destinations are the correct AWS-native mechanism to retain failed events after retries are exhausted. When a Lambda function fails to process an event (e.g., due to an unreliable third-party API), the function can be configured to send the failed event payload to an SQS queue or SNS topic for later investigation. This ensures no data loss and aligns with the requirement for a managed, AWS-native solution.

Exam trap

The trap here is that candidates often confuse Lambda DLQs with SQS DLQs or assume that increasing retries (via reserved concurrency or package size) solves the retention problem, but the key is the explicit configuration to capture events after retries are exhausted.

How to eliminate wrong answers

Option A is wrong because setting Lambda reserved concurrency to zero would prevent the function from executing at all, not retain failed events. Option C is wrong because a larger deployment package has no impact on error handling or event retention; it only affects cold start times and deployment size. Option D is wrong because CloudFront error pages are for HTTP-level errors in front of web applications, not for Lambda function invocation failures or event retention.

Full explanation →

467

MCQmedium

A service reads encrypted data from Amazon S3. The S3 objects use a customer-managed CMK. The IAM role used by the service has kms:Decrypt in its identity policy, but decryption fails with a KMS error stating the role is not authorized to perform kms:CreateGrant. The CMK’s key policy allows kms:Decrypt for the role but does not include kms:CreateGrant. What is the most appropriate change to resolve the failure while preserving least privilege?

A.Add kms:CreateGrant permission to the CMK key policy for the role (scoped to the necessary CMK), keeping other KMS permissions minimal.

B.Enable key rotation because it makes grant creation unnecessary.

C.Add kms:DescribeKey to the key policy and remove kms:Decrypt to reduce permissions.

D.Update the IAM role to use kms:ScheduleKeyDeletion so future decrypt attempts succeed.

AnswerA

The error explicitly indicates missing authorization to create grants (kms:CreateGrant). Some AWS services require creating a grant to use a key on behalf of a principal. Adding only kms:CreateGrant to the key policy for the specific role resolves the failure with minimal additional access.

Why this answer

The error indicates that the service requires a grant to allow the KMS key to be used in a cryptographic operation that involves a grant-based workflow (e.g., using the S3 bucket key or cross-account access). The IAM role has kms:Decrypt, but the key policy does not grant kms:CreateGrant, which is necessary for the service to create a grant on the CMK. Adding kms:CreateGrant to the key policy scoped to the role resolves the failure while adhering to least privilege by only granting the minimum additional permission needed.

Exam trap

The trap here is that candidates assume IAM permissions alone are sufficient for all KMS operations, but key policies must explicitly allow kms:CreateGrant when a service needs to create a grant on a customer-managed CMK.

How to eliminate wrong answers

Option B is wrong because enabling key rotation does not eliminate the need for kms:CreateGrant; key rotation changes the underlying key material but does not affect grant-based authorization requirements. Option C is wrong because adding kms:DescribeKey does not resolve the missing kms:CreateGrant, and removing kms:Decrypt would break the decryption operation entirely. Option D is wrong because kms:ScheduleKeyDeletion is used to schedule key deletion, which would make the key unusable and is unrelated to grant creation or decryption authorization.

Full explanation →

468

MCQeasy

A company’s private workload in a VPC uploads objects to an S3 bucket. Security requires that S3 requests are allowed only when they traverse a specific S3 Gateway VPC Endpoint (vpce-0abc123example). Which change best enforces this restriction at the S3 bucket level?

A.Add an S3 bucket policy Deny statement for s3:PutObject when aws:sourceVpce is not equal to vpce-0abc123example.

B.Add an S3 bucket policy Deny statement that blocks requests unless the principal uses MFA.

C.Enable Block Public Access and remove the public bucket policy statement.

D.Attach an IAM policy to the workload role that allows s3:PutObject only to the bucket ARN.

AnswerA

A bucket policy can use the request context key aws:sourceVpce to distinguish requests that came through a particular VPC endpoint. Using a Deny with a condition such as StringNotEquals on aws:sourceVpce blocks PutObject unless the request reached S3 via that specific Gateway Endpoint. Requests that arrive by other network paths will not match the required endpoint ID and will be denied.

Why this answer

Option A is correct because it uses an S3 bucket policy with a Deny statement that explicitly denies any s3:PutObject request unless the request originates from the specified VPC Endpoint (vpce-0abc123example). The aws:sourceVpce condition key evaluates the VPC endpoint ID from which the request is made, ensuring that only traffic through that specific Gateway VPC Endpoint is allowed. This enforces the security requirement at the bucket level, overriding any other policies that might allow access from other sources.

Exam trap

The trap here is that candidates often confuse IAM policies (which control who can act) with bucket policies (which control how and from where access is allowed), leading them to choose an IAM-based solution (Option D) that does not enforce the network-level restriction required by the scenario.

How to eliminate wrong answers

Option B is wrong because requiring MFA does not restrict requests to a specific VPC Endpoint; it only adds an authentication factor, which does not enforce the network-level restriction. Option C is wrong because Block Public Access and removing public policies prevent public access but do not restrict requests to a specific VPC Endpoint; private traffic from other sources (e.g., the internet via a NAT gateway) would still be allowed. Option D is wrong because an IAM policy attached to the workload role controls what the role can do but does not restrict the network path; the workload could still send requests from any network interface, not just the specified VPC Endpoint.

Full explanation →

469

MCQmedium

A SaaS platform serves an API using two regional deployments: us-east-1 (primary) and us-west-2 (secondary). Each region has its own ALB. The business requires automated DNS-based failover when the primary region becomes unhealthy, and they do not want manual DNS changes during incidents. Which Route 53 configuration is the best match?

A.Create a single Route 53 record using weighted routing across both ALBs with weights adjusted manually during an incident.

B.Use Route 53 failover routing with a primary record pointing to the us-east-1 ALB and a secondary record pointing to the us-west-2 ALB, each using health checks.

C.Use latency-based routing so Route 53 always selects the fastest region; health checks are unnecessary because client latency reflects availability.

D.Use a single A record with a static IP address that points to a NAT gateway, and update that IP during failure events.

AnswerB

Failover routing with health checks enables automatic switching of DNS responses when the primary endpoint fails health evaluation.

Why this answer

Route 53 failover routing is designed for active-passive configurations where traffic must automatically shift to a secondary endpoint when the primary fails. By attaching health checks to the primary record (us-east-1 ALB), Route 53 can detect regional unavailability and automatically route traffic to the secondary record (us-west-2 ALB) without manual intervention. This meets the requirement for DNS-based failover without manual DNS changes during incidents.

Exam trap

The trap here is that candidates often confuse latency-based routing with failover routing, assuming that lower latency implies availability, but latency routing does not incorporate health checks and cannot automatically redirect traffic away from an unhealthy region.

How to eliminate wrong answers

Option A is wrong because weighted routing requires manual adjustment of weights during an incident, which violates the requirement for no manual DNS changes. Option C is wrong because latency-based routing optimizes for performance, not availability; it does not use health checks to detect regional failures, so traffic could still be sent to an unhealthy region if it has lower latency. Option D is wrong because using a static IP pointing to a NAT gateway is not a valid DNS failover strategy; NAT gateways are not load balancers, and updating the IP during failure events requires manual intervention, contradicting the automated failover requirement.

Full explanation →

470

Multi-Selecthard

A partner integration sends a custom binary TCP protocol to a service running on EC2 instances in private subnets. The partners require static endpoint IPs for allowlisting, and the application must see the original client source IP for rate limiting. Which two changes best fit the protocol and network requirements? Select two.

Select 2 answers

A.Replace the Application Load Balancer with a Network Load Balancer.

B.Use a TCP listener on the load balancer instead of an HTTP or HTTPS listener.

C.Put the service behind API Gateway REST API and use Lambda integration.

D.Use CloudFront to cache the binary packets at edge locations.

E.Terminate the traffic with an Amazon RDS proxy to stabilize the connections.

AnswersA, B

A Network Load Balancer is the right choice for TCP traffic and low-latency forwarding at layer 4. It also supports static IP behavior that is important for partner allowlisting. This directly matches the custom binary protocol and source-IP requirement.

Why this answer

A Network Load Balancer (NLB) is required because it supports TCP traffic natively at Layer 4, which is necessary for a custom binary TCP protocol that cannot be interpreted by an Application Load Balancer (ALB) at Layer 7. Additionally, an NLB preserves the original client source IP address by default when used with targets in private subnets, meeting the requirement for rate limiting based on the client IP. Static IP addresses can be assigned to the NLB via Elastic IPs, satisfying the partner's need for static endpoint IPs for allowlisting.

Exam trap

The trap here is that candidates often assume an Application Load Balancer can handle any TCP traffic because it supports TCP listeners, but ALB only supports HTTP/HTTPS at Layer 7 and cannot process custom binary protocols, while NLB is the correct choice for non-HTTP TCP traffic with static IP and client IP preservation requirements.

Full explanation →

471

MCQmedium

A company serves the same public content to many users through Amazon CloudFront. The origin is experiencing increased fetches because CloudFront cache hit rate is dropping. Most requests include an Authorization header and a custom header that changes per user. The response content is identical regardless of these headers. What change should the solutions architect make to restore a high cache hit rate?

A.Create a custom cache policy that excludes the Authorization header and the per-user changing custom header from the cache key.

B.Lower the TTL to a few seconds so cached objects expire sooner and origin fetches decrease.

C.Disable caching for the affected paths so CloudFront always forwards all headers to the origin.

D.Force all requests to use query-string based caching and include all headers in the cache policy for correctness.

AnswerA

CloudFront cache keys determine how requests map to cached objects. If the response is identical regardless of certain headers, including those headers in the cache key causes cache fragmentation (many unique cache keys for what is effectively the same content). Excluding the Authorization header and the varying custom header from the cache key allows CloudFront to reuse cached responses across users, restoring hit rate and reducing origin fetches.

Why this answer

Option A is correct because CloudFront's default cache key includes the Authorization header and all custom headers, which causes unique cache entries for each user even though the content is identical. By creating a custom cache policy that excludes these headers from the cache key, CloudFront will treat requests with different header values as the same cached object, restoring a high cache hit rate and reducing origin fetches.

Exam trap

The trap here is that candidates may assume the Authorization header must always be included in the cache key for security, but for public content, it can be safely excluded to improve cache efficiency without compromising access control.

How to eliminate wrong answers

Option B is wrong because lowering the TTL causes cached objects to expire sooner, which increases origin fetches and further reduces the cache hit rate, the opposite of what is needed. Option C is wrong because disabling caching for the affected paths forces CloudFront to forward every request to the origin, eliminating cache hits entirely and defeating the purpose of using CloudFront. Option D is wrong because forcing query-string based caching and including all headers in the cache key would still create unique cache entries per user (since headers vary per user), and query strings are not relevant to the issue described.

Full explanation →

472

MCQmedium

A telemetry pipeline uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered?

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.CloudFront only with long TTLs

D.AWS Backup cross-Region copy

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator uses the Anycast IP address concept to route traffic through the AWS global network to the optimal endpoint, reducing latency and jitter for global users. It does not cache content, making it ideal for dynamic responses that cannot be cached, and it integrates directly with an Application Load Balancer in a single Region.

Exam trap

The trap here is that candidates often confuse CloudFront (a CDN with caching) with Global Accelerator (a non-caching network accelerator), assuming any edge service must cache content, but Global Accelerator is designed specifically for dynamic and uncacheable traffic.

How to eliminate wrong answers

Option B is wrong because S3 Cross-Region Replication is a storage feature for replicating objects across S3 buckets in different Regions; it does not reduce network latency for application traffic or handle dynamic HTTP responses. Option C is wrong because CloudFront with long TTLs caches responses at edge locations, which is unsuitable for dynamic content that must not be cached; long TTLs would serve stale data. Option D is wrong because AWS Backup cross-Region copy is a disaster recovery feature for backing up resources to another Region; it does not improve real-time network latency for users accessing the application.

Full explanation →

473

MCQmedium

A SaaS company uses an S3 bucket for database backups created daily. Backups are rarely restored; the company’s documented RTO is 24 hours, and the compliance policy requires backups be kept for 90 days. The team currently stores all backups in S3 Standard, which is costly. Which single lifecycle policy change is most cost-optimized while still meeting the 24-hour RTO and 90-day retention?

A.Add a lifecycle rule to transition backups older than 1 day to S3 Glacier Flexible Retrieval, and keep them until day 90.

B.Add a lifecycle rule to transition backups older than 1 day to S3 Glacier Instant Retrieval, and keep them until day 90.

C.Add a lifecycle rule to transition backups older than 1 day to S3 Glacier Deep Archive, and keep them until day 90 with no restore configuration.

D.Add a lifecycle rule to transition backups older than 1 day to S3 One Zone-IA, and delete them after 7 days.

AnswerA

Glacier Flexible Retrieval is intended for backups with infrequent access and supports restores within an RTO measured in hours.

Why this answer

Option A is correct because S3 Glacier Flexible Retrieval provides retrieval times from minutes to hours, which meets the 24-hour RTO, and offers significant cost savings over S3 Standard for data that is rarely accessed. Transitioning backups older than 1 day to this storage class reduces costs while retaining them for the required 90-day compliance period.

Exam trap

The trap here is that candidates may choose S3 Glacier Deep Archive for maximum cost savings without verifying that its retrieval time (12–48 hours) can exceed the 24-hour RTO, or they may overlook that S3 Glacier Instant Retrieval is not the most cost-effective option for data that is restored only rarely.

How to eliminate wrong answers

Option B is wrong because S3 Glacier Instant Retrieval is designed for data accessed once a quarter with millisecond retrieval, but it is more expensive than S3 Glacier Flexible Retrieval and not the most cost-optimized choice for backups restored only rarely within a 24-hour RTO. Option C is wrong because S3 Glacier Deep Archive has a retrieval time of 12–48 hours, which may exceed the 24-hour RTO, and the option lacks a restore configuration, making it non-compliant with the RTO requirement. Option D is wrong because S3 One Zone-IA does not provide the durability or availability needed for critical backups, and deleting backups after 7 days violates the 90-day retention policy.

Full explanation →

474

MCQmedium

A inventory service uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured?

A.Lambda reserved concurrency set to zero

B.A Lambda dead-letter queue or failure destination

C.A larger deployment package

D.CloudFront error pages

AnswerB

A DLQ or asynchronous failure destination captures failed events after retry attempts.

Why this answer

Lambda dead-letter queues (DLQs) or failure destinations are the correct mechanism to retain failed events after all retries are exhausted. When a Lambda function fails to process an event (e.g., from an asynchronous invocation), the service automatically retries twice. If those retries fail, the event can be sent to an SQS queue or SNS topic (DLQ) or to a specified destination (failure destination) for later investigation.

This ensures no data loss and provides a durable storage for post-mortem analysis.

Exam trap

The trap here is that candidates may confuse DLQs with retry mechanisms or think that increasing function resources (like memory or package size) will prevent failures, when in fact DLQs are the only way to durably capture events after retries are exhausted.

How to eliminate wrong answers

Option A is wrong because setting reserved concurrency to zero would prevent the Lambda function from executing at all, not retain failed events. Option C is wrong because a larger deployment package does not affect error handling or event retention; it only increases cold start latency and storage overhead. Option D is wrong because CloudFront error pages are for HTTP-level errors from a web distribution, not for capturing asynchronous Lambda invocation failures.

Full explanation →

475

Matchinghard

Match each workload to the AWS pricing option that most directly minimizes cost while still meeting the stated flexibility requirements. Use each option once.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Compute Savings Plan

Standard Reserved Instance

Spot Instances

On-Demand Instances

Why these pairings

Spot Instances are cheapest for fault-tolerant workloads; Reserved Instances offer discounts for steady usage; On-Demand is flexible for short tests; Savings Plans provide lower rates for committed use; Dedicated Hosts meet compliance for strict workloads.

Full explanation →

476

MCQeasy

A team runs a latency-sensitive service on EC2 and needs consistent, low-latency block storage for a database. The application requires predictable performance and should be fast for random reads/writes. Which EBS volume type is the best choice?

A.EBS st1 (throughput optimized HDD)

B.EBS gp3 (general purpose SSD)

C.EBS sc1 (cold HDD)

D.EBS magnetic (legacy magnetic)

AnswerB

gp3 is designed for a broad range of general-purpose workloads with solid low-latency performance. It supports random I/O patterns and offers predictable performance for many latency-sensitive applications. It is a common best-fit choice when you need balanced performance without specialized throughput-focused characteristics.

Why this answer

B is correct because gp3 is a general-purpose SSD volume that provides consistent, low-latency performance for random read/write operations, making it ideal for latency-sensitive database workloads. It offers a baseline of 3,000 IOPS and 125 MB/s throughput, with the ability to independently scale up to 16,000 IOPS and 1,000 MB/s, ensuring predictable performance without the burst-bucket limitations of gp2.

Exam trap

The trap here is that candidates often confuse 'throughput optimized' (st1) with 'low-latency' because both sound performance-oriented, but st1 is designed for sequential throughput, not random I/O latency, making it a poor choice for databases.

How to eliminate wrong answers

Option A is wrong because st1 (throughput optimized HDD) is designed for large, sequential workloads like big data and log processing, not for low-latency random reads/writes, and its performance degrades significantly with random I/O. Option C is wrong because sc1 (cold HDD) is the lowest-cost HDD volume intended for infrequently accessed data, with very low IOPS (as low as 0.025 IOPS/GB) and high latency, making it unsuitable for a latency-sensitive database. Option D is wrong because magnetic (legacy) volumes are obsolete, offer inconsistent performance with high latency and low IOPS (max ~100 IOPS), and are not recommended for any production database workloads.

Full explanation →

477

MCQmedium

Based on the exhibit, the application team wants the database to keep the same connection endpoint during failover and to reconnect automatically after the primary instance becomes unavailable. Which change best meets the requirement?

A.Keep the IP address and increase the JDBC connection timeout so the application waits longer during failover.

B.Replace the IP address with the RDS DNS endpoint and add client retry logic that re-resolves DNS after connection loss.

C.Create an additional read replica and point the application to it so failover is faster.

D.Place a Network Load Balancer in front of the database and use the load balancer target IP to avoid DNS changes.

AnswerB

RDS Multi-AZ failover preserves the database endpoint name, not the underlying IP address. When the standby is promoted, AWS updates the DNS record to point to the new primary. Using the RDS endpoint allows the application to follow that change, and retry logic helps the client recover from the short disconnect that occurs during failover.

Why this answer

Option B is correct because using the RDS DNS endpoint ensures that the application connects to the current primary instance, even after a failover. When the primary becomes unavailable, RDS promotes a standby (or read replica) to a new primary and updates the DNS record to point to the new instance's IP. By adding client retry logic that re-resolves DNS after a connection loss, the application automatically picks up the new IP and reconnects without manual intervention, meeting both requirements of a stable endpoint and automatic reconnection.

Exam trap

The trap here is that candidates assume a static IP or a load balancer can provide a stable endpoint, but AWS RDS does not support static IPs for Multi-AZ failover, and NLB cannot front RDS instances—the only reliable way is to use the RDS DNS endpoint with retry logic that re-resolves DNS after a connection loss.

How to eliminate wrong answers

Option A is wrong because keeping the IP address is unreliable—after a failover, the new primary instance will have a different IP address, so the application would connect to a stale IP and fail. Increasing the JDBC connection timeout only delays the failure; it does not resolve the underlying IP mismatch. Option C is wrong because creating an additional read replica does not change the connection endpoint for the primary; the application still connects to the original primary endpoint, which becomes unavailable during failover.

Read replicas are for read scaling, not for providing a failover endpoint. Option D is wrong because placing a Network Load Balancer in front of an RDS database is not a supported architecture—RDS does not integrate with NLB for database traffic, and the load balancer target IP would still change after failover, requiring DNS re-resolution anyway, making the solution unnecessarily complex and non-compliant with AWS best practices.

Full explanation →

478

MCQmedium

A distributed system needs extremely low network latency between a set of EC2 instances running the same workload. The team wants the instances to be placed as close together as AWS allows to reduce round-trip time. Which placement strategy should the architect use?

A.Use a Cluster placement group for the instances that must communicate frequently over low latency.

B.Use a Spread placement group across multiple Availability Zones to maximize fault tolerance.

C.Use the default placement strategy without specifying a placement group.

D.Use a placement group of type Partition to ensure independent failure of each instance.

AnswerA

Cluster placement groups are designed to place instances close together within a single Availability Zone to minimize network latency. They are the right choice when nodes require high intercommunication performance, such as distributed processing or tightly coupled systems. The scenario’s goal of minimizing round-trip time aligns with the Cluster placement group behavior. It’s also an EC2-native placement option focused on performance.

Why this answer

A Cluster placement group is the correct choice because it places instances in a single Availability Zone within the same rack or logical cluster, providing the lowest possible network latency and maximum throughput (up to 10 Gbps for single-flow traffic) between instances. This is ideal for tightly coupled, latency-sensitive workloads like HPC or real-time distributed systems.

Exam trap

The trap here is that candidates often confuse the purpose of placement groups: Cluster is for low latency and high throughput, Spread is for fault tolerance across hardware, and Partition is for large distributed systems needing failure isolation, but only Cluster guarantees physical proximity.

How to eliminate wrong answers

Option B is wrong because a Spread placement group spreads instances across distinct hardware racks or Availability Zones, which increases latency and is designed for fault tolerance, not low latency. Option C is wrong because the default placement strategy does not guarantee proximity; instances may be placed on different racks or AZs, leading to higher latency. Option D is wrong because a Partition placement group spreads instances across multiple partitions (each with separate racks) to isolate failures, but does not minimize latency between instances within the same partition.

Full explanation →

479

MCQmedium

A dev sandbox runs for several hours each night and can be interrupted and restarted. Which EC2 purchasing option should minimize cost? The architecture review board prefers a managed AWS-native control.

A.On-Demand Instances only

B.Spot Instances

C.Dedicated Hosts

D.Provisioned IOPS volumes

AnswerB

Spot Instances offer deep discounts for interruptible workloads.

Why this answer

Spot Instances are ideal for fault-tolerant, interruptible workloads like a nightly dev sandbox because they offer up to 90% cost savings over On-Demand. Since the job can be interrupted and restarted, Spot Instances provide the lowest cost while still being a fully managed AWS-native option.

Exam trap

The trap here is that candidates may choose On-Demand Instances assuming they are the simplest managed option, overlooking that Spot Instances are also fully managed and significantly cheaper for interruptible workloads.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances are not the most cost-effective choice for an interruptible workload; they charge a fixed hourly rate with no discount, leading to higher costs than Spot Instances. Option C is wrong because Dedicated Hosts are a physical server dedicated to your use, incurring significant additional costs and are intended for licensing or compliance needs, not cost minimization for a transient dev sandbox. Option D is wrong because Provisioned IOPS volumes are a storage type (EBS), not an EC2 purchasing option, and they do not directly address compute cost optimization.

Full explanation →

480

MCQmedium

A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The architecture review board prefers a managed AWS-native control.

A.Lambda reserved concurrency set to zero

B.A larger deployment package

C.CloudFront error pages

D.A Lambda dead-letter queue or failure destination

AnswerD

A DLQ or asynchronous failure destination captures failed events after retry attempts.

Why this answer

Option D is correct because Lambda dead-letter queues (DLQs) or failure destinations are the managed AWS-native way to capture events that have exhausted all retry attempts from an asynchronous invocation. When the Lambda function fails after the configured number of retries (default 3), the event is automatically sent to an SQS queue or SNS topic (DLQ) or to a specified destination (e.g., SQS, SNS, EventBridge) for later investigation and reprocessing.

Exam trap

The trap here is that candidates may confuse Lambda's synchronous invocation retry behavior (which is controlled by the caller) with asynchronous invocation retries (which are managed by Lambda itself and require a DLQ or failure destination for post-retry capture).

How to eliminate wrong answers

Option A is wrong because setting reserved concurrency to zero would prevent the Lambda function from executing at all, not handle failed events after retries. Option B is wrong because a larger deployment package does not affect retry or failure handling; it only increases the function's code size and cold start latency. Option C is wrong because CloudFront error pages are for HTTP-level errors from a web distribution, not for capturing failed asynchronous Lambda invocations from a third-party API call.

Full explanation →

481

Multi-Selecthard

A internal reporting portal has old unattached EBS volumes and many stale snapshots. Which two actions reduce storage cost without affecting running instances? The design must avoid adding custom operational scripts.

Select 2 answers

A.Disable CloudTrail logging

B.Stop all EC2 instances in the account

C.Delete unattached EBS volumes after verifying they are no longer needed

D.Apply snapshot lifecycle policies to expire obsolete snapshots

AnswersC, D

Unattached volumes continue to incur charges until deleted.

Why this answer

Option C is correct because deleting unattached EBS volumes directly reduces storage costs without impacting running instances, as these volumes are not in use. Option D is correct because snapshot lifecycle policies automate the deletion of obsolete snapshots, eliminating manual cleanup and reducing storage costs without custom scripts.

Exam trap

The trap here is that candidates might think stopping instances or disabling CloudTrail saves costs, but these actions either disrupt operations or target unrelated services, while the real savings come from cleaning up orphaned storage resources.

Full explanation →

482

MCQeasy

A team stores application logs in Amazon S3. They need access to the logs only occasionally for troubleshooting (infrequent access), and they want to reduce storage cost automatically over time without manually moving objects. What should they implement?

A.An S3 lifecycle policy that transitions objects to a lower-cost storage class after a set number of days

B.An S3 lifecycle policy that deletes objects after 1 day to eliminate storage costs

C.An S3 lifecycle policy that keeps all objects in S3 Standard and only applies compression at read time

D.A policy that changes bucket encryption from SSE-S3 to SSE-KMS to reduce storage cost

AnswerA

S3 lifecycle policies can automatically transition objects based on age to storage classes priced for infrequent access (for example, Standard-IA or Glacier-based classes). This preserves the data for later troubleshooting while lowering storage cost as objects become older.

Why this answer

Option A is correct because an S3 lifecycle policy can automatically transition objects from S3 Standard to lower-cost storage classes (e.g., S3 Standard-IA, S3 One Zone-IA, or S3 Glacier Instant Retrieval) after a specified number of days. This meets the requirement of reducing storage costs over time for infrequently accessed logs without manual intervention, as the policy automates the movement based on object age.

Exam trap

The trap here is that candidates may confuse lifecycle policies with deletion policies, thinking that deleting objects after a short period (Option B) is a valid cost-saving strategy, but the question explicitly requires retaining logs for occasional troubleshooting, so deletion is not appropriate.

How to eliminate wrong answers

Option B is wrong because deleting objects after 1 day would permanently remove logs needed for occasional troubleshooting, eliminating the data entirely rather than reducing storage cost while retaining access. Option C is wrong because S3 does not apply compression at read time; compression must be applied before upload or via a separate process, and keeping all objects in S3 Standard would not reduce costs for infrequent access. Option D is wrong because changing bucket encryption from SSE-S3 to SSE-KMS does not reduce storage cost; SSE-KMS incurs additional per-request charges and does not affect storage class pricing.

Full explanation →

483

MCQmedium

A internal reporting portal serves infrequently accessed user documents that must be available immediately when requested. Which S3 storage class is likely the best cost fit?

A.Instance store volumes

B.S3 Glacier Deep Archive

C.S3 Standard for all objects

D.S3 Standard-IA or S3 One Zone-IA depending on resilience requirements

AnswerD

Infrequent Access classes reduce storage cost while keeping millisecond retrieval.

Why this answer

S3 Standard-IA or S3 One Zone-IA is the best cost fit because the data is infrequently accessed but must be available immediately when requested. These storage classes offer low-latency retrieval (milliseconds) at a lower storage cost than S3 Standard, with the trade-off of a retrieval fee. The choice between Standard-IA and One Zone-IA depends on whether the application requires resilience against Availability Zone failures.

Exam trap

The trap here is that candidates may choose S3 Standard for all objects because they assume 'immediately available' requires the highest performance tier, overlooking that Standard-IA and One Zone-IA offer identical retrieval latency at a lower storage cost for infrequently accessed data.

How to eliminate wrong answers

Option A is wrong because instance store volumes are ephemeral block storage attached to EC2 instances, not an S3 storage class, and data is lost if the instance is stopped or terminated. Option B is wrong because S3 Glacier Deep Archive has retrieval times of 12–48 hours, which does not meet the 'immediately available' requirement. Option C is wrong because S3 Standard is designed for frequently accessed data and would incur higher storage costs for infrequently accessed objects, making it less cost-optimal than Standard-IA or One Zone-IA.

Full explanation →

484

MCQmedium

A backup process restores a 2 TB production database from an EBS snapshot onto a new volume. During the first hours after restore, the application sees slow reads whenever previously unused blocks are accessed. What is the best way to avoid this performance issue in future restores?

A.Increase the volume size to give the database more free space.

B.Enable Fast Snapshot Restore on the snapshots used for recovery.

C.Move the database files to Amazon EFS after the restore completes.

D.Use magnetic standard volumes because they avoid snapshot hydration delays.

AnswerB

Fast Snapshot Restore removes the initial performance penalty that occurs when a restored EBS volume reads blocks that have not yet been hydrated. By pre-warming the snapshot data in the target AZ, it helps ensure consistent read performance immediately after restore. This is especially valuable for databases and other workloads that must recover quickly without waiting for the background hydration process.

Why this answer

When an EBS volume is restored from a snapshot, it is lazily loaded from Amazon S3 in the background. Accessing data blocks that have not yet been loaded triggers a read penalty because the volume must fetch them from S3 before serving the I/O. Enabling Fast Snapshot Restore (FSR) pre-warms the snapshot data so that restored volumes have full performance immediately, eliminating the slow reads on first access.

Exam trap

The trap here is that candidates may think increasing volume size or switching to a different storage class will fix the lazy hydration delay, but only Fast Snapshot Restore directly addresses the root cause by pre-initializing the data blocks.

How to eliminate wrong answers

Option A is wrong because increasing volume size does not affect the lazy hydration process; it only adds more uninitialized blocks that would also suffer from the same slow-read penalty. Option C is wrong because moving database files to Amazon EFS after restore does not solve the initial slow-read problem; EFS has its own latency characteristics and does not eliminate the need to hydrate the EBS volume. Option D is wrong because magnetic standard volumes (st1/sc1 or previous standard) are slower and still require lazy hydration from snapshots; they do not avoid the hydration delay and would actually perform worse than gp3 or io2 volumes.

Full explanation →

485

MCQeasy

Your global users access static images stored in S3. Origin bandwidth costs are higher than expected because CloudFront is not caching effectively. What change most directly reduces origin fetches (and typically lowers data transfer costs) without changing application logic?

A.Configure CloudFront caching by setting appropriate cache-control headers and/or CloudFront cache policy/TTL values for the static objects

B.Disable CloudFront caching so every request goes back to S3 for the latest image

C.Route users directly to the S3 website endpoint to bypass CloudFront

D.Turn on a NAT Gateway for the CloudFront origin to reduce bandwidth charges

AnswerA

CloudFront reduces origin fetches when responses are cacheable and allowed to remain in the edge cache for a meaningful duration. Ensuring the objects include correct cache-control headers (or configuring CloudFront cache policy TTLs) increases cache hit rate, so fewer requests require fetching from S3 origin. This directly reduces origin bandwidth and related data transfer costs.

Why this answer

The high origin bandwidth costs are caused by CloudFront not caching effectively, meaning too many requests reach the S3 origin. By configuring appropriate Cache-Control headers or a CloudFront cache policy with optimal TTL values, you ensure that CloudFront caches the static images at edge locations for longer periods. This directly reduces the number of origin fetches, lowering data transfer costs without any changes to the application logic.

Exam trap

The trap here is that candidates may think disabling caching or bypassing CloudFront entirely will reduce costs, when in fact the opposite is true—effective caching is the key to reducing origin fetches and lowering data transfer costs.

How to eliminate wrong answers

Option B is wrong because disabling CloudFront caching would force every request to go back to the S3 origin, increasing origin fetches and bandwidth costs, which is the opposite of the desired outcome. Option C is wrong because routing users directly to the S3 website endpoint bypasses CloudFront entirely, eliminating caching benefits and likely increasing costs due to direct S3 data transfer and request pricing. Option D is wrong because a NAT Gateway is used for outbound internet access from private subnets in a VPC, not for reducing bandwidth charges between CloudFront and an S3 origin; it would add unnecessary cost and complexity.

Full explanation →

486

MCQhard

Based on the exhibit, the application tier is not replacing unhealthy instances even though the Auto Scaling group spans two Availability Zones. What change most directly improves automatic recovery when the application process fails?

A.Increase the ASG desired capacity so that extra instances absorb the failed ones.

B.Set the Auto Scaling group health check type to ELB so target group health determines replacement.

C.Replace the Application Load Balancer with a Network Load Balancer to improve failover speed.

D.Increase the HealthCheckGracePeriod to the maximum value so the instances have more time to stabilize.

AnswerB

This makes Auto Scaling replace instances that fail the load balancer health check even when EC2 status checks still pass. The exhibit shows the application health endpoint returns 500 while EC2 checks remain passing, so EC2-only health checks miss the failure. ELB-based health checks align replacement with real application availability.

Why this answer

Option B is correct because setting the Auto Scaling group health check type to ELB allows the ASG to use the target group's health checks, which monitor application-level health (e.g., HTTP 200 responses). When the application process fails, the ELB marks the instance as unhealthy, and the ASG immediately terminates and replaces it. This directly addresses the issue of unhealthy instances not being replaced, as the default EC2 health check only verifies instance status (e.g., running vs. stopped), not application responsiveness.

Exam trap

The trap here is that candidates assume the default EC2 health check is sufficient for application-level failures, but it only checks instance state (running/stopped), not the application process, so the ASG never triggers replacement for application crashes.

How to eliminate wrong answers

Option A is wrong because increasing the desired capacity does not cause the ASG to replace unhealthy instances; it only adds more instances, which may mask the failure but does not fix the underlying health check mechanism. Option C is wrong because replacing the Application Load Balancer with a Network Load Balancer does not improve application-level health checking; NLB operates at Layer 4 and cannot perform HTTP-level health checks, so it would not detect application process failures. Option D is wrong because increasing the HealthCheckGracePeriod delays the start of health checks, giving instances more time to stabilize, but it does not change the health check type to monitor application health; if the health check type remains EC2, the ASG still won't detect application failures.

Full explanation →

487

MCQmedium

A team runs an EC2-based service and ships logs to Amazon CloudWatch Logs. They enabled long log retention and turned on detailed monitoring to improve troubleshooting. Their monthly CloudWatch costs have grown unexpectedly. Compliance requires that the logs remain available in CloudWatch Logs (for querying and audits) for 90 days, and alerts/alarms do not require detailed EC2 monitoring. What change best reduces cost while meeting requirements?

A.Keep the current long retention and detailed monitoring; reduce the log volume by sampling 10% of events

B.Set the CloudWatch Logs retention to 90 days and disable detailed EC2 monitoring (use standard monitoring) for the instances

C.Move all logs to S3 immediately and delete the CloudWatch log groups to reduce costs

D.Increase CloudWatch alarm thresholds to reduce the number of metric datapoints

AnswerB

CloudWatch Logs storage costs are driven primarily by retention period. Setting retention to exactly 90 days reduces storage cost while meeting compliance. Disabling detailed EC2 monitoring reduces the number/granularity of metrics (detailed is billed more than standard), lowering monitoring cost without impacting alarms that don’t require high-resolution metrics.

Why this answer

Option B reduces costs by setting CloudWatch Logs retention to exactly 90 days (meeting compliance) and disabling detailed monitoring (which incurs per-minute metrics charges) in favor of standard 5-minute monitoring. This directly addresses the two main cost drivers—long retention and detailed EC2 monitoring—while preserving the required 90-day log availability for queries and audits.

Exam trap

The trap here is that candidates may think sampling logs or moving them to S3 is acceptable, but the requirement explicitly states logs must remain available in CloudWatch Logs for querying and audits, making those options non-compliant.

How to eliminate wrong answers

Option A is wrong because sampling only 10% of log events would lose critical data for troubleshooting and audits, violating the compliance requirement that logs remain available for 90 days. Option C is wrong because moving logs to S3 immediately and deleting CloudWatch log groups would remove the ability to query logs in CloudWatch Logs Insights, breaking the requirement that logs remain available in CloudWatch Logs for querying. Option D is wrong because increasing alarm thresholds does not reduce the number of metric datapoints collected; detailed monitoring still sends per-minute metrics, and thresholds only affect when alarms trigger, not the volume of data ingested or stored.

Full explanation →

488

MCQmedium

A high-volume telemetry pipeline writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate? The architecture review board prefers a managed AWS-native control.

A.Amazon Kinesis Data Streams

B.AWS DataSync

C.Amazon EBS

D.Amazon Route 53

AnswerA

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the correct choice because it is a fully managed, AWS-native service designed for real-time streaming data ingestion and processing. It supports multiple independent consumers via enhanced fan-out, which provides each consumer with a dedicated throughput of up to 2 MB/sec per shard, ensuring that high-volume click events can be processed concurrently without contention.

Exam trap

The trap here is confusing batch data transfer services (DataSync) or storage services (EBS) with real-time streaming, leading candidates to overlook Kinesis Data Streams' native support for multiple independent consumers via enhanced fan-out.

How to eliminate wrong answers

Option B (AWS DataSync) is wrong because it is a data transfer service for moving large datasets between on-premises storage and AWS, not a real-time streaming pipeline. Option C (Amazon EBS) is wrong because it provides block-level storage volumes for EC2 instances, not a streaming data ingestion or processing capability. Option D (Amazon Route 53) is wrong because it is a DNS and domain name resolution service, completely unrelated to streaming telemetry data.

Full explanation →

489

Multi-Selecthard

A payments API requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.Deletion protection or tightly controlled delete permissions

B.Point-in-time recovery

C.Global secondary indexes

D.DAX

AnswersA, B

Deletion protection and least-privilege controls reduce accidental table removal risk.

Why this answer

Point-in-time recovery (PITR) enables continuous backups of the DynamoDB table, allowing restoration to any point within the last 35 days, which satisfies the requirement for point-in-time recovery. Deletion protection prevents accidental deletion of the table by blocking drop-table operations, meeting the accidental-delete protection requirement. Both are managed AWS-native controls that require no custom scripting or external tooling.

Exam trap

The trap here is that candidates often confuse operational features like DAX (caching) or GSIs (indexing) with data protection mechanisms, but neither provides backup/restore or deletion safeguards required for resilience and data durability.

Full explanation →

490

MCQmedium

A trading dashboard uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered?

A.A single-AZ Aurora cluster

B.Aurora Global Database

C.Manual snapshots copied monthly

D.An ElastiCache Redis replica

AnswerB

Aurora Global Database replicates with low latency to secondary Regions and supports faster disaster recovery than snapshot-only approaches.

Why this answer

Aurora Global Database is designed for cross-Region disaster recovery with a typical RPO of 1 second and RTO of less than 1 minute, using storage-based replication that does not impact database performance. This meets the low RPO requirement for a trading dashboard, where data loss must be minimized.

Exam trap

The trap here is that candidates might choose manual snapshots (Option C) thinking they are sufficient for DR, but they overlook the critical requirement of low RPO, which snapshots copied monthly cannot satisfy.

How to eliminate wrong answers

Option A is wrong because a single-AZ Aurora cluster provides no cross-Region replication and offers no disaster recovery across AWS Regions, resulting in potentially high RPO if the primary Region fails. Option C is wrong because manual snapshots copied monthly have an RPO of up to one month, which is far too high for a trading dashboard requiring low RPO. Option D is wrong because ElastiCache Redis is an in-memory cache, not a persistent database, and cannot serve as a cross-Region disaster recovery solution for Aurora MySQL data.

Full explanation →

491

MCQmedium

Your company requires that all requests to an S3 bucket use HTTPS and that all objects uploaded to the bucket are encrypted at rest. You manage the S3 bucket policy and want enforcement that does not rely on application code compliance. Which bucket policy change best enforces both requirements?

A.Add a Deny statement for all S3 actions on the bucket and its objects when aws:SecureTransport is false, and add a Deny statement for s3:PutObject when the request does not specify server-side encryption with AES256 (s3:x-amz-server-side-encryption = "AES256").

B.Use S3 website hosting to redirect users to HTTPS and rely on bucket default encryption for all uploads.

C.Add a Deny statement for s3:GetObject when aws:SecureTransport is false, and enable default encryption on the bucket.

D.Allow only IAM principals from your account to access the bucket and require clients to configure HTTPS in their applications.

AnswerA

This enforces HTTPS for all S3 requests by denying any non-TLS access and enforces encryption at rest by denying uploads that do not request SSE-S3. Because the controls are in the bucket policy, compliance does not depend on application behavior.

Why this answer

Option A is correct because it uses a Deny statement with the `aws:SecureTransport` condition to block any request that does not use HTTPS, enforcing encryption in transit. It also adds a Deny statement for `s3:PutObject` when the request does not include the `x-amz-server-side-encryption` header set to `AES256`, ensuring that all uploaded objects are encrypted at rest with SSE-S3. This policy-based enforcement works regardless of application code, meeting the requirement for non-reliance on client-side compliance.

Exam trap

The trap here is that candidates often confuse default encryption with policy-based enforcement, assuming that enabling default encryption on the bucket alone guarantees all objects are encrypted at rest, but it does not prevent clients from overriding it with unencrypted uploads via the request header.

How to eliminate wrong answers

Option B is wrong because S3 website hosting only redirects HTTP to HTTPS for web traffic, but it does not enforce HTTPS for API requests (e.g., via SDK or CLI), and default encryption only encrypts objects at rest, not in transit. Option C is wrong because it only denies `s3:GetObject` when `aws:SecureTransport` is false, leaving other actions like `s3:PutObject` unencrypted in transit, and enabling default encryption does not enforce that uploads specify encryption headers—it only applies if the request lacks encryption headers, which can be overridden. Option D is wrong because restricting access to IAM principals does not enforce HTTPS or encryption at rest; it only controls authorization, and requiring clients to configure HTTPS in their applications relies on application code compliance, which the question explicitly wants to avoid.

Full explanation →

492

MCQmedium

An events service publishes critical notifications using Amazon SNS. Three independent downstream systems (A, B, and C) subscribe to the topic. Downstream system B sometimes fails to process certain messages (for example, it times out or returns an error while handling the message), and you want: 1) failures in B to be isolated so A and C keep processing unaffected, and 2) messages that B cannot successfully process after retries to be sent to a DLQ for B. Which design best meets these requirements?

A.Subscribe each downstream directly with HTTPS endpoints and configure a single SNS dead-letter queue (DLQ) for the topic.

B.For each downstream system, create its own SQS queue, subscribe each SQS queue to the SNS topic, and configure a redrive policy with a DLQ for each SQS queue.

C.Use one shared SQS queue for all three downstream systems and configure a single DLQ only when all three downstream systems fail.

D.Use EventBridge rules to invoke A, B, and C synchronously with retries enabled, and send failures to a common DLQ.

AnswerB

SNS delivers the message independently to each subscribed SQS queue. If downstream B fails to process a message, B can avoid deleting it from its own queue; after visibility timeout and retry attempts, SQS redrives messages to B’s DLQ. A and C are isolated because they have separate queues and DLQs, so B’s failures do not prevent deliveries to A and C.

Why this answer

Option B is correct because it creates a dedicated SQS queue for each downstream system, which isolates failures: if system B fails, its SQS queue will accumulate messages while systems A and C continue processing from their own queues. Each SQS queue can have a redrive policy that moves messages to a per-queue DLQ after the configured maximum retries are exhausted, satisfying the requirement for a B-specific DLQ without affecting the other subscribers.

Exam trap

The trap here is that candidates assume a single DLQ at the SNS topic level is sufficient, but SNS DLQs only apply to the SNS delivery failure (e.g., HTTP endpoint unreachable), not to downstream processing failures after the message is delivered to SQS.

How to eliminate wrong answers

Option A is wrong because a single SNS DLQ applies to the entire topic, not per-subscriber; if B fails, messages would be sent to the common DLQ for all subscribers, and A and C would still receive the message from SNS, but the DLQ is not isolated to B. Option C is wrong because a shared SQS queue for all three systems means a failure in B could block or delay messages for A and C, and a single DLQ would trigger only when all three fail, not when B alone fails. Option D is wrong because EventBridge synchronous invocation with a common DLQ would cause failures in B to potentially block or delay A and C (since synchronous calls are sequential), and the DLQ is shared, not isolated to B.

Full explanation →

493

Multi-Selecthard

A product catalog system uses a relational database for orders and a simple key-value profile store for shopping carts. Traffic is unpredictable, and the company wants to avoid paying for large idle database instances. Which two choices are best? Select two.

Select 2 answers

A.Use Aurora Serverless v2 for the relational order system.

B.Use DynamoDB on-demand capacity for the shopping-cart profile store.

C.Keep both workloads on large provisioned RDS instances and add read replicas for the cart store.

D.Use DynamoDB provisioned capacity with a fixed minimum despite the unpredictable traffic.

E.Replace the relational order system with a wide-column table to reduce SQL licensing.

AnswersA, B

Correct. Aurora Serverless v2 is designed for variable relational workloads because capacity can scale without constantly paying for a large fixed instance. It preserves SQL features while reducing idle overprovisioning.

Why this answer

Aurora Serverless v2 automatically scales compute and memory capacity based on application demand, making it ideal for unpredictable traffic. It eliminates the need to provision for peak load, reducing costs by scaling to zero when idle. This matches the requirement to avoid paying for large idle database instances.

Exam trap

The trap here is that candidates may choose provisioned capacity (Option D) thinking it is cheaper, but for unpredictable traffic, on-demand avoids over-provisioning costs, and Aurora Serverless v2 is the relational equivalent of this elastic model.

Full explanation →

494

MCQmedium

A dev sandbox has unpredictable DynamoDB traffic with long idle periods and occasional spikes. Which capacity mode should minimize operational overhead and avoid paying for idle provisioned capacity? The architecture review board prefers a managed AWS-native control.

A.Reserved capacity for maximum daily traffic

B.Provisioned capacity set for peak traffic

C.DynamoDB on-demand capacity mode

D.Global tables in every Region

AnswerC

On-demand capacity is suitable for unpredictable workloads and charges per request without capacity planning.

Why this answer

DynamoDB on-demand capacity mode (Option C) is ideal for unpredictable traffic with long idle periods and spikes because it automatically scales to handle workload demands without requiring any capacity planning. You pay only for the reads and writes you perform, eliminating the cost of idle provisioned capacity and the operational overhead of managing scaling thresholds.

Exam trap

The trap here is that candidates may confuse 'Reserved capacity' (an EC2/RDS concept) with DynamoDB pricing, or assume Provisioned capacity is always cheaper without considering the cost of idle resources in unpredictable workloads.

How to eliminate wrong answers

Option A is wrong because Reserved capacity is not a DynamoDB pricing model; it applies to Amazon RDS and EC2, not DynamoDB, and would lock you into a fixed cost regardless of usage. Option B is wrong because Provisioned capacity set for peak traffic would require you to pay for the peak capacity even during idle periods, leading to wasted cost and manual scaling adjustments. Option D is wrong because Global tables are a replication feature for multi-Region active-active setups, not a capacity mode; they add complexity and cost without addressing the need to avoid paying for idle provisioned capacity.

Full explanation →

495

MCQmedium

Based on the exhibit, what is the most appropriate fix so the workload in Account A can access the S3 bucket in Account B without using long-lived access keys?

A.Create an IAM role in Account B, trust Account A's AppRole to assume it with STS, and then access the bucket using temporary credentials.

B.Attach AmazonS3FullAccess to the instance profile role in Account A and keep using the same direct access path.

C.Add an SCP to Account A that allows S3 actions against buckets in Account B.

D.Enable S3 versioning on the bucket so cross-account requests are automatically trusted.

AnswerA

Assuming a role in the target account is a clean cross-account pattern that uses temporary credentials instead of static keys. The trust policy in Account B controls who may assume the role, and the role in B can then be given the exact S3 permissions needed. This is easy to revoke centrally by changing the trust relationship or role policy.

Why this answer

Option A is correct because it uses AWS Security Token Service (STS) to allow the workload in Account A to assume an IAM role in Account B, obtaining temporary credentials that grant access to the S3 bucket. This eliminates the need for long-lived access keys and follows the principle of least privilege, as the role can be scoped to specific S3 actions and resources.

Exam trap

The trap here is that candidates often confuse SCPs with resource-based policies or assume that attaching a managed policy to an instance profile automatically grants cross-account access, overlooking the need for explicit trust and bucket policies in the target account.

How to eliminate wrong answers

Option B is wrong because attaching AmazonS3FullAccess to the instance profile role in Account A does not enable cross-account access; S3 bucket policies in Account B must explicitly grant permissions to the IAM role in Account A, and using long-lived keys from the instance profile still violates the requirement to avoid long-lived access keys. Option C is wrong because Service Control Policies (SCPs) are used to restrict permissions within an AWS Organization and cannot grant cross-account access to resources in another account; they only deny or allow actions within the account where the SCP is applied. Option D is wrong because enabling S3 versioning on the bucket does not affect cross-account authentication or authorization; versioning is a data management feature that tracks object versions and has no impact on IAM permissions or trust relationships.

Full explanation →

496

MCQmedium

A company hosts a image sharing application on EC2. Administrators must connect without opening SSH or RDP ports to the internet. What should the architect use?

A.AWS Systems Manager Session Manager with the required instance role

B.An internet gateway attached to the private subnet

C.A public Elastic IP address on each instance

D.A bastion host with SSH open to 0.0.0.0/0

AnswerA

Session Manager provides audited shell access without inbound SSH/RDP exposure.

Why this answer

AWS Systems Manager Session Manager allows administrators to establish secure shell (SSH) or PowerShell (RDP) sessions to EC2 instances without opening any inbound ports. It uses the SSM Agent and the AWS Systems Manager service, which initiates outbound connections to the AWS cloud over HTTPS (port 443). The required instance role grants permissions for the agent to communicate with Systems Manager, enabling secure, auditable access without public IP addresses or bastion hosts.

Exam trap

The trap here is that candidates often default to a bastion host (Option D) as a traditional solution, but fail to recognize that a bastion host still requires opening SSH/RDP to the internet (even if only to the bastion), which violates the 'without opening SSH or RDP ports to the internet' constraint.

How to eliminate wrong answers

Option B is wrong because an internet gateway attached to a private subnet does not provide direct connectivity to instances; it only enables outbound internet access via a NAT device, and does not allow inbound administrative connections without opening ports. Option C is wrong because assigning a public Elastic IP address to each instance would expose them to the internet, requiring SSH or RDP ports to be open, which violates the requirement to not open those ports. Option D is wrong because a bastion host with SSH open to 0.0.0.0/0 exposes the bastion to the entire internet, creating a security risk and still requires opening SSH (port 22) to the internet, which directly contradicts the requirement.

Full explanation →

497

MCQmedium

A stateless web API runs on EC2 instances behind an Application Load Balancer (ALB). The Auto Scaling group (ASG) currently uses subnets from only one Availability Zone, even though the ALB spans two Availability Zones. During maintenance of that single AZ, the ALB remains up but clients see timeouts because there are no healthy targets. Which change most directly improves resilience against an AZ failure?

A.Keep the ASG in one subnet/AZ, but enable ALB stickiness to reduce session interruption.

B.Update the ASG to launch instances across subnets in at least two Availability Zones and ensure ALB health checks target an application-ready path.

C.Add a NAT gateway in the public subnets so instances can reach the internet during maintenance events.

D.Create a second ALB in the same Availability Zone and route traffic using DNS failover.

AnswerB

Spreading instances across multiple AZs ensures the ALB can route to healthy targets even when one AZ fails.

Why this answer

The most direct fix for AZ failure resilience is to distribute the ASG across multiple Availability Zones. With the ALB already spanning two AZs, if the ASG only launches instances in one AZ, a failure of that AZ leaves the ALB with zero healthy targets, causing timeouts. By configuring the ASG to launch instances in at least two AZs and setting ALB health checks to an application-ready path, the ALB can route traffic to healthy instances in the surviving AZ, maintaining availability.

Exam trap

The trap here is that candidates may think adding a second ALB or enabling stickiness solves the problem, when the real issue is that the ASG is not distributing instances across multiple Availability Zones, leaving the ALB with no healthy targets during an AZ outage.

How to eliminate wrong answers

Option A is wrong because enabling ALB stickiness (session affinity) does not solve the underlying problem of zero healthy targets; it only binds a client session to a specific target, which still fails if that target is in the failed AZ. Option C is wrong because a NAT gateway provides outbound internet access for instances in private subnets, but it does not affect the availability of targets for inbound traffic through the ALB during an AZ failure. Option D is wrong because creating a second ALB in the same AZ and using DNS failover adds complexity and cost without addressing the root cause—the ASG's single-AZ deployment—and DNS failover introduces propagation delays, not the immediate resilience needed.

Full explanation →

498

MCQmedium

An orders service publishes payment instructions to an Amazon SQS Standard queue. A downstream consumer sometimes times out or crashes after it has partially completed processing, causing the same instruction to be processed more than once. You must keep the design resilient without attempting to guarantee exactly-once processing. Which approach best handles duplicates safely?

A.Set the SQS visibility timeout extremely long so the message cannot be retried even after processing failures.

B.Make the consumer idempotent by deriving a deterministic idempotency key from the payment instruction (for example, the instruction ID), persisting the result of successful processing, and skipping re-processing when that key is already marked successful.

C.Switch to an SQS FIFO queue but remove error handling in the consumer so duplicates never occur.

D.Send all failed messages to a DLQ and rely on it to deduplicate messages that were already successfully processed.

AnswerB

SQS Standard provides at-least-once delivery, so duplicates are expected. Idempotency ensures that re-processing the same instruction does not create incorrect side effects. Persisting a deterministic key/result allows the consumer to safely short-circuit duplicates after retries/timeouts.

Why this answer

Option B is correct because making the consumer idempotent ensures that even if the same payment instruction is processed multiple times due to timeouts or crashes, the system remains consistent. By deriving a deterministic idempotency key (e.g., the instruction ID) and persisting the result of successful processing, the consumer can skip re-processing when the key is already marked as successful. This approach aligns with the requirement to keep the design resilient without guaranteeing exactly-once processing, as it safely handles duplicates at the application level.

Exam trap

The trap here is that candidates often assume SQS FIFO queues or DLQs inherently solve duplicate processing, but the exam tests understanding that Standard queues require application-level idempotency for safe duplicate handling, and FIFO queues do not eliminate the need for idempotent consumers in crash scenarios.

How to eliminate wrong answers

Option A is wrong because setting the SQS visibility timeout extremely long does not prevent duplicates; it only delays retries, and if the consumer crashes after partially processing, the message will eventually become visible again and be reprocessed, leading to duplicates. Option C is wrong because switching to an SQS FIFO queue provides exactly-once processing within a five-minute deduplication window, but removing error handling in the consumer does not prevent duplicates from timeouts or crashes; FIFO queues still allow retries, and without error handling, the system becomes fragile. Option D is wrong because a Dead-Letter Queue (DLQ) is used to capture messages that fail after multiple retries, not to deduplicate messages; relying on a DLQ for deduplication is a misconception, as DLQs do not track successful processing and cannot prevent duplicates from being processed again.

Full explanation →

499

MCQeasy

Your company uses an OIDC identity provider to let users assume an IAM role without long-term AWS credentials. In the IAM role trust policy, which STS action must be allowed to support this type of federation?

A.sts:AssumeRoleWithWebIdentity

B.sts:AssumeRole

C.sts:GetCallerIdentity

D.sts:TagSession

AnswerA

OIDC/web identity federation uses sts:AssumeRoleWithWebIdentity in the trust policy.

Why this answer

Option A is correct because sts:AssumeRoleWithWebIdentity is the STS API action specifically designed for federated users authenticated by an OIDC or SAML identity provider. This action returns temporary security credentials without requiring long-term AWS access keys, enabling the OIDC-based federation described in the question.

Exam trap

The trap here is that candidates confuse sts:AssumeRoleWithWebIdentity with sts:AssumeRole, not realizing that the former is required for OIDC/SAML federation while the latter is for IAM users or AWS services.

How to eliminate wrong answers

Option B is wrong because sts:AssumeRole is used for cross-account access or for roles assumed by IAM users or AWS services, not for web identity federation with an OIDC provider. Option C is wrong because sts:GetCallerIdentity simply returns details about the IAM user or role whose credentials are used to make the call; it does not initiate any federation or credential issuance. Option D is wrong because sts:TagSession is used to apply tags to an already-assumed role session, not to perform the initial federation or role assumption.

Full explanation →

500

Multi-Selecthard

A reporting application in Account B must read files from an S3 bucket in Account A. The bucket contains objects encrypted with a customer managed KMS key in Account A. The application role in Account B already has an identity policy allowing s3:GetObject on the bucket prefix, but requests still fail with AccessDenied. Which two changes are required for the application to read the objects? Select two.

Select 2 answers

A.Add a bucket policy in Account A that allows the Account B role to perform s3:GetObject on the required prefix.

B.Add the Account B role to the KMS key policy in Account A with permission to use kms:Decrypt.

C.Attach an IAM policy in Account B that grants s3:* on the bucket and its objects.

D.Create an S3 gateway endpoint in Account B so the application can reach the bucket privately.

E.Add an SCP in Account A that allows the Account B role to bypass KMS encryption checks.

AnswersA, B

Cross-account S3 access requires a resource-based permission on the bucket. The bucket policy must explicitly allow the external role to read the needed prefix, otherwise the bucket owner blocks the request even if the role's identity policy allows it.

Why this answer

Option A is correct because cross-account S3 access requires the destination account (Account A) to explicitly grant access via a bucket policy that allows the source account's role (Account B) to perform s3:GetObject on the specified prefix. Without this bucket policy, the S3 service in Account A will deny the request, even if the IAM identity policy in Account B permits the action. Option B is correct because the objects are encrypted with a customer managed KMS key in Account A; the application role in Account B must be added to the KMS key policy with kms:Decrypt permission to decrypt the objects during retrieval.

Both the S3 bucket policy and the KMS key policy are required for cross-account encrypted access.

Exam trap

The trap here is that candidates often assume a cross-account IAM role with s3:GetObject permission is sufficient, overlooking that S3 bucket policies and KMS key policies are separate authorization layers that must explicitly allow the external principal, especially when objects are encrypted with a customer managed KMS key.

Full explanation →

501

MCQmedium

A marketing site runs on x86 EC2 instances and uses open-source software with no architecture-specific licensing restriction. What should be evaluated to reduce compute cost? The design must avoid adding custom operational scripts.

A.Cross-Region data replication for all data

B.io2 Block Express volumes for all instances

C.AWS Graviton-based instances after performance testing

D.Dedicated Hosts by default

AnswerC

Graviton instances often provide better price performance for compatible workloads.

Why this answer

Option C is correct because AWS Graviton-based instances (ARM architecture) offer up to 40% better price-performance compared to comparable x86 instances for many workloads. Since the marketing site uses open-source software with no architecture-specific licensing restrictions, migrating to Graviton after performance testing can significantly reduce compute costs without requiring custom operational scripts, as the OS and software can be recompiled for ARM natively.

Exam trap

The trap here is that candidates may assume Dedicated Hosts (Option D) are a cost-saving measure, but they actually increase costs unless you have specific licensing needs, and they violate the 'no custom operational scripts' constraint by requiring manual host management.

How to eliminate wrong answers

Option A is wrong because Cross-Region data replication increases data transfer and storage costs, and it does not reduce compute costs; it is a disaster recovery or latency optimization strategy, not a cost-saving measure for compute. Option B is wrong because io2 Block Express volumes are high-performance, high-cost SSD volumes designed for latency-sensitive workloads like databases, not for reducing compute costs; they would increase storage costs without affecting compute efficiency. Option D is wrong because Dedicated Hosts are a licensing option that incurs additional per-host charges and are only cost-effective for specific scenarios like bring-your-own-license (BYOL) software with socket/core restrictions; they do not reduce compute costs for open-source software and would increase operational overhead.

Full explanation →

502

MCQmedium

Your team runs a tightly coupled distributed workload (for example, synchronous training nodes) across many EC2 instances placed within a single cluster environment. The instances need low-latency networking to reduce delays at synchronization barriers. Which EC2 placement strategy should you use to improve inter-node latency?

A.Create a placement group with the 'spread' strategy to separate instances across underlying hardware for fault tolerance.

B.Create a placement group with the 'cluster' strategy to place instances close together and reduce network latency.

C.Use the default placement strategy and rely on Auto Scaling to keep instances from drifting to different locations.

D.Avoid placement groups and instead use Amazon S3 for inter-node messaging to minimize direct network traffic between instances.

AnswerB

Cluster placement groups are intended to place instances in close proximity to provide high-bandwidth, low-latency networking. For tightly coupled workloads, this improves the likelihood of reduced latency and faster completion of synchronization barriers.

Why this answer

A cluster placement group is the correct choice because it groups instances in a single Availability Zone with low-latency, high-bandwidth networking, ideal for tightly coupled workloads like synchronous training nodes that require minimal delay at synchronization barriers. This strategy places instances physically close together within the same rack or cluster, reducing network round-trip time and maximizing throughput for inter-node communication.

Exam trap

The trap here is that candidates may confuse 'spread' with 'cluster' placement groups, assuming fault tolerance is always the priority, but for tightly coupled workloads requiring low latency, the cluster strategy is the correct choice despite its reduced fault tolerance.

How to eliminate wrong answers

Option A is wrong because the 'spread' strategy places instances on distinct hardware to maximize fault tolerance, which increases network latency due to physical separation, making it unsuitable for low-latency inter-node communication. Option C is wrong because the default placement strategy does not guarantee proximity; instances can be placed on different racks or hosts, leading to higher and unpredictable latency, and Auto Scaling does not control placement to reduce latency. Option D is wrong because using Amazon S3 for inter-node messaging introduces significant latency and throughput bottlenecks compared to direct network communication, and it is not designed for real-time, low-latency synchronization in tightly coupled workloads.

Full explanation →

503

MCQmedium

A company runs an application in private subnets (no inbound internet). The application must access Amazon S3 and AWS Secrets Manager endpoints without routing through the public internet and without exposing the instances to NAT gateways due to cost. Security requirements also state that only the required VPC traffic should be allowed to reach AWS services. Which architecture best satisfies these requirements?

A.Place instances in private subnets but use NAT gateways so traffic to S3 and Secrets Manager goes through the internet; restrict security groups to instance-to-instance only.

B.Add a VPC gateway endpoint for S3 and an interface VPC endpoint for Secrets Manager; keep instances in private subnets and configure security group rules attached to the endpoints to allow inbound traffic only from the application subnets.

C.Use public subnets with instances that have no security group rules; rely on AWS services to reject unauthorized traffic.

D.Create an S3 bucket policy that allows requests from the application instances’ private IP addresses and enable public access to Secrets Manager via the default service endpoint.

AnswerB

Gateway endpoints provide private routing to S3, and interface endpoints provide private access to Secrets Manager without internet traversal. Security group controls on interface endpoints restrict traffic to only the application subnets, meeting segmentation and cost constraints.

Why this answer

Option B is correct because it uses a VPC gateway endpoint for Amazon S3 and an interface VPC endpoint for AWS Secrets Manager, both of which allow private subnet instances to access these services without traversing the public internet or requiring a NAT gateway. The security group rules attached to the interface endpoint restrict inbound traffic to only the application subnets, satisfying the security requirement of allowing only required VPC traffic. This architecture meets all constraints: no public internet, no NAT gateway cost, and least-privilege access.

Exam trap

The trap here is that candidates often assume all AWS services require NAT gateways or internet gateways for private subnet access, overlooking the distinction between gateway endpoints (for S3 and DynamoDB) and interface endpoints (for most other services like Secrets Manager) that provide private connectivity without internet exposure.

How to eliminate wrong answers

Option A is wrong because NAT gateways incur cost and route traffic through the internet, violating the 'without exposing the instances to NAT gateways due to cost' requirement; additionally, security groups on instances alone do not restrict traffic to AWS service endpoints. Option C is wrong because public subnets expose instances to inbound internet traffic, contradicting the 'no inbound internet' requirement, and having no security group rules is a severe security violation. Option D is wrong because enabling public access to Secrets Manager via the default service endpoint exposes it to the internet, and S3 bucket policies based on private IP addresses are unreliable since private IPs can change and do not authenticate the requester; Secrets Manager requires interface endpoints or private connectivity, not public endpoints.

Full explanation →

504

Multi-Selectmedium

A containerized service on Amazon ECS connects to a database with a password that must never be stored in plaintext or hardcoded in the image. The application reads the password at startup and occasionally reconnects later, so it needs to retrieve the current secret when needed. Which three actions should the architect take? Select three.

Select 3 answers

A.Store the database password in AWS Secrets Manager.

B.Have the application retrieve the secret from Secrets Manager at runtime when it needs the password.

C.Grant the ECS task role least-privilege permission to read only that secret.

D.Store the password in a plain environment variable and update it manually during maintenance windows.

E.Use an IAM user access key inside the container so the database password can be embedded in code.

AnswersA, B, C

Secrets Manager is designed for sensitive credentials and integrates with IAM and rotation features. It is a better fit than putting passwords in code, images, or plain variables.

Why this answer

AWS Secrets Manager is the correct service for securely storing and automatically rotating database credentials. It eliminates the need to hardcode secrets in the container image or store them in plaintext. The application retrieves the secret at runtime via the AWS SDK, ensuring the current password is always used.

Exam trap

The trap here is that candidates might think environment variables or IAM access keys are acceptable for storing secrets, but AWS explicitly requires secrets to be stored in a dedicated secrets management service with least-privilege IAM permissions.

Full explanation →

505

MCQmedium

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The design must avoid adding custom operational scripts.

A.S3 Cross-Region Replication

B.Multi-AZ deployment for the RDS DB instance

C.EBS snapshots every hour

D.Read replicas only

AnswerB

Multi-AZ provides synchronous standby replication and automatic failover within a Region.

Why this answer

Multi-AZ deployment for RDS MySQL provides automatic failover to a standby replica in a different Availability Zone. This ensures high availability during an AZ failure with minimal application changes, as the DNS endpoint remains the same and failover is handled by AWS without custom scripts.

Exam trap

The trap here is that candidates often confuse read replicas with Multi-AZ deployments, assuming read replicas provide automatic failover, but they require manual promotion and do not maintain the same endpoint.

How to eliminate wrong answers

Option A is wrong because S3 Cross-Region Replication is for object storage replication across regions, not for database availability within a region, and it does not address RDS MySQL failover. Option C is wrong because EBS snapshots every hour provide point-in-time backups but do not enable automatic failover or maintain availability during an AZ failure; recovery would require manual intervention and data loss. Option D is wrong because read replicas are for read scaling and do not provide automatic failover for the primary instance; promoting a read replica requires manual steps or custom scripts, violating the 'no custom operational scripts' constraint.

Full explanation →

506

MCQeasy

A CI/CD pipeline needs to deploy to your production environment. Security requires that the pipeline uses temporary credentials (not long-lived access keys) and only has permissions to read a specific set of parameters from AWS Systems Manager Parameter Store and write application logs to CloudWatch Logs. What is the best AWS approach?

A.Create an IAM user for the pipeline and store access keys in the CI system.

B.Create an IAM role in the production account, grant least-privilege policies, and let the CI assume it using STS AssumeRole.

C.Attach the required permissions to an IAM group and add the pipeline’s principal to that group directly.

D.Use AWS KMS to encrypt the pipeline’s access keys and store the ciphertext in the CI system.

AnswerB

IAM roles with STS provide temporary credentials and allow least-privilege permissions via attached policies.

Why this answer

Option B is correct because it uses an IAM role with least-privilege policies that the CI/CD pipeline can assume via AWS STS AssumeRole, providing temporary credentials that automatically expire. This avoids long-lived access keys and meets the security requirement of using temporary credentials. The role can be scoped to allow only reading specific parameters from Systems Manager Parameter Store and writing logs to CloudWatch Logs, adhering to the principle of least privilege.

Exam trap

The trap here is that candidates may think IAM users with access keys are acceptable for automation, but the question explicitly requires temporary credentials, making the IAM role with STS AssumeRole the only correct approach.

How to eliminate wrong answers

Option A is wrong because it creates an IAM user with long-lived access keys, which violates the security requirement for temporary credentials and introduces a risk of key exposure. Option C is wrong because IAM groups are used to manage permissions for IAM users, not for external CI/CD systems; the pipeline's principal cannot be added to an IAM group directly without an IAM user, and this still relies on long-lived credentials. Option D is wrong because encrypting access keys with KMS does not eliminate the use of long-lived access keys; the pipeline would still need to decrypt and use them, which does not meet the requirement for temporary credentials.

Full explanation →

507

MCQmedium

A solutions architect is designing an S3 bucket for a healthcare document service. The objects must never be publicly accessible, even if a developer later adds an overly broad bucket policy. What should the architect configure? The design must avoid adding custom operational scripts.

A.Enable server access logging on the bucket

B.Enable S3 Transfer Acceleration

C.Create an IAM policy that denies s3:GetObject to anonymous users

D.Enable S3 Block Public Access at the account or bucket level

AnswerD

S3 Block Public Access prevents public ACLs and public bucket policies from exposing the bucket.

Why this answer

S3 Block Public Access provides a definitive override that prevents any public access to S3 objects, regardless of bucket policies or ACLs. By enabling this setting at the account or bucket level, the architect ensures that even if a developer later attaches an overly permissive bucket policy, the objects remain inaccessible to anonymous users. This meets the requirement without requiring custom operational scripts.

Exam trap

The trap here is that candidates often think an IAM policy can block anonymous access, but IAM policies do not apply to unauthenticated principals; only bucket policies or S3 Block Public Access can effectively deny anonymous access.

How to eliminate wrong answers

Option A is wrong because server access logging records requests to the bucket but does not enforce any access restrictions; it only provides audit logs. Option B is wrong because S3 Transfer Acceleration speeds up uploads over long distances using edge locations but has no effect on access control or public accessibility. Option C is wrong because an IAM policy that denies s3:GetObject to anonymous users only applies to IAM principals; anonymous users are not IAM principals, so such a policy would not block access granted by a bucket policy that explicitly allows anonymous access.

Full explanation →

508

Multi-Selecteasy

A web application runs on an Auto Scaling group behind an Application Load Balancer. The business wants the service to keep running if one Availability Zone goes down. Which two changes should you make? Select two.

Select 2 answers

A.Place the Auto Scaling group in subnets across at least two Availability Zones.

B.Attach the Application Load Balancer to subnets in at least two Availability Zones.

C.Increase the instance size so each server can handle more traffic alone.

D.Disable ALB health checks so instances stay registered longer.

E.Run the whole stack in one Availability Zone for simpler networking.

AnswersA, B

Spreading the Auto Scaling group across multiple Availability Zones lets EC2 capacity remain available if one Zone fails. The group can continue launching and serving instances in the remaining healthy Zone, which improves availability without changing the application itself.

Why this answer

Option A is correct because placing the Auto Scaling group in subnets across at least two Availability Zones ensures that if one AZ fails, the Auto Scaling group can still launch instances in the remaining healthy AZ(s). This is a fundamental pattern for high availability, as the Auto Scaling group distributes instances across the specified subnets, and if an entire AZ becomes unavailable, instances in other AZs continue to serve traffic.

Exam trap

The trap here is that candidates often think increasing instance size or disabling health checks provides resilience, but AWS's high availability model relies on distributing resources across multiple Availability Zones, not on making individual instances more powerful or ignoring failures.

Full explanation →

509

MCQmedium

An application runs on EC2 instances in private subnets in a VPC. There is no NAT gateway. The instances need to download objects from S3 over HTTPS and also call DynamoDB. The security group outbound rules allow TCP 443 to the VPC endpoint addresses. After deployment, the app times out when connecting to S3, but it can reach DynamoDB. Which single change is most likely to restore S3 connectivity?

A.Create a Gateway VPC endpoint for S3 and associate it with the private subnet route tables that contain the instances.

B.Replace the security group egress rule to allow all outbound traffic to 0.0.0.0/0 on TCP 443.

C.Add an Internet Gateway to the VPC and route the private subnet’s 0.0.0.0/0 to the IGW.

D.Switch from network ACLs to security groups by removing the existing NACL allow rules for ephemeral ports.

AnswerA

S3 connectivity without NAT typically requires a Gateway VPC endpoint. For a gateway endpoint, you must update the route tables to direct S3 traffic to the endpoint. If DynamoDB works but S3 times out, it often means DynamoDB has the required endpoint while S3 is missing or not routed via the correct route tables.

Why this answer

The application is in private subnets without a NAT Gateway, so it cannot reach the internet. A Gateway VPC Endpoint for S3 allows private subnet instances to access S3 over the AWS network without internet connectivity. The security group already allows TCP 443 to the VPC endpoint addresses, so the missing piece is the route table association that directs S3 traffic to the endpoint.

Exam trap

The trap here is that candidates often confuse Gateway VPC Endpoints with Interface Endpoints, or assume that allowing outbound HTTPS to 0.0.0.0/0 in a security group is sufficient, forgetting that private subnets have no internet route without a NAT Gateway or Internet Gateway.

How to eliminate wrong answers

Option B is wrong because allowing all outbound traffic to 0.0.0.0/0 on TCP 443 would not help, as the private subnets have no route to the internet (no NAT Gateway or Internet Gateway), so traffic would still be dropped. Option C is wrong because adding an Internet Gateway and routing 0.0.0.0/0 to it would require the private subnet instances to have public IPs or a NAT device, and it would break the security posture by exposing them to the internet. Option D is wrong because network ACLs are stateless and must allow ephemeral ports for return traffic, but the issue is not about NACLs; it is about the lack of a route to S3, and switching to security groups does not solve the routing problem.

Full explanation →

510

MCQhard

Based on the exhibit, the team must restore an Amazon RDS for PostgreSQL database to the exact state just before a bad delete happened. What is the best recovery approach?

A.Restore the latest automated snapshot and accept data loss from the last backup window.

B.Perform a point-in-time restore to 2026-04-27 15:10 UTC into a new DB instance, then cut over after validation.

C.Promote a read replica because it will contain the deleted rows and can replace the primary immediately.

D.Enable Multi-AZ on the current database and wait for automatic failover to reverse the delete.

AnswerB

Point-in-time restore uses the automated backups and transaction logs to rebuild the database to an exact time before the bad change. The exhibit confirms the requested restore time is within the restorable window, and the business wants to validate the restored copy before switching traffic. Restoring to a new instance first is the safest way to recover without risking the current production database.

Why this answer

Point-in-time recovery (PITR) allows you to restore an Amazon RDS for PostgreSQL database to any second within the backup retention period, using automated backups and transaction logs. By restoring to 2026-04-27 15:10 UTC, just before the bad delete occurred, you can recover the exact state without data loss, then cut over after validation.

Exam trap

The trap here is that candidates often confuse read replicas or Multi-AZ as solutions for logical data corruption, when in fact they only protect against infrastructure failures, not user errors like a bad delete.

How to eliminate wrong answers

Option A is wrong because restoring the latest automated snapshot would only recover data up to the last snapshot time, which could be hours before the delete, resulting in data loss from the backup window. Option C is wrong because a read replica in RDS for PostgreSQL does not contain deleted rows from the primary; it applies the same changes asynchronously, so the delete would also be replicated, and promoting it would not recover the lost data. Option D is wrong because enabling Multi-AZ provides high availability through synchronous replication to a standby in another Availability Zone, but it does not protect against logical errors like a bad delete; the delete would be replicated to the standby, and failover would not reverse it.

Full explanation →

511

Matchinghard

Match each workload to the most cost-effective compute model or service choice. Focus on how often the workload runs, whether it is interruption-tolerant, and how much administration the team wants to avoid.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

AWS Lambda

Amazon ECS on AWS Fargate

EC2 Spot Instances

EC2 On-Demand Instances

Why these pairings

Lambda is ideal for short, frequent tasks; Batch on Fargate for nightly jobs; Spot Instances for fault-tolerant workloads; On-Demand for stateful; ECS Fargate reduces management; Auto Scaling for variable traffic.

Full explanation →

512

MCQmedium

A company hosts a e-learning platform on EC2. Administrators must connect without opening SSH or RDP ports to the internet. What should the architect use?

A.A public Elastic IP address on each instance

B.A bastion host with SSH open to 0.0.0.0/0

C.An internet gateway attached to the private subnet

D.AWS Systems Manager Session Manager with the required instance role

AnswerD

Session Manager provides audited shell access without inbound SSH/RDP exposure.

Why this answer

AWS Systems Manager Session Manager allows secure shell access to EC2 instances without opening inbound ports (SSH/RDP) or using a bastion host. It uses the AWS Systems Manager agent and an IAM instance role to establish a bidirectional connection over HTTPS to the AWS Systems Manager service, eliminating the need for public IP addresses or internet-facing security groups.

Exam trap

The trap here is that candidates often assume a bastion host (Option B) is the only secure way to manage instances, but they overlook that Session Manager provides a more secure, agent-based solution that eliminates the need for any open inbound ports or public IP addresses.

How to eliminate wrong answers

Option A is wrong because assigning a public Elastic IP address to each instance would expose them directly to the internet, requiring open SSH or RDP ports, which violates the requirement to avoid opening those ports. Option B is wrong because a bastion host with SSH open to 0.0.0.0/0 exposes the bastion to the entire internet, creating a security risk and still requires opening SSH ports, which contradicts the requirement. Option C is wrong because an internet gateway attached to a private subnet does not provide connectivity; internet gateways must be attached to VPCs and associated with route tables for public subnets, and private subnets cannot directly use an internet gateway without a NAT device, which still does not solve the administrative access need without open ports.

Full explanation →

513

Drag & Dropmedium

Arrange the steps for a cross-region Amazon S3 replication configuration.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Versioning must be enabled first, then IAM role, replication rules, destination config, and verification.

Full explanation →

514

MCQhard

A patient portal must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable?

A.Use an in-memory queue on one EC2 instance

B.Use UDP messages sent directly to workers

C.Use Amazon SQS standard queue and design consumers to be idempotent

D.Use CloudFront signed URLs

AnswerC

SQS standard queues provide at-least-once delivery and high throughput; consumers must handle occasional duplicates.

Why this answer

Amazon SQS standard queues provide at-least-once delivery, meaning each message is delivered at least once but may occasionally be delivered more than once. This aligns with the requirement that every event must be processed at least once, and since duplicate processing is acceptable when consumers are idempotent, the standard queue is the most suitable choice. SQS handles the decoupling and durability of messages without requiring custom infrastructure.

Exam trap

The trap here is that candidates may confuse 'at-least-once' with 'exactly-once' and incorrectly choose a FIFO queue or another option, but the question explicitly accepts duplicates if the consumer handles idempotency, making the standard queue the correct choice.

How to eliminate wrong answers

Option A is wrong because an in-memory queue on a single EC2 instance is not durable, cannot survive instance failures, and does not provide at-least-once delivery guarantees across distributed consumers. Option B is wrong because UDP is a connectionless, unreliable protocol that does not guarantee message delivery, ordering, or duplicate detection, making it unsuitable for at-least-once processing. Option D is wrong because CloudFront signed URLs are used for secure content delivery and access control, not for event messaging or queue-based processing.

Full explanation →

515

MCQmedium

A media platform stores originals in an S3 bucket. The application must: (1) prevent any public access to the bucket, (2) allow authenticated users to upload and download objects using presigned URLs, and (3) enforce that all requests use HTTPS and only touch objects under the user-specific prefix (for example, s3://media-originals/user-123/*). The bucket currently allows uploads but sometimes returns 403 AccessDenied for presigned URLs. Which change is the best fix while meeting the security requirements?

A.Disable S3 Block Public Access and add an ACL that grants READ and WRITE to the bucket owner only.

B.Keep Block Public Access enabled, remove any Allow statement to Principal="*", and use a bucket policy or access point policy that denies non-HTTPS requests and allows PutObject/GetObject only when the object key matches the authenticated user's session tag, such as arn:aws:s3:::media-originals/${aws:PrincipalTag/userId}/*.

C.Use bucket website hosting and allow public GET requests so presigned URLs are not needed for downloads.

D.Use ACLs to grant ObjectOwner full control and rely on the application to generate presigned URLs with longer expirations to avoid 403 errors.

AnswerB

Block Public Access ensures the bucket cannot become public. A policy that denies non-HTTPS traffic and scopes object ARNs to a session tag or equivalent identity attribute enforces user-specific access without relying on public principals.

Why this answer

Option B is correct because it keeps S3 Block Public Access enabled (preventing any public access), uses a bucket policy or access point policy with a condition key like `aws:PrincipalTag` to restrict `PutObject`/`GetObject` to the user-specific prefix (e.g., `arn:aws:s3:::media-originals/${aws:PrincipalTag/userId}/*`), and denies non-HTTPS requests via a `aws:SecureTransport` condition. This ensures presigned URLs work only for authenticated users with the correct session tag, while eliminating the 403 errors caused by overly restrictive policies or missing principal restrictions.

Exam trap

The trap here is that candidates assume presigned URLs bypass all bucket policies, but in reality, presigned URLs are subject to the same bucket policies and IAM permissions as the signing principal, so a missing or overly restrictive policy condition (like not scoping to the user-specific prefix) causes 403 errors.

How to eliminate wrong answers

Option A is wrong because disabling S3 Block Public Access and using an ACL that grants READ and WRITE to the bucket owner only does not prevent public access — Block Public Access is the primary safeguard, and ACLs are legacy and do not enforce user-specific prefix restrictions or HTTPS. Option C is wrong because using bucket website hosting with public GET requests violates the requirement to prevent any public access and makes presigned URLs unnecessary, but it exposes objects to the internet. Option D is wrong because ACLs granting ObjectOwner full control do not enforce user-specific prefix restrictions or HTTPS, and relying on longer presigned URL expirations does not fix the 403 error caused by missing policy conditions or incorrect principal restrictions.

Full explanation →

516

MCQhard

Based on the exhibit, the team wants to minimize compute cost for a workload with a steady 24/7 baseline and a separate nightly batch job that can be interrupted and resumed from checkpoints. They also expect to change EC2 instance families during the year as performance needs evolve. Which approach is the best fit?

A.Buy EC2 Instance Savings Plans for the baseline and run the nightly batch on On-Demand instances.

B.Use a Compute Savings Plan to cover the steady baseline and run the nightly batch on Spot Instances.

C.Purchase Standard Reserved Instances for all 12 instances and keep the current families fixed.

D.Run both tiers entirely on Spot Instances and rely on automatic restarts for the baseline web tier.

AnswerB

A Compute Savings Plan provides discount coverage while preserving flexibility across EC2 families and even other compute services. That makes it ideal for the steady baseline when future family changes are expected. Spot Instances are the lowest-cost choice for the restartable batch tier because interruptions are acceptable and checkpointing is already in place.

Why this answer

Option B is correct because a Compute Savings Plan covers any EC2 instance family (or even container/Fargate usage) at a discounted rate, making it ideal for the steady 24/7 baseline. The nightly batch job can be interrupted and resumed from checkpoints, which is a perfect use case for Spot Instances, offering up to 90% cost savings. This combination minimizes compute cost while maintaining flexibility to change instance families during the year.

Exam trap

The trap here is that candidates often assume Reserved Instances or Instance Savings Plans are always cheaper, but they fail to recognize that the requirement to change instance families during the year makes Compute Savings Plans the only flexible discount option, and they overlook that Spot Instances are ideal for interruptible batch jobs.

How to eliminate wrong answers

Option A is wrong because EC2 Instance Savings Plans lock you into a specific instance family within a region, which conflicts with the requirement to change instance families during the year; also, running the nightly batch on On-Demand instances is more expensive than using Spot Instances. Option C is wrong because Standard Reserved Instances require a 1- or 3-year commitment and lock you into a specific instance family, which prevents the flexibility to change families and does not leverage Spot Instances for the interruptible batch job. Option D is wrong because running the steady baseline entirely on Spot Instances risks interruption (Spot Instances can be reclaimed with a 2-minute warning), which is unsuitable for a 24/7 workload that must remain stable and available.

Full explanation →

517

Drag & Dropmedium

Arrange the steps to create an encrypted Amazon EBS volume from scratch in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Encryption requires a KMS key first, then volume creation with encryption, attach, format/mount, and verification.

Full explanation →

518

MCQeasy

A worker consumes messages from an Amazon SQS queue. Some messages consistently fail validation and are retried until the worker can no longer process them. What is the most appropriate AWS mechanism to handle these poison messages while keeping the queue usable?

A.Enable SQS long polling and increase the maximum message size for the queue.

B.Send failing messages to an SQS dead-letter queue (DLQ) using a redrive policy based on receive count.

C.Change the queue to a FIFO queue and handle duplicates in the worker code without DLQs.

D.Delete the queue and recreate it hourly to clear out any problematic messages.

AnswerB

A DLQ with a redrive policy isolates poison messages. After a message is received and fails processing more than the configured maxReceiveCount, SQS moves it to the DLQ, preventing it from continually blocking retries in the source queue.

Why this answer

Option B is correct because an SQS dead-letter queue (DLQ) with a redrive policy based on receive count allows messages that repeatedly fail processing (poison pills) to be moved out of the main queue after a specified number of retries. This keeps the main queue operational for valid messages and isolates problematic messages for later analysis or manual intervention.

Exam trap

The trap here is that candidates may think increasing retries or message size (Option A) solves the problem, but the exam specifically tests the concept of isolating poison messages via a DLQ with a receive-count-based redrive policy to maintain queue availability.

How to eliminate wrong answers

Option A is wrong because enabling long polling and increasing maximum message size does not address the core issue of messages that consistently fail validation; long polling reduces empty responses and larger message size allows bigger payloads, but neither prevents poison messages from blocking processing. Option C is wrong because changing to a FIFO queue does not inherently handle poison messages; FIFO queues preserve order and deduplicate based on message deduplication ID, but they still require a DLQ or explicit error handling to remove failing messages, and the worker code alone cannot prevent retries from exhausting resources. Option D is wrong because deleting and recreating the queue hourly is a disruptive, non-scalable approach that loses all messages (including valid ones) and does not provide a mechanism to isolate or analyze poison messages; it also violates the requirement to keep the queue usable.

Full explanation →

519

MCQeasy

A retail API uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured?

A.Auto Scaling policy based on an appropriate CloudWatch metric

B.S3 Object Lock

C.A VPC endpoint for CloudWatch only

D.Disable health checks

AnswerA

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

An Auto Scaling policy based on an appropriate CloudWatch metric (such as CPUUtilization or ALBRequestCountPerTarget) dynamically adds or removes EC2 instances to match demand. This directly addresses the high CPU and rising latency by distributing the load across more instances, preventing performance degradation during peak traffic.

Exam trap

The trap here is that candidates may confuse operational features (like S3 Object Lock or VPC endpoints) with scaling mechanisms, or mistakenly think disabling health checks improves performance, when in fact it degrades reliability and latency.

How to eliminate wrong answers

Option B is wrong because S3 Object Lock is a data protection feature for Amazon S3 objects (preventing deletion or overwriting) and has no relevance to scaling compute resources or reducing request latency. Option C is wrong because a VPC endpoint for CloudWatch only enables private connectivity to CloudWatch APIs (e.g., for publishing metrics or logs) but does not scale EC2 capacity or reduce latency. Option D is wrong because disabling health checks would cause the ALB to continue routing traffic to unhealthy instances, worsening latency and potentially causing failures; health checks are essential for maintaining a reliable target group.

Full explanation →

520

MCQmedium

A data engineering team runs a nightly ETL job on EC2. The job can be checkpointed every 5 minutes and can be retried from the last checkpoint if the instance terminates. The job runtime varies from 2 to 4 hours, and the team has no need for a specific instance type, as long as it completes before 7:00 AM local time. They currently run the job on On-Demand EC2, leading to high monthly compute cost. Which change best reduces cost while maintaining the business deadline?

A.Use Spot Instances for the ETL workload, and configure the job to checkpoint frequently and restart on interruption.

B.Use Reserved Instances with a 1-year term to lower costs, since reservations provide discounts for any usage.

C.Switch to On-Demand but enable Auto Scaling so the job finishes faster during peak hours.

D.Use Spot Instances but disable checkpointing to simplify the application.

AnswerA

Spot can significantly reduce costs, and checkpointing plus retries mitigate interruption risk.

Why this answer

Option A is correct because Spot Instances offer significant cost savings (up to 90% compared to On-Demand) and are ideal for fault-tolerant, checkpointable workloads. The job's ability to checkpoint every 5 minutes and retry from the last checkpoint means it can gracefully handle Spot Instance interruptions, ensuring it still completes before the 7:00 AM deadline without incurring the high cost of On-Demand instances.

Exam trap

The trap here is that candidates may think Reserved Instances are always cheaper, but they fail to consider the low utilization of a nightly job, making Spot Instances with checkpointing the true cost-optimized solution for fault-tolerant workloads.

How to eliminate wrong answers

Option B is wrong because Reserved Instances require a 1-year or 3-year commitment and are best suited for steady-state, predictable workloads, not for a nightly ETL job that runs only 2–4 hours per day, leading to underutilization and higher effective cost. Option C is wrong because enabling Auto Scaling on On-Demand instances does not reduce cost; it may increase cost by launching additional instances, and the job already completes within the required window without needing faster execution. Option D is wrong because disabling checkpointing on Spot Instances removes fault tolerance, making the job vulnerable to interruption failures and risking the 7:00 AM deadline, as the job would have to restart from scratch.

Full explanation →

521

MCQmedium

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The team wants the control to be enforceable during normal operations.

A.A single EC2 instance with detailed monitoring

B.Subnets in at least two Availability Zones with health checks enabled

C.All instances in one larger subnet

D.A Network Load Balancer in one subnet

AnswerB

An Auto Scaling group spanning multiple AZs can replace unhealthy instances and maintain capacity during an AZ failure.

Why this answer

Option B is correct because distributing EC2 instances across at least two Availability Zones (AZs) ensures that if one AZ fails, the Auto Scaling group can maintain capacity in the remaining AZ(s). Enabling health checks allows the group to detect instance failures and automatically replace them, providing fault tolerance. This configuration meets the requirement to tolerate a single AZ failure while remaining enforceable during normal operations.

Exam trap

The trap here is that candidates often confuse high availability (spanning multiple AZs) with fault tolerance at the instance level, mistakenly thinking a single instance with monitoring or a single subnet can survive an AZ failure.

How to eliminate wrong answers

Option A is wrong because a single EC2 instance, even with detailed monitoring, cannot tolerate the failure of an entire Availability Zone; if that AZ goes down, the instance becomes unavailable. Option C is wrong because placing all instances in one larger subnet within a single AZ creates a single point of failure; an AZ failure would take down all instances. Option D is wrong because a Network Load Balancer in one subnet does not provide AZ-level fault tolerance; it still relies on that single AZ, and the Auto Scaling group must span multiple AZs for resilience.

Full explanation →

522

Multi-Selectmedium

A service in private subnets downloads product images from Amazon S3 and stores job state in DynamoDB. A NAT Gateway is currently the only route to AWS services, and the monthly bill is dominated by NAT data processing charges. Which two changes will most directly reduce that cost? Select two.

Select 2 answers

A.Create a gateway VPC endpoint for Amazon S3.

B.Create a gateway VPC endpoint for Amazon DynamoDB.

C.Add an internet gateway and move the instances into public subnets.

D.Replace the NAT Gateway with a Site-to-Site VPN connection.

E.Create an interface endpoint for S3 instead of a gateway endpoint.

AnswersA, B

An S3 gateway endpoint routes S3 traffic over the AWS private network instead of through the NAT Gateway. That removes NAT data processing charges for the S3 downloads and is one of the most direct cost optimizations for private-subnet workloads.

Why this answer

A is correct because a gateway VPC endpoint for Amazon S3 allows instances in private subnets to access S3 directly over the AWS network without traversing the internet or a NAT Gateway. This eliminates NAT data processing charges for S3 traffic, which is the dominant cost driver in this scenario.

Exam trap

The trap here is that candidates may think interface endpoints are always better for security or performance, but for S3 and DynamoDB, gateway endpoints are free and more cost-effective, while interface endpoints incur additional charges.

Full explanation →

523

MCQmedium

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The design must avoid adding custom operational scripts.

A.S3 Cross-Region Replication

B.Multi-AZ deployment for the RDS DB instance

C.Read replicas only

D.EBS snapshots every hour

AnswerB

Multi-AZ provides synchronous standby replication and automatic failover within a Region.

Why this answer

Multi-AZ deployment for RDS MySQL automatically provisions and maintains a synchronous standby replica in a different Availability Zone. If the primary AZ fails, RDS performs an automatic failover to the standby, ensuring database availability with minimal application changes (the application only needs to reconnect using the same endpoint). This meets the requirement of avoiding custom operational scripts.

Exam trap

The trap here is that candidates often confuse read replicas with Multi-AZ, thinking read replicas provide high availability, but they lack automatic failover for write traffic and require manual promotion, which violates the 'no custom operational scripts' constraint.

How to eliminate wrong answers

Option A is wrong because S3 Cross-Region Replication is for object storage replication across AWS regions, not for RDS database availability during an AZ failure, and it would require significant application changes to redirect traffic. Option C is wrong because read replicas are designed for read scaling and do not provide automatic failover for write operations; promoting a read replica to a primary requires manual intervention or custom scripts. Option D is wrong because EBS snapshots every hour provide point-in-time backups but do not enable automatic failover; restoring from a snapshot would cause significant downtime and require custom scripts to automate recovery.

Full explanation →

524

MCQhard

A media processing workflow generates analytics files that are accessed unpredictably. Some files become hot again months later. The team wants automatic storage cost optimisation without retrieval delays. What should be used?

A.S3 Intelligent-Tiering

B.Manual monthly review and object copying

C.S3 Glacier Flexible Retrieval for all files

D.EFS One Zone for analytics files

AnswerA

Intelligent-Tiering automatically moves objects between access tiers based on usage while preserving low-latency access.

Why this answer

S3 Intelligent-Tiering automatically moves objects between access tiers (frequent, infrequent, and archive instant access) based on changing access patterns, with no retrieval delays for hot objects. This is ideal for unpredictable access where some files become hot again months later, as it optimizes storage costs without manual intervention or retrieval latency.

Exam trap

The trap here is that candidates may choose S3 Glacier Flexible Retrieval (Option C) thinking it is the cheapest archival option, but they overlook the requirement for 'no retrieval delays' and the unpredictable access pattern that makes Intelligent-Tiering's automatic tiering the correct choice.

How to eliminate wrong answers

Option B is wrong because manual monthly review and object copying is labor-intensive, error-prone, and cannot react to unpredictable access patterns in real time, leading to either higher costs or retrieval delays. Option C is wrong because S3 Glacier Flexible Retrieval has retrieval delays (minutes to hours) and is not suitable for files that may become hot again unpredictably, as it would introduce unacceptable latency. Option D is wrong because EFS One Zone is a file system, not an object storage service, and is designed for low-latency shared access within a single AZ, not for cost-optimized archival of analytics files with unpredictable retrieval.

Full explanation →

525

MCQmedium

You run a web application on an EC2 Auto Scaling group behind an Application Load Balancer (ALB). During scheduled traffic spikes, new instances launch but customers occasionally see 5xx errors for the first few minutes after scale-out. Operational logs show instances need ~4 minutes to warm up (load caches and initialize dependencies). ALB target health becomes healthy only after this warm-up. Which change most directly improves performance during spikes by reducing the time to serve traffic after scaling?

A.Configure a larger ALB deregistration delay so that old targets remain longer before termination.

B.Use an Auto Scaling warm pool so instances are pre-initialized and ready to register quickly when the ASG scales out.

C.Increase the number of desired instances immediately without using scaling policies, and then rely on manual reconfiguration.

D.Switch from ALB to NLB so instances become reachable sooner without waiting for health checks.

AnswerB

With a warm pool, Auto Scaling can launch and keep a set of instances in a pre-initialized state (for example, instances are already booted and have completed parts of startup/initialization as supported by warm pool behavior). When scaling triggers, these instances can transition to service faster and begin registering with the ALB. Because your bottleneck is that instances take ~4 minutes to become truly ready, warming them ahead of time most directly reduces the gap between scale-out and customer-ready capacity (and therefore reduces 5xx occurrences while waiting for targets to pass ALB health checks).

Why this answer

B is correct because a warm pool pre-initializes instances (e.g., loading caches and dependencies) before they are added to the Auto Scaling group. When the ASG scales out, these pre-warmed instances can be quickly moved into service, bypassing the ~4-minute warm-up delay and reducing the window for 5xx errors.

Exam trap

The trap here is that candidates may think NLB bypasses health checks entirely, but in reality NLB still requires health checks to mark targets as healthy, and the application warm-up delay remains the bottleneck.

How to eliminate wrong answers

Option A is wrong because increasing the deregistration delay keeps old targets alive longer, which does not help new instances serve traffic faster; it only delays termination of existing instances. Option C is wrong because manually setting desired instances without scaling policies is not automated and does not address the root cause of warm-up latency during spikes. Option D is wrong because switching to NLB does not eliminate the need for health checks or application warm-up; NLB health checks are still required and instances still need time to become healthy, so 5xx errors would persist.

Full explanation →

SAA-C03 (SAA-C03) — Questions 451–525