SAA-C03 SAA-C03 Questions 901–975 | Page 13/14

901

MCQeasy

A production Amazon RDS database has automated backups enabled with sufficient retention. At 10:30 UTC, a release corrupts specific rows. The issue is detected at 10:45 UTC. The team wants to restore the database state to before the corruption with minimal complexity. What should they do?

A.Perform a point-in-time restore (PITR) to a timestamp just before 10:30 UTC and create a restored DB instance/cluster.

B.Change the VPC route tables so the database restarts in a clean state.

C.Relaunch the same DB instance in the same Availability Zone and rely on caching to revert the changes.

D.Enable a DLQ on the database to store invalid SQL statements until the system is fixed.

AnswerA

PITR uses automated backups to restore the database to a specific point in time. Selecting a timestamp just before the corruption (for example, slightly before 10:30 UTC) restores the affected data state as it existed before the bad release.

Why this answer

Option A is correct because Amazon RDS Point-in-Time Restore (PITR) allows you to restore the database to any second within the backup retention period, using automated backups and transaction logs. By restoring to a timestamp just before 10:30 UTC, you can recover the database to a state before the corruption occurred, creating a new DB instance/cluster with minimal complexity and no data loss from the uncorrupted period.

Exam trap

The trap here is that candidates may confuse database recovery methods with network or application-level fixes, or incorrectly assume that restarting or relaunching an instance will clear data changes, when in fact only a restore from backup or PITR can revert committed transactions.

How to eliminate wrong answers

Option B is wrong because changing VPC route tables affects network traffic routing, not database state or data integrity; it cannot revert corrupt rows or restart the database in a clean state. Option C is wrong because relaunching the same DB instance in the same Availability Zone does not revert data changes; it simply creates a new instance with the same underlying storage, which still contains the corrupt rows. Option D is wrong because a Dead Letter Queue (DLQ) is a concept for message queues (like Amazon SQS) to handle failed message processing, not a feature of Amazon RDS; it cannot store or revert SQL statements.

Full explanation →

902

MCQmedium

A Multi-AZ Amazon RDS database experiences incorrect writes at 10:15 UTC due to a buggy release. The team detects the problem at 10:25 UTC. They want to restore the data to a known-good point around 10:15 UTC, and validate the recovered data, without taking the current production instance offline during the recovery process. What is the most appropriate AWS action?

A.Immediately reboot the RDS instance and rely on the reboot to roll back the bad writes.

B.Perform a point-in-time restore (PITR) to a new DB instance using a restore time around 10:15 UTC, then test the restored instance before cutting over.

C.Create a new Read Replica from the current primary and use it as the recovered database after applying reverse migrations.

D.Temporarily disable Multi-AZ to speed up storage rollback, then re-enable Multi-AZ.

AnswerB

PITR restores to a specific timestamp using backups and transaction logs. Importantly, it creates a recovered copy (typically a new DB instance), which allows validation and cutover decisions without stopping or directly impacting the existing production instance.

Why this answer

Option B is correct because Amazon RDS point-in-time recovery (PITR) allows you to restore a DB instance to any second within the backup retention period, creating a new, independent DB instance. This lets you validate the recovered data without affecting the current production instance, which remains online and serving traffic. The team can then cut over to the restored instance after confirming it is clean.

Exam trap

The trap here is that candidates may assume a reboot or Read Replica can undo bad writes, but neither provides a rollback mechanism; only PITR or a manual restore from a snapshot can recover to a specific point in time without affecting the live instance.

How to eliminate wrong answers

Option A is wrong because rebooting an RDS instance does not roll back writes; it only restarts the database engine and applies any pending maintenance or parameter changes, leaving the bad data intact. Option C is wrong because a Read Replica is an asynchronous copy of the primary that replicates all writes, including the buggy ones, so it cannot serve as a point-in-time recovery target without manual, error-prone reverse migrations. Option D is wrong because disabling Multi-AZ does not provide a storage rollback mechanism; it only removes the standby replica, and the primary's storage still contains the incorrect writes.

Full explanation →

903

MCQmedium

Developers for a financial reporting platform need temporary elevated access to production resources for troubleshooting. The security team wants approvals, expiry, and audit logging. Which approach is best?

A.Use IAM Identity Center permission sets with time-bound access processes and CloudTrail auditing

B.Disable CloudTrail during troubleshooting

C.Create shared administrator access keys for the team

D.Attach AdministratorAccess permanently to every developer role

AnswerA

Federated access with permission sets and audited temporary assignments reduces standing privilege.

Why this answer

IAM Identity Center (formerly AWS SSO) allows you to define permission sets with time-bound access, ensuring that developers receive temporary elevated permissions that automatically expire. Combined with AWS CloudTrail, all API calls made during the troubleshooting session are logged for audit, meeting the security team's requirements for approvals, expiry, and audit logging.

Exam trap

The trap here is that candidates may think IAM roles with a trust policy and `sts:AssumeRole` are sufficient, but without IAM Identity Center's permission sets and time-bound controls, they lack the centralized approval workflow and automatic expiry that the question explicitly requires.

How to eliminate wrong answers

Option B is wrong because disabling CloudTrail would eliminate audit logging, directly violating the security team's requirement for audit logging. Option C is wrong because creating shared administrator access keys violates the principle of least privilege, provides no individual accountability, and cannot enforce time-bound access or approvals. Option D is wrong because permanently attaching AdministratorAccess to every developer role grants persistent elevated privileges with no expiry, which contradicts the requirement for temporary, time-bound access and increases the attack surface.

Full explanation →

904

MCQmedium

A production team accidentally deletes critical rows in an Amazon RDS for PostgreSQL database. The deletion occurred about 6 hours ago. The team wants to recover to a specific point in time with minimal disruption. Assuming automated backups are enabled, which approach provides the best resilience outcome?

A.Restore the current DB instance in place by overwriting it with only the latest automated backup.

B.Use point-in-time recovery (PITR) to restore a new DB instance to a timestamp shortly before the deletion, then switch application traffic to the restored instance.

C.Create a manual snapshot and restore from it only if the snapshot date exactly matches today.

D.Perform a database-level rollback using transaction logs from the application server without using RDS restore features.

AnswerB

With automated backups enabled, PITR allows restoring to a precise timestamp within the retention window. Creating a new DB instance (rather than overwriting production) enables verification of data correctness and then a controlled cutover, minimizing disruption while meeting the “specific point in time” requirement.

Why this answer

Point-in-time recovery (PITR) allows you to restore a new DB instance to any second within the automated backup retention period, which includes transaction logs. By restoring to a timestamp just before the deletion, you recover the lost rows without affecting the current production instance, then switch traffic to the new instance for minimal disruption.

Exam trap

The trap here is that candidates may think restoring in place (Option A) is faster or simpler, but they overlook that PITR provides granular recovery without overwriting the production instance, which is the key to minimal disruption.

How to eliminate wrong answers

Option A is wrong because restoring in place overwrites the current DB instance with the latest automated backup, which does not contain the deleted rows (they were removed 6 hours ago) and causes significant downtime. Option C is wrong because manual snapshots capture the entire DB at a specific point in time, but they do not support point-in-time granularity; restoring from a snapshot taken today would still include the deletion if it occurred after the snapshot. Option D is wrong because RDS for PostgreSQL does not expose transaction logs for direct database-level rollback; application-level rollback cannot guarantee consistency with the RDS-managed storage engine.

Full explanation →

905

Multi-Selecthard

A media company serves versioned JavaScript and CSS files from an Amazon S3 origin through CloudFront. After each release, origin requests spike even though the files are public. Browser requests include a tracking cookie, an Authorization header, and a cache-busting query string that the site no longer needs. Which three changes will most improve the CloudFront cache hit ratio without exposing private content? Select three.

Select 3 answers

A.Rename each static asset with a content hash or release version in the filename before publishing.

B.Create a CloudFront cache policy that excludes unnecessary query strings and cookies from the cache key.

C.Use an origin request policy that forwards only the headers and cookies the origin truly needs.

D.Enable CloudFront compression and configure the origin to return Cache-Control: no-store for all files.

E.Forward all viewer headers to the origin so CloudFront can personalize every request.

AnswersA, B, C

Versioned filenames let CloudFront cache each asset for a long time without worrying about stale content. When the file name changes on release, clients naturally fetch the new object, and old cached objects remain valid for older pages until they expire.

Why this answer

Option A is correct because renaming static assets with a content hash or version in the filename ensures that each new release creates a unique object key in S3. This allows CloudFront to treat the new file as a distinct object, avoiding cache invalidation issues and enabling long-term caching of the old version. Without this, even with cache-busting query strings, CloudFront might still serve stale content or require frequent invalidations, reducing the cache hit ratio.

Exam trap

The trap here is that candidates often confuse origin request policies (which control what is sent to the origin) with cache policies (which control the cache key), leading them to think forwarding headers or cookies to the origin will improve caching, when in fact it can harm the cache hit ratio if those values are included in the cache key.

Full explanation →

906

Multi-Selecthard

An application uses Amazon Aurora MySQL. CloudWatch shows the writer instance near 85% CPU while the only reader instance averages 15% CPU. Trace logs show that all SELECT statements still target the writer endpoint. The workload is read-heavy, and the application already tolerates eventual consistency for reads. Which two changes will best increase total read throughput without a schema redesign? Select two.

Select 2 answers

A.Point read-only queries to the Aurora reader endpoint instead of the writer endpoint.

B.Add one or more additional Aurora Replicas and distribute read traffic across them.

C.Convert the cluster to a single-AZ RDS MySQL instance to reduce replication overhead.

D.Replace the writer endpoint with the instance endpoint of the primary node to speed up SELECT queries.

E.Add Amazon ElastiCache and move all database writes into the cache layer.

AnswersA, B

The reader endpoint is intended for read-only traffic and automatically distributes connections across Aurora Replicas. Redirecting SELECT statements away from the writer immediately reduces CPU pressure on the writer and uses the unused read capacity already available in the cluster. This is the fastest, lowest-risk way to improve read throughput without changing the schema or the application data model.

Why this answer

Option A is correct because the Aurora reader endpoint is designed to distribute read-only connections across all available Aurora Replicas, offloading SELECT queries from the writer instance. Currently, all SELECT statements target the writer endpoint, causing the writer's CPU to be at 85% while the reader instance is underutilized at 15%. By redirecting read traffic to the reader endpoint, the writer's CPU load decreases, and the existing reader instance can handle more read throughput without any schema changes.

Exam trap

The trap here is that candidates may think adding more reader instances alone solves the problem, but they must first redirect read traffic away from the writer endpoint—otherwise, the new replicas remain idle and the writer remains overloaded.

Full explanation →

907

MCQmedium

A mobile game backend uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added? The design must avoid adding custom operational scripts.

A.An internet gateway

B.S3 Select

C.RDS Proxy

D.A larger Route 53 hosted zone

AnswerC

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy is the correct choice because it pools and shares database connections, reducing the overhead of establishing new connections for each Lambda invocation. This prevents connection storms by maintaining a persistent pool of connections to Aurora, which is ideal for short-lived, high-frequency connections from serverless functions like Lambda.

Exam trap

The trap here is that candidates might think adding more network resources (like an internet gateway or larger DNS zone) solves connection storms, when the real issue is connection management at the database layer, not network capacity.

How to eliminate wrong answers

Option A is wrong because an internet gateway provides internet access to a VPC and does not manage database connections or connection pooling. Option B is wrong because S3 Select is used to retrieve subsets of data from objects in S3 using SQL expressions, not for managing database connections. Option D is wrong because a larger Route 53 hosted zone increases the number of DNS records you can host but does not affect database connection management or pooling.

Full explanation →

908

MCQhard

An application runs in private subnets and must download objects from Amazon S3 and read one secret from AWS Secrets Manager. NAT gateways are prohibited, and traffic must not traverse the public internet. The secret uses a customer managed KMS key. Which design is best?

A.Use a NAT gateway for outbound access and rely on security groups to block internet destinations.

B.Create interface VPC endpoints for both S3 and Secrets Manager and enable private DNS.

C.Create a gateway VPC endpoint for S3 and an interface VPC endpoint for Secrets Manager with private DNS enabled.

D.Use VPC peering to a public subnet that hosts a proxy for S3 and Secrets Manager access.

AnswerC

This combination keeps traffic on the AWS network without NAT. S3 is best accessed through a gateway endpoint, which is the native private connectivity option for S3. Secrets Manager requires an interface endpoint, and private DNS lets the application use standard service names while still resolving to the private endpoint. The KMS key is used by Secrets Manager service-side, not via a separate app network path.

Why this answer

Option C is correct because it uses a gateway VPC endpoint for S3, which provides private connectivity to S3 without traversing the internet, and an interface VPC endpoint for Secrets Manager, which allows private access via AWS PrivateLink. Enabling private DNS ensures that the standard DNS names for both services resolve to the endpoint IPs, keeping all traffic within the AWS network and meeting the requirement to avoid NAT gateways and public internet.

Exam trap

The trap here is that candidates often assume all AWS services require interface VPC endpoints, but S3 and DynamoDB use gateway endpoints, which are more cost-effective and simpler to configure for private access.

How to eliminate wrong answers

Option A is wrong because NAT gateways are explicitly prohibited by the requirement, and relying on security groups to block internet destinations does not prevent traffic from traversing the public internet; security groups control inbound/outbound traffic but do not alter the routing path. Option B is wrong because while interface VPC endpoints work for Secrets Manager, S3 does not support interface VPC endpoints in all regions and using one for S3 would incur higher costs and complexity; the recommended approach for S3 is a gateway endpoint, which is free and uses prefix lists. Option D is wrong because VPC peering to a public subnet with a proxy still requires traffic to traverse the internet or a NAT device, and it introduces a single point of failure and additional latency, violating the 'no public internet' requirement.

Full explanation →

909

MCQeasy

A company wants a disaster recovery setup for a web application. They want to keep costs low but still recover within a couple of hours after a regional disruption. They are willing to run only minimal infrastructure in the secondary location and scale it up during the outage. Which DR approach best matches this requirement?

A.Active-active, where both Regions run full production at all times.

B.Pilot light, where the secondary Region keeps minimal core components ready and scales up during failover.

C.Cold standby, where no infrastructure is running in the secondary Region until an outage occurs.

D.Backups-only, where recovery relies solely on manually restoring snapshots during an outage.

AnswerB

Pilot light maintains a small baseline in the secondary Region to enable faster, cost-optimized recovery.

Why this answer

The Pilot light approach is correct because it keeps minimal core components (e.g., a small database, a scaled-down application server) running in the secondary Region, allowing rapid failover by scaling up those resources during an outage. This meets the requirement of low cost during normal operations while achieving recovery within a couple of hours, as the core infrastructure is already provisioned and can be scaled horizontally (e.g., using Auto Scaling groups and pre-configured AMIs) without needing to rebuild from scratch.

Exam trap

The trap here is confusing Pilot light with Cold standby, as both involve minimal infrastructure, but Pilot light has core components already running (e.g., a small database instance) while Cold standby has nothing provisioned, leading to significantly longer recovery times.

How to eliminate wrong answers

Option A is wrong because Active-active runs full production in both Regions at all times, which incurs high costs and does not match the requirement to keep costs low. Option C is wrong because Cold standby has no infrastructure running in the secondary Region until an outage occurs, which would typically require more than a couple of hours to provision and configure resources (e.g., launching EC2 instances, restoring databases) and thus fails the recovery time objective. Option D is wrong because Backups-only relies on manually restoring snapshots (e.g., EBS snapshots, RDS snapshots) during an outage, which is slow and error-prone, often exceeding the couple-of-hours recovery window due to manual intervention and data transfer times.

Full explanation →

910

MCQmedium

A security analyst needs to let an external vendor (AWS account 555566667777) read data from a set of internal resources in your AWS account. You created an IAM role called VendorReadRole with a policy that allows the required API calls. However, when the vendor tries to access, CloudTrail shows the call fails at AssumeRole with: "Not authorized to perform: sts:AssumeRole". What is the most appropriate fix?

A.Add an allow statement for the vendor in the role’s trust policy to permit sts:AssumeRole from the vendor account (and include any required ExternalId condition).

B.Attach the same allow policy to the vendor account’s existing IAM user so the user can call sts:AssumeRole directly into your role.

C.Replace the AssumeRole call with GetCallerIdentity so the vendor can infer permissions without assuming the role.

D.Enable MFA on the vendor’s IAM user and require MFA for your role using condition keys in the permissions policy.

AnswerA

AssumeRole is blocked unless the role trust policy allows the vendor principal. The role’s permissions policy alone cannot permit assumption.

Why this answer

The error 'Not authorized to perform: sts:AssumeRole' indicates that the role's trust policy does not grant the external AWS account (555566667777) permission to assume the role. The trust policy must include an Allow statement with the sts:AssumeRole action, specifying the external account as the principal, and optionally an ExternalId condition to prevent the confused deputy problem. This is the required configuration for cross-account IAM role access.

Exam trap

The trap here is that candidates often confuse the role's permissions policy (which defines what the role can do after being assumed) with the trust policy (which defines who can assume the role), and mistakenly think attaching permissions to the external user or modifying the permissions policy will fix the AssumeRole authorization failure.

How to eliminate wrong answers

Option B is wrong because attaching the allow policy to the vendor account's IAM user does not grant the user permission to assume the role; the trust policy on the role must explicitly allow the external account (or its users/roles) to call sts:AssumeRole. Option C is wrong because GetCallerIdentity returns information about the caller's identity and does not grant or infer permissions to access resources in another account; it cannot replace the need for role assumption. Option D is wrong because enabling MFA on the vendor's IAM user and requiring MFA in the role's permissions policy does not address the missing trust policy authorization; the trust policy must first allow the sts:AssumeRole call, and MFA conditions are optional enhancements, not a fix for a missing trust relationship.

Full explanation →

911

MCQmedium

Your CI system assumes an IAM role RoleForDeploy using STS AssumeRole and includes a session tag called Project=blue. The role’s permissions policy uses an ABAC condition like aws:PrincipalTag/Project to allow access only to resources tagged with the same project. AssumeRole succeeds, but deployments fail with AccessDenied. CloudTrail shows the role was assumed, yet the effective session does not contain the Project tag. Which change most directly fixes this issue?

A.Add permissions for sts:TagSession to the IAM role so the CI pipeline is allowed to pass the Project session tag during AssumeRole.

B.Remove the ABAC condition using aws:PrincipalTag/Project so the policy ignores session tags.

C.Move the aws:PrincipalTag/Project condition into the trust policy so it applies during the AssumeRole call.

D.Add kms:Decrypt permission to the CI role because missing tags are typically caused by KMS authorization failures.

AnswerA

Session tags are not automatically granted; the role needs sts:TagSession permission to allow passing tags into the session.

Why this answer

Option A is correct because when using AWS Security Token Service (STS) AssumeRole with session tags, the calling entity must have explicit permission to pass those tags via the `sts:TagSession` action. Without this permission, the AssumeRole call succeeds but the session tags are silently dropped, causing the ABAC condition `aws:PrincipalTag/Project` to evaluate to false and deny access to resources. Adding `sts:TagSession` to the role's permissions policy allows the CI pipeline to include the `Project=blue` tag in the assumed role session.

Exam trap

The trap here is that candidates assume `AssumeRole` with session tags always succeeds in applying the tags, but AWS silently drops tags if the caller lacks `sts:TagSession` permission, leading to a confusing AccessDenied on downstream actions.

How to eliminate wrong answers

Option B is wrong because removing the ABAC condition would bypass the intended fine-grained access control, but it does not address the root cause—the session tags are missing due to lack of `sts:TagSession` permission. Option C is wrong because the `aws:PrincipalTag/Project` condition belongs in the resource-based policy or identity-based policy to enforce ABAC on downstream actions; moving it to the trust policy would only affect who can assume the role, not the presence of session tags in the assumed session. Option D is wrong because KMS authorization failures are unrelated to missing session tags; the issue is purely about STS tag propagation, not encryption key permissions.

Full explanation →

912

MCQhard

An EC2 instance in a private subnet must access an S3 bucket that contains regulated exports for a B2B file exchange site. The security team requires access to be allowed only when traffic comes through a specific VPC endpoint. What should the architect add to the bucket policy?

A.A condition that matches aws:sourceVpce to the endpoint ID

B.A deny statement for all IAM users except the EC2 role

C.A condition that matches aws:RequestedRegion to the bucket Region

D.A security group rule that allows HTTPS to S3

AnswerA

The aws:sourceVpce condition restricts S3 access to requests that arrive through the specified VPC endpoint.

Why this answer

The correct answer is A because the bucket policy must include a condition that matches `aws:sourceVpce` to the specific VPC endpoint ID. This ensures that only traffic originating from that VPC endpoint (e.g., `vpce-12345678`) is allowed to access the S3 bucket, meeting the security team's requirement. Without this condition, any traffic from the private subnet that routes through the endpoint would be allowed, but the policy explicitly restricts access to only that endpoint.

Exam trap

The trap here is that candidates often confuse `aws:sourceVpce` with `aws:SourceIp` or think a security group rule can be applied to S3, when in fact S3 bucket policies use VPC endpoint conditions to enforce network-level restrictions.

How to eliminate wrong answers

Option B is wrong because denying all IAM users except the EC2 role does not restrict traffic to the VPC endpoint; it only controls which IAM identities can access the bucket, not the network path. Option C is wrong because `aws:RequestedRegion` checks the AWS Region of the request, not the VPC endpoint, and does not enforce that traffic comes through a specific endpoint. Option D is wrong because security group rules apply to EC2 instances, not to S3 bucket policies, and S3 does not support security group rules in bucket policies; S3 uses bucket policies and VPC endpoint policies for access control.

Full explanation →

913

Multi-Selecthard

A regional web application for a content publishing system must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required?

Select 2 answers

A.AWS Organizations service control policies

B.Route 53 failover routing with health checks

C.S3 Transfer Acceleration

D.A deployed standby application stack in the secondary Region

AnswersB, D

Route 53 can monitor endpoint health and return the standby endpoint when the primary is unhealthy.

Why this answer

Route 53 failover routing with health checks (B) is required because it continuously monitors the health of the primary endpoint and automatically reroutes traffic to a secondary Region when the primary becomes unhealthy. This is achieved by configuring a primary and secondary failover record set in Route 53, where the health check is associated with the primary record. When the health check fails, Route 53 returns the secondary record's IP address, enabling automatic failover at the DNS level.

Exam trap

The trap here is that candidates often think Route 53 alone is sufficient for failover, but they forget that a fully deployed standby application stack in the secondary Region is also required to actually serve traffic after the DNS switch.

Full explanation →

914

MCQeasy

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The design must avoid adding custom operational scripts.

A.CloudFront caching with appropriate TTLs

B.AWS Backup Vault Lock

C.IAM Access Analyzer

D.S3 Select

AnswerA

CloudFront can serve cached content from edge locations when the origin is temporarily unavailable.

Why this answer

CloudFront caches responses at edge locations based on configured TTLs (Cache-Control or Expires headers). If the S3 origin becomes temporarily unavailable, CloudFront can still serve stale or cached content to users, maintaining availability without any custom scripts or failover logic. This directly addresses the requirement to serve cached pages during short S3 outages.

Exam trap

The trap here is that candidates might think AWS Backup Vault Lock (Option B) provides some form of data availability or failover, but it is purely a compliance and retention tool with no impact on serving cached web content during origin outages.

How to eliminate wrong answers

Option B is wrong because AWS Backup Vault Lock is a data protection feature for backup vaults, enforcing retention policies (WORM) to prevent deletion; it does not provide caching or origin failover for web content. Option C is wrong because IAM Access Analyzer helps identify unintended resource access policies, not caching or availability during origin outages. Option D is wrong because S3 Select is a query-in-place feature to retrieve subsets of object data using SQL expressions; it has no role in caching or serving cached pages during origin failures.

Full explanation →

915

MCQhard

A DynamoDB table for a travel booking site has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change?

A.Create a global secondary index with the same date key

B.Move the table to S3 Glacier Instant Retrieval

C.Reduce the table's write capacity

D.Use a higher-cardinality partition key that distributes writes across partitions

AnswerD

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Option D is correct because using a low-cardinality partition key like the current date concentrates all writes into a single partition, causing throttling when write demand exceeds that partition's 1,000 WCU limit. A higher-cardinality key (e.g., combining date with user ID or session ID) distributes writes evenly across multiple partitions, allowing the table to use its full provisioned write capacity without throttling.

Exam trap

The trap here is that candidates confuse throttling with insufficient total capacity and choose to reduce write capacity (Option C), when the real issue is a hot partition caused by a low-cardinality partition key.

How to eliminate wrong answers

Option A is wrong because a global secondary index (GSI) inherits the same partition key from the base table by default; creating a GSI with the same date key does not redistribute writes and would itself be throttled. Option B is wrong because S3 Glacier Instant Retrieval is an object storage class for archival data with retrieval latency in milliseconds, not a replacement for DynamoDB's low-latency read/write operations required by a travel booking site. Option C is wrong because reducing write capacity would lower the throttling threshold, making the problem worse; the issue is uneven distribution of writes, not insufficient total capacity.

Full explanation →

916

Multi-Selectmedium

A serverless checkout API has predictable traffic spikes every weekday at 09:00 UTC and low traffic the rest of the day. The team wants to reduce cost while keeping response times fast during the recurring spike. Which two actions should they take? Select two.

Select 2 answers

A.Use provisioned concurrency for the Lambda function during the expected spike window.

B.Use Application Auto Scaling or scheduled actions to reduce provisioned concurrency after the spike ends.

C.Replace the API with a single always-on EC2 instance.

D.Keep provisioned concurrency permanently high all day and all week.

E.Disable API Gateway and use direct public internet access to Lambda.

AnswersA, B

Provisioned concurrency keeps Lambda execution environments initialized before traffic arrives, which reduces cold starts during the predictable busy period. Because the spike is scheduled, the team can pay for the performance benefit only when it is actually needed.

Why this answer

Provisioned concurrency initializes a specified number of Lambda execution environments in advance, ensuring no cold starts during traffic spikes. By scheduling provisioned concurrency to activate only during the 09:00 UTC window, the team keeps response times fast without paying for idle capacity the rest of the day.

Exam trap

The trap here is that candidates might think provisioned concurrency must be always-on to be effective, missing the cost-saving strategy of scheduling it only during predictable spikes.

Full explanation →

917

MCQmedium

A marketing team uses CloudFront with an S3 origin to serve a single-page web app. After a release, CloudFront cache hit ratio dropped sharply. The app requests the same static JS and CSS assets, but each request includes a unique tracking query parameter (for example, ?utm_source=campaign123, campaign456, etc.). You want CloudFront to cache those assets efficiently even when the tracking query parameter changes. What should you do?

A.Create a cache policy that forwards the query string to the origin and varies the cache key by all query parameters.

B.Update the CloudFront cache policy so the cache key ignores the tracking query parameter, while still using the path and other essential headers.

C.Enable S3 origin access control and keep the existing default cache policy, because origin access changes caching behavior automatically.

D.Set the CloudFront Time-to-Live (TTL) to 0 seconds to ensure the origin always serves the latest asset content.

AnswerB

CloudFront caching depends on the cache key (for example, path, selected headers, and selected query strings). If you configure a cache policy to exclude the tracking query parameter (or ignore specific query string parameters), CloudFront treats requests for the same asset as the same cached object. This prevents cache fragmentation caused by unique tracking values. Origin load decreases and cache hit ratio increases, while correctness is maintained because the excluded parameter does not affect the content of the static JS/CSS objects.

Why this answer

Option B is correct because CloudFront's cache key determines whether a request is served from the cache or forwarded to the origin. By configuring a cache policy that ignores the tracking query parameter (e.g., utm_source), CloudFront treats all requests for the same asset path as identical, regardless of the unique tracking parameter. This allows the same JS and CSS files to be cached once and served for all campaign variations, restoring the cache hit ratio.

Exam trap

The trap here is that candidates may think forwarding all query parameters (Option A) is necessary for dynamic content, but for static assets with irrelevant tracking parameters, ignoring them is the correct approach to maximize cache hits.

How to eliminate wrong answers

Option A is wrong because forwarding the query string and varying the cache key by all query parameters would create a separate cache entry for each unique utm_source value, which is exactly the problem causing the cache hit ratio to drop. Option C is wrong because enabling S3 origin access control (OAC) only secures the origin and does not affect CloudFront's caching behavior or cache key configuration. Option D is wrong because setting TTL to 0 seconds forces CloudFront to revalidate every request with the origin, eliminating caching entirely and worsening performance, not improving cache efficiency.

Full explanation →

918

Multi-Selectmedium

A company is designing a secure multi-tier web application on AWS. The application uses an Application Load Balancer (ALB) to distribute traffic to EC2 instances in private subnets, and the EC2 instances need to access an Amazon RDS database in a separate private subnet. The company must ensure that all traffic is encrypted in transit and that only necessary access is allowed. Which of the following steps should the company take to meet these requirements? (Choose four.)

Select 4 answers

.Configure the ALB with an HTTPS listener and install a TLS certificate from AWS Certificate Manager (ACM).

.Place the EC2 instances in a public subnet to allow direct access to the internet for security updates.

.Configure the security group for the RDS database to allow inbound traffic on port 3306 (or the appropriate database port) only from the security group attached to the EC2 instances.

.Enable encryption in transit for the RDS database by using SSL/TLS for connections between the EC2 instances and the database.

.Assign a public IP address to the RDS database instance to simplify connectivity from the EC2 instances.

.Use an AWS Network Load Balancer (NLB) with a TLS listener to terminate and re-encrypt traffic from the ALB to the EC2 instances.

Why this answer

Configuring the ALB with an HTTPS listener and a TLS certificate from ACM ensures that traffic between clients and the ALB is encrypted in transit. This is a fundamental requirement for securing a multi-tier web application, as it protects data from eavesdropping and tampering during transmission over the internet.

Exam trap

The trap here is that candidates often think an NLB is required for TLS termination or re-encryption between ALB and EC2, but the ALB itself can handle HTTPS termination, and the question's requirement for encryption in transit is already met by the ALB's HTTPS listener and the RDS SSL/TLS connection.

Full explanation →

919

MCQmedium

A team wants to remove a bastion host used for administrative access to EC2 instances in private subnets. The instances should be reachable only for occasional troubleshooting by engineers who authenticate with AWS SSO. What is the best secure alternative within AWS, assuming the instances already have an instance profile attached?

A.Use AWS Systems Manager Session Manager, enabling the required SSM permissions in the instance profile and restricting access to engineers via IAM.

B.Keep the bastion host but move it into a private subnet; engineers can connect by using a corporate VPN into the VPC.

C.Attach a public IP to each private instance so engineers can SSH directly and use security groups to restrict access.

D.Create a security group rule that allows engineers’ source IP addresses to reach instances over RDP on port 3389.

AnswerA

Session Manager avoids inbound SSH from the internet by initiating interactive sessions through Systems Manager. The instance profile must allow SSM actions like StartSession, and engineers’ IAM permissions restrict who can connect. This is a commonly recommended bastion-free alternative that improves security and reduces exposed network paths.

Why this answer

AWS Systems Manager Session Manager provides secure, auditable, agent-based access to EC2 instances without requiring a bastion host, open inbound ports, or SSH keys. By enabling the required SSM permissions (e.g., AmazonSSMManagedInstanceCore) in the instance profile and using IAM policies to restrict access to authenticated engineers via AWS SSO, you achieve a fully managed, secure, and compliant solution. This eliminates the need for a bastion host while maintaining the ability to troubleshoot instances in private subnets.

Exam trap

The trap here is that candidates often think a bastion host is required for private subnet access, or they mistakenly believe that opening inbound ports (even with IP restrictions) is an acceptable alternative, failing to recognize that AWS Systems Manager Session Manager provides a fully managed, agent-based, port-free solution that aligns with the principle of least privilege and removes the bastion host entirely.

How to eliminate wrong answers

Option B is wrong because moving the bastion host to a private subnet and requiring a corporate VPN still maintains an unnecessary bastion host, which adds complexity, cost, and a potential attack surface; it does not remove the bastion host as required. Option C is wrong because attaching a public IP to each private instance defeats the purpose of a private subnet, exposing instances directly to the internet and violating the principle of least privilege, even with security group restrictions. Option D is wrong because allowing RDP (port 3389) from engineers' source IPs still requires opening inbound ports and managing IP whitelists, which is less secure and more operationally burdensome than using Session Manager's agent-based, port-free access.

Full explanation →

920

MCQmedium

An administrator needs the ability to read and update infrastructure for a specific AWS account, but only when using MFA. The security team wants to eliminate long-lived administrator access keys and ensure that even if someone obtains temporary session credentials, actions are only allowed with MFA present. Which IAM design best meets these requirements?

A.Create an IAM user for administrators with AdministratorAccess and require MFA only at the IAM user login.

B.Create an IAM role for administration and use a permissions policy that allows only the required read/write actions. Add a condition to deny all allowed actions unless aws:MultiFactorAuthPresent is true.

C.Attach policies to an IAM user that allow read/write actions and enable MFA in the account, but do not use condition keys in IAM policies.

D.Use a role with the correct actions but enforce MFA only in the application by prompting users for an OTP before every API call.

AnswerB

A role-based approach removes long-lived keys and supports temporary credentials. Using a permissions-policy condition to require MFA presence enforces that the session must have MFA to perform actions, aligning with the “actions only allowed with MFA present” requirement.

Why this answer

Option B is correct because it uses an IAM role with a permissions policy that includes a condition key `aws:MultiFactorAuthPresent` set to `true`. This ensures that any API call made using temporary credentials from the role requires MFA to be present, eliminating long-lived access keys and enforcing MFA for every action. The role-based approach also aligns with the principle of least privilege by scoping actions to only required read/write operations.

Exam trap

The trap here is that candidates assume requiring MFA at login (Option A) or enabling MFA in the account (Option C) is sufficient, but they overlook the need for a condition key in the IAM policy to enforce MFA for every API call, not just the initial authentication.

How to eliminate wrong answers

Option A is wrong because it creates an IAM user with AdministratorAccess and only requires MFA at login, which still allows the user to generate long-lived access keys and perform actions without MFA after the initial session. Option C is wrong because it attaches policies to an IAM user and enables MFA in the account but does not use condition keys, meaning the user can still use access keys without MFA for API calls. Option D is wrong because it enforces MFA only at the application layer via OTP prompts, which does not prevent API calls made directly to AWS using temporary credentials that bypass the application-level check.

Full explanation →

921

MCQmedium

Based on the exhibit, which Route 53 configuration should be used so traffic automatically returns to the secondary Region only when the primary Region becomes unhealthy?

A.Use latency-based routing with both ALB records enabled.

B.Use failover routing with a primary alias record, a secondary alias record, and a Route 53 health check on the primary target.

C.Use geolocation routing so users are always sent to the closest Region.

D.Use a CNAME record that points to both ALBs so DNS can round-robin between Regions.

AnswerB

Failover routing is designed for this pattern: Route 53 returns the primary alias while the primary endpoint is healthy, and switches to the secondary alias when the primary health check fails. Alias records integrate cleanly with ALB targets, and the health check provides the signal that drives the failover decision.

Why this answer

Failover routing in Amazon Route 53 is designed for active-passive configurations. By creating a primary alias record pointing to the ALB in the primary Region and a secondary alias record pointing to the ALB in the secondary Region, and attaching a Route 53 health check to the primary target, traffic automatically fails over to the secondary Region only when the health check detects the primary as unhealthy. This meets the requirement of returning traffic to the secondary Region only upon primary failure.

Exam trap

The trap here is that candidates often confuse failover routing with latency-based or geolocation routing, assuming that 'closest' or 'fastest' automatically implies health awareness, but Route 53 health checks must be explicitly associated with failover records to trigger automatic traffic redirection.

How to eliminate wrong answers

Option A is wrong because latency-based routing directs users based on lowest latency, not health status, so it would not automatically fail over only when the primary is unhealthy; traffic could still be sent to an unhealthy primary if latency is low. Option C is wrong because geolocation routing sends users based on their geographic location, not the health of the endpoint, so it cannot automatically redirect traffic to the secondary Region when the primary becomes unhealthy. Option D is wrong because a CNAME record cannot point to multiple ALBs for round-robin; CNAME records can only point to a single DNS name, and DNS round-robin does not consider health checks, so traffic would still be sent to an unhealthy primary.

Full explanation →

922

MCQeasy

A travel booking site uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured?

A.A VPC endpoint for CloudWatch only

B.Auto Scaling policy based on an appropriate CloudWatch metric

C.S3 Object Lock

D.Disable health checks

AnswerB

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

An Auto Scaling policy based on a CloudWatch metric like CPUUtilization or request latency directly addresses the root cause: rising CPU and latency under peak traffic. By automatically adding EC2 instances when the metric breaches a threshold, the ALB can distribute load across more resources, reducing CPU per instance and improving response times. This is the standard AWS solution for dynamic scaling to maintain performance.

Exam trap

The trap here is that candidates may confuse monitoring (VPC endpoints) or data protection (S3 Object Lock) with scaling solutions, or think disabling health checks reduces overhead, when the correct approach is to scale horizontally based on load metrics.

How to eliminate wrong answers

Option A is wrong because a VPC endpoint for CloudWatch only enables private connectivity to CloudWatch without internet gateway, but does not add compute capacity or reduce CPU load or latency. Option C is wrong because S3 Object Lock prevents object deletion or overwrite for compliance, which is irrelevant to EC2 CPU and latency issues. Option D is wrong because disabling health checks would cause the ALB to route traffic to unhealthy instances, increasing failures and latency, not solving the performance problem.

Full explanation →

923

MCQmedium

A video processing pipeline runs batch jobs that are safe to interrupt and restart. The jobs checkpoint progress to durable storage every few minutes, and the team can automatically resubmit from the last checkpoint. They want to minimize compute cost while accepting that capacity can be interrupted. Which launch configuration for the processing workers is the best cost-optimized choice?

A.Launch the worker nodes as Spot Instances, and configure the job resubmission logic to restart from checkpoints upon interruption.

B.Launch the worker nodes as On-Demand Instances with no interruption handling so the pipeline never needs resubmission.

C.Launch the worker nodes as Reserved Instances to guarantee capacity and reduce cost, ignoring interruptions.

D.Use Savings Plans and also set the job scheduler to never start new jobs unless previous jobs finish without interruption.

AnswerA

Spot provides significantly lower pricing than On-Demand for EC2 capacity. Because the workload is designed to tolerate interruption (checkpointing + resubmission from the last checkpoint), the team can safely accept Spot interruptions. Resubmission from durable checkpoints preserves correctness while still capturing the cost advantage of Spot.

Why this answer

Spot Instances offer significant cost savings (up to 90% compared to On-Demand) and are ideal for fault-tolerant, interruptible workloads. Since the pipeline checkpoints progress to durable storage and can automatically resume from the last checkpoint, using Spot Instances minimizes compute cost while accepting interruptions.

Exam trap

The trap here is that candidates may choose On-Demand or Reserved Instances because they assume interruptions are unacceptable, but the question explicitly states the workload is safe to interrupt and restart, making Spot Instances the correct cost-optimized choice.

How to eliminate wrong answers

Option B is wrong because On-Demand Instances are more expensive and provide no cost optimization benefit for a workload that can tolerate interruptions. Option C is wrong because Reserved Instances require a 1- or 3-year commitment and are not designed for workloads that can be interrupted; they also do not inherently handle interruption recovery. Option D is wrong because Savings Plans still incur costs for unused capacity if jobs are delayed, and the suggestion to never start new jobs unless previous jobs finish without interruption contradicts the goal of minimizing cost by accepting interruptions.

Full explanation →

924

MCQmedium

A web application uses pooled JDBC connections to an Amazon Aurora cluster using the writer endpoint. During an Aurora planned failover, monitoring shows a short spike in failed requests. The Aurora cluster writer endpoint remains the same, but many existing pooled connections briefly fail. The application retries aggressively and overloads the new writer during the transition. Which design change will most improve application resilience during Aurora failovers without requiring application redeployment?

A.Add an RDS Proxy between the application and Aurora to manage database connections across failovers.

B.Change the Aurora cluster to Single-AZ to reduce failover events.

C.Increase the application thread count so more requests can be served while connections reconnect.

D.Pin all database traffic to a specific instance hostname instead of the writer cluster endpoint.

AnswerA

RDS Proxy terminates and manages client connections, while maintaining separate managed connections to the database. During a writer failover, the proxy can re-establish backend connections to the new writer, reducing failed pooled connections seen by the application and lowering retry pressure.

Why this answer

RDS Proxy sits between the application and the Aurora cluster, maintaining a warm connection pool to the database. During a failover, RDS Proxy transparently reconnects to the new writer instance without dropping the application's existing connections, eliminating the spike in failed requests and preventing the aggressive retry storm that overloads the new writer. This requires no code changes or redeployment, as the application simply connects to the proxy endpoint instead of the cluster endpoint.

Exam trap

The trap here is that candidates assume the writer endpoint remains the same so connections should survive, but they miss that pooled JDBC connections hold stale server-side state (like TCP sockets and session context) that is invalidated during failover, and only RDS Proxy can transparently preserve those connections without application changes.

How to eliminate wrong answers

Option B is wrong because switching to Single-AZ eliminates the failover mechanism entirely, making the application less resilient and increasing downtime during any instance failure, which is the opposite of improving resilience. Option C is wrong because increasing the thread count only amplifies the retry storm, overwhelming the new writer even more and failing to address the root cause of connection drops during failover. Option D is wrong because pinning traffic to a specific instance hostname bypasses the writer endpoint's automatic failover routing, causing all traffic to fail if that instance becomes unavailable, and it requires application redeployment to change the hostname.

Full explanation →

925

MCQhard

A batch analytics job currently uses two NAT gateways in each of three Availability Zones, but only one private subnet per AZ needs outbound internet access. What should the architect review first?

A.Replacing every NAT gateway with an internet gateway attached to private subnets

B.Whether one NAT gateway per AZ is sufficient for the required private subnets

C.Disabling route tables

D.Moving all workloads to public subnets

AnswerB

NAT gateways are normally deployed per AZ for resilience; duplicate NAT gateways in the same AZ may be unnecessary.

Why this answer

Option B is correct because the question asks what the architect should review first. Using two NAT gateways per Availability Zone (AZ) when only one private subnet per AZ needs outbound internet access is likely over-provisioned and costly. The architect should first verify if a single NAT gateway per AZ can handle the traffic load, as NAT gateways are highly available within an AZ and can support up to 45 Gbps of bandwidth.

This review directly addresses cost optimization without sacrificing functionality.

Exam trap

The trap here is that candidates may assume more NAT gateways always improve reliability, but the question emphasizes cost optimization, so the first review should be whether the existing number of gateways is necessary rather than immediately adding or removing resources.

How to eliminate wrong answers

Option A is wrong because replacing NAT gateways with an internet gateway attached to private subnets is technically invalid; internet gateways can only be attached to VPCs and provide outbound access only to resources with public IPs in public subnets, not private subnets. Option C is wrong because disabling route tables would break all network connectivity, not just outbound internet access, and is not a valid cost-optimization review step. Option D is wrong because moving all workloads to public subnets would expose them directly to the internet, violating security best practices and potentially incurring higher data transfer costs, and does not address the cost of NAT gateways.

Full explanation →

926

Multi-Selecthard

A fleet of test servers is rebuilt every week from AMIs. EBS volumes are often left behind after termination, and the team creates daily snapshots of every volume even when nothing changes. Which three actions most reduce storage cost while preserving recovery options? Select three.

Select 3 answers

A.Use gp3 for new EBS volumes instead of gp2 when similar performance is enough.

B.Automate snapshot creation and deletion with Amazon Data Lifecycle Manager.

C.Move old snapshots to the EBS Snapshot Archive tier once they are rarely restored.

D.Keep unattached volumes around for troubleshooting after instance termination.

E.Raise provisioned IOPS on every volume so snapshot restore time feels faster.

AnswersA, B, C

Correct. gp3 decouples baseline performance from volume size, which commonly lowers cost for workloads that do not need gp2's hidden throughput coupling. It is a practical right-sizing move for many general-purpose volumes.

Why this answer

Option A is correct because gp3 volumes offer a baseline performance that is often sufficient for test servers, and they are typically more cost-effective than gp2 volumes for the same amount of storage. By using gp3, you avoid paying for the additional IOPS that gp2 includes by default, which can reduce costs when the workload does not require high performance. This directly addresses the goal of reducing storage costs while maintaining adequate performance for recovery purposes.

Exam trap

The trap here is that candidates may think keeping unattached volumes is useful for troubleshooting, but snapshots already provide the same data recovery capability without ongoing storage costs, and raising IOPS is mistakenly believed to speed up snapshot restore, when in fact it does not affect restore performance.

Full explanation →

927

MCQmedium

A web application runs on an Auto Scaling group (ASG) behind an Application Load Balancer (ALB). After a new release, instances begin failing ALB health checks with errors like 502 while the application is still starting up. CloudWatch shows that the ASG replaces the instances before they finish initializing, so traffic never reaches healthy targets. Which change most directly prevents premature replacement during startup so traffic can resume as soon as the instances are actually healthy?

A.Reduce the ALB health check timeout to 1 second so failures are detected faster.

B.Increase the Auto Scaling group health check grace period to cover application startup and initialization time.

C.Enable connection draining on the ALB target group but set deregistration delay to 0 seconds.

D.Switch the ALB target group health checks from HTTP to TCP so the application does not need to return HTTP 200.

AnswerB

The ASG health check grace period tells Auto Scaling to ignore failing health checks for a period after instance launch. This prevents newly launched instances from being replaced before the application has finished booting and can pass ALB health checks.

Why this answer

B is correct because the Auto Scaling group health check grace period allows instances a specified amount of time to initialize before the ASG starts checking their health status. By increasing this grace period to cover the application startup time, the ASG will not prematurely replace instances that are still initializing, allowing them to pass the ALB health checks and begin receiving traffic once they are actually healthy.

Exam trap

The trap here is that candidates often confuse the ALB health check timeout or interval with the ASG health check grace period, thinking that adjusting ALB settings will fix the premature replacement issue, when in fact the ASG grace period is the direct control for delaying health check evaluation during startup.

How to eliminate wrong answers

Option A is wrong because reducing the ALB health check timeout to 1 second would cause health checks to fail even faster, exacerbating the problem of premature instance replacement. Option C is wrong because connection draining controls how existing connections are closed during deregistration, not how quickly instances are replaced during startup; setting deregistration delay to 0 seconds would abruptly terminate active connections, causing user disruption. Option D is wrong because switching to TCP health checks would bypass the application layer, allowing the ALB to consider an instance healthy even if the application is not fully initialized, which could lead to serving 502 errors to users.

Full explanation →

928

MCQeasy

A company runs a stateless web API on Amazon EC2 behind an Application Load Balancer. The team notices that during business hours, the ALB starts queueing requests and the average request latency rises. They want to scale out quickly and reliably based on demand, not CPU alone. Which Auto Scaling approach best matches this requirement?

A.Use a fixed-size Auto Scaling group and increase capacity manually once per hour.

B.Use target tracking scaling based on ALB request count per target.

C.Scale based only on EC2 instance memory utilization, regardless of load.

D.Use step scaling with a single threshold on average network-in bytes.

AnswerB

Target tracking can automatically adjust capacity using ALB load metrics and respond faster.

Why this answer

Target tracking scaling based on ALB request count per target directly aligns with the requirement to scale out based on demand (request queuing and latency) rather than CPU alone. This policy automatically adjusts the Auto Scaling group size to maintain a target value for the average number of requests per instance, which is a more reliable indicator of load for a stateless web API than CPU utilization.

Exam trap

The trap here is that candidates often assume CPU utilization is the best metric for all scaling scenarios, but for a stateless web API behind an ALB, request count per target is a more direct and reliable indicator of demand and latency issues.

How to eliminate wrong answers

Option A is wrong because manual scaling once per hour cannot react quickly to sudden spikes in demand during business hours, leading to request queuing and increased latency. Option C is wrong because scaling based solely on memory utilization ignores the actual load (request count) and may not trigger scaling when the API is CPU-bound or I/O-bound, failing to address the queuing issue. Option D is wrong because step scaling with a single threshold on average network-in bytes is not a direct measure of application demand; network bytes can be influenced by packet size and protocol overhead, and a single threshold lacks the granularity to handle variable traffic patterns, potentially causing either under- or over-scaling.

Full explanation →

929

MCQmedium

A SaaS company runs a production API on an EC2 Auto Scaling group with steady demand 24/7. The team uses multiple instance types over time (they switch types during tuning) but the overall compute hours are stable. They want a cost reduction without committing to a specific instance type or size. Which AWS pricing option best meets the requirement?

A.Buy EC2 Spot Instances for the Auto Scaling group to maximize savings

B.Purchase a Compute Savings Plan for the region and commit to a dollar-per-hour amount

C.Purchase Reserved Instances that are limited to a single specific instance type in the Auto Scaling group

D.Use on-demand only, and rely on Auto Scaling to reduce cost during low utilization

AnswerB

A Compute Savings Plan reduces cost for steady compute usage and supports flexibility across instance families and sizes.

Why this answer

B is correct because a Compute Savings Plan provides the flexibility to change instance types, sizes, and even compute services (e.g., EC2, Fargate, Lambda) within a region while still receiving discounted rates (up to 66% vs. on-demand). This matches the requirement of reducing costs without committing to a specific instance type or size, as the plan is based on a dollar-per-hour commitment rather than instance family or tenancy.

Exam trap

The trap here is that candidates often confuse Compute Savings Plans with Reserved Instances, assuming that any savings plan requires a specific instance type, but Compute Savings Plans offer full flexibility across instance families and sizes within a region.

How to eliminate wrong answers

Option A is wrong because Spot Instances can be interrupted with a 2-minute warning, making them unsuitable for a production API with steady demand 24/7 where availability and reliability are critical. Option C is wrong because Reserved Instances are tied to a specific instance type (e.g., m5.large) and tenancy, which contradicts the requirement to avoid committing to a specific instance type or size. Option D is wrong because relying solely on on-demand instances with Auto Scaling does not reduce cost; Auto Scaling only adjusts capacity based on demand, but on-demand pricing is the highest, so no cost savings are achieved.

Full explanation →

930

MCQmedium

You deploy a Web ACL with an AWS WAF rate-based rule intended to limit abusive traffic to your API. After the deployment, attackers still reach the backend service. ALB access logs show requests arrive at the ALB, but WAF logs indicate the Web ACL is not evaluating those requests. Which change most likely fixes the issue?

A.Associate the Web ACL with the Application Load Balancer resource ARN so WAF evaluates requests sent to that ALB.

B.Add a security group rule that drops inbound traffic from the attacker IP range at the instances' ENIs.

C.Create a target group stickiness policy so WAF can count requests consistently per client IP.

D.Enable AWS Shield Advanced but keep the Web ACL unattached because Shield automatically applies rate limiting.

AnswerA

For an ALB, the Web ACL must be associated with the load balancer resource itself. If it is not attached to the ALB, WAF will not inspect those requests.

Why this answer

The Web ACL must be explicitly associated with the ALB resource ARN for AWS WAF to evaluate incoming requests. Without this association, WAF does not inspect traffic, allowing attackers to bypass the rate-based rule and reach the backend service. Associating the Web ACL with the ALB ensures that all requests to the ALB are evaluated by the WAF rules before being forwarded to the target group.

Exam trap

The trap here is that candidates assume deploying a Web ACL automatically applies it to all associated resources, but in AWS WAF, you must explicitly associate the Web ACL with each resource (ALB, API Gateway, CloudFront) using the resource ARN.

How to eliminate wrong answers

Option B is wrong because security group rules operate at the network layer (Layer 3/4) and cannot inspect application-layer traffic or implement rate-based logic; they only filter by IP/port/protocol, which would block all traffic from the attacker IPs rather than rate-limit abusive requests. Option C is wrong because target group stickiness (sticky sessions) is a load-balancing feature that routes requests from the same client to the same target, but it does not integrate with AWS WAF or enable request counting for rate-based rules. Option D is wrong because AWS Shield Advanced provides DDoS protection at the network/transport layer and does not automatically apply application-layer rate limiting; the Web ACL must still be associated with the ALB for WAF rules to take effect.

Full explanation →

931

MCQmedium

Developers for a e-learning platform need temporary elevated access to production resources for troubleshooting. The security team wants approvals, expiry, and audit logging. Which approach is best? The design must avoid adding custom operational scripts.

A.Disable CloudTrail during troubleshooting

B.Use IAM Identity Center permission sets with time-bound access processes and CloudTrail auditing

C.Attach AdministratorAccess permanently to every developer role

D.Create shared administrator access keys for the team

AnswerB

Federated access with permission sets and audited temporary assignments reduces standing privilege.

Why this answer

IAM Identity Center permission sets allow administrators to define time-bound access policies that grant temporary elevated permissions to specific users or roles. When combined with AWS CloudTrail, every API call made during the troubleshooting session is logged, providing full auditability. This approach meets the security team's requirements for approvals (via the permission set assignment process), expiry (via session duration settings), and audit logging without requiring custom scripts.

Exam trap

The trap here is that candidates may think IAM roles or temporary credentials require custom scripts to manage, but IAM Identity Center provides a fully managed, script-free solution for time-bound access with built-in audit logging via CloudTrail.

How to eliminate wrong answers

Option A is wrong because disabling CloudTrail during troubleshooting would remove all audit logging for that period, violating the security team's requirement for audit logging and making it impossible to track actions taken. Option C is wrong because permanently attaching AdministratorAccess to every developer role violates the principle of least privilege, grants excessive permissions at all times, and does not provide time-bound access or expiry. Option D is wrong because creating shared administrator access keys eliminates individual accountability, bypasses approval workflows, and prevents proper audit logging since actions cannot be attributed to a specific developer.

Full explanation →

932

MCQmedium

A stateless web API runs on EC2 instances behind an Application Load Balancer (ALB). The Auto Scaling group (ASG) currently uses subnets from only one Availability Zone, even though the ALB spans two Availability Zones. During maintenance of that single AZ, the ALB remains up but clients see timeouts because there are no healthy targets. Which change most directly improves resilience against an AZ failure?

A.Keep the ASG in one subnet/AZ, but enable ALB stickiness to reduce session interruption.

B.Update the ASG to launch instances across subnets in at least two Availability Zones and ensure ALB health checks target an application-ready path.

C.Add a NAT gateway in the public subnets so instances can reach the internet during maintenance events.

D.Create a second ALB in the same Availability Zone and route traffic using DNS failover.

AnswerB

Spreading instances across multiple AZs ensures the ALB can route to healthy targets even when one AZ fails.

Why this answer

Option B is correct because it directly addresses the single point of failure: the ASG only launches instances in one AZ, so when that AZ fails, the ALB has no healthy targets to route traffic to, causing timeouts. By configuring the ASG to span at least two AZs, the ALB can distribute traffic to healthy instances in the remaining AZ during maintenance, ensuring high availability. The ALB health check must target an application-ready path (e.g., /health) to accurately detect instance health and avoid routing requests to impaired instances.

Exam trap

The trap here is that candidates may think ALB stickiness or DNS failover can compensate for a single-AZ deployment, but AWS explicitly requires multi-AZ architecture for resilience, and the ALB's health check must be application-aware to avoid routing to impaired instances.

How to eliminate wrong answers

Option A is wrong because enabling ALB stickiness (session affinity) does not solve the root cause; it only binds a client session to a specific target, but if all targets in the single AZ are unhealthy, stickiness cannot route traffic to a healthy instance and timeouts will still occur. Option C is wrong because adding a NAT gateway in public subnets provides outbound internet access for instances, which is irrelevant to the AZ failure scenario where the issue is the lack of healthy targets in the ALB's target group, not internet connectivity. Option D is wrong because creating a second ALB in the same single AZ does not eliminate the single point of failure; DNS failover would still route to an ALB that has no healthy targets if that AZ fails, and the architecture remains dependent on one AZ.

Full explanation →

933

MCQmedium

A company hosts a financial reporting platform on EC2. Administrators must connect without opening SSH or RDP ports to the internet. What should the architect use?

A.A public Elastic IP address on each instance

B.A bastion host with SSH open to 0.0.0.0/0

C.AWS Systems Manager Session Manager with the required instance role

D.An internet gateway attached to the private subnet

AnswerC

Session Manager provides audited shell access without inbound SSH/RDP exposure.

Why this answer

AWS Systems Manager Session Manager allows secure shell access to EC2 instances without opening inbound ports (SSH 22 or RDP 3389) to the internet. It uses the AWS Systems Manager agent on the instance, combined with an IAM instance role that grants permissions to communicate with the Systems Manager API, establishing a bidirectional tunnel over HTTPS (port 443). This satisfies the requirement of no public-facing SSH or RDP ports while enabling administrative connectivity.

Exam trap

The trap here is that candidates often default to a bastion host (Option B) as the traditional solution, but the question explicitly prohibits opening SSH or RDP ports to the internet, and a bastion host still requires those ports open (even if restricted to a CIDR), which fails the requirement; Session Manager avoids any inbound port exposure entirely.

How to eliminate wrong answers

Option A is wrong because assigning a public Elastic IP address to each instance would expose them directly to the internet, requiring open SSH or RDP ports to connect, which violates the requirement. Option B is wrong because a bastion host with SSH open to 0.0.0.0/0 exposes the bastion itself to the entire internet, creating a single point of attack and still requiring open SSH ports, which does not meet the 'without opening SSH or RDP ports to the internet' constraint. Option D is wrong because an internet gateway attached to a private subnet does not provide administrative connectivity; it enables outbound internet access for instances in public subnets, not inbound management access without open ports.

Full explanation →

934

Multi-Selectmedium

A company is designing a multi-Region disaster recovery (DR) strategy for a stateless web application running on Amazon EC2 instances behind an Application Load Balancer (ALB). The application uses an Amazon RDS for MySQL database as its data store. The architecture must provide rapid failover with the lowest possible Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Which of the following design choices will help achieve these objectives? (Choose four.)

Select 4 answers

.Configure an active-passive failover strategy by deploying the application stack in two AWS Regions and using Amazon Route 53 health checks with a failover routing policy.

.Set up Amazon RDS Multi-AZ deployment to enable automatic failover to a standby replica in a different Availability Zone within the primary Region.

.Use Amazon RDS cross-Region read replicas with automatic failover to promote a read replica to a primary instance in the secondary Region.

.Deploy the application and ALB in an active-active configuration across two AWS Regions using Amazon Route 53 latency-based routing.

.Store static assets and application state in Amazon S3 with cross-Region replication enabled, and serve them via Amazon CloudFront.

.Use an Amazon RDS for MySQL single-AZ deployment in the primary Region and take daily snapshots copied to the secondary Region.

Why this answer

An active-passive failover strategy with Route 53 failover routing policy is correct because it provides rapid failover by directing traffic to the secondary Region only when health checks fail in the primary, minimizing RTO. Cross-Region read replicas with automatic failover are correct because they allow promoting a read replica to a primary in the secondary Region with low RPO (typically seconds) and automated failover, reducing RTO. Active-active configuration with latency-based routing is correct because it distributes traffic across both Regions, enabling immediate failover without DNS propagation delays, achieving very low RTO.

Storing static assets and application state in S3 with cross-Region replication and CloudFront is correct because it ensures data durability and low-latency access, supporting rapid recovery with minimal RPO.

Exam trap

The trap here is that candidates often confuse Multi-AZ (single-Region high availability) with cross-Region DR, or they assume daily snapshots provide adequate RPO for a DR strategy requiring the lowest possible RPO and RTO.

Full explanation →

935

Multi-Selectmedium

A CPU-bound batch rendering service runs on EC2. The application is Linux-based, compatible with ARM64, and the team wants the best throughput per dollar without changing the workload's architecture. Which two instance-family choices should the team consider first? Select two.

Select 2 answers

A.A compute-optimized family, because it is designed for workloads that spend most of their time on CPU.

B.A Graviton-based family, because compatible ARM instances often provide better price performance for many compute workloads.

C.A memory-optimized family, because extra RAM always increases compute throughput.

D.A storage-optimized family, because local storage bandwidth is the main factor for rendering performance.

E.A burstable family, because CPU credits make sustained rendering faster during long runs.

AnswersA, B

Compute-optimized families are the first place to look for sustained CPU-heavy jobs. They allocate more of the instance's resources to processor performance rather than memory or storage.

Why this answer

Option A is correct because compute-optimized families (e.g., C5, C6g) are designed for workloads that spend most of their time on CPU, such as batch rendering. Option B is correct because Graviton-based instances (e.g., C6g, M6g) use ARM64 architecture, which is compatible with the workload and often delivers better price-performance for compute-intensive tasks, maximizing throughput per dollar without architectural changes.

Exam trap

The trap here is that candidates may confuse 'CPU-bound' with 'memory-bound' or 'storage-bound,' leading them to select memory-optimized or storage-optimized families, or they may mistakenly think burstable instances can sustain high CPU performance over long periods.

Full explanation →

936

MCQhard

A media archive needs low-latency full-text search across product descriptions and filtered attributes. Which managed service is most suitable? The architecture review board prefers a managed AWS-native control.

A.AWS Config

B.Amazon OpenSearch Service

C.Amazon EFS

D.Amazon SQS

AnswerB

OpenSearch is designed for search and analytics over indexed text and structured fields.

Why this answer

Amazon OpenSearch Service is the correct choice because it is a managed, AWS-native service that provides low-latency full-text search and supports filtering on structured attributes. It is purpose-built for indexing and searching large volumes of text data, such as product descriptions, with sub-second response times. The architecture review board's preference for a managed AWS-native control is satisfied, as OpenSearch Service handles cluster management, scaling, and backups automatically.

Exam trap

The trap here is that candidates may confuse Amazon OpenSearch Service with a database or storage service, but it is specifically a search and analytics engine optimized for low-latency full-text queries, not for transactional storage or messaging.

How to eliminate wrong answers

Option A is wrong because AWS Config is a service for auditing and evaluating resource configurations against compliance rules, not for full-text search. Option C is wrong because Amazon EFS is a managed NFS file system for shared storage, not a search engine; it cannot perform full-text queries on file contents without additional software. Option D is wrong because Amazon SQS is a fully managed message queue for decoupling application components, not a search or indexing service.

Full explanation →

937

MCQhard

A Lambda-based travel booking site has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The architecture review board prefers a managed AWS-native control.

A.Provisioned concurrency during campaign windows

B.A larger deployment package

C.CloudTrail data events

D.Reserved concurrency only

AnswerA

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, eliminating cold starts for those instances. During campaign windows, this ensures consistent latency by keeping the function warm and ready to handle spikes without the delay of initializing new environments. It is a managed AWS-native control that directly addresses the unpredictable traffic pattern described.

Exam trap

The trap here is confusing reserved concurrency (which limits scaling but does not prevent cold starts) with provisioned concurrency (which pre-warms instances to eliminate cold starts), leading candidates to choose reserved concurrency as a simpler but incorrect solution.

How to eliminate wrong answers

Option B is wrong because a larger deployment package increases cold start time due to longer download and initialization overhead, making latency worse, not better. Option C is wrong because CloudTrail data events record API activity for auditing and governance, not for managing function initialization or latency. Option D is wrong because reserved concurrency only guarantees a maximum number of concurrent executions for a function, preventing it from using all available concurrency, but does not pre-warm instances; cold starts still occur for new invocations.

Full explanation →

938

Multi-Selectmedium

A company stores sensitive PDFs in Amazon S3 and serves them through CloudFront. Users must access PDFs only through CloudFront, and direct S3 URL requests must fail. Which three changes should be implemented? Select three.

Select 3 answers

A.Enable CloudFront Origin Access Control (OAC) for the S3 origin.

B.Turn on S3 Block Public Access for the bucket and account.

C.Add an S3 bucket policy that allows requests only from the CloudFront distribution using aws:SourceArn.

D.Enable S3 static website hosting on the bucket.

E.Make the object ACLs public so CloudFront can retrieve them.

AnswersA, B, C

OAC lets CloudFront sign origin requests so the S3 bucket can trust only that distribution.

Why this answer

Option A is correct because CloudFront Origin Access Control (OAC) is the modern, recommended method to restrict S3 bucket access exclusively to a CloudFront distribution. OAC uses a signed request mechanism that verifies the request originates from CloudFront, ensuring direct S3 URL requests are denied. This replaces the older Origin Access Identity (OAI) and provides stronger security with support for features like cross-region buckets and server-side encryption with KMS.

Exam trap

The trap here is that candidates often confuse OAC with OAI or think that enabling static website hosting is necessary for CloudFront integration, when in fact it creates an additional attack surface by exposing a direct S3 endpoint.

Full explanation →

939

MCQhard

Based on the exhibit, a single EC2 instance hosts a latency-sensitive cache that performs sustained random reads and writes to persistent block storage. The current EBS volume is a general-purpose SSD, but BurstBalance is repeatedly depleted and p95 I/O latency has risen above 20 ms. The workload needs more than 16,000 sustained IOPS. Which change is the best fix?

A.Move the data to Amazon S3 so the instance can read and write objects directly.

B.Replace the volume with an io2 EBS volume and provision the required IOPS.

C.Keep gp2 and increase the instance size to a compute-optimized family.

D.Enable Amazon EFS with bursting throughput mode for the cache data.

AnswerB

io2 is designed for mission-critical workloads that need sustained, predictable, low-latency random I/O. Unlike gp2, it does not depend on burst credits for performance. Provisioning the required IOPS directly addresses the exhausted BurstBalance and the sustained throughput requirement above 16,000 IOPS.

Why this answer

The workload requires more than 16,000 sustained IOPS with low latency, and the gp2 volume's burst credits are exhausted, causing high latency. An io2 Block Express or io2 volume can be provisioned with the exact IOPS needed (up to 256,000 IOPS) and provides consistent single-digit millisecond latency, making it the best fix for this latency-sensitive, sustained I/O workload.

Exam trap

The trap here is that candidates often assume increasing instance size (Option C) will improve EBS performance, but EBS IOPS and throughput are tied to the volume type and size, not the instance type (except for EBS-optimized bandwidth), so the gp2 burst credit exhaustion remains the root cause.

How to eliminate wrong answers

Option A is wrong because Amazon S3 is object storage accessed via HTTPS, not block storage, and introduces network latency and throughput limitations that are unsuitable for a latency-sensitive cache requiring sustained random reads/writes. Option C is wrong because increasing the instance size to a compute-optimized family does not change the gp2 volume's burst credit model; the volume will still deplete its burst balance and throttle to baseline IOPS (e.g., 160 IOPS per GB), failing to meet the >16,000 sustained IOPS requirement. Option D is wrong because Amazon EFS is a shared file system with NFS protocol overhead and its bursting throughput mode relies on burst credits that can be exhausted, leading to throttled throughput and higher latency, not suitable for sustained high IOPS block-level cache workloads.

Full explanation →

940

Multi-Selecthard

A company is encrypting sensitive S3 data for a IoT ingestion API with AWS KMS. Which two controls help prevent accidental use of the KMS key by unauthorized principals? The design must avoid adding custom operational scripts.

Select 2 answers

A.IAM policies that grant kms:Decrypt only to required application roles

B.S3 Transfer Acceleration

C.A key policy that limits key administrators and key users

D.A larger KMS key rotation period

AnswersA, C

IAM permissions should grant least-privilege use of the KMS key to specific roles.

Why this answer

Option A is correct because IAM policies can explicitly grant kms:Decrypt only to specific application roles, ensuring that only authorized principals (e.g., the IoT ingestion service role) can use the KMS key for decryption. This prevents unauthorized principals from accidentally or maliciously decrypting S3 objects, as the policy restricts the action to required roles without needing custom scripts.

Exam trap

The trap here is that candidates often confuse operational features like Transfer Acceleration or key rotation settings with access control mechanisms, failing to recognize that only IAM and key policies directly govern who can use a KMS key.

Full explanation →

941

MCQmedium

Your organization uses IAM permission boundaries to prevent engineers from escalating privileges. An automated pipeline creates an IAM role for an application deployment and attaches a permission boundary. After deployment, the pipeline reports that the role could create a new KMS key. The permission boundary policy attached to the role allows only (for a specific KMS key ARN, prod-key): - kms:Decrypt - kms:DescribeKey There is no Allow statement for: - iam:CreateKey - kms:CreateKey What is the most likely reason the role was still able to create a KMS key?

A.The permission boundary was not actually attached to the role at creation time (for example, the pipeline bug attached a different boundary ARN or the attachment step failed).

B.Permission boundaries automatically grant all KMS permissions needed by applications, even when they are not listed in the boundary.

C.Because the boundary allows kms:DescribeKey for prod-key, kms:CreateKey must also be implicitly allowed.

D.SCPs always override permission boundaries, so the boundary is ignored in Organizations.

AnswerA

Permission boundaries only constrain the effective permissions when they are attached to the IAM principal. If the boundary attachment step fails (or attaches the wrong boundary/role), the role’s effective permissions come from its identity policy alone, which may include kms:CreateKey.

Why this answer

The most likely reason is that the permission boundary was not actually attached to the role at creation time. Permission boundaries define the maximum permissions a role can have; if the boundary is missing or not attached, the role inherits the full permissions of its attached IAM policies. Since the boundary explicitly denies kms:CreateKey, the only way the role could create a KMS key is if the boundary was not enforced, pointing to a pipeline bug or attachment failure.

Exam trap

The trap here is that candidates assume permission boundaries are always correctly attached and enforced, but the question tests the understanding that a missing or misattached boundary results in no restriction, allowing actions that the boundary was intended to block.

How to eliminate wrong answers

Option B is wrong because permission boundaries do not automatically grant any permissions; they only set a limit on what the role can do, and any action not explicitly allowed in the boundary is implicitly denied. Option C is wrong because allowing kms:DescribeKey for a specific key does not imply any other KMS actions; IAM permissions are explicit, not implicit, and kms:CreateKey is a separate action that must be explicitly allowed. Option D is wrong because SCPs (Service Control Policies) can further restrict permissions, but they do not override permission boundaries; both SCPs and permission boundaries are evaluated together, and the effective permission is the intersection of all applicable policies.

Full explanation →

942

Multi-Selecthard

Multiple EC2 instances in different Availability Zones need concurrent read/write access to the same shared files. The files are actively modified by several application servers, and low-latency metadata operations matter more than extremely high aggregate throughput. Which two changes should the team make? Select two.

Select 2 answers

A.Use Amazon EFS instead of EBS or S3 for the shared file system.

B.Create EFS mount targets in every Availability Zone that hosts application instances.

C.Use a single EBS Multi-Attach volume mounted read/write by all instances across AZs.

D.Store the files in S3 and mount them directly through the console as a shared network filesystem.

E.Place the files on instance store volumes so each server has faster local access.

AnswersA, B

Amazon EFS is the managed AWS file service built for shared POSIX-style file access from multiple instances. It supports concurrent read/write access from many EC2 hosts and is a better fit than EBS, which is attached to a single instance, or S3, which provides object storage rather than a native shared filesystem. For an application that expects standard filesystem semantics, EFS is the correct storage layer.

Why this answer

Amazon EFS provides a fully managed, POSIX-compliant, shared file system that can be mounted concurrently by multiple EC2 instances across different Availability Zones (AZs). It supports concurrent read/write access with strong consistency, and its metadata operations are optimized for low latency, making it ideal for workloads where many application servers actively modify the same files. EBS cannot be shared across AZs, and S3 lacks POSIX semantics and low-latency metadata operations.

Exam trap

The trap here is that candidates often confuse EBS Multi-Attach with a cross-AZ shared storage solution, but Multi-Attach is strictly limited to a single AZ and a small number of instances, while EFS is the only AWS shared file system that natively spans AZs with concurrent read/write access.

Full explanation →

943

MCQhard

An EC2 instance in a private subnet must access an S3 bucket that contains regulated exports for a financial reporting platform. The security team requires access to be allowed only when traffic comes through a specific VPC endpoint. What should the architect add to the bucket policy? The design must avoid adding custom operational scripts.

A.A security group rule that allows HTTPS to S3

B.A condition that matches aws:RequestedRegion to the bucket Region

C.A deny statement for all IAM users except the EC2 role

D.A condition that matches aws:sourceVpce to the endpoint ID

AnswerD

The aws:sourceVpce condition restricts S3 access to requests that arrive through the specified VPC endpoint.

Why this answer

Option D is correct because the bucket policy can use the `aws:sourceVpce` condition key to restrict access to requests originating from a specific VPC endpoint. This ensures that only traffic routed through that endpoint can access the S3 bucket, meeting the security team's requirement without custom scripts.

Exam trap

The trap here is that candidates may confuse security group rules (which control instance-level traffic) with bucket policy conditions (which control access to the S3 service), leading them to pick Option A instead of the correct VPC endpoint condition.

How to eliminate wrong answers

Option A is wrong because security group rules are applied at the instance level, not the bucket policy level, and cannot restrict access based on the VPC endpoint used. Option B is wrong because `aws:RequestedRegion` checks the region of the request, not the network path or endpoint, so it does not enforce that traffic comes through a specific VPC endpoint. Option C is wrong because denying all IAM users except the EC2 role does not control the network path; the EC2 role could still access S3 via the internet or a different endpoint, violating the requirement.

Full explanation →

944

MCQhard

Based on the exhibit, a workload in private subnets must reach only Amazon S3 and AWS Secrets Manager. The team wants to eliminate internet exposure for those calls and reduce NAT gateway charges. What change should be made?

A.Move the instances into a public subnet and restrict inbound access with security groups.

B.Add a NAT instance and disable the managed NAT gateway to lower cost.

C.Create an S3 gateway endpoint and a Secrets Manager interface endpoint with private DNS, then remove NAT dependency for those service calls.

D.Use VPC peering to a shared services VPC and route all AWS service traffic through that VPC.

AnswerC

S3 is best reached through a gateway VPC endpoint, while Secrets Manager requires an interface endpoint. With private DNS enabled, the application can resolve and reach those services without leaving AWS private networking. This removes the need for NAT traffic for those calls, cuts cost, and keeps service access off the public internet.

Why this answer

Option C is correct because VPC Gateway Endpoints for S3 and VPC Interface Endpoints for Secrets Manager allow private subnet instances to access these services over the AWS network without traversing the internet or a NAT gateway. Enabling private DNS on the interface endpoint ensures that standard DNS names resolve to private IPs, eliminating the need for NAT and reducing costs.

Exam trap

The trap here is that candidates may think NAT gateways are required for all AWS service access from private subnets, not realizing that VPC endpoints provide direct, private connectivity without internet exposure.

How to eliminate wrong answers

Option A is wrong because moving instances to a public subnet would expose them to the internet, violating the requirement to eliminate internet exposure. Option B is wrong because a NAT instance still requires internet access and incurs management overhead, failing to eliminate internet exposure and not reducing costs effectively compared to endpoints. Option D is wrong because VPC peering to a shared services VPC does not inherently provide private access to S3 or Secrets Manager without additional endpoints or NAT, and it adds complexity and potential routing issues.

Full explanation →

945

MCQmedium

A batch analytics job runs for several hours each night and can be interrupted and restarted. Which EC2 purchasing option should minimize cost?

A.On-Demand Instances only

B.Dedicated Hosts

C.Spot Instances

D.Provisioned IOPS volumes

AnswerC

Spot Instances offer deep discounts for interruptible workloads.

Why this answer

Spot Instances are the correct choice because they offer significant cost savings (up to 90% compared to On-Demand) and are ideal for fault-tolerant, interruptible workloads like batch processing. Since the job can be interrupted and restarted, it can handle Spot Instance terminations gracefully, making this the most cost-effective option.

Exam trap

The trap here is that candidates may choose On-Demand Instances thinking they need guaranteed uptime, overlooking the fact that the workload is explicitly described as interruptible and restartable, which makes Spot Instances the optimal cost-saving choice.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances provide no interruption but are priced higher, which is unnecessary for a workload that can tolerate interruptions. Option B is wrong because Dedicated Hosts are designed for licensing or compliance requirements and are billed per host, making them far more expensive and unsuitable for cost minimization. Option D is wrong because Provisioned IOPS volumes (EBS) relate to storage performance, not compute pricing, and do not address the cost of EC2 instances.

Full explanation →

946

MCQeasy

A company serves mostly static images and JavaScript files from an origin in one AWS Region. They want to reduce origin load and improve global performance. Which change most directly increases cache-hit ratio for static assets while avoiding stale content?

A.Set Cache-Control headers on the origin to always be no-cache so clients revalidate frequently.

B.Use versioned file names (e.g., app.abc123.js) and configure a long TTL with appropriate revalidation behavior.

C.Disable query string forwarding so all URLs without query strings share one cached object even when content differs.

D.Forward all headers, including cookies, to maximize personalization in edge cached responses.

AnswerB

Versioned assets allow long caching with confidence, while new filenames trigger updates when code changes.

Why this answer

Option B is correct because using versioned file names (e.g., app.abc123.js) allows you to set a long Cache-Control max-age TTL (e.g., one year) without risking stale content. When the file changes, the new version gets a new URL, so clients and edge caches immediately fetch the fresh object, maximizing cache hits for unchanged assets while avoiding stale content.

Exam trap

The trap here is that candidates often confuse 'no-cache' with 'no-store' or think that disabling query strings universally improves caching, but they fail to recognize that versioned filenames with long TTLs are the standard pattern for maximizing cache hits while ensuring content freshness.

How to eliminate wrong answers

Option A is wrong because setting Cache-Control: no-cache forces clients to revalidate with the origin on every request, which increases origin load and reduces cache-hit ratio, directly contradicting the goal. Option C is wrong because disabling query string forwarding can cause different content to be served from the same cached object if the URL path is identical but query parameters differentiate the content, leading to stale or incorrect responses. Option D is wrong because forwarding all headers, including cookies, reduces cache-hit ratio by creating many unique cache keys for the same asset, defeating the purpose of caching static content.

Full explanation →

947

MCQmedium

A batch analytics job runs for several hours each night and can be interrupted and restarted. Which EC2 purchasing option should minimize cost? The design must avoid adding custom operational scripts.

A.On-Demand Instances only

B.Dedicated Hosts

C.Spot Instances

D.Provisioned IOPS volumes

AnswerC

Spot Instances offer deep discounts for interruptible workloads.

Why this answer

Spot Instances are ideal for fault-tolerant, interruptible batch workloads because they offer significant cost savings (up to 90% off On-Demand pricing) by using spare EC2 capacity. Since the job can be interrupted and restarted, it can handle Spot Instance reclaimations without requiring custom operational scripts—AWS handles the interruption notification and automatic instance termination, and the job's restart logic can be built into the application or orchestration layer (e.g., AWS Batch).

Exam trap

The trap here is that candidates may confuse Spot Instances with On-Demand Instances for cost savings, or incorrectly assume that Spot Instances require custom scripting to handle interruptions, when in fact AWS provides built-in mechanisms (e.g., lifecycle hooks, rebalance notifications) that can be leveraged without custom scripts.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances provide no cost savings for interruptible workloads; they are priced at the standard rate and are intended for steady-state or unpredictable workloads that cannot tolerate interruptions. Option B is wrong because Dedicated Hosts are a physical server dedicated to your use, which is significantly more expensive and unnecessary for a batch job that can tolerate interruptions; they are used for licensing or compliance requirements, not cost optimization. Option D is wrong because Provisioned IOPS volumes (EBS) are a storage type, not an EC2 purchasing option; they affect storage performance and cost but do not address compute cost optimization for interruptible workloads.

Full explanation →

948

MCQeasy

A team wants to run containerized services with AWS-managed orchestration and autoscaling. They do NOT require Kubernetes compatibility. Which AWS service choice is most appropriate to meet these goals?

A.Amazon EKS

B.Amazon ECS

C.An EC2 Auto Scaling group only

D.Amazon SQS as the compute layer

AnswerB

Amazon ECS is a native container orchestration service. You can run containers without Kubernetes, and ECS integrates with AWS-native autoscaling (for example, ECS Service Auto Scaling with targets such as CPU/memory or request-based metrics when applicable to the architecture).

Why this answer

Amazon ECS is the most appropriate choice because it provides AWS-managed container orchestration and autoscaling without requiring Kubernetes compatibility. ECS integrates natively with AWS services like Application Auto Scaling and CloudWatch to automatically scale container tasks based on metrics such as CPU or memory utilization, meeting the team's requirements directly.

Exam trap

The trap here is that candidates often confuse Amazon ECS with Amazon EKS, assuming that Kubernetes compatibility is required for container orchestration, but ECS provides a simpler, AWS-native alternative without Kubernetes overhead.

How to eliminate wrong answers

Option A is wrong because Amazon EKS is a managed Kubernetes service that requires Kubernetes compatibility, which the team explicitly does not need, adding unnecessary complexity and overhead. Option C is wrong because an EC2 Auto Scaling group only manages EC2 instances, not container orchestration or scheduling, so it cannot run containerized services directly without additional container management software. Option D is wrong because Amazon SQS is a message queuing service, not a compute layer; it cannot run containers or provide orchestration or autoscaling for containerized workloads.

Full explanation →

949

Multi-Selecthard

The web tier of an online scheduling app runs on an Auto Scaling group behind an ALB. Traffic spikes every weekday at 13:00 when a corporate newsletter is sent. CloudWatch shows CPU averages 18% outside that window, and the current fleet uses larger instances than the load test requires. The application is stateless and can scale out in a few minutes. Which two changes should the architect recommend? Select two.

Select 2 answers

A.Use scheduled scaling to raise desired capacity before the known newsletter window and lower it afterward.

B.Reduce the instance size to the smallest tested type that still meets peak load.

C.Keep the current oversized instances to avoid any scaling activity.

D.Replace the Auto Scaling group with Spot Instances only.

E.Disable ALB health checks to save a small amount of traffic.

AnswersA, B

Scheduled scaling eliminates unnecessary baseline capacity during predictable low-demand periods and ensures extra instances are ready before the spike.

Why this answer

Option A is correct because the traffic spike is predictable (every weekday at 13:00), making scheduled scaling the most cost-effective and reliable approach. Scheduled scaling allows you to increase the desired capacity of the Auto Scaling group before the newsletter window and decrease it afterward, ensuring the application can handle the load without relying on dynamic scaling policies that might lag behind the sudden spike. This avoids over-provisioning during non-peak hours while guaranteeing capacity exactly when needed.

Exam trap

The trap here is that candidates often assume dynamic scaling (e.g., step scaling or target tracking) is always the best choice, but for predictable, recurring traffic patterns, scheduled scaling is more efficient because it proactively adds capacity before the load arrives, avoiding the latency of scaling in response to metrics.

Full explanation →

950

MCQeasy

A retail API uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured? The design must avoid adding custom operational scripts.

A.Auto Scaling policy based on an appropriate CloudWatch metric

B.S3 Object Lock

C.A VPC endpoint for CloudWatch only

D.Disable health checks

AnswerA

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

An Auto Scaling policy based on a CloudWatch metric like CPUUtilization or ALB TargetResponseTime can dynamically add or remove EC2 instances to match demand. This directly addresses the high CPU and rising latency during peak traffic without requiring custom scripts, as the scaling actions are fully managed by AWS. The ALB distributes traffic across the scaled instances, reducing per-instance load and improving response times.

Exam trap

The trap here is that candidates may confuse VPC endpoints (which enable private connectivity) with actual scaling mechanisms, or assume that disabling health checks is a quick fix for latency, when in fact it degrades reliability and does not address the underlying capacity issue.

How to eliminate wrong answers

Option B is wrong because S3 Object Lock is a data protection feature for S3 objects (preventing deletion/overwrite) and has no role in scaling compute resources or reducing latency for an API behind an ALB. Option C is wrong because a VPC endpoint for CloudWatch only enables private connectivity to CloudWatch APIs (e.g., for publishing metrics or logs) but does not automatically trigger scaling or resolve CPU/latency issues; scaling still requires an Auto Scaling policy. Option D is wrong because disabling health checks would cause the ALB to route traffic to unhealthy instances, worsening latency and availability, and it does not address the root cause of high CPU.

Full explanation →

951

MCQmedium

A telemetry pipeline uses RDS MySQL and receives many read-only reporting queries that slow down the primary database. What should the architect add? The design must avoid adding custom operational scripts.

A.Multi-AZ standby and route reads to the standby

B.RDS read replica and route reporting queries to it

C.S3 lifecycle policy

D.A larger NAT gateway

AnswerB

Read replicas offload read traffic from the primary instance.

Why this answer

RDS Read Replicas are designed specifically to offload read-heavy workloads from the primary database. By creating a read replica and routing reporting queries to it, the architect reduces load on the primary MySQL instance without custom scripts. This is a native, managed feature of RDS that supports asynchronous replication.

Exam trap

The trap here is confusing Multi-AZ standby (which is for failover only and cannot serve reads) with a read replica (which is for read scaling and can serve queries).

How to eliminate wrong answers

Option A is wrong because a Multi-AZ standby is a synchronous replica used for high availability and disaster recovery, not for read traffic; it cannot serve read queries directly. Option C is wrong because an S3 lifecycle policy manages object storage transitions and expiration, not database query routing or read offloading. Option D is wrong because a larger NAT gateway increases outbound internet capacity for private subnets but does not address database read performance or query distribution.

Full explanation →

952

MCQmedium

A company stores application logs in an S3 bucket. They retain logs for 180 days. Compliance requires that the logs be immutable once written, but the business only reviews logs about once per month. Currently, the team stores everything in S3 Standard, and their monthly S3 bill is too high. They want to reduce storage cost without changing the requirement to keep logs for 180 days. Which lifecycle approach best meets the goal?

A.Use a lifecycle policy to transition objects older than 30 days to S3 Standard-IA, and keep them there until day 180.

B.Use a lifecycle policy to transition objects older than 30 days to S3 Glacier Deep Archive and delete after 30 days.

C.Use a lifecycle policy to transition objects older than 30 days to S3 Intelligent-Tiering with no minimum storage duration.

D.Disable lifecycle management and instead lower costs by deleting objects immediately after they are written.

AnswerA

Logs accessed about monthly match Standard-IA economics and still provide fast retrieval.

Why this answer

Option A is correct because it transitions logs older than 30 days to S3 Standard-IA, which offers lower storage costs than S3 Standard while still providing low-latency access for monthly reviews. The lifecycle policy keeps the objects in S3 Standard-IA until day 180, meeting the 180-day retention requirement without incurring the higher cost of S3 Standard for the entire period. S3 Standard-IA has a minimum storage duration of 30 days, which is satisfied by the 30-day transition threshold, and the objects remain immutable as S3 Object Lock is not affected by lifecycle transitions.

Exam trap

The trap here is that candidates may choose S3 Intelligent-Tiering (Option C) thinking it automatically optimizes cost for all access patterns, but for logs accessed only once per month, S3 Standard-IA is more cost-effective because Intelligent-Tiering incurs monitoring and automation overhead and may not move objects to the cheapest tier quickly enough for this specific use case.

How to eliminate wrong answers

Option B is wrong because transitioning objects to S3 Glacier Deep Archive after 30 days and deleting them after 30 days would delete the logs after 60 days total, violating the 180-day retention requirement. Option C is wrong because S3 Intelligent-Tiering has a minimum storage duration of 30 days per tier transition, and while it can reduce costs, it does not guarantee the lowest cost for logs accessed only once per month; S3 Standard-IA is more cost-effective for predictable monthly access patterns. Option D is wrong because deleting objects immediately after they are written violates the 180-day retention requirement and eliminates the logs entirely, which fails compliance.

Full explanation →

953

MCQhard

A Lambda-based travel booking site has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The design must avoid adding custom operational scripts.

A.Provisioned concurrency during campaign windows

B.A larger deployment package

C.CloudTrail data events

D.Reserved concurrency only

AnswerA

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency keeps a specified number of Lambda execution environments initialized and ready to respond immediately, eliminating cold starts. By enabling it only during campaign windows, you ensure consistent latency for the travel booking site during traffic spikes without incurring cost during off-peak periods. This directly addresses the requirement to avoid custom scripts, as it is a native AWS feature configured via the Lambda API or console.

Exam trap

The trap here is that candidates confuse reserved concurrency (which caps concurrent executions) with provisioned concurrency (which pre-warms instances), leading them to choose reserved concurrency alone, which does not address cold starts.

How to eliminate wrong answers

Option B is wrong because a larger deployment package increases the time to download and initialize the function code, which actually worsens cold start latency rather than solving it. Option C is wrong because CloudTrail data events record API activity for auditing and governance, not for managing Lambda concurrency or cold starts. Option D is wrong because reserved concurrency only sets a maximum number of concurrent executions for a function to prevent it from consuming all available concurrency in the account; it does not pre-warm instances or reduce cold starts.

Full explanation →

954

MCQmedium

A inventory service uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The design must avoid adding custom operational scripts.

A.Lambda reserved concurrency set to zero

B.A Lambda dead-letter queue or failure destination

C.A larger deployment package

D.CloudFront error pages

AnswerB

A DLQ or asynchronous failure destination captures failed events after retry attempts.

Why this answer

A Lambda dead-letter queue (DLQ) or failure destination allows you to capture events that have exhausted all retry attempts from an asynchronous invocation. When the Lambda function fails after the maximum retries (default 3), the event is sent to the configured SQS queue or SNS topic for later investigation, without requiring custom scripts or manual polling.

Exam trap

The trap here is that candidates may confuse Lambda's DLQ/failure destination with other error-handling mechanisms like SQS redrive policies or CloudFront custom error pages, which serve different purposes and operate at different layers of the architecture.

How to eliminate wrong answers

Option A is wrong because setting reserved concurrency to zero would completely disable the Lambda function, preventing any invocations and thus failing to process or retain any events. Option C is wrong because a larger deployment package does not affect error handling or event retention; it only increases cold start latency and deployment size. Option D is wrong because CloudFront error pages are for HTTP-level errors from a web distribution, not for capturing asynchronous Lambda invocation failures or dead-letter events.

Full explanation →

955

MCQeasy

An order system receives events and uses a Lambda function to write each order into a database. During traffic spikes, the database sometimes throttles, and Lambda retries lead to occasional message loss in the event flow. The team wants buffering, automatic retries, and a way to isolate messages that repeatedly fail so they can be inspected later. What design change best meets this need?

A.Send events directly from EventBridge to Lambda without any queue to simplify the flow.

B.Use Amazon SQS as a buffer between the event source and Lambda, with an SQS dead-letter queue (DLQ).

C.Use SNS fan-out to multiple Lambda functions, but keep no retry logic and no DLQ.

D.Store events in an S3 bucket and trigger Lambda immediately after each upload, without using DLQs.

AnswerB

SQS buffers bursts, supports retries via visibility timeouts, and DLQs capture messages that fail repeatedly for later review.

Why this answer

B is correct because Amazon SQS acts as a durable buffer between the event source and Lambda, absorbing traffic spikes and providing automatic retries via its visibility timeout mechanism. By attaching a dead-letter queue (DLQ) to the SQS queue, messages that repeatedly fail processing can be isolated for later inspection, preventing data loss and enabling debugging.

Exam trap

The trap here is that candidates may think EventBridge or S3 triggers provide sufficient retry and isolation, but they lack the built-in DLQ and configurable retry mechanics that SQS offers for decoupling and resilience.

How to eliminate wrong answers

Option A is wrong because sending events directly from EventBridge to Lambda without a queue provides no buffering or retry isolation; Lambda’s synchronous invocation retries are limited and can still lead to message loss under throttling. Option C is wrong because SNS fan-out to multiple Lambda functions without retry logic and no DLQ means failed messages are dropped immediately, with no mechanism for buffering or isolating problematic messages. Option D is wrong because storing events in S3 and triggering Lambda immediately after upload does not provide built-in retry logic for processing failures, and S3 does not offer a DLQ concept; failed events would be lost unless custom retry logic is implemented.

Full explanation →

956

MCQeasy

A company runs its customer-facing web app on EC2 behind an Application Load Balancer. The database is Amazon RDS for PostgreSQL. The requirement is that if a single Availability Zone fails, the database must automatically fail over within the same AWS Region with minimal application changes. Which database setup best meets this requirement?

A.Use an RDS single-AZ instance and periodically restore from automated backups if needed.

B.Deploy the RDS PostgreSQL instance as Multi-AZ with automatic failover enabled.

C.Create a read replica in a different AZ and use it only when the primary fails.

D.Use RDS with Multi-AZ disabled, but increase storage IOPS to prevent failover.

AnswerB

Multi-AZ RDS maintains a standby instance in a different AZ. If the primary fails, RDS performs automatic failover, preserving the same database endpoint behavior.

Why this answer

Option B is correct because RDS Multi-AZ with automatic failover provides synchronous replication to a standby instance in a different Availability Zone. If the primary AZ fails, RDS automatically flips the DNS CNAME to the standby, resulting in minimal application changes (only a brief connection interruption). This meets the requirement for automatic failover within the same region without manual intervention.

Exam trap

The trap here is that candidates often confuse a read replica (Option C) with a Multi-AZ standby, but a read replica is asynchronous and requires manual promotion, whereas Multi-AZ provides automatic synchronous failover with no application changes beyond reconnecting.

How to eliminate wrong answers

Option A is wrong because restoring from automated backups is a manual, time-consuming process that does not provide automatic failover; it can take hours and requires application changes to point to a new endpoint. Option C is wrong because a read replica is designed for read scaling, not automatic failover; promoting it to a primary requires manual intervention and does not provide synchronous replication, leading to potential data loss. Option D is wrong because disabling Multi-AZ and increasing IOPS does not provide any failover capability; it only improves performance, not availability, and a single-AZ failure will still cause an outage.

Full explanation →

957

MCQmedium

A web application runs in private subnets with no NAT gateway. It needs to retrieve credentials from AWS Secrets Manager at runtime. After a recent network hardening change, the application logs timeout errors when calling Secrets Manager. Which change will most directly enable private connectivity to Secrets Manager while keeping the subnets NAT-free?

A.Create an interface VPC endpoint (AWS PrivateLink) for the Secrets Manager service and update the security group rules to allow HTTPS from the application subnets.

B.Add a public DNS entry in the instance /etc/hosts pointing Secrets Manager to the instance’s private IP so requests do not leave the VPC.

C.Attach an internet gateway to the private route table so that Secrets Manager traffic can reach public endpoints without NAT.

D.Enable S3 VPC endpoint and store the secrets in an S3 bucket instead of Secrets Manager, then retrieve them using S3 gateway endpoints.

AnswerA

An interface VPC endpoint provides private, route-table-scoped connectivity to Secrets Manager without internet access or NAT. Security group rules on the endpoint enforce which subnets/instances can reach it.

Why this answer

An interface VPC endpoint (AWS PrivateLink) for Secrets Manager creates a private, direct connection to the service within the VPC, using Elastic Network Interfaces (ENIs) in the subnets. This allows the application to reach Secrets Manager over HTTPS without traversing the internet, a NAT gateway, or an internet gateway, directly resolving the timeout errors caused by the network hardening change that removed public internet access.

Exam trap

The trap here is that candidates might think a NAT gateway or internet gateway is required for any AWS service access, overlooking that AWS PrivateLink interface endpoints can provide private, direct connectivity to services like Secrets Manager without any public internet exposure.

How to eliminate wrong answers

Option B is wrong because modifying /etc/hosts to point the Secrets Manager DNS name to a private IP does not establish a valid network path to the service; the private IP would not be routable to the actual Secrets Manager endpoints, and the request would still fail or be misrouted. Option C is wrong because attaching an internet gateway to a private route table would expose the private subnets to the internet, defeating the purpose of keeping them private and NAT-free, and it would not provide a secure, private connection to Secrets Manager. Option D is wrong because while S3 VPC endpoints (gateway type) provide private connectivity to S3, migrating secrets to an S3 bucket introduces security risks (e.g., lack of automatic rotation, encryption at rest complexities) and does not leverage Secrets Manager's native secret management features; the question specifically asks for connectivity to Secrets Manager, not a workaround.

Full explanation →

958

MCQeasy

A media company uses CloudFront in front of an S3 bucket origin for video thumbnails. They want to prevent users from bypassing CloudFront and accessing the S3 bucket directly, while still allowing CloudFront to fetch objects. What is the best option?

A.Keep the bucket public and rely on signed cookies for all thumbnail requests.

B.Use CloudFront Origin Access Control (OAC) or Origin Access Identity (OAI) and update the bucket policy to allow only CloudFront.

C.Enable S3 static website hosting so users access thumbnails directly from the S3 website endpoint.

D.Set S3 bucket permissions to allow all IAM users and block access only by using a WAF rule at CloudFront.

AnswerB

OAC/OAI ensures only CloudFront can access the bucket while keeping the bucket private.

Why this answer

Option B is correct because CloudFront Origin Access Control (OAC) or Origin Access Identity (OAI) allows you to create a special CloudFront identity and attach a bucket policy that grants that identity s3:GetObject permissions. This ensures that only CloudFront can fetch objects from the S3 bucket, while direct access via the S3 endpoint is denied, preventing users from bypassing CloudFront.

Exam trap

The trap here is that candidates often think signed cookies or WAF rules can restrict direct S3 access, but they fail to realize that those mechanisms only apply at the CloudFront layer and do not affect the S3 bucket's own permissions.

How to eliminate wrong answers

Option A is wrong because keeping the bucket public and relying on signed cookies does not prevent direct access to the S3 bucket; signed cookies only restrict access through CloudFront, but the bucket itself remains publicly accessible. Option C is wrong because enabling S3 static website hosting exposes the bucket via the S3 website endpoint, which would allow users to bypass CloudFront and access thumbnails directly. Option D is wrong because setting S3 bucket permissions to allow all IAM users does not restrict direct access; a WAF rule at CloudFront cannot block direct S3 requests since WAF operates at the CloudFront edge, not on the S3 endpoint.

Full explanation →

959

MCQmedium

A company runs a customer portal on an Amazon Aurora PostgreSQL cluster. The application currently connects directly to the writer instance endpoint and keeps long-lived connections open. During a maintenance failover, writes fail until clients are restarted. The team wants the application to reconnect to the correct Aurora endpoint automatically and reduce user-visible write interruptions. Which change is most likely to achieve this?

A.Use the Aurora cluster endpoint for write traffic, use the reader endpoint for read-only traffic, and implement connection retry or reconnect logic on failover.

B.Keep using the original writer instance endpoint so the database host name never changes during failover.

C.Convert the Aurora cluster to Single-AZ so there is only one database node to connect to.

D.Place Route 53 in front of the database and manually update DNS records whenever failover occurs.

AnswerA

The cluster endpoint always targets the current writer, and failover-aware reconnect logic helps the application recover from dropped connections after promotion.

Why this answer

The Aurora cluster endpoint automatically points to the current writer instance, so using it for write traffic ensures that after a failover, new writes are directed to the new writer without needing to change the connection string. Implementing connection retry or reconnect logic in the application is essential because the existing long-lived connections will be broken during failover; the application must detect the failure and re-establish connections to the cluster endpoint to resume writes seamlessly.

Exam trap

The trap here is that candidates assume the writer instance endpoint remains constant during failover (Option B), but in Aurora, the writer instance endpoint changes because it is tied to the specific DB instance, not the cluster.

How to eliminate wrong answers

Option B is wrong because the writer instance endpoint is tied to a specific database node; during a failover, the original writer is replaced by a new writer with a different endpoint, so the host name does change, and the application would still need to reconnect. Option C is wrong because converting to Single-AZ removes the high-availability failover capability entirely, which contradicts the goal of reducing write interruptions during a failover. Option D is wrong because manually updating Route 53 DNS records is error-prone, introduces latency due to DNS caching, and does not provide the automatic, low-latency failover behavior that the Aurora cluster endpoint offers natively.

Full explanation →

960

MCQmedium

A media processing pipeline runs batch jobs on EC2. The jobs can tolerate interruptions because they checkpoint progress to durable storage and can restart. The total workload is variable week-to-week, and there is no need to guarantee capacity at specific times. To reduce compute cost while maintaining correctness, what EC2 purchase option and approach is the best fit?

A.Use EC2 Spot Instances with interruption handling and restart from checkpoints.

B.Use All Upfront Reserved Instances sized for the average weekly workload to minimize cost.

C.Use On-Demand Instances and scale only during business hours to reduce idle time.

D.Use Savings Plans with a fixed hourly commitment to ensure capacity for the entire year.

AnswerA

Spot capacity is typically the lowest-cost EC2 option and can be reclaimed by AWS with interruption notices. Because the workload is explicitly restartable and checkpoints to durable storage, interruptions do not break correctness. Since there is no requirement to reserve capacity, the variable workload aligns well with Spot’s spare-capacity model.

Why this answer

Spot Instances offer up to 90% cost savings compared to On-Demand and are ideal for fault-tolerant, stateless workloads that can checkpoint progress to durable storage. Since the batch jobs can tolerate interruptions and restart from checkpoints, Spot Instances provide the lowest compute cost while maintaining correctness. No other purchase option achieves the same level of cost reduction for this variable, interruption-tolerant workload.

Exam trap

The trap here is that candidates often choose Reserved Instances or Savings Plans thinking they always provide the best cost savings, but they fail to recognize that Spot Instances are significantly cheaper and perfectly suited for fault-tolerant, checkpointed batch workloads that do not require guaranteed capacity.

How to eliminate wrong answers

Option B is wrong because All Upfront Reserved Instances require a 1- or 3-year commitment and are sized for a fixed capacity, which does not match the variable week-to-week workload and would lead to over-provisioning or under-utilization, increasing cost. Option C is wrong because On-Demand Instances are the most expensive per-hour option and scaling only during business hours ignores the fact that the workload can run at any time; this approach does not minimize cost compared to Spot. Option D is wrong because Savings Plans with a fixed hourly commitment lock in a baseline spend and do not provide the deep discounts of Spot Instances; they also guarantee capacity only up to the committed amount, which is unnecessary for a workload that does not need guaranteed capacity.

Full explanation →

961

MCQhard

Based on the exhibit, a retail analytics service repeatedly reads the same DynamoDB items during an active campaign. The business can tolerate data that is a few seconds stale, but the application must minimize latency and reduce pressure on DynamoDB. A load test shows that 80% of reads target only 200 item keys. What should the solutions architect implement?

A.Add a DynamoDB Accelerator (DAX) cluster in front of the table and point the application to the DAX endpoint.

B.Switch the table to provisioned capacity with auto scaling so DynamoDB can handle the repeated reads more efficiently.

C.Create a global table in a second Region and read from the replica Region to lower latency.

D.Move the hot items into Amazon ElastiCache for Redis and keep the remaining data in DynamoDB.

AnswerA

DAX is purpose-built for DynamoDB read caching and can absorb repeated reads for the same keys with very low latency. Because the workload can tolerate slight staleness, DAX fits the requirement well and reduces pressure on the table during bursts.

Why this answer

DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache designed specifically for DynamoDB. It reduces read latency from single-digit milliseconds to microseconds and offloads repeated reads from the table, which directly addresses the requirement to minimize latency and reduce pressure on DynamoDB. Since the business can tolerate stale data (DAX default TTL is 5 minutes, but can be configured lower), DAX is ideal for the 80% of reads hitting only 200 hot keys.

Exam trap

The trap here is that candidates may choose ElastiCache (Option D) because it is a general-purpose cache, but they overlook that DAX is purpose-built for DynamoDB and eliminates the need for custom cache invalidation and dual-write logic, making it the simpler and more efficient solution for this exact use case.

How to eliminate wrong answers

Option B is wrong because switching to provisioned capacity with auto scaling does not reduce latency or offload repeated reads; it only adjusts throughput capacity based on load, but the same read requests still hit the underlying DynamoDB storage, causing the same pressure. Option C is wrong because creating a global table in a second Region adds cross-Region replication latency and does not reduce read pressure on the primary table; it also does not solve the hot-key issue for repeated reads within the same Region. Option D is wrong because moving hot items into ElastiCache for Redis requires dual-write logic and data synchronization between DynamoDB and Redis, adding complexity and potential inconsistency; DAX is a simpler, native cache that automatically stays consistent with DynamoDB without application changes.

Full explanation →

962

Multi-Selecteasy

A company hosts an internal API in two AWS Regions. Traffic must automatically switch to the secondary Region when the primary Region's endpoint is unhealthy. Which two Route 53 settings are required? Select two.

Select 2 answers

A.Use a failover routing policy for the DNS record.

B.Configure a health check for the primary endpoint.

C.Use geolocation routing so users are always sent to the closest Region.

D.Use a private hosted zone to expose the API to the internet.

E.Set the TTL to zero and skip health checks to make failover faster.

AnswersA, B

Failover routing is specifically designed for primary and secondary endpoints. Route 53 returns the secondary record when the primary record is considered unhealthy.

Why this answer

A failover routing policy is required because it allows Route 53 to automatically route traffic from a primary resource to a secondary resource when the primary is unhealthy. This is the only routing policy that supports active-passive failover across two AWS Regions. Without this policy, Route 53 would not know which endpoint to consider primary or how to switch traffic upon failure.

Exam trap

The trap here is that candidates often confuse failover routing with geolocation routing, thinking geographic proximity is sufficient for disaster recovery, but failover routing is the only policy that provides automatic health-based switching between primary and secondary endpoints.

Full explanation →

963

Multi-Selecthard

A company is encrypting sensitive S3 data for a healthcare document service with AWS KMS. Which two controls help prevent accidental use of the KMS key by unauthorized principals?

Select 2 answers

A.S3 Transfer Acceleration

B.A key policy that limits key administrators and key users

C.A larger KMS key rotation period

D.IAM policies that grant kms:Decrypt only to required application roles

AnswersB, D

The KMS key policy is the primary resource policy that controls who can administer or use the key.

Why this answer

Option B is correct because a key policy in AWS KMS explicitly defines which principals (users, roles, or AWS services) are allowed to use the key for cryptographic operations. By restricting key usage to specific key users, you prevent unauthorized principals—even those with broad IAM permissions—from accidentally invoking the key. This is a critical control for sensitive data like healthcare documents, where compliance requires strict access boundaries.

Exam trap

The trap here is that candidates often overlook that key rotation settings (Option C) are about key lifecycle management, not access control, and confuse S3 Transfer Acceleration (Option A) with a security feature when it is purely a performance optimization.

Full explanation →

964

MCQmedium

A web application for a IoT ingestion API is behind an Application Load Balancer. The application must be protected from common SQL injection and cross-site scripting attacks with minimum operational overhead. What should the architect deploy?

A.AWS WAF associated with the Application Load Balancer

B.Network ACLs on the public subnets

C.Security groups on the application instances

D.AWS Shield Advanced only

AnswerA

AWS WAF can inspect HTTP requests and block common web exploits when associated with an ALB.

Why this answer

AWS WAF is a web application firewall that integrates directly with an Application Load Balancer to filter and monitor HTTP/HTTPS requests. It provides managed rules specifically designed to block common attack patterns like SQL injection and cross-site scripting (XSS) with minimal operational overhead, as AWS manages the rule updates and scaling. This makes it the ideal choice for protecting the IoT ingestion API without requiring custom code or manual configuration.

Exam trap

The trap here is that candidates often confuse network-layer controls (like NACLs or security groups) with application-layer protection, assuming that blocking ports or IPs is sufficient to prevent SQL injection and XSS, when in fact these attacks require deep packet inspection of HTTP content.

How to eliminate wrong answers

Option B is wrong because Network ACLs operate at the subnet level and provide stateless IP/port filtering; they cannot inspect application-layer payloads to detect SQL injection or XSS patterns. Option C is wrong because security groups act as stateful firewalls at the instance level, filtering traffic based on IP addresses and ports, but they lack the ability to parse HTTP request bodies or headers for malicious content. Option D is wrong because AWS Shield Advanced provides DDoS protection and enhanced monitoring, but it does not include web application firewall capabilities to block SQL injection or XSS attacks.

Full explanation →

965

Multi-Selectmedium

A compliance archive writes one log file per day to Amazon S3. The logs are almost never accessed after day 30, but if they are needed they must still be retrievable in milliseconds. They must be deleted automatically after one year. Which two lifecycle settings should you apply? Select two.

Select 2 answers

A.Transition the objects to S3 Glacier Instant Retrieval after 30 days.

B.Expire the objects after 365 days.

C.Transition the objects to S3 Standard-IA after 30 days.

D.Keep the logs in S3 Standard indefinitely and delete them manually when needed.

E.Replicate the logs to another Region for cheaper archival storage.

AnswersA, B

Glacier Instant Retrieval is designed for data that is rarely accessed but still needs millisecond retrieval. Because the logs remain in the archive for 11 more months, the 90-day minimum storage duration is not a problem, and the storage cost is lower than keeping them in a hotter class.

Why this answer

Option A is correct because S3 Glacier Instant Retrieval provides millisecond retrieval times for archived data, meeting the requirement that logs must be retrievable in milliseconds after 30 days. This storage class is designed for long-lived, rarely accessed data that still needs immediate access, making it ideal for compliance archives that are almost never accessed but must be available instantly when needed.

Exam trap

The trap here is that candidates often confuse S3 Standard-IA with S3 Glacier Instant Retrieval, assuming Standard-IA is the cheapest option for infrequent access, but they overlook that Glacier Instant Retrieval offers lower storage costs for data that is almost never accessed while still providing millisecond retrieval, and they may forget that lifecycle expiration must be explicitly set for automatic deletion.

Full explanation →

966

MCQmedium

A marketing site stores logs in S3. Logs are queried for 30 days, rarely accessed for one year, and then retained for compliance. What should reduce storage cost? The design must avoid adding custom operational scripts.

A.S3 lifecycle policy that transitions objects to lower-cost storage classes over time

B.Keep all logs in S3 Standard indefinitely

C.Use EBS snapshots for the logs

D.Move all logs immediately to S3 Glacier Deep Archive

AnswerA

Lifecycle rules automate transitions based on age, matching storage cost to access patterns.

Why this answer

Option A is correct because S3 Lifecycle policies allow you to automatically transition objects from S3 Standard to lower-cost storage classes like S3 Standard-IA (Infrequent Access) after 30 days, then to S3 Glacier Deep Archive after one year, without custom scripts. This matches the access pattern: frequent queries for 30 days, rare access for a year, then long-term retention for compliance. The policy automates cost reduction by moving data to progressively cheaper storage as access frequency decreases.

Exam trap

The trap here is that candidates might choose Option D, thinking immediate archiving is cheapest, but they overlook the 30-day query requirement and the fact that S3 Glacier Deep Archive has retrieval times of 12+ hours, making it unsuitable for frequent access.

How to eliminate wrong answers

Option B is wrong because keeping all logs in S3 Standard indefinitely incurs the highest storage cost, ignoring the infrequent access and long-term retention requirements. Option C is wrong because EBS snapshots are designed for block-level backups of EC2 volumes, not for storing S3 log data, and would require custom scripts to move logs from S3 to EBS, violating the 'no custom operational scripts' constraint. Option D is wrong because moving all logs immediately to S3 Glacier Deep Archive would make them inaccessible for the first 30 days of frequent queries (retrieval takes 12 hours or more), and the cost of early deletion fees or retrieval requests would outweigh savings.

Full explanation →

967

MCQmedium

A developer accidentally deletes important rows in an RDS database. The mistake is discovered 45 minutes later. The database has automated backups enabled with a retention period of 7 days. What is the best way to restore the database to a point just before the deletion?

A.Restore the latest manual snapshot and then run SQL scripts to revert the deletion.

B.Use point-in-time restore (PITR) to restore the database to a specific timestamp before the deletion, based on automated backups.

C.Promote an existing read replica to be the primary and then copy the missing rows from logs.

D.Recreate the instance using the most recent CloudWatch metric alarm snapshot of storage metrics.

AnswerB

With automated backups enabled, RDS supports PITR within the retention window. PITR lets you restore to any second within that window, so you can select a timestamp just before the destructive deletion occurred. This avoids restoring a potentially stale snapshot and eliminates the need for risky manual compensating scripts.

Why this answer

Point-in-time restore (PITR) allows you to restore an RDS DB instance to any second within the automated backup retention period (here, 7 days). Since the deletion occurred 45 minutes ago, you can specify a timestamp just before the deletion, and RDS will replay the transaction logs to bring the database to that exact state. This is the most precise and efficient recovery method for accidental data modifications.

Exam trap

The trap here is that candidates may assume manual snapshots or read replicas can be used for granular point-in-time recovery, but only automated backups with transaction logs enable restoring to a specific second within the retention period.

How to eliminate wrong answers

Option A is wrong because manual snapshots capture the entire instance at a point in time, but they do not provide the granularity to restore to a specific moment just before the deletion; you would lose all changes made after the snapshot, and running SQL scripts to revert deletions is error-prone and not a built-in RDS feature. Option C is wrong because promoting a read replica makes it a new primary, but it does not revert data; it simply becomes a writable copy of the current state, which still contains the deletion. Option D is wrong because CloudWatch metric alarms monitor performance metrics, not database row-level data; they cannot be used to restore or recover deleted rows.

Full explanation →

968

MCQhard

Based on the exhibit, a partner account uploads encrypted objects to a central S3 bucket and later reads them back. The S3 permissions are correct, but the requests still fail. What change is required so the partner workload can use the customer-managed KMS key safely?

A.Replace SSE-KMS with S3 object ACLs so the partner account can bypass KMS authorization.

B.Create a new bucket in the partner account and copy the objects there to avoid cross-account encryption.

C.Switch the bucket to SSE-S3 so the partner role no longer needs KMS permissions.

D.Update the CMK key policy, or add a tightly scoped grant, to allow the partner role the required KMS actions through S3.

AnswerD

Cross-account access to SSE-KMS encrypted objects requires KMS authorization in addition to S3 authorization. The key policy must trust the partner role, and the permissions should be limited to the needed KMS actions such as Decrypt, Encrypt, and GenerateDataKey with a service condition for S3. That is why the partner can have valid S3 permissions and still fail until the KMS policy is fixed.

Why this answer

The correct answer is D because when using a customer-managed KMS key (CMK) for SSE-KMS in a cross-account scenario, the key policy must explicitly grant the partner account's IAM role the necessary KMS actions (kms:Decrypt, kms:GenerateDataKey) to allow S3 to perform the encryption/decryption on behalf of the partner. Without this policy update or a tightly scoped grant, S3 cannot authorize the KMS operation even if the S3 bucket policy permits the upload/read.

Exam trap

The trap here is that candidates assume the S3 bucket policy alone is sufficient for cross-account access with SSE-KMS, forgetting that KMS requires its own separate authorization via the key policy or a grant, which is a frequent point of failure in multi-account architectures.

How to eliminate wrong answers

Option A is wrong because S3 object ACLs do not bypass KMS authorization; they only control access to the object metadata, not the encryption key, and cannot resolve the missing KMS permissions. Option B is wrong because copying objects to a new bucket in the partner account does not address the root cause—the partner still needs to read the encrypted objects from the central bucket, and the KMS key policy remains unchanged. Option C is wrong because switching to SSE-S3 would remove the need for KMS permissions, but this changes the encryption method and may violate security requirements for using a customer-managed key; the question implies the partner must use the existing CMK.

Full explanation →

969

MCQhard

A claims portal uses Amazon RDS for PostgreSQL. Application credentials must not be stored on the EC2 instances, and authentication should use short-lived credentials. What should the architect recommend? The design must avoid adding custom operational scripts.

A.Store the database password in user data

B.Embed the database password in the AMI

C.IAM database authentication for RDS with an EC2 instance role

D.Use a security group rule that allows only application instances

AnswerC

IAM database authentication allows the application to use temporary AWS credentials instead of stored database passwords.

Why this answer

Option C is correct because IAM database authentication for RDS PostgreSQL allows EC2 instances to authenticate using short-lived credentials (tokens) obtained via the IAM instance profile role, eliminating the need to store long-term credentials on the instance. The EC2 instance assumes an IAM role, which grants permission to generate an authentication token (valid for 15 minutes) using the AWS CLI or SDK, and that token is used as the password for the database connection. This approach satisfies the requirements of no stored credentials, short-lived authentication, and no custom operational scripts.

Exam trap

The trap here is that candidates often confuse network-layer controls (security groups) with application-layer authentication, or they assume that storing credentials in user data or an AMI is acceptable because it's 'not on the instance filesystem' — but both still persist the credential on the instance, violating the 'not stored on EC2' requirement.

How to eliminate wrong answers

Option A is wrong because storing the database password in user data means the password is written to the instance's metadata and remains on the instance, violating the requirement that credentials must not be stored on EC2 instances. Option B is wrong because embedding the database password in the AMI hard-codes the credential into the image, which persists across instance launches and again stores credentials on the instance, failing the no-storage requirement. Option D is wrong because a security group rule controls network access at the transport layer (IP/port) and does not provide authentication; it cannot replace database credentials or enforce short-lived authentication.

Full explanation →

970

MCQmedium

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable?

A.S3 Cross-Region Replication

B.Multi-AZ deployment for the RDS DB instance

C.Read replicas only

D.EBS snapshots every hour

AnswerB

Multi-AZ provides synchronous standby replication and automatic failover within a Region.

Why this answer

Multi-AZ deployment for RDS MySQL automatically provisions and maintains a synchronous standby replica in a different Availability Zone. In the event of an AZ failure, Amazon RDS automatically fails over to the standby, providing high availability with minimal application changes (the application only needs to reconnect to the same endpoint). This meets the requirement for availability during an AZ outage without requiring code modifications.

Exam trap

The trap here is that candidates often confuse read replicas (which are for read scaling and manual promotion) with Multi-AZ (which provides automatic failover for high availability), leading them to select read replicas as a cheaper or simpler alternative.

How to eliminate wrong answers

Option A is wrong because S3 Cross-Region Replication is for object storage in S3, not for RDS MySQL databases, and it does not provide automatic failover for a relational database. Option C is wrong because read replicas are designed for read scaling, not for automatic failover during an AZ failure; they require manual promotion and application changes to redirect writes. Option D is wrong because EBS snapshots every hour provide point-in-time backup and recovery, not high availability; restoring from a snapshot would involve significant downtime and manual intervention, not minimal application changes.

Full explanation →

971

MCQmedium

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The team wants the control to be enforceable during normal operations.

A.S3 Cross-Region Replication

B.Multi-AZ deployment for the RDS DB instance

C.Read replicas only

D.EBS snapshots every hour

AnswerB

Multi-AZ provides synchronous standby replication and automatic failover within a Region.

Why this answer

Multi-AZ deployment for RDS MySQL automatically provisions and synchronously replicates a standby instance in a different Availability Zone. In the event of an AZ failure, Amazon RDS automatically fails over to the standby, providing high availability with minimal application changes (the application only needs to reconnect using the same endpoint). This meets the requirement of remaining available during an AZ failure while being enforceable during normal operations.

Exam trap

The trap here is that candidates often confuse read replicas (which are for read offloading and disaster recovery across regions) with Multi-AZ deployments (which provide synchronous replication and automatic failover for high availability within a region).

How to eliminate wrong answers

Option A is wrong because S3 Cross-Region Replication is for object-level replication in Amazon S3, not for RDS MySQL databases, and it does not provide automatic failover for a relational database. Option C is wrong because read replicas are designed for read scaling and asynchronous replication; they do not provide automatic failover for the primary database instance during an AZ failure, and promoting a read replica requires manual intervention and changes to the application connection string. Option D is wrong because EBS snapshots every hour provide point-in-time backups, not high availability; restoring from a snapshot would involve significant downtime and manual steps, failing the requirement for minimal application changes and continuous availability.

Full explanation →

972

MCQmedium

You host a public API using Amazon API Gateway in two AWS Regions: us-east-1 (primary) and us-west-2 (secondary). You want Route 53 to send client traffic to the secondary region only when the primary API is unhealthy. Which Route 53 setup best meets this requirement?

A.Use latency-based routing with one routing policy per region, and use CloudWatch alarms to update traffic weights between regions.

B.Use Route 53 failover routing with two ALIAS records (same DNS name) pointing to the API Gateway regional endpoints: one record is configured as PRIMARY with an associated health check, and the other is configured as SECONDARY.

C.Use weighted routing across both regions and rely on Route 53 health checks to automatically set the secondary to 100% weight when the primary fails.

D.Use geolocation routing to map some client geographies to the secondary region and the rest to the primary region.

AnswerB

Failover routing is designed for active-passive regional resiliency. With a PRIMARY record tied to a health check, Route 53 automatically returns DNS answers to the SECONDARY endpoint when the PRIMARY fails health checks.

Why this answer

Route 53 failover routing is designed for active-passive setups where traffic is sent to a primary resource unless it is unhealthy, in which case traffic is routed to a secondary resource. By creating two ALIAS records with the same DNS name, one marked PRIMARY with an associated health check and the other marked SECONDARY, Route 53 will automatically fail over to the secondary region when the health check for the primary API Gateway endpoint fails. This directly meets the requirement of sending traffic to the secondary region only when the primary API is unhealthy.

Exam trap

The trap here is that candidates often confuse weighted routing with failover routing, mistakenly believing that Route 53 health checks can automatically adjust weights to achieve active-passive failover, when in fact weighted routing does not support dynamic weight adjustment based on health.

How to eliminate wrong answers

Option A is wrong because latency-based routing directs traffic based on lowest latency, not health, and using CloudWatch alarms to manually update weights is not an automatic failover mechanism; it also requires custom automation and does not natively support health-check-driven failover. Option C is wrong because weighted routing distributes traffic based on assigned weights and does not automatically set the secondary to 100% weight when the primary fails; Route 53 health checks can mark a record as unhealthy but do not dynamically adjust weights—they would cause the primary record to be excluded from responses, but the secondary would only receive traffic if its weight is non-zero, and the behavior is not a clean active-passive failover. Option D is wrong because geolocation routing directs traffic based on the geographic location of the client, not the health of the endpoint, and it cannot automatically fail over traffic from one region to another when the primary becomes unhealthy.

Full explanation →

973

MCQmedium

A DynamoDB table uses this schema: partition key = customerId, sort key = timestamp. During a marketing campaign, one customer generates extremely high read traffic and the application sees ProvisionedThroughputExceeded errors even though the table’s total capacity is sufficient. What change most directly improves read distribution across partitions?

A.Increase the table’s provisioned read capacity units while keeping partition key = customerId.

B.Add a salt component to the partition key by changing it to customerId#salt, where salt is derived from a hash of requestId so a single customer’s requests are spread across many partitions; keep the sort key as timestamp.

C.Remove the sort key and use timestamp as the partition key to increase cardinality.

D.Switch to on-demand capacity and rely on DynamoDB to automatically distribute reads across partitions.

AnswerB

Hot partition throttling usually occurs when too many requests target a single partition key value. Salting transforms the partition key so that one high-traffic customerId maps to multiple distinct partition keys (e.g., customerId#0, customerId#1, etc.), which increases the number of partitions that can serve that customer’s workload concurrently and reduces the probability that a single partition becomes overloaded.

Why this answer

Option B is correct because adding a salt to the partition key (e.g., customerId#hash(requestId)) distributes the read-heavy customer's data across multiple physical partitions. This prevents a single hot partition from throttling requests, even when the table's total provisioned capacity is sufficient. DynamoDB's partition key determines the internal hash used for data placement, so increasing partition key cardinality directly improves read distribution.

Exam trap

The trap here is that candidates confuse total table capacity with per-partition capacity, assuming that increasing RCUs or switching to on-demand will fix throttling caused by a hot key, when in reality the bottleneck is the single partition's throughput limit.

How to eliminate wrong answers

Option A is wrong because increasing provisioned read capacity units does not solve the hot partition problem; it only raises the total table capacity, but a single partition still has a hard limit of 3000 RCUs (or 1000 WCUs) and will continue to throttle requests from the same customer. Option C is wrong because removing the sort key and using timestamp as the partition key would cause all writes and reads for a given timestamp to land on one partition, creating a new hot partition and losing the ability to query by customerId. Option D is wrong because on-demand capacity does not automatically distribute reads across partitions; it only scales total table capacity up or down, but a single partition still has the same throughput limit, so a hot key will still cause throttling.

Full explanation →

974

MCQmedium

A security requirement states: all uploads to an S3 bucket must (1) use TLS in transit and (2) use server-side encryption with AWS KMS (SSE-KMS) using the CMK key id 'abcd-1234'; otherwise the upload should be rejected. A developer reports that uploads are succeeding even though clients are sometimes using non-encrypted requests. Which bucket policy approach most directly enforces both controls?

A.Add an Allow statement granting s3:PutObject to the developer role; rely on IAM conditions in the developer role to enforce TLS and SSE-KMS.

B.Use Deny statements that reject PutObject when aws:SecureTransport is false and reject PutObject when s3:x-amz-server-side-encryption is not 'aws:kms' or when s3:x-amz-server-side-encryption-aws-kms-key-id does not equal 'abcd-1234'.

C.Enable S3 default encryption to SSE-KMS and remove any bucket policy enforcement, since default encryption automatically rejects all noncompliant uploads.

D.Attach a WAF rule to the S3 website endpoint to block non-TLS requests, because bucket policies cannot evaluate aws:SecureTransport.

AnswerB

These Deny conditions directly block noncompliant requests regardless of the caller’s IAM permissions because explicit Deny in a resource policy overrides any Allow. aws:SecureTransport identifies whether the request used TLS. The SSE-KMS headers (s3:x-amz-server-side-encryption and s3:x-amz-server-side-encryption-aws-kms-key-id) identify whether SSE-KMS was requested and which CMK key id was used.

Why this answer

Option B is correct because bucket policies can use the `aws:SecureTransport` condition key to enforce TLS and the `s3:x-amz-server-side-encryption` and `s3:x-amz-server-side-encryption-aws-kms-key-id` condition keys to enforce SSE-KMS with the specific CMK key ID. By using Deny statements, any request that does not meet both conditions is explicitly rejected, regardless of any Allow statements that might otherwise grant access. This directly enforces the security requirement at the bucket level.

Exam trap

The trap here is that candidates often confuse S3 default encryption with enforcement—default encryption only applies encryption to objects that lack it, but does not reject non-compliant uploads, so it cannot replace a bucket policy Deny statement for rejecting requests that violate encryption or TLS requirements.

How to eliminate wrong answers

Option A is wrong because relying on IAM conditions in the developer role does not enforce the controls for all clients; any client that can assume the role or use different credentials could bypass the conditions, and IAM conditions are not evaluated for anonymous or cross-account requests. Option C is wrong because S3 default encryption only applies server-side encryption to objects that are uploaded without an encryption header; it does not reject non-compliant uploads—it silently encrypts them, so requests without TLS or with a different KMS key ID would still succeed. Option D is wrong because AWS WAF cannot be attached directly to an S3 bucket endpoint; S3 does not support WAF integration, and bucket policies can indeed evaluate `aws:SecureTransport` to enforce TLS.

Full explanation →

975

MCQhard

A media processing workflow in private subnets downloads large amounts of data from S3 through a NAT gateway. NAT data processing charges are high. What should the architect use to reduce cost?

A.S3 Object Lambda

B.AWS Shield Advanced

C.Gateway VPC endpoint for Amazon S3

D.A larger NAT gateway

AnswerC

A gateway endpoint routes S3 traffic privately without NAT gateway data processing charges.

Why this answer

A Gateway VPC endpoint for Amazon S3 allows instances in private subnets to access S3 directly via the AWS network without traversing a NAT gateway, eliminating NAT data processing charges. This is the most cost-effective solution because NAT gateway costs are incurred per GB of data processed, and using a gateway endpoint avoids those charges entirely.

Exam trap

The trap here is that candidates may think a larger NAT gateway would improve throughput and lower costs, but in reality, it only increases both hourly and per-GB charges, while a gateway VPC endpoint eliminates the data processing cost entirely.

How to eliminate wrong answers

Option A is wrong because S3 Object Lambda is used to transform data as it is retrieved from S3, not to reduce data transfer costs from private subnets. Option B is wrong because AWS Shield Advanced is a DDoS protection service that does not address NAT gateway data processing charges. Option D is wrong because a larger NAT gateway would increase, not reduce, costs due to higher hourly and data processing fees.

Full explanation →

SAA-C03 (SAA-C03) — Questions 901–975