SAA-C03 SAA-C03 Questions 976–1040 | Page 14/14

976

MCQmedium

An ECS service runs on EC2 instances and is fronted by an ALB. The ALB spans two Availability Zones, and the ECS service desired count is 2 tasks. The underlying EC2 capacity uses an Auto Scaling group (ASG) with min size set to 1, and the ASG also spans only one subnet in practice. What is the most effective change to meet the requirement that the service continues during a single-AZ instance loss?

A.Set the ECS deployment configuration to maximum percent 100 so tasks replace instances faster during rollouts.

B.Increase ASG min size to at least 2 and ensure the ASG uses subnets in at least two Availability Zones.

C.Enable ALB connection draining longer than expected so existing connections survive longer during an AZ event.

D.Reduce task memory reservations to pack both tasks onto a single EC2 instance.

AnswerB

Multi-AZ instance capacity ensures tasks have eligible compute in another AZ when one AZ loses instances.

Why this answer

The current architecture has a single point of failure because the Auto Scaling group (ASG) spans only one subnet (one Availability Zone). If that AZ fails, all EC2 instances are lost, and the ECS service cannot run any tasks. Increasing the ASG min size to at least 2 and configuring it to use subnets in at least two AZs ensures that EC2 instances are distributed across AZs, allowing the ECS service to maintain at least one task in the surviving AZ during a single-AZ failure.

Exam trap

The trap here is that candidates often focus on ECS-specific settings (like deployment configuration or task placement) rather than recognizing that the root cause is the ASG's single-AZ limitation, which is a fundamental infrastructure resilience issue.

How to eliminate wrong answers

Option A is wrong because setting the ECS deployment configuration to maximum percent 100 controls how tasks are replaced during a rolling update, not how the service survives an AZ failure; it does not address the underlying lack of EC2 capacity in multiple AZs. Option C is wrong because ALB connection draining only helps gracefully terminate existing connections during deregistration or health check failures; it does not provision new compute capacity or ensure tasks run in another AZ after an AZ loss. Option D is wrong because reducing task memory reservations to pack both tasks onto a single EC2 instance actually increases risk—if that single instance (or its AZ) fails, both tasks are lost, and the ASG min size of 1 cannot recover quickly enough to meet the requirement.

Full explanation →

977

MCQeasy

A web service runs on an Auto Scaling group (ASG). The team updates configuration (AMIs, environment variables) in a Launch Template and wants new instances created during scale-out to use the latest Launch Template version. What should the architect do?

A.Leave the ASG attached to the previous Launch Template version so scale-out is stable.

B.Set the ASG to use the latest Launch Template version and optionally start an instance refresh for existing instances.

C.Manually SSH into each new instance and reconfigure it after it launches.

D.Move the configuration changes into a security group rule so the ASG updates them automatically.

AnswerB

ASG scale-out uses the configured Launch Template version at instance launch time. Switching the ASG to the latest version ensures new instances are consistent. An instance refresh helps apply changes to running instances safely and predictably.

Why this answer

Option B is correct because the ASG can be configured to use the latest version of a Launch Template by specifying the `$Latest` version alias. This ensures that any new instances launched during scale-out automatically use the most recent template configuration (e.g., updated AMI, environment variables). Additionally, an Instance Refresh can be triggered to roll the update across existing instances, aligning them with the same latest template version without manual intervention.

Exam trap

The trap here is that candidates may think the ASG automatically updates existing instances when the Launch Template version is changed, but without an Instance Refresh, only new scale-out instances receive the update, leaving existing instances on the old configuration.

How to eliminate wrong answers

Option A is wrong because leaving the ASG attached to a previous Launch Template version means new scale-out instances will use outdated configurations, defeating the purpose of updating the template. Option C is wrong because manually SSHing into each new instance is not scalable, violates infrastructure-as-code principles, and introduces human error; the ASG should automate configuration via the Launch Template. Option D is wrong because security group rules control network traffic, not instance configuration (AMIs, environment variables); they cannot propagate or apply Launch Template changes.

Full explanation →

978

MCQmedium

A global mobile game backend serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most?

A.RDS read replicas

B.Amazon CloudFront distribution with the S3 bucket as origin

C.A larger S3 bucket

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content (images, JavaScript files) at edge locations worldwide. By distributing content closer to users, it significantly reduces latency and improves load times for a global audience, making it the most effective solution for this use case.

Exam trap

The trap here is that candidates may confuse improving database read performance (RDS read replicas) with improving static content delivery, or assume that scaling compute resources (Auto Scaling) in a single Region can solve global latency issues.

How to eliminate wrong answers

Option A is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3. Option C is wrong because increasing the S3 bucket size does not affect network latency or data transfer speed; it only increases storage capacity. Option D is wrong because an EC2 Auto Scaling group in a single Region does not address global latency; it only provides scalability within that one Region, leaving distant users unaffected.

Full explanation →

979

MCQmedium

A company runs a SaaS application with highly unpredictable database load — it may receive zero queries for hours, then spike to thousands of queries per second briefly. The company wants to minimize database costs while handling all load levels without manual scaling. Which solution is MOST cost-effective?

A.Amazon Aurora Serverless v2 — scales automatically from minimum capacity during idle to maximum during spikes

B.Amazon Aurora Provisioned with Auto Scaling to add and remove read replicas

C.Amazon RDS MySQL with scheduled stop/start to save costs during predictable off-hours

D.Amazon DynamoDB with on-demand capacity mode — scales to any traffic level with no minimum cost

AnswerA

Aurora Serverless v2 scales in sub-second increments based on actual load. You pay only for ACUs consumed — idle periods cost near-minimum. No manual scaling required.

Why this answer

Amazon Aurora Serverless v2 automatically scales database capacity based on actual load. It scales from a minimum of 0.5 ACUs during quiet periods to up to 128 ACUs during spikes within seconds. You pay only for the ACUs consumed — idle periods cost near-minimum.

Aurora Provisioned requires pre-provisioning capacity for peak load, incurring full instance costs regardless of actual utilization. Even with Auto Scaling, the primary instance has a minimum provisioned capacity that runs at full cost during idle periods.

Exam trap

Aurora Serverless v1 and v2 are different products. v1 supports true pause-to-zero but has a cold-start delay (~25 seconds) and fewer supported features. v2 scales much faster (sub-second), supports more Aurora features including Global Database and Multi-AZ, and scales to 0.5 ACU minimum. For modern architectures, Aurora Serverless v2 is the correct recommendation.

Why the other options are wrong

Aurora Provisioned with Auto Scaling adds/removes read replicas based on CPU or connections but does not scale write capacity. The primary instance remains at a fixed provisioned size, incurring full cost during idle periods.

RDS scheduled stop/start stops the database entirely during off-hours — no queries can be served. SaaS applications may receive queries at any time. Manual scheduling cannot handle unpredictable spikes.

DynamoDB on-demand scales automatically and costs nothing when idle. However, migrating a relational workload to DynamoDB requires significant application refactoring. Aurora Serverless v2 provides cost savings without architectural changes.

Full explanation →

980

Multi-Selecthard

A fleet of test servers is rebuilt every week from AMIs. EBS volumes are often left behind after termination, and the team creates daily snapshots of every volume even when nothing changes. Which three actions most reduce storage cost while preserving recovery options? Select three.

Select 3 answers

A.Use gp3 for new EBS volumes instead of gp2 when similar performance is enough.

B.Automate snapshot creation and deletion with Amazon Data Lifecycle Manager.

C.Move old snapshots to the EBS Snapshot Archive tier once they are rarely restored.

D.Keep unattached volumes around for troubleshooting after instance termination.

E.Raise provisioned IOPS on every volume so snapshot restore time feels faster.

AnswersA, B, C

Correct. gp3 decouples baseline performance from volume size, which commonly lowers cost for workloads that do not need gp2's hidden throughput coupling. It is a practical right-sizing move for many general-purpose volumes.

Why this answer

Option A is correct because gp3 volumes offer a baseline performance that is often sufficient for test server workloads, and they are typically more cost-effective than gp2 volumes when similar performance is adequate. By using gp3, you avoid paying for provisioned IOPS that you do not need, directly reducing storage costs without sacrificing recovery options.

Exam trap

The trap here is that candidates may think keeping unattached volumes is a valid recovery option, but it is more cost-effective to snapshot and delete them, and they may overlook that raising IOPS does not accelerate snapshot restore times.

Full explanation →

981

Multi-Selectmedium

A retail API runs on Amazon EC2 instances behind an Application Load Balancer and stores orders in an Amazon RDS for PostgreSQL database. A test that stopped one Availability Zone caused the API to return errors because all application servers were in the same AZ and the database was single-AZ. Which two changes should the architect make to continue serving traffic during a single-AZ failure? Select two.

Select 2 answers

A.Increase the EC2 instance size and keep all application servers in the same subnet.

B.Configure the Auto Scaling group to launch instances across private subnets in at least two Availability Zones.

C.Replace the Application Load Balancer with a Network Load Balancer in a single Availability Zone.

D.Convert the RDS for PostgreSQL database to a Multi-AZ deployment.

E.Add an Amazon RDS read replica and point the application to the replica endpoint.

AnswersB, D

Spreading the application tier across multiple AZs preserves healthy capacity if one AZ fails and lets the load balancer keep serving requests.

Why this answer

Option B is correct because distributing EC2 instances across multiple Availability Zones via an Auto Scaling group ensures that if one AZ fails, the remaining AZs continue to serve traffic. Option D is correct because converting the RDS for PostgreSQL database to a Multi-AZ deployment provides a standby replica in a different AZ, enabling automatic failover and continued database availability during a single-AZ failure.

Exam trap

The trap here is that candidates often confuse a read replica (which is for read scaling and requires manual promotion) with a Multi-AZ standby (which provides automatic failover for high availability).

Full explanation →

982

MCQmedium

A document portal requires consistent high IOPS for a transactional database on EC2. Which EBS volume type is most suitable?

A.sc1 Cold HDD

B.Instance store only

C.Provisioned IOPS SSD such as io2

D.st1 Throughput Optimized HDD

AnswerC

io2 is designed for business-critical workloads requiring consistent high IOPS and durability.

Why this answer

Provisioned IOPS SSD (io2) is the correct choice because it delivers consistent, high IOPS performance required for transactional databases running on EC2. io2 volumes offer a 99.999% durability and can sustain up to 256,000 IOPS per volume, making them ideal for latency-sensitive workloads like OLTP databases.

Exam trap

The trap here is that candidates often confuse 'high IOPS' with 'high throughput' and select st1 or sc1, not realizing that transactional databases require low-latency random I/O, which only SSD-based volumes like io2 can consistently deliver.

How to eliminate wrong answers

Option A is wrong because sc1 Cold HDD is designed for infrequently accessed, throughput-oriented workloads with low cost, and cannot provide consistent high IOPS due to its burst-bucket model and high latency. Option B is wrong because instance store volumes are ephemeral and data is lost on instance stop/termination, making them unsuitable for persistent transactional databases that require durability and consistent IOPS. Option D is wrong because st1 Throughput Optimized HDD is optimized for large, sequential workloads like big data and log processing, not for random I/O patterns typical of transactional databases, and its performance is limited to a maximum of 500 IOPS per volume.

Full explanation →

983

MCQmedium

A ticket booking system uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The design must avoid adding custom operational scripts.

A.Aurora Global Database

B.A single-AZ Aurora cluster

C.An ElastiCache Redis replica

D.Manual snapshots copied monthly

AnswerA

Aurora Global Database replicates with low latency to secondary Regions and supports faster disaster recovery than snapshot-only approaches.

Why this answer

Aurora Global Database is designed for cross-Region disaster recovery with a typical RPO of 1 second and RTO of 1 minute, using storage-based replication that does not require custom scripts. It replicates data from a primary Region to up to five secondary Regions with minimal impact on database performance, meeting the low RPO requirement without operational overhead.

Exam trap

The trap here is that candidates may confuse cross-Region read replicas (which require manual promotion and have higher RPO) with Aurora Global Database, which provides automated failover and lower RPO without custom scripts.

How to eliminate wrong answers

Option B is wrong because a single-AZ Aurora cluster lacks any cross-Region replication or failover capability, providing no disaster recovery across Regions. Option C is wrong because ElastiCache Redis is an in-memory cache, not a persistent database, and cannot serve as a primary data store for ticket bookings or provide cross-Region DR with low RPO. Option D is wrong because manual snapshots copied monthly result in an RPO of up to one month, which is far too high for fast disaster recovery, and the process requires custom scripting to automate cross-Region copy.

Full explanation →

984

MCQeasy

An internal team runs a report-generation job once per day. It typically finishes in a few minutes, and even on its slowest days it still completes in under 15 minutes. The team wants to reduce operational overhead and pay primarily for actual runtime instead of keeping servers running 24/7. Which AWS approach best matches these goals?

A.Deploy the job on EC2 instances and keep them running continuously for the daily schedule.

B.Use AWS Lambda triggered by a schedule (for example, EventBridge) to run the report at the required time.

C.Run the job in an RDS database using stored procedures scheduled by the database engine.

D.Use an Auto Scaling group with a fixed minimum size of one instance and disable scaling.

AnswerB

Lambda runs on demand and charges for execution time, aligning spend with actual job runtime and reducing ops.

Why this answer

AWS Lambda is the ideal choice because it is a serverless compute service that runs code only when triggered, aligning with the requirement to pay primarily for actual runtime. By using Amazon EventBridge (CloudWatch Events) to invoke the Lambda function on a daily schedule, the team eliminates the need to provision or manage servers, and the job's typical runtime of a few minutes (under 15 minutes, Lambda's maximum execution timeout) fits perfectly within Lambda's constraints.

Exam trap

The trap here is that candidates may overlook Lambda's 15-minute timeout limit and assume any short-duration job is suitable, or they may mistakenly think that RDS stored procedures (Option C) are a cost-effective compute alternative, when in fact they are not designed for general-purpose application logic and still require a running database instance.

How to eliminate wrong answers

Option A is wrong because keeping EC2 instances running continuously incurs costs for idle time, which directly contradicts the goal of paying primarily for actual runtime and reducing operational overhead. Option C is wrong because RDS stored procedures are designed for database-level logic and are not a general-purpose compute solution for running report-generation jobs; they also incur costs for the RDS instance running 24/7 and lack the flexibility of a dedicated compute service. Option D is wrong because an Auto Scaling group with a fixed minimum size of one instance still keeps a server running 24/7, resulting in the same cost and operational overhead as Option A, and does not achieve the goal of paying only for runtime.

Full explanation →

985

MCQhard

A mobile banking backend must ensure that only encrypted EBS volumes can be created in the account. What is the strongest preventive control?

A.Run a daily Lambda function to encrypt unencrypted volumes

B.Enable VPC Flow Logs

C.Tag encrypted volumes after creation

D.Use an SCP that denies ec2:CreateVolume when the encrypted condition is false

AnswerD

An SCP can prevent noncompliant volume creation across accounts in an organization.

Why this answer

Option D is correct because Service Control Policies (SCPs) are a preventive control that can deny the ec2:CreateVolume action when the encrypted condition is false. This ensures that only encrypted EBS volumes can be created, enforcing encryption at the point of request before any volume is provisioned. SCPs operate at the AWS Organizations level, making them the strongest preventive mechanism for account-wide enforcement.

Exam trap

The trap here is that candidates often confuse reactive controls (like Lambda remediation) with preventive controls (like SCPs), or they mistakenly think tagging or logging can enforce encryption requirements.

How to eliminate wrong answers

Option A is wrong because running a daily Lambda function to encrypt unencrypted volumes is a detective/reactive control, not preventive; it only remediates volumes after they have already been created unencrypted, violating the requirement to prevent creation in the first place. Option B is wrong because VPC Flow Logs capture network traffic metadata and have no ability to enforce or prevent EBS volume creation or encryption; they are a monitoring tool, not a preventive control. Option C is wrong because tagging encrypted volumes after creation is a labeling action that does not prevent unencrypted volumes from being created; it is a detective or organizational control, not a preventive one.

Full explanation →

986

MCQmedium

Based on the exhibit, an administrator accidentally deleted data from Amazon RDS for PostgreSQL about 90 minutes ago. Which recovery approach best restores the database to the exact required point in time?

A.Restore the latest automated snapshot back onto the existing DB instance.

B.Restore the database to the specified point in time into a new DB instance.

C.Create a read replica and promote it after the deletion is noticed.

D.Enable Multi-AZ so the database can automatically undo application mistakes.

AnswerB

Point-in-time restore uses automated backups plus transaction logs to recreate the database at a specific moment. For accidental deletion, this is the correct RDS recovery method because it can recover the database to just before the bad change while preserving all legitimate data up to that point.

Why this answer

Amazon RDS for PostgreSQL supports Point-in-Time Recovery (PITR), which allows you to restore a DB instance to any second within the backup retention period, up to the last five minutes. Since the deletion occurred approximately 90 minutes ago, you can restore to that exact point in time by specifying the timestamp, and RDS will create a new DB instance from automated backups and transaction logs. This is the only option that recovers the exact state before the accidental deletion.

Exam trap

The trap here is that candidates confuse automated snapshots with point-in-time recovery, assuming a snapshot restore can target a specific time, when in fact snapshots are point-in-time captures and cannot replay transaction logs to reach an arbitrary second.

How to eliminate wrong answers

Option A is wrong because restoring the latest automated snapshot would recover data only up to the snapshot creation time, which could be hours or days before the deletion, not the exact point 90 minutes ago. Option C is wrong because creating a read replica replicates data asynchronously from the source; by the time the replica is promoted, it will already contain the deletion, and it cannot roll back to a prior point in time. Option D is wrong because Multi-AZ provides high availability through synchronous standby replication, but it does not protect against logical data corruption or accidental deletions; it cannot undo application mistakes.

Full explanation →

987

MCQmedium

A partner company needs read-only access to reports in an S3 bucket for a B2B file exchange site. The partner has its own AWS account. What is the most secure scalable access pattern? The design must avoid adding custom operational scripts.

A.Make the objects public and rely on difficult-to-guess object names

B.Create an IAM user in the company account and share the access keys

C.Create a bucket policy that grants the partner role least-privilege access to the required prefix

D.Copy the objects to a public website bucket

AnswerC

A resource policy can grant cross-account access to a specific external role and prefix.

Why this answer

Option C is correct because it uses a bucket policy with a principal ARN for the partner's AWS account, granting read-only access to a specific prefix. This is secure (no public exposure), scalable (no per-user credentials to manage), and avoids custom scripts by leveraging native AWS IAM and S3 policy evaluation. The partner can use their own IAM roles to access the bucket without sharing long-term access keys.

Exam trap

The trap here is that candidates may choose Option B (IAM user with shared keys) because it seems straightforward, but they overlook the security risk of long-term credentials and the operational burden of key rotation, which violates the 'most secure scalable' and 'avoid custom scripts' requirements.

How to eliminate wrong answers

Option A is wrong because making objects public relies on security through obscurity (difficult-to-guess names), which is not secure and violates the principle of least privilege; objects can be discovered via enumeration or leaks. Option B is wrong because creating an IAM user and sharing access keys introduces long-term credentials that must be securely rotated and managed, increasing operational overhead and risk of exposure, contradicting the 'avoid custom operational scripts' requirement. Option D is wrong because copying objects to a public website bucket makes them publicly accessible, losing all access control, and adds unnecessary data duplication and synchronization overhead.

Full explanation →

988

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most?

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content (images, JavaScript) at edge locations worldwide. By distributing content closer to users, it reduces latency and improves load times significantly compared to serving directly from a single S3 origin. This is the most effective solution for a global user base accessing static assets.

Exam trap

The trap here is that candidates might confuse 'scaling' (Auto Scaling, larger buckets) with 'latency reduction' (CDN), or mistakenly think database read replicas can serve static web assets, when in fact they are only for relational database read offloading.

How to eliminate wrong answers

Option A is wrong because S3 bucket size has no impact on performance; S3 scales automatically to handle any amount of data, and a larger bucket does not reduce latency for distant users. Option C is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to serve static files like images or JavaScript; they address database query performance, not content delivery. Option D is wrong because an EC2 Auto Scaling group in one Region only scales compute capacity within that single geographic area, failing to reduce latency for users in distant countries who still must traverse long network paths.

Full explanation →

989

MCQhard

A batch analytics job currently uses two NAT gateways in each of three Availability Zones, but only one private subnet per AZ needs outbound internet access. What should the architect review first? The design must avoid adding custom operational scripts.

A.Replacing every NAT gateway with an internet gateway attached to private subnets

B.Whether one NAT gateway per AZ is sufficient for the required private subnets

C.Disabling route tables

D.Moving all workloads to public subnets

AnswerB

NAT gateways are normally deployed per AZ for resilience; duplicate NAT gateways in the same AZ may be unnecessary.

Why this answer

Option B is correct because the current setup uses two NAT gateways per AZ, which is likely over-provisioned and incurs unnecessary costs. Since only one private subnet per AZ requires outbound internet access, a single NAT gateway per AZ is typically sufficient to handle the traffic, and this is the first cost-optimization step to review before making other changes.

Exam trap

The trap here is that candidates may assume more NAT gateways are always better for high availability, but the question explicitly states only one private subnet per AZ needs outbound access, making a single NAT gateway per AZ the cost-optimized starting point.

How to eliminate wrong answers

Option A is wrong because internet gateways cannot be attached to private subnets; they are attached to VPCs and only work with public subnets that have a route to the IGW. Option C is wrong because disabling route tables would break all network connectivity, not just outbound internet access, and is not a valid optimization strategy. Option D is wrong because moving all workloads to public subnets would expose them directly to the internet, violating security best practices and the requirement to avoid custom operational scripts.

Full explanation →

990

MCQeasy

You have EC2 instances in private subnets with no NAT gateway. They must retrieve secrets from AWS Secrets Manager without sending traffic to the public internet. Which VPC endpoint type is the correct choice for connecting to AWS Secrets Manager?

A.Create a Gateway VPC endpoint for Secrets Manager.

B.Create an Interface VPC endpoint (AWS PrivateLink) for Secrets Manager and associate security groups for the endpoint.

C.Use a Transit Gateway attachment to route traffic to the public internet for Secrets Manager.

D.Deploy a NAT gateway and allow outbound HTTPS traffic to Secrets Manager.

AnswerB

Secrets Manager is reached via an Interface VPC endpoint. Interface endpoints create private network interfaces in your subnets and route traffic to the AWS service over the AWS network, avoiding public internet egress.

Why this answer

AWS Secrets Manager is accessed via an API endpoint that uses HTTPS. Interface VPC endpoints (AWS PrivateLink) are the correct choice for connecting to services like Secrets Manager because they use elastic network interfaces (ENIs) with private IPs in your VPC, allowing traffic to stay within the AWS network. Gateway endpoints only support S3 and DynamoDB, not Secrets Manager.

Exam trap

The trap here is that candidates often confuse Gateway endpoints (which are free and only for S3/DynamoDB) with Interface endpoints (which incur hourly charges but support many services like Secrets Manager, KMS, and CloudWatch).

How to eliminate wrong answers

Option A is wrong because Gateway VPC endpoints only support Amazon S3 and DynamoDB, not AWS Secrets Manager. Option C is wrong because Transit Gateway attachments route traffic between VPCs and on-premises networks, but they do not provide private connectivity to AWS public services without a NAT gateway or internet gateway. Option D is wrong because deploying a NAT gateway would send traffic to the public internet, violating the requirement to avoid public internet traffic.

Full explanation →

991

MCQeasy

A company wants to protect a critical application from a full Region outage. The secondary Region should keep only a small amount of infrastructure running most of the time to control cost. Which disaster recovery strategy fits best?

A.Pilot light

B.Active-active

C.Single-AZ deployment

D.Blue/green deployment

AnswerA

Pilot light keeps a minimal version of the environment running in the backup Region, which helps reduce cost while still supporting recovery.

Why this answer

The pilot light strategy is correct because it keeps a minimal core of infrastructure (e.g., a small database, a few EC2 instances) running in the secondary Region, while the bulk of the application remains dormant. In a full Region outage, the pilot light can be rapidly scaled up to full production capacity, meeting the requirement of low ongoing cost with the ability to recover from a complete Region failure.

Exam trap

The trap here is that candidates confuse 'pilot light' with 'active-active' or 'warm standby,' mistakenly thinking that any multi-Region setup must run full capacity, when the pilot light specifically minimizes cost by keeping only a minimal footprint until failover is triggered.

How to eliminate wrong answers

Option B (Active-active) is wrong because it runs full production workloads in both Regions simultaneously, incurring high costs that contradict the requirement to keep only a small amount of infrastructure running most of the time. Option C (Single-AZ deployment) is wrong because it deploys resources within a single Availability Zone, which does not protect against a full Region outage and violates the requirement for cross-Region disaster recovery. Option D (Blue/green deployment) is wrong because it is a deployment strategy for minimizing downtime during application updates, not a disaster recovery strategy for Region-level failures; it typically operates within a single Region.

Full explanation →

992

Multi-Selectmedium

A serverless order-ingestion API writes directly to a database. During traffic spikes, the database occasionally throttles, Lambda retries create duplicate order records, and some requests time out. Which two changes best improve buffering and safe retry behavior? Select two.

Select 2 answers

A.Increase the Lambda timeout and keep writing directly to the database.

B.Put an Amazon SQS queue between the API and the database-processing function.

C.Replace SQS with SNS so every request is delivered immediately to all subscribers.

D.Make the database write idempotent by using a unique request token or order ID.

E.Disable retries so failed writes are never duplicated.

AnswersB, D

SQS buffers bursts and decouples producers from consumers, so the database can be processed at a steadier rate.

Why this answer

Option B is correct because inserting an SQS queue between the API Gateway and the Lambda function decouples the ingestion from the database write. During traffic spikes, SQS acts as a buffer, absorbing bursts and allowing the Lambda function to poll messages at a controlled rate, which prevents database throttling. Combined with a dead-letter queue, failed messages can be retried safely without overwhelming the database or creating duplicate records.

Exam trap

The trap here is that candidates often confuse SNS (push-based, no buffering) with SQS (pull-based, buffering), and they overlook that idempotency is a complementary pattern to handle retries without duplicates, not a replacement for decoupling.

Full explanation →

993

MCQmedium

A static marketing site is served through CloudFront from an S3 origin. After a product update, customers report a drop in CloudFront cache hit ratio and the CloudFront bill increases because the origin is receiving many more requests for the same JS/CSS assets. Asset URLs are versioned, but requests now include an Authorization header even though these assets are public. Which CloudFront change most directly improves the cache hit ratio for these assets?

A.Increase the origin's max connections to handle more origin fetches

B.Configure the CloudFront cache policy so Authorization is not included in the cache key, and use an origin request policy that does not forward Authorization to the S3 origin for this behavior

C.Set CloudFront minimum TTL to 0 seconds so caches expire faster and origin fetches start again

D.Disable CloudFront compression because Authorization headers are not cacheable when compression is enabled

AnswerB

For public assets, Authorization should not vary the cache key. Removing it from the cache key allows CloudFront to reuse cached objects across requests, and not forwarding it to the origin avoids unnecessary origin variation and request overhead.

Why this answer

The drop in cache hit ratio is caused by the Authorization header being included in the cache key, which makes CloudFront treat each request as unique even when the asset URL is the same. By configuring the cache policy to exclude Authorization from the cache key and using an origin request policy that does not forward it to S3, CloudFront can serve cached responses for all users regardless of their Authorization header, restoring the cache hit ratio.

Exam trap

The trap here is that candidates may think increasing origin capacity or adjusting TTLs solves the problem, but the real issue is that the Authorization header is unnecessarily varying the cache key, which is a common misconfiguration in CloudFront when public assets are served alongside authenticated content.

How to eliminate wrong answers

Option A is wrong because increasing origin max connections addresses origin load but does not fix the root cause of cache misses caused by the Authorization header in the cache key. Option C is wrong because setting minimum TTL to 0 seconds forces CloudFront to revalidate every request with the origin, which would increase origin fetches and worsen the cache hit ratio and bill. Option D is wrong because CloudFront compression does not affect cacheability of Authorization headers; the header is simply not part of the cache key by default unless explicitly included, and disabling compression would not resolve the cache key issue.

Full explanation →

994

Multi-Selectmedium

A startup runs an API on Amazon EC2. The instance must read items from one DynamoDB table and upload logs to one S3 bucket. Platform engineers also need a way to create new application roles, but those roles must never exceed a predefined set of permissions. Which three actions should the architect take? Select three.

Select 3 answers

A.Attach an IAM role to the EC2 instance profile and remove long-lived access keys from the server.

B.Give the EC2 instance an IAM user with administrator access for simplicity.

C.Scope the application policy to the exact DynamoDB table ARN and S3 bucket prefix.

D.Store the access keys in the application configuration file and rotate them later.

E.Use a permissions boundary for any IAM roles the platform team is allowed to create.

AnswersA, C, E

This gives the workload temporary credentials through the instance metadata service and avoids storing secrets on the host. It is the standard least-privilege pattern for EC2-based applications.

Why this answer

Option A is correct because attaching an IAM role to the EC2 instance profile allows the instance to obtain temporary credentials via the instance metadata service (IMDS), eliminating the need to store long-lived access keys on the server. This follows the AWS security best practice of using roles for EC2 to securely access DynamoDB and S3 without managing static credentials.

Exam trap

The trap here is that candidates may think storing access keys in a config file with rotation is acceptable, but AWS explicitly recommends using IAM roles for EC2 to avoid the security risks of long-lived static credentials.

Full explanation →

995

MCQeasy

A public API is served through an Application Load Balancer and protected by AWS WAF. The team wants AWS to automatically block clients that send too many requests from the same IP address within a short time window. Which AWS WAF feature is the best fit?

A.Use a rate-based rule in AWS WAF to block when requests per IP exceed a configured threshold over the WAF rate-based evaluation window.

B.Use an AWS IAM policy on the ALB listener to deny requests when request count exceeds a threshold.

C.Enable S3 server access logs for the bucket that stores API responses and alert on high log volume.

D.Configure an AWS Lambda authorizer to reject requests after the Nth request from an IP address.

AnswerA

Rate-based rules are designed specifically to mitigate abusive traffic by limiting the number of requests from an identified source (typically by IP). When the threshold is exceeded, you can set the rule action to Block (or count first for tuning).

Why this answer

A rate-based rule in AWS WAF is specifically designed to automatically block clients when the number of requests from a single IP address exceeds a configured threshold within a rolling evaluation window (typically 5 minutes). This feature directly addresses the requirement to mitigate high request rates from the same IP, making it the best fit for the described use case.

Exam trap

The trap here is that candidates may confuse AWS WAF rate-based rules with other AWS services like IAM or Lambda authorizers, mistakenly thinking those can handle network-level rate limiting, when in fact only WAF provides native, automatic IP-based rate blocking.

How to eliminate wrong answers

Option B is wrong because IAM policies control authentication and authorization for AWS API calls, not network-level request rates on an ALB listener; they cannot deny HTTP requests based on request count. Option C is wrong because S3 server access logs are for auditing object-level access, not for real-time rate limiting of API requests, and alerting on log volume does not automatically block clients. Option D is wrong because an AWS Lambda authorizer is used for custom authentication/authorization of requests, not for rate limiting based on IP address; it would require custom logic and does not natively support sliding window rate tracking.

Full explanation →

996

MCQmedium

A warehouse integration service receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers?

A.AWS WAF

B.Amazon Route 53 weighted routing

C.Amazon SQS queue

D.Amazon CloudFront

AnswerC

SQS decouples producers and consumers, buffers bursts, and supports retries through visibility timeout and dead-letter queues.

Why this answer

Amazon SQS is the correct choice because it acts as a durable, fully managed message queue that decouples the web tier from the fulfilment workers. It can absorb bursts of orders by storing messages durably, and workers can poll the queue at their own pace, with built-in retry logic via visibility timeouts and dead-letter queues to ensure no requests are lost.

Exam trap

The trap here is that candidates may confuse load-balancing or caching services (like Route 53 or CloudFront) with message queuing, failing to recognize that only a durable queue like SQS provides the necessary buffering, decoupling, and retry semantics for asynchronous order processing.

How to eliminate wrong answers

Option A is wrong because AWS WAF is a web application firewall that filters HTTP/S traffic based on rules, not a queuing or buffering mechanism; it cannot absorb spikes or retry processing. Option B is wrong because Amazon Route 53 weighted routing distributes DNS traffic across multiple endpoints based on weights, but it does not provide durable storage or retry capabilities for individual requests. Option D is wrong because Amazon CloudFront is a content delivery network (CDN) that caches static and dynamic content at edge locations; it can reduce load on origins but cannot queue or retry individual order messages.

Full explanation →

997

MCQeasy

A workload runs in private subnets. It must access AWS services such as Amazon S3, but the company wants to avoid using a NAT Gateway to reduce outbound networking costs. What is the best solution?

A.Create VPC endpoints for the required AWS services and route traffic to them

B.Attach Elastic IP addresses to instances in private subnets

C.Install a NAT Gateway in every subnet to minimize routing hops

D.Open outbound internet access with a security group rule to reach service endpoints directly

AnswerA

VPC endpoints provide private connectivity from your VPC to supported AWS services without traversing the public internet or a NAT Gateway. For example, you can use a gateway endpoint for S3 (and interface endpoints for other services where supported), which avoids NAT Gateway hourly and data-processing charges.

Why this answer

VPC endpoints (Gateway Endpoints for S3 and DynamoDB, or Interface Endpoints for other services) allow instances in private subnets to access AWS services privately without traversing the internet or a NAT Gateway. This eliminates NAT Gateway data processing and hourly charges, directly reducing outbound networking costs while keeping traffic within the AWS network.

Exam trap

The trap here is that candidates often assume private subnets must use a NAT Gateway or internet gateway for any AWS service access, overlooking that VPC endpoints provide direct, cost-free connectivity to supported services within the AWS network.

How to eliminate wrong answers

Option B is wrong because attaching Elastic IP addresses to instances in private subnets does not enable outbound internet access; private subnets lack an internet gateway route, so EIPs alone cannot route traffic to AWS services. Option C is wrong because installing a NAT Gateway in every subnet increases costs unnecessarily (each NAT Gateway incurs hourly and data processing charges) and does not minimize routing hops compared to VPC endpoints. Option D is wrong because security group rules control inbound/outbound traffic based on IP addresses or security groups, but they cannot route traffic to service endpoints directly; instances still need a route to the internet or a VPC endpoint to reach AWS services.

Full explanation →

998

MCQmedium

A healthcare document service stores audit logs in S3. The compliance team requires that logs cannot be overwritten or deleted for seven years. What should be configured?

A.S3 Object Lock in compliance mode with an appropriate retention period

B.S3 server access logging

C.S3 lifecycle expiration after seven years

D.S3 versioning only

AnswerA

Object Lock compliance mode enforces write-once-read-many retention that even privileged users cannot bypass during the retention period.

Why this answer

S3 Object Lock in compliance mode prevents any user, including the root user, from overwriting or deleting objects for the specified retention period. This meets the compliance requirement of immutable audit logs for seven years, as compliance mode enforces a strict write-once-read-many (WORM) model that cannot be bypassed.

Exam trap

The trap here is that candidates often confuse S3 versioning with immutability, assuming versioning alone prevents deletion, but versioning only protects against accidental overwrites by creating new versions—it does not prevent explicit deletion of the current version or the entire object.

How to eliminate wrong answers

Option B is wrong because S3 server access logging only records requests made to the bucket, it does not prevent deletion or overwriting of existing logs. Option C is wrong because S3 lifecycle expiration automatically deletes objects after seven years, which violates the requirement that logs cannot be deleted. Option D is wrong because S3 versioning alone preserves previous versions but does not prevent deletion of the current version or overwriting of objects; it must be combined with Object Lock to enforce immutability.

Full explanation →

999

MCQhard

Based on the exhibit, a trading platform exposes a custom binary TCP protocol to partner systems. The service must preserve the original client source IP for rate limiting, support TLS pass-through to the application, and minimize network latency. The team also wants a simple architecture that can scale across multiple Availability Zones. What load balancing option should the solutions architect choose?

A.Application Load Balancer with path-based routing and HTTP/2 enabled.

B.Network Load Balancer with TCP listeners and target groups in the private subnets.

C.Amazon API Gateway REST API integrated directly with the EC2 instances.

D.CloudFront in front of the EC2 instances to cache and terminate the client connections.

AnswerB

NLB is designed for ultra-low-latency TCP/UDP workloads and preserves the client source IP to targets. It also supports multi-AZ scale-out and works well when the application is not HTTP-based.

Why this answer

A Network Load Balancer (NLB) with TCP listeners is the correct choice because it preserves the original client source IP address (via the Proxy Protocol header or direct preservation in the TCP flow), supports TLS pass-through (no decryption at the load balancer), and minimizes latency by operating at Layer 4. It also scales across multiple Availability Zones with simple target groups in private subnets, meeting all stated requirements.

Exam trap

The trap here is that candidates often choose ALB for its advanced routing features, forgetting that ALB cannot preserve client source IP for TCP traffic and terminates TLS, which violates the TLS pass-through requirement for a custom binary protocol.

How to eliminate wrong answers

Option A is wrong because an Application Load Balancer (ALB) operates at Layer 7, which would terminate TLS and re-encrypt, breaking the TLS pass-through requirement; it also does not preserve the original client source IP natively for TCP-based protocols. Option C is wrong because Amazon API Gateway is a RESTful HTTP/HTTPS service that cannot handle custom binary TCP protocols or provide TLS pass-through, and it introduces additional latency. Option D is wrong because CloudFront is an HTTP/HTTPS content delivery network that terminates client connections, cannot pass through raw TCP traffic, and would add latency and complexity without supporting the custom binary protocol.

Full explanation →

1000

MCQhard

An EC2 instance in a private subnet must access an S3 bucket that contains regulated exports for a image sharing application. The security team requires access to be allowed only when traffic comes through a specific VPC endpoint. What should the architect add to the bucket policy?

A.A condition that matches aws:sourceVpce to the endpoint ID

B.A deny statement for all IAM users except the EC2 role

C.A security group rule that allows HTTPS to S3

D.A condition that matches aws:RequestedRegion to the bucket Region

AnswerA

The aws:sourceVpce condition restricts S3 access to requests that arrive through the specified VPC endpoint.

Why this answer

Option A is correct because the `aws:sourceVpce` condition key in an S3 bucket policy allows you to restrict access so that only traffic originating from a specific VPC endpoint (VPCe) is permitted. This enforces the security team's requirement that all S3 access must come through that endpoint, ensuring that requests from other paths (e.g., NAT gateway, internet gateway) are denied. The condition is evaluated at the S3 service side, not at the instance level, making it a direct and secure way to enforce the policy.

Exam trap

The trap here is that candidates often confuse `aws:sourceVpce` with `aws:SourceVpc` (which matches the VPC ID, not the endpoint ID) or incorrectly think a security group rule can enforce endpoint-specific routing, but security groups cannot control the network path taken by traffic.

How to eliminate wrong answers

Option B is wrong because denying all IAM users except the EC2 role does not enforce the requirement that traffic must come through a specific VPC endpoint; it only restricts which IAM identity can access the bucket, not the network path. Option C is wrong because a security group rule controls traffic at the instance level (allowing HTTPS to S3 from the instance), but it cannot enforce that the traffic must traverse a specific VPC endpoint—security groups do not have awareness of VPC endpoints. Option D is wrong because `aws:RequestedRegion` restricts the AWS Region from which the request is made, not the network path or VPC endpoint; it does not ensure traffic flows through the required VPC endpoint.

Full explanation →

1001

MCQmedium

A telemetry pipeline uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered? The architecture review board prefers a managed AWS-native control.

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.CloudFront only with long TTLs

D.AWS Backup cross-Region copy

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator is the correct choice because it uses the AWS global network and Anycast IP addresses to route user traffic to the optimal Application Load Balancer endpoint, reducing latency for global users without caching dynamic responses. Unlike CloudFront, Global Accelerator does not cache content; it simply optimizes the network path, making it ideal for dynamic or real-time applications where caching is not acceptable. It is a managed AWS-native service that aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates often confuse CloudFront with Global Accelerator, assuming that any CDN-like service is the answer for latency reduction, but CloudFront's caching behavior makes it unsuitable for dynamic content that must not be cached.

How to eliminate wrong answers

Option B (S3 Cross-Region Replication) is wrong because it is designed for replicating objects in S3 buckets across regions for data durability or compliance, not for reducing network latency to an ALB-based application. Option C (CloudFront only with long TTLs) is wrong because CloudFront caches content at edge locations, which would cache dynamic responses—contradicting the requirement to avoid caching—and long TTLs would further exacerbate stale data issues. Option D (AWS Backup cross-Region copy) is wrong because it is a backup and disaster recovery service for creating copies of resources across regions, not a solution for improving application latency.

Full explanation →

1002

MCQhard

A media processing workflow in private subnets downloads large amounts of data from S3 through a NAT gateway. NAT data processing charges are high. What should the architect use to reduce cost? The architecture review board prefers a managed AWS-native control.

A.S3 Object Lambda

B.AWS Shield Advanced

C.Gateway VPC endpoint for Amazon S3

D.A larger NAT gateway

AnswerC

A gateway endpoint routes S3 traffic privately without NAT gateway data processing charges.

Why this answer

A Gateway VPC endpoint for Amazon S3 allows instances in private subnets to access S3 directly over the AWS network without traversing a NAT gateway, eliminating NAT data processing charges. This is a managed AWS-native control that meets the architecture review board's preference, as it uses AWS PrivateLink and does not require any changes to the S3 bucket or client configuration beyond route table updates.

Exam trap

The trap here is that candidates may confuse Gateway VPC endpoints with Interface VPC endpoints, assuming both incur hourly charges, when in fact Gateway endpoints are free and only incur standard S3 data transfer costs, making them the optimal choice for reducing NAT-related expenses.

How to eliminate wrong answers

Option A is wrong because S3 Object Lambda is used to transform data on the fly during S3 GET requests, not to reduce data transfer costs from S3 to a VPC; it adds processing overhead and does not address NAT gateway charges. Option B is wrong because AWS Shield Advanced is a DDoS protection service that does not reduce data transfer costs or replace the need for a NAT gateway; it is unrelated to S3 access cost optimization. Option D is wrong because a larger NAT gateway would increase, not decrease, costs, as it still incurs per-GB data processing charges for all traffic through it, and does not eliminate the need for NAT traversal.

Full explanation →

1003

MCQmedium

A Lambda function for a order processing API needs to read a database password. The password must rotate automatically every 30 days and should not be stored in environment variables. Which service should be used?

A.AWS Secrets Manager with rotation enabled

B.An encrypted object in Amazon S3

C.AWS Systems Manager Parameter Store SecureString without automation

D.A KMS-encrypted Lambda environment variable

AnswerA

Secrets Manager stores secrets securely and supports automatic rotation using a rotation Lambda function.

Why this answer

AWS Secrets Manager is designed to securely store, retrieve, and automatically rotate database credentials on a schedule. It natively supports rotation every 30 days via a built-in Lambda rotation function, and it avoids storing the password in environment variables, meeting both security and compliance requirements.

Exam trap

The trap here is that candidates often confuse AWS Systems Manager Parameter Store SecureString (which can store encrypted values but lacks automatic rotation) with Secrets Manager, or they assume KMS-encrypted environment variables are sufficient despite their static nature and the explicit requirement to avoid environment variables.

How to eliminate wrong answers

Option B is wrong because storing an encrypted object in Amazon S3 requires manual retrieval and decryption logic in the Lambda function, and it does not provide automated rotation every 30 days. Option C is wrong because AWS Systems Manager Parameter Store SecureString without automation lacks built-in rotation capabilities; you would need to implement custom rotation logic, which is not automatic. Option D is wrong because a KMS-encrypted Lambda environment variable is static and cannot be rotated automatically; any rotation would require redeploying the function, and the password remains in the environment variable, which is explicitly prohibited.

Full explanation →

1004

MCQmedium

Your ecommerce app runs behind an Application Load Balancer (ALB) and uses an RDS database for orders. During an AZ impairment in us-east-1, customers report that checkout takes several minutes to recover. The current design places EC2 instances only in private subnets of AZ-a, while the ALB spans multiple subnets. The RDS DB instance is Multi-AZ. Management wants automatic recovery within the same Region. Which change best addresses the issue with minimal operational overhead?

A.Move the EC2 instances into Auto Scaling Groups that span private subnets in at least two AZs, keeping the ALB spanning those subnets.

B.Switch from RDS Single-AZ to RDS Multi-AZ, keeping the EC2 instances in only AZ-a because failover will still reach them.

C.Terminate the ALB and use a Network Load Balancer (NLB) in front of the existing single-AZ EC2 instances.

D.Add more EC2 instances in AZ-a and increase the ALB health check thresholds to avoid unnecessary replacements during impairments.

AnswerA

An Auto Scaling Group across multiple AZs ensures healthy capacity exists when an AZ becomes impaired, and the ALB can route to instances in any available AZ.

Why this answer

The correct answer is A because the current design has a single point of failure: all EC2 instances are in one Availability Zone (AZ-a). During an AZ impairment, those instances become unreachable, causing the checkout process to fail until the impairment ends or manual intervention occurs. By placing EC2 instances in an Auto Scaling Group spanning at least two AZs, the application can automatically recover by launching new instances in a healthy AZ, while the ALB distributes traffic across the surviving AZs.

This minimizes operational overhead as Auto Scaling handles instance replacement automatically.

Exam trap

The trap here is that candidates may focus on the database layer (Multi-AZ) or load balancer type (NLB vs ALB) and overlook the critical single-AZ EC2 instance placement, which is the actual bottleneck causing the prolonged recovery during an AZ impairment.

How to eliminate wrong answers

Option B is wrong because the RDS DB instance is already Multi-AZ (as stated in the question), so switching from Single-AZ to Multi-AZ is not a change; moreover, keeping EC2 instances in only AZ-a still leaves them vulnerable to an AZ impairment, as the ALB cannot route traffic to a healthy AZ if no instances exist there. Option C is wrong because replacing the ALB with an NLB does not address the root cause—EC2 instances are still confined to a single AZ; additionally, an NLB operates at Layer 4 and lacks the HTTP/HTTPS health checks and content-based routing that an ALB provides, which could break the ecommerce application's functionality. Option D is wrong because adding more EC2 instances in AZ-a only increases capacity within the same failing AZ, and increasing health check thresholds delays the detection of unhealthy instances, prolonging recovery time rather than improving it.

Full explanation →

1005

MCQhard

Based on the exhibit, an application repeatedly reads the same DynamoDB items with extremely low latency requirements. The business can tolerate data that is a few seconds stale. Which architecture change best improves read performance?

A.Add a DynamoDB Accelerator (DAX) cluster in front of the table.

B.Increase the table's sort key cardinality while keeping the same read pattern.

C.Switch the table to provisioned mode with auto scaling disabled.

D.Move the session data to Amazon EFS so the application can read it from shared files.

AnswerA

DAX is designed for repeated, read-heavy DynamoDB access patterns where a small amount of staleness is acceptable. It can dramatically reduce read latency and offload the table during peak demand.

Why this answer

Adding a DynamoDB Accelerator (DAX) cluster provides an in-memory cache that can reduce read latencies to microseconds for frequently accessed items, while still allowing for eventual consistency and tolerating a few seconds of staleness. DAX is specifically designed for this use case, handling cache hits without any application code changes and offloading read traffic from the DynamoDB table.

Exam trap

The trap here is that candidates may overlook DAX as a specialized caching layer for DynamoDB and instead consider increasing table capacity or changing data models, which do not directly address the need for extremely low latency on repeated reads of the same items.

How to eliminate wrong answers

Option B is wrong because increasing sort key cardinality does not improve read performance for repeated reads of the same items; it primarily helps with write distribution and query flexibility, not latency for individual GetItem operations. Option C is wrong because switching to provisioned mode with auto scaling disabled does not inherently improve read performance; it may lead to throttling if capacity is insufficient, and it does not address the need for sub-millisecond latency. Option D is wrong because moving session data to Amazon EFS introduces file system overhead and network latency that is significantly higher than DynamoDB's single-digit millisecond latency, and EFS is not designed for the same low-latency, high-throughput access pattern required for repeated reads of individual items.

Full explanation →

1006

MCQeasy

An internal API is hosted in two AWS Regions behind Route 53. Under normal conditions, clients should use the primary region. If the primary endpoint becomes unhealthy, traffic must automatically switch to the secondary region. Which Route 53 setup best meets this requirement?

A.Use latency-based routing with one record per region and no health checks.

B.Use failover routing policy: create two alias records for the same name (primary and failover) and associate health checks with the primary record.

C.Use weighted routing and manually change the weights during incidents.

D.Create a single alias record only for the primary region and rely on client-side DNS retries.

AnswerB

Route 53 failover routing is designed for deterministic primary/secondary switching based on health check status. When the primary health check fails, Route 53 automatically returns the secondary region endpoint.

Why this answer

Route 53 failover routing policy is designed for active-passive failover scenarios. By creating two alias records (primary and secondary) for the same DNS name and associating a health check with the primary record, Route 53 automatically directs traffic to the secondary region if the primary health check fails. This meets the requirement of automatic failover without manual intervention.

Exam trap

The trap here is that candidates often confuse failover routing with latency-based routing, assuming latency routing inherently handles failover, but latency routing does not automatically switch traffic when an endpoint becomes unhealthy unless health checks are explicitly configured.

How to eliminate wrong answers

Option A is wrong because latency-based routing distributes traffic based on lowest latency, not active-passive failover, and without health checks it cannot detect endpoint failures. Option C is wrong because weighted routing requires manual weight changes during incidents, which violates the requirement for automatic failover. Option D is wrong because a single alias record with no secondary endpoint provides no failover capability; client-side DNS retries do not redirect to a different region.

Full explanation →

1007

MCQmedium

A order processing API stores audit logs in S3. The compliance team requires that logs cannot be overwritten or deleted for seven years. What should be configured?

A.S3 server access logging

B.S3 lifecycle expiration after seven years

C.S3 versioning only

D.S3 Object Lock in compliance mode with an appropriate retention period

AnswerD

Object Lock compliance mode enforces write-once-read-many retention that even privileged users cannot bypass during the retention period.

Why this answer

Exam trap

The trap here is that candidates confuse versioning (which provides recovery but not immutability) with Object Lock (which enforces strict WORM compliance), leading them to select versioning alone as sufficient.

How to eliminate wrong answers

Option A is wrong because S3 server access logging only records requests made to the bucket; it does not prevent deletion or overwriting of existing logs. Option B is wrong because S3 lifecycle expiration after seven years would delete objects after that period, but it does not prevent deletion or overwriting before the expiration date. Option C is wrong because S3 versioning alone preserves previous versions of objects but does not prevent deletion of the current version or overwriting; it only allows recovery of deleted or overwritten objects, not immutability.

Full explanation →

1008

Multi-Selectmedium

A company is designing a multi-tier web application on AWS that must be resilient to the failure of an entire AWS Region. The application uses Amazon Route 53, an Application Load Balancer, EC2 instances, and Amazon RDS. Which three design choices support a multi-Region resilient architecture? (Choose three.)

Select 3 answers

.Use Route 53 latency-based routing to direct users to the closest healthy region.

.Configure Route 53 with a failover routing policy and health checks on the application endpoints.

.Deploy the application stack in two separate AWS Regions and use an active-passive setup.

.Use Amazon RDS Cross-Region read replicas to keep the standby region database up-to-date.

.Store all application state in a single Amazon ElastiCache cluster in the primary region.

.Place the Application Load Balancer in a single region but use it across multiple Availability Zones.

Why this answer

Route 53 failover routing with health checks is correct because it allows DNS to automatically route traffic away from a failed primary region to a standby region, which is essential for multi-Region resilience. Deploying the application stack in two separate AWS Regions with an active-passive setup is correct because it ensures that if the primary region fails, the passive region can take over, providing regional fault isolation. Amazon RDS Cross-Region read replicas are correct because they keep the standby region's database synchronized with the primary, enabling promotion to a primary database in a disaster recovery scenario.

Exam trap

AWS often tests the misconception that latency-based routing provides failover capability, but it only routes to the lowest-latency endpoint and does not support health-check-driven failover to a specific standby region.

Full explanation →

1009

MCQmedium

Your CI system assumes an IAM role RoleForDeploy using STS AssumeRole and includes a session tag called Project=blue. The role’s permissions policy uses an ABAC condition like aws:PrincipalTag/Project to allow access only to resources tagged with the same project. AssumeRole succeeds, but deployments fail with AccessDenied. CloudTrail shows the role was assumed, yet the effective session does not contain the Project tag. Which change most directly fixes this issue?

A.Add permissions for sts:TagSession to the IAM role so the CI pipeline is allowed to pass the Project session tag during AssumeRole.

B.Remove the ABAC condition using aws:PrincipalTag/Project so the policy ignores session tags.

C.Move the aws:PrincipalTag/Project condition into the trust policy so it applies during the AssumeRole call.

D.Add kms:Decrypt permission to the CI role because missing tags are typically caused by KMS authorization failures.

AnswerA

Session tags are not automatically granted; the role needs sts:TagSession permission to allow passing tags into the session.

Why this answer

Option A is correct because when an IAM role is assumed with STS AssumeRole and session tags are included, the calling principal must have explicit permission to pass those tags via the `sts:TagSession` action. Without this permission, the session tags are silently dropped, even though the AssumeRole call succeeds. Adding `sts:TagSession` to the role's permissions allows the CI pipeline to pass the `Project=blue` tag, making the ABAC condition on `aws:PrincipalTag/Project` evaluate correctly and granting access to tagged resources.

Exam trap

The trap here is that candidates assume session tags are automatically applied when passed in the AssumeRole call, but AWS requires explicit `sts:TagSession` permission for the tags to take effect, which is a subtle but critical detail tested in ABAC scenarios.

How to eliminate wrong answers

Option B is wrong because removing the ABAC condition would bypass the intended security control, but the root cause is that the session tag is not being applied, not that the condition is misconfigured. Option C is wrong because moving the condition to the trust policy would not fix the missing tag; the trust policy controls who can assume the role, not how session tags are passed, and the condition on `aws:PrincipalTag/Project` is correctly placed in the permissions policy to enforce ABAC. Option D is wrong because KMS authorization failures are unrelated to missing session tags; the issue is purely about STS tag propagation, not encryption key permissions.

Full explanation →

1010

MCQmedium

A Lambda function for a mobile banking backend needs to read a database password. The password must rotate automatically every 30 days and should not be stored in environment variables. Which service should be used? The design must avoid adding custom operational scripts.

A.An encrypted object in Amazon S3

B.AWS Secrets Manager with rotation enabled

C.AWS Systems Manager Parameter Store SecureString without automation

D.A KMS-encrypted Lambda environment variable

AnswerB

Secrets Manager stores secrets securely and supports automatic rotation using a rotation Lambda function.

Why this answer

AWS Secrets Manager is the correct choice because it natively supports automatic rotation of secrets on a configurable schedule (e.g., every 30 days) without requiring custom scripts. It also provides fine-grained access control and integrates directly with Lambda via the AWS SDK, keeping the password out of environment variables and code.

Exam trap

The trap here is that candidates often confuse AWS Systems Manager Parameter Store SecureString (which can store secrets but lacks automatic rotation) with Secrets Manager, or they assume that encrypting environment variables with KMS is sufficient for rotation, ignoring the need for automated lifecycle management.

How to eliminate wrong answers

Option A is wrong because storing an encrypted object in Amazon S3 requires custom code to retrieve, decrypt, and rotate the password, violating the 'no custom operational scripts' constraint. Option C is wrong because AWS Systems Manager Parameter Store SecureString without automation does not support automatic rotation; you would need to manually update the parameter or add a custom rotation solution. Option D is wrong because a KMS-encrypted Lambda environment variable is static and cannot be rotated automatically; you would need to redeploy the function to change the password, which adds operational overhead.

Full explanation →

1011

MCQhard

Based on the exhibit, what change should the team make to achieve the lowest possible network latency for the distributed workload?

A.Place the instances in a spread placement group across multiple Availability Zones.

B.Move the workload into a cluster placement group in one Availability Zone.

C.Add an Application Load Balancer in front of the workers to reduce inter-node latency.

D.Increase the EC2 instance size while keeping the current multi-AZ layout.

AnswerB

Cluster placement groups place instances physically close together inside one Availability Zone, which is the best AWS option for workloads that need low-latency, high-bandwidth communication between many nodes. The exhibit explicitly says the workload can run in a single AZ if performance improves. That makes cluster placement groups the right fit.

Why this answer

A cluster placement group provides the lowest possible network latency and highest throughput by placing all instances in a single Availability Zone with low-latency, non-blocking 10 Gbps or 25 Gbps network connectivity between them. This is ideal for tightly coupled, distributed workloads that require frequent inter-node communication, such as HPC or data analytics jobs.

Exam trap

The trap here is that candidates often assume multi-AZ is always better for high availability, but for latency-sensitive distributed workloads, a single-AZ cluster placement group is the correct choice to minimize inter-node latency, even though it sacrifices fault tolerance.

How to eliminate wrong answers

Option A is wrong because a spread placement group spreads instances across distinct hardware racks or even Availability Zones, which increases network latency and reduces throughput compared to a cluster placement group. Option C is wrong because an Application Load Balancer operates at Layer 7 and is designed for distributing incoming traffic, not for reducing inter-node latency between worker instances; it would add overhead and increase latency. Option D is wrong because increasing instance size does not fundamentally change the network topology or reduce the physical distance between instances; inter-node latency remains constrained by multi-AZ network hops.

Full explanation →

1012

MCQmedium

A company stores millions of objects in Amazon S3. Access patterns are completely unpredictable — some objects are frequently accessed, others rarely. Objects range from 4 KB to 50 MB. The company wants to minimize storage costs automatically without managing lifecycle rules. Which storage class should a solutions architect recommend?

A.S3 Standard — it is the default and handles all access patterns equally

B.S3 Standard-IA — it automatically detects infrequent access and reduces cost

C.S3 Intelligent-Tiering — it automatically moves objects between tiers based on access patterns

D.S3 One Zone-IA — it is the cheapest option with fast retrieval

AnswerC

Intelligent-Tiering monitors actual access and automatically moves objects between Frequent and Infrequent tiers with no retrieval fees. It eliminates lifecycle management complexity for unknown access patterns.

Why this answer

S3 Intelligent-Tiering monitors access patterns and automatically moves objects between access tiers — Frequent Access, Infrequent Access, and optional Archive tiers — based on actual usage. It requires no management or lifecycle rules.

Important: Intelligent-Tiering charges a small monitoring fee per object per month. For objects under 128 KB, this fee may exceed the storage savings. With objects ranging from 4 KB to 50 MB and unpredictable access patterns, Intelligent-Tiering is the recommended answer — AWS explicitly recommends it for unknown access patterns where object size averages above 128 KB.

Exam trap

For purely small objects (all < 128 KB), Intelligent-Tiering's monitoring cost ($0.0025 per 1,000 objects) can exceed the storage savings — Standard would be cheaper. But for mixed sizes with unpredictable access (as in this question), Intelligent-Tiering is the correct recommendation. The key phrase 'automatically without managing lifecycle rules' points to Intelligent-Tiering.

Why the other options are wrong

S3 Standard is the highest cost per-GB storage class and does not automatically reduce cost based on access patterns. For unpredictable access, Intelligent-Tiering is more cost-effective for objects with average size above 128 KB.

S3 Standard-IA does NOT automatically detect access patterns. Objects placed in Standard-IA are statically in that class. It also charges a per-GB retrieval fee making it expensive for frequently accessed objects.

One Zone-IA stores data in a single AZ (lower durability). It does not automatically adjust to access patterns and charges retrieval fees. It's inappropriate for data requiring standard S3 durability.

Full explanation →

1013

Multi-Selectmedium

A marketing site serves versioned JavaScript and CSS files from Amazon S3 through CloudFront. Origin bandwidth costs are rising because CloudFront keeps revalidating objects and fetching too much content from the bucket. Which two changes most directly improve cache hit ratio and reduce origin load? Select two.

Select 2 answers

A.Use versioned object names and long cache TTLs for immutable assets.

B.Forward all cookies and query strings so each request is treated as unique.

C.Configure a cache policy that excludes unnecessary cookies, query strings, and headers.

D.Switch the bucket to S3 Intelligent-Tiering to reduce CloudFront origin requests.

E.Add more NAT Gateways to improve the speed of CloudFront origin fetches.

AnswersA, C

Versioned file names let you cache content aggressively because each new build gets a new URL and does not overwrite the old one.

Why this answer

Option A is correct because using versioned object names (e.g., app-v1.js, app-v2.js) combined with long cache TTLs (e.g., one year) tells CloudFront that these assets are immutable. Once cached, CloudFront never revalidates them, eliminating origin requests for unchanged files. This directly reduces origin bandwidth costs by preventing unnecessary fetches from S3.

Exam trap

The trap here is that candidates confuse S3 storage classes (like Intelligent-Tiering) with caching performance, or think that increasing network throughput (NAT Gateways) can fix a cache miss problem, when the real solution lies in optimizing cache keys and TTLs.

Full explanation →

1014

MCQmedium

A read-heavy document portal repeatedly queries the same product catalogue data from DynamoDB with millisecond latency requirements. Which service can reduce read latency and table load? The design must avoid adding custom operational scripts.

A.Amazon Kinesis Data Firehose

B.S3 Transfer Acceleration

C.DynamoDB Accelerator (DAX)

D.AWS Glue Data Catalog

AnswerC

DAX is an in-memory cache for DynamoDB that reduces read latency for suitable access patterns.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that delivers microsecond read latency, reducing the number of read requests hitting the underlying table. It requires no custom scripts—just a DAX cluster endpoint—and automatically caches frequently accessed items, making it ideal for a read-heavy document portal with millisecond latency requirements.

Exam trap

The trap here is that candidates often confuse caching services like ElastiCache with DAX, but DAX is purpose-built for DynamoDB and requires no application code changes beyond pointing to a different endpoint, whereas ElastiCache would need custom cache invalidation logic.

How to eliminate wrong answers

Option A is wrong because Amazon Kinesis Data Firehose is a streaming data ingestion service for loading data into data lakes or analytics tools, not a read cache for DynamoDB. Option B is wrong because S3 Transfer Acceleration speeds up uploads to S3 over long distances using edge locations, but does not reduce read latency or load on a DynamoDB table. Option D is wrong because AWS Glue Data Catalog is a metadata repository for ETL jobs and data discovery, not a caching layer for DynamoDB reads.

Full explanation →

1015

Multi-Selectmedium

A global software company distributes large installation packages from an Amazon S3 bucket. During release week, many users in the same region download the same file repeatedly, and the origin bill is rising because the same objects are fetched over and over. The team wants to lower origin data transfer and improve delivery cost. Which two actions should it take? Select two.

Select 2 answers

A.Put Amazon CloudFront in front of the S3 origin.

B.Use versioned object names and long cache TTLs for the release artifacts.

C.Disable caching so every user always gets the newest file from S3.

D.Serve the downloads from a self-managed EC2 web server instead of S3.

E.Move the release packages to S3 Glacier Deep Archive for faster downloads.

AnswersA, B

CloudFront caches popular package files at edge locations, so repeated downloads can be served without repeatedly hitting S3. That reduces origin data transfer and improves user download performance, which is exactly what this scenario needs.

Why this answer

Amazon CloudFront acts as a content delivery network (CDN) that caches objects at edge locations close to users. By placing CloudFront in front of the S3 bucket, repeated downloads of the same file are served from the edge cache, drastically reducing the number of requests to the S3 origin and lowering data transfer costs from S3. CloudFront also offers free data transfer to the origin for cached content, further optimizing delivery cost.

Exam trap

The trap here is that candidates may think disabling caching ensures freshness (Option C) or that moving to a cheaper storage class like Glacier Deep Archive (Option E) reduces cost, without realizing that both actions increase origin data transfer or retrieval latency, contradicting the goal of lowering delivery cost for frequently accessed objects.

Full explanation →

1016

MCQeasy

An orders service currently sends HTTP requests directly to two downstream services (inventory and shipping). During peak load, inventory slows down, causing the orders service to slow as well. The team wants the orders service to remain responsive even when a downstream service is temporarily slow or restarted. Which design change best achieves this resiliency goal?

A.Keep HTTP calls but add longer client timeouts so orders requests wait for slow downstream responses.

B.Introduce Amazon SQS as a buffer between orders and downstream services, with consumers processing from the queue.

C.Replace the downstream services with AWS Lambda functions that are invoked synchronously by the orders service.

D.Call the downstream services in parallel threads to reduce waiting time during peak load.

AnswerB

SQS decouples the producer (orders service) from the consumers (inventory/shipping processors). The orders service can quickly enqueue work and return to the caller, even if a downstream service is slow or restarted. Messages remain in the queue until consumers can process them, preventing cascading latency/backpressure from propagating to the orders API.

Why this answer

Option B is correct because introducing Amazon SQS as a buffer decouples the orders service from the downstream inventory and shipping services. The orders service can immediately enqueue messages and respond to the client, while downstream consumers process messages at their own pace. This prevents backpressure from a slow or restarting downstream service from blocking the orders service, achieving the desired resiliency.

Exam trap

The trap here is that candidates may think parallelizing calls (Option D) or increasing timeouts (Option A) solves the problem, but they fail to recognize that true resiliency requires decoupling via asynchronous messaging, not just concurrency or tolerance of delays.

How to eliminate wrong answers

Option A is wrong because adding longer client timeouts does not prevent the orders service from being blocked; it only increases the wait time before a timeout occurs, still causing the orders service to slow down during peak load. Option C is wrong because replacing downstream services with synchronously invoked Lambda functions does not decouple the services; the orders service would still block waiting for the Lambda invocation to complete, and Lambda has a 15-minute timeout limit, which does not solve the slowdown issue. Option D is wrong because calling downstream services in parallel threads reduces latency only if both services are responsive; if one service is slow or restarting, the orders service still waits for that slow response, and thread pool exhaustion can occur under peak load, leading to resource contention and slowdown.

Full explanation →

1017

MCQmedium

Your media processing pipeline writes original uploads to an S3 bucket and later generates derivative files. An operator accidentally deletes a subset of original uploads in production. You need to (1) restore the deleted objects with minimal data loss and (2) protect against both regional disasters and future operator mistakes. The company requires recovery even if objects are deleted and later overwritten. What is the most effective change to meet these requirements?

A.Enable S3 versioning on the bucket and configure cross-Region replication so previous versions are available after regional loss and accidental deletion.

B.Move all objects to S3 Glacier Instant Retrieval and apply a lifecycle policy to keep only the latest object copy.

C.Use S3 server-side encryption with KMS keys and rely on access logs to manually recover the deleted objects.

D.Enable S3 bucket policies that deny DeleteObject, but do not enable versioning or replication.

AnswerA

Versioning retains prior object versions, and cross-Region replication provides redundancy across Regions for recovery after deletion or disaster.

Why this answer

Option A is correct because enabling S3 Versioning preserves all object versions, including deleted markers and overwritten objects, allowing recovery from accidental deletions. Cross-Region Replication (CRR) replicates both current and previous versions to a secondary region, providing protection against regional disasters. This combination ensures that even if objects are deleted and later overwritten, the original versions remain recoverable in both the source and destination buckets.

Exam trap

The trap here is that candidates often assume S3 bucket policies or encryption alone can protect against deletion, but only versioning preserves object history, and only replication provides regional disaster recovery.

How to eliminate wrong answers

Option B is wrong because moving objects to S3 Glacier Instant Retrieval does not provide versioning or replication, so deleted objects cannot be restored and there is no protection against regional disasters. Option C is wrong because S3 server-side encryption with KMS keys does not preserve deleted or overwritten objects; access logs only record events, not the data itself, making manual recovery impossible. Option D is wrong because a bucket policy denying DeleteObject can be bypassed by authorized users or misconfigurations, and without versioning or replication, deleted objects are permanently lost and there is no regional disaster recovery.

Full explanation →

1018

MCQhard

A Lambda-based retail API has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured?

A.A larger deployment package

B.Reserved concurrency only

C.Provisioned concurrency during campaign windows

D.CloudTrail data events

AnswerC

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, eliminating cold starts during campaign windows. This ensures consistent latency even under unpredictable traffic spikes, as the function is always warm and ready to handle requests immediately.

Exam trap

The trap here is confusing reserved concurrency (which limits scaling but does not prevent cold starts) with provisioned concurrency (which pre-warms environments to eliminate cold starts).

How to eliminate wrong answers

Option A is wrong because a larger deployment package increases cold start time, making latency worse. Option B is wrong because reserved concurrency only guarantees a maximum number of concurrent executions but does not pre-warm environments; cold starts still occur. Option D is wrong because CloudTrail data events record API activity for auditing, not for managing function initialization or latency.

Full explanation →

1019

Multi-Selecthard

A customer analytics portal uses CloudFront in front of an S3 origin. Which two settings help keep users from bypassing CloudFront and accessing the bucket directly? The design must avoid adding custom operational scripts.

Select 2 answers

A.Enable CloudFront standard logging

B.Configure Origin Access Control for the S3 origin

C.Use an S3 bucket policy that allows access only from the CloudFront distribution

D.Enable S3 static website hosting

AnswersB, C

Origin Access Control allows CloudFront to securely access a private S3 bucket.

Why this answer

Option B is correct because Origin Access Control (OAC) is a CloudFront feature that restricts access to an S3 origin so that only the CloudFront distribution can retrieve objects. OAC uses a signed request mechanism that prevents direct S3 access, ensuring users cannot bypass CloudFront and hit the bucket directly without custom scripts.

Exam trap

The trap here is that candidates often think enabling logging (Option A) or static website hosting (Option D) can somehow restrict access, but these settings have no effect on access control and actually introduce new endpoints that could be exploited to bypass CloudFront.

Full explanation →

1020

MCQeasy

Your team serves static JavaScript and CSS files from an S3 origin through CloudFront. After a release, the CloudFront cache hit ratio dropped because clients keep re-downloading the same assets. What is the best next change to improve caching performance?

A.Update origin responses to include long-lived Cache-Control headers (for example, max-age) so CloudFront can cache objects

B.Switch the S3 bucket to S3 Glacier so objects are not frequently accessed

C.Disable CloudFront compression to reduce CPU usage at the edge

D.Set CloudFront to forward all query strings to the origin to ensure the latest assets are returned

AnswerA

CloudFront will only reuse cached objects when the origin response is cacheable. Adding/adjusting Cache-Control (and related directives such as public and s-maxage where appropriate) to allow long-lived caching enables edge reuse and increases cache hit ratio.

Why this answer

Option A is correct because setting long-lived Cache-Control headers (e.g., max-age=31536000) on static assets tells CloudFront to cache them at edge locations for an extended period. This reduces the number of requests forwarded to the S3 origin, improving the cache hit ratio and preventing clients from re-downloading unchanged assets on every visit.

Exam trap

The trap here is that candidates may think forwarding query strings (Option D) ensures freshness, but it actually fragments the cache and reduces hit ratio, whereas the real solution is to use long-lived Cache-Control headers with versioned filenames to maximize caching.

How to eliminate wrong answers

Option B is wrong because moving the S3 bucket to Glacier would make objects inaccessible for real-time serving, breaking the static asset delivery entirely. Option C is wrong because disabling CloudFront compression does not affect caching behavior; it would only increase bandwidth and latency for clients, not improve cache hit ratio. Option D is wrong because forwarding all query strings to the origin forces CloudFront to treat each unique query string as a separate cache key, fragmenting the cache and reducing hit ratio, which is the opposite of what is needed.

Full explanation →

1021

MCQmedium

A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The design must avoid adding custom operational scripts.

A.Lambda reserved concurrency set to zero

B.A larger deployment package

C.CloudFront error pages

D.A Lambda dead-letter queue or failure destination

AnswerD

A DLQ or asynchronous failure destination captures failed events after retry attempts.

Why this answer

A Lambda dead-letter queue (DLQ) or failure destination is the correct solution because it captures events that have exhausted all retry attempts from an asynchronous Lambda invocation. This allows failed events to be retained in an Amazon SQS queue or SNS topic for later investigation, without requiring custom operational scripts. The DLQ or failure destination integrates directly with Lambda's built-in retry behavior, ensuring that only events that fail after the configured number of retries are sent to the destination.

Exam trap

The trap here is that candidates may confuse a DLQ with other error-handling mechanisms like CloudFront error pages or reserved concurrency, but only a DLQ or failure destination directly captures failed asynchronous Lambda events without custom code.

How to eliminate wrong answers

Option A is wrong because setting reserved concurrency to zero would prevent the Lambda function from executing at all, which stops all invocations and does not retain failed events. Option B is wrong because a larger deployment package does not affect error handling or event retention; it only increases the function's storage size and cold start time. Option C is wrong because CloudFront error pages are used for HTTP error responses from a web distribution, not for capturing failed Lambda invocations from asynchronous event sources.

Full explanation →

1022

MCQhard

A risk simulation workload in private subnets downloads large amounts of data from S3 through a NAT gateway. NAT data processing charges are high. What should the architect use to reduce cost? The design must avoid adding custom operational scripts.

A.A larger NAT gateway

B.Gateway VPC endpoint for Amazon S3

C.S3 Object Lambda

D.AWS Shield Advanced

AnswerB

A gateway endpoint routes S3 traffic privately without NAT gateway data processing charges.

Why this answer

A Gateway VPC Endpoint for Amazon S3 allows instances in private subnets to access S3 directly over the AWS network without traversing a NAT gateway, eliminating NAT data processing charges. This is the most cost-effective and operationally simple solution because it requires no custom scripts and no changes to routing beyond adding the endpoint.

Exam trap

The trap here is that candidates often confuse Gateway VPC Endpoints with Interface VPC Endpoints, assuming both incur hourly charges, or mistakenly think a larger NAT gateway is a cost-saving measure when it actually increases costs.

How to eliminate wrong answers

Option A is wrong because a larger NAT gateway would increase, not reduce, data processing costs (charged per GB processed) and does not address the root cause of traffic going through the NAT. Option C is wrong because S3 Object Lambda is used to transform data as it is retrieved from S3, not to reduce network egress costs or replace NAT gateway traffic. Option D is wrong because AWS Shield Advanced is a DDoS protection service that does not affect data transfer costs or routing between VPC and S3.

Full explanation →

1023

MCQeasy

A service role has an IAM policy granting kms:Decrypt for a specific AWS KMS key. The application still fails to decrypt with an AccessDenied error. What change most directly fixes this when the KMS key policy is missing the role’s permissions?

A.Update the KMS key policy to allow kms:Decrypt for the service role principal (or the assumed-role principal identity that the KMS key evaluates).

B.Add an IAM policy statement allowing s3:GetObject for the bucket that stores the encrypted data.

C.Enable a CloudFront distribution for the KMS key alias.

D.Create a VPC gateway endpoint for KMS to route decryption requests privately.

AnswerA

KMS authorization is controlled by the KMS key policy in addition to (not instead of) IAM identity policies. If the key policy does not allow the principal, KMS will deny kms:Decrypt even if the IAM policy allows it.

Why this answer

The AccessDenied error occurs because the KMS key policy does not grant the service role (or its assumed-role principal) permission to call kms:Decrypt. Even if the IAM policy attached to the role allows kms:Decrypt, KMS requires that the key policy explicitly authorize the principal (or the role’s assumed-role session) when the key policy is the sole authorization mechanism. Updating the key policy to include the service role principal (or the assumed-role ARN) with kms:Decrypt directly resolves the missing permission.

Exam trap

The trap here is that candidates assume IAM policies alone are sufficient for KMS authorization, but KMS key policies are resource-based and must explicitly include the principal (or the assumed-role session) when the key policy is the sole authorization mechanism.

How to eliminate wrong answers

Option B is wrong because s3:GetObject for the S3 bucket is unrelated to the KMS decryption failure; the error is specifically about KMS authorization, not S3 access. Option C is wrong because enabling a CloudFront distribution for the KMS key alias does not grant decryption permissions; CloudFront is a content delivery service and does not interact with KMS key policies. Option D is wrong because a VPC gateway endpoint for KMS only affects network routing for KMS API calls, not the IAM or key policy authorization; it does not grant the required kms:Decrypt permission.

Full explanation →

1024

Multi-Selecthard

A startup has three sandbox accounts and one production account. The CTO wants lower cost and operational overhead while keeping central purchasing and spend visibility. Which two actions are best? Select two.

Select 2 answers

A.Enable consolidated billing under AWS Organizations so discounts and shared purchasing apply across accounts.

B.Move each sandbox to its own payer account to isolate spend from the rest.

C.Use managed services such as Amazon RDS or Amazon S3 instead of self-managed EC2-based databases and file servers where practical.

D.Buy Dedicated Hosts for sandbox workloads to get a lower blended rate.

E.Disable AWS Budgets because consolidated billing already solves visibility.

AnswersA, C

Correct. Consolidated billing centralizes purchasing and can improve discount usage across linked accounts. It also gives the company one payer view, which simplifies governance and visibility.

Why this answer

Option A is correct because enabling consolidated billing under AWS Organizations aggregates usage across all accounts, allowing the startup to benefit from volume discounts and Reserved Instance sharing, which lowers overall costs. This also provides a single payer account for central purchasing and spend visibility through the consolidated billing console.

Exam trap

The trap here is that candidates may think Dedicated Hosts (Option D) reduce costs due to 'blended rates,' but they actually increase costs for sandbox workloads and are intended for licensing compliance, not general cost optimization.

Full explanation →

1025

MCQmedium

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The design must avoid adding custom operational scripts.

A.A single EC2 instance with detailed monitoring

B.Subnets in at least two Availability Zones with health checks enabled

C.All instances in one larger subnet

D.A Network Load Balancer in one subnet

AnswerB

An Auto Scaling group spanning multiple AZs can replace unhealthy instances and maintain capacity during an AZ failure.

Why this answer

Option B is correct because distributing EC2 instances across at least two Availability Zones (AZs) ensures that the Auto Scaling group can maintain capacity even if one AZ fails. Enabling health checks on the Application Load Balancer (ALB) allows the group to automatically replace unhealthy instances without custom scripts, meeting the fault-tolerance requirement.

Exam trap

The trap here is that candidates often confuse using a load balancer (like an NLB) with achieving AZ redundancy, but without multi-AZ subnets in the Auto Scaling group, the architecture remains single-AZ and vulnerable to failure.

How to eliminate wrong answers

Option A is wrong because a single EC2 instance, even with detailed monitoring, cannot tolerate an AZ failure—if that AZ goes down, the instance is lost. Option C is wrong because placing all instances in one larger subnet confines them to a single AZ, providing no redundancy against AZ failure. Option D is wrong because a Network Load Balancer (NLB) in one subnet does not inherently distribute across AZs; the Auto Scaling group must span multiple AZs, and the NLB alone does not replace the need for multi-AZ instance placement.

Full explanation →

1026

MCQmedium

A company uses AWS Organizations and has separate development, test, and production accounts. The security team wants to ensure that no one in the sandbox organizational unit can disable AWS CloudTrail or delete the central audit bucket, even if an account administrator creates permissive IAM policies later. Which control should they use?

A.Attach an identity-based policy in each account that denies CloudTrail changes.

B.Use a service control policy on the sandbox organizational unit to deny the prohibited actions.

C.Create an S3 bucket policy that allows only the audit team role to delete objects.

D.Apply a permission boundary to each IAM user in the sandbox accounts.

AnswerB

Service control policies are the correct governance mechanism for setting guardrails across multiple accounts in AWS Organizations. An SCP can explicitly deny sensitive actions such as disabling CloudTrail or deleting the audit bucket, and those denies apply even if administrators create local IAM policies that would otherwise allow the actions. SCPs do not grant permissions by themselves; they only constrain what account principals can ever do within the OU.

Why this answer

Service control policies (SCPs) are the correct mechanism because they act as a centralized guardrail at the AWS Organizations level, setting maximum permissions for all accounts in an organizational unit (OU). Even if an account administrator creates permissive IAM policies later, an SCP that explicitly denies disabling CloudTrail or deleting the central audit bucket will override those permissions, ensuring the security team's requirements are enforced across the sandbox OU.

Exam trap

The trap here is that candidates often confuse service control policies with IAM permission boundaries or resource-based policies, thinking that a bucket policy or permission boundary can prevent service-level actions like disabling CloudTrail, when only an SCP can enforce such restrictions across all principals in an entire OU.

How to eliminate wrong answers

Option A is wrong because identity-based policies are attached to IAM users, groups, or roles within an account and can be overridden by a more permissive policy created by an account administrator, so they do not provide the centralized, unchangeable control needed. Option C is wrong because an S3 bucket policy can prevent deletion of objects in the bucket but cannot prevent an account administrator from disabling CloudTrail itself, which is the primary concern. Option D is wrong because a permission boundary limits the maximum permissions an IAM user can have but does not prevent an account administrator from creating new IAM users or roles without the boundary, nor does it block disabling CloudTrail at the service level.

Full explanation →

1027

MCQmedium

Your mobile app writes events to a single DynamoDB table with partition key = customerId and sort key = eventTime. During a promotional campaign, one tenant ("ACME") generates far more traffic than others. CloudWatch shows sustained throttling (ProvisionedThroughputExceeded) and elevated p99 latency only for that tenant. The workload pattern cannot be changed to a completely different schema, but you can change how items are partitioned. Which design change is most likely to reduce the hot-partition throttling while keeping efficient reads for ACME?

A.Use the same partition key (customerId), but increase the table’s provisioned capacity for that tenant.

B.Change the partition key to a salted key such as customerId + shard number, and include the eventTime ordering using the sort key.

C.Switch to on-demand capacity mode and keep the partition key unchanged.

D.Enable Global Tables so that reads are served from a nearby replica for ACME.

AnswerB

Hot-partition throttling happens when a single logical partition (one partition key value) receives more requests than it can serve. By salting the partition key (for example, customerId#shardId), ACME’s writes are spread across multiple physical partitions, reducing request rate per partition and lowering throttling. Efficient reads for ACME can be preserved by querying only the shard partitions that belong to ACME (for example, using a small, deterministic set of shardIds and issuing parallel queries per shard, then merging results). This avoids scanning the whole table and keeps access patterns predictable while improving tail latency.

Why this answer

Option B is correct because salting the partition key by appending a shard number (e.g., customerId + random digit) distributes ACME's writes across multiple partitions, eliminating the hot partition. The sort key still preserves eventTime ordering, so queries for a specific customer can be parallelized across shards and merged client-side or via a composite sort key pattern, maintaining efficient reads.

Exam trap

The trap here is that candidates assume increasing capacity or switching to on-demand alone solves hot partitions, but they overlook DynamoDB's fixed per-partition throughput limits that require key design changes to distribute load.

How to eliminate wrong answers

Option A is wrong because increasing provisioned capacity for a single tenant does not solve the hot-partition issue; DynamoDB distributes capacity across partitions, and a single partition's throughput is capped at 3,000 RCU or 1,000 WCU regardless of table-level settings. Option C is wrong because switching to on-demand capacity mode only handles traffic spikes at the table level, but a single hot partition still hits the same per-partition throughput limits (3,000 RCU/1,000 WCU), causing throttling. Option D is wrong because Global Tables replicate data across regions for low-latency reads and disaster recovery, but they do not redistribute write load within a single table; ACME's writes still target the same partition key in the source region, so throttling persists.

Full explanation →

1028

MCQmedium

An Aurora PostgreSQL cluster is experiencing high read latency because 85% of traffic consists of read-only queries. The write workload must stay on the writer instance, and the team wants to offload reads without changing the application’s core query patterns. What is the best architectural option?

A.Increase the writer instance size so it can handle more reads and writes simultaneously.

B.Add Aurora reader instances (read replicas) and route read queries to the reader endpoint while keeping writes on the writer endpoint.

C.Enable Multi-AZ failover only and rely on the standby to serve reads in normal operation.

D.Move the read workload to ElastiCache Redis while keeping DynamoDB as the SQL data source.

AnswerB

Aurora reader instances are designed for exactly this pattern: they provide dedicated compute capacity for read-only workloads. By sending read queries to the reader endpoint and keeping writes on the writer endpoint, the cluster can scale read performance without forcing reads to contend with write processing on the writer.

Why this answer

Adding Aurora reader instances (read replicas) and routing read queries to the reader endpoint offloads read traffic from the writer instance without altering application query patterns. Aurora reader endpoints automatically distribute read-only connections across all replicas, reducing latency on the writer while keeping writes on the writer instance. This directly addresses the 85% read-heavy workload without requiring application changes.

Exam trap

The trap here is that candidates often confuse Multi-AZ standby instances (which are passive and cannot serve reads) with Aurora reader replicas (which are active and can serve reads), leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because increasing the writer instance size does not offload reads; it only adds more resources to a single node, which still handles all read and write traffic, and does not scale read capacity independently. Option C is wrong because Multi-AZ failover provides a standby for high availability, but the standby does not serve reads in normal operation (it is passive until failover), so it does not offload read traffic. Option D is wrong because ElastiCache Redis is a caching layer, not a SQL data source, and DynamoDB is a NoSQL database, not a SQL data source; this would require significant application changes and does not preserve the existing Aurora PostgreSQL query patterns.

Full explanation →

1029

MCQhard

Based on the exhibit, the current disaster recovery design misses the RTO target even though the database replica is current. Which deployment model best meets the requirements with the least always-on cost?

A.Pilot light, because only the database needs to be running in the secondary Region.

B.Warm standby, because a scaled-down application stack stays running in the secondary Region and can take over faster.

C.Active-active, because both Regions should always serve traffic to guarantee the RTO.

D.Backup and restore, because restoring from backups is the least expensive DR model available.

AnswerB

Warm standby is the best fit when you need faster recovery than pilot light but do not want the cost of full active-active capacity. The exhibit shows that starting the application stack from zero consumes most of the recovery time. Keeping a reduced but functional stack running in the secondary Region removes that startup delay and should bring the total recovery time within the 15-minute RTO while still keeping always-on cost below full production duplication.

Why this answer

Warm standby is the correct choice because it keeps a scaled-down application stack running in the secondary Region, which can be scaled up quickly to handle production traffic. This design meets the RTO target by reducing failover time compared to a pilot light, while avoiding the higher always-on cost of an active-active deployment.

Exam trap

The trap here is that candidates confuse pilot light with warm standby, assuming that only the database needs to be running to meet the RTO, but they overlook the time required to provision the application stack on failover.

How to eliminate wrong answers

Option A is wrong because a pilot light only keeps the database running and requires provisioning the full application stack on failover, which cannot meet the RTO target. Option C is wrong because active-active runs full application stacks in both Regions at all times, incurring higher always-on cost than necessary. Option D is wrong because backup and restore involves restoring from backups (e.g., RDS snapshots or S3), which takes too long to meet the RTO target and is not the least expensive when considering operational overhead.

Full explanation →

1030

MCQeasy

A company runs the same public API in two regions (Region A and Region B), each fronted by an ALB. They want Route 53 to automatically route clients to the Region B API when Region A becomes unhealthy, with minimal configuration effort. Which Route 53 approach should they use?

A.Use a single Route 53 A record that points only to Region A’s ALB and manually update it after failures.

B.Use Route 53 latency-based routing with separate records for each region.

C.Use Route 53 failover routing with health checks for each region’s endpoint.

D.Use weighted routing and set the Region B weight to 0 to ensure it is only used when needed.

AnswerC

Failover routing works with health checks to move traffic from a primary endpoint to a secondary endpoint when the primary becomes unhealthy.

Why this answer

Route 53 failover routing with health checks is the correct choice because it automatically directs traffic to the secondary (Region B) endpoint when the primary (Region A) endpoint fails a health check. This provides automated DNS failover with minimal configuration effort, as Route 53 monitors the health of each ALB endpoint and updates DNS responses accordingly.

Exam trap

The trap here is that candidates often confuse latency-based routing with failover routing, assuming latency routing will automatically redirect traffic away from an unhealthy region, but latency routing has no health awareness and will continue sending traffic to a down endpoint if it has the lowest latency.

How to eliminate wrong answers

Option A is wrong because manually updating a single A record after a failure is not automated and contradicts the requirement for minimal configuration effort; it also introduces significant downtime during the manual update window. Option B is wrong because latency-based routing routes traffic based on lowest latency, not health; it does not automatically fail over to Region B when Region A becomes unhealthy—it would still send traffic to Region A if it has lower latency, even if it is down. Option D is wrong because setting Region B's weight to 0 would prevent any traffic from reaching it; weighted routing does not support automatic failover based on health checks, so Region B would remain unused even when Region A fails.

Full explanation →

1031

MCQmedium

A team wants to remove a bastion host used for administrative access to EC2 instances in private subnets. The instances should be reachable only for occasional troubleshooting by engineers who authenticate with AWS SSO. What is the best secure alternative within AWS, assuming the instances already have an instance profile attached?

A.Use AWS Systems Manager Session Manager, enabling the required SSM permissions in the instance profile and restricting access to engineers via IAM.

B.Keep the bastion host but move it into a private subnet; engineers can connect by using a corporate VPN into the VPC.

C.Attach a public IP to each private instance so engineers can SSH directly and use security groups to restrict access.

D.Create a security group rule that allows engineers’ source IP addresses to reach instances over RDP on port 3389.

AnswerA

Session Manager avoids inbound SSH from the internet by initiating interactive sessions through Systems Manager. The instance profile must allow SSM actions like StartSession, and engineers’ IAM permissions restrict who can connect. This is a commonly recommended bastion-free alternative that improves security and reduces exposed network paths.

Why this answer

AWS Systems Manager Session Manager provides secure, auditable, agent-based access to EC2 instances without requiring a bastion host, public IPs, or open inbound ports. Since the instances already have an instance profile, you only need to add the required SSM permissions (e.g., AmazonSSMManagedInstanceCore) to that profile and use IAM policies to restrict Session Manager access to engineers authenticated via AWS SSO. This eliminates the bastion host while maintaining secure, on-demand troubleshooting access.

Exam trap

The trap here is that candidates often think a bastion host is the only way to access private instances, overlooking that AWS Systems Manager Session Manager provides a fully managed, agent-based alternative that eliminates the need for any bastion host or open inbound ports.

How to eliminate wrong answers

Option B is wrong because moving the bastion host to a private subnet and using a corporate VPN still leaves a persistent bastion host that must be patched and managed, and it does not eliminate the attack surface or the need for SSH/RDP key management. Option C is wrong because attaching a public IP to each private instance directly exposes them to the internet, violating the principle of least privilege and increasing the attack surface, even with security group restrictions. Option D is wrong because allowing engineers’ source IPs over RDP port 3389 requires opening inbound ports and managing IP whitelists, which is less secure than agentless access and does not integrate with AWS SSO for authentication.

Full explanation →

1032

MCQeasy

A retail platform needs disaster recovery across AWS Regions. The business requirement is: RTO up to 6 hours, RPO up to 1 hour, and they want the ability to start serving quickly during a Region outage but do not want to run full production capacity continuously. Which DR strategy best fits these requirements?

A.Backup and restore only, with no continuously running infrastructure in the secondary Region.

B.Pilot light, keeping only the minimum resources needed to bootstrap the environment.

C.Warm standby, keeping a reduced but ready-to-scale environment in the secondary Region.

D.Multi-site active-active, serving production traffic from both Regions at all times.

AnswerC

Warm standby maintains enough infrastructure to reduce recovery time, while not fully running production capacity continuously.

Why this answer

Warm standby is the correct strategy because it maintains a scaled-down but fully functional copy of the production environment in the secondary Region, which can be scaled up within the 6-hour RTO. The RPO of 1 hour is met by continuous replication (e.g., Amazon RDS cross-Region read replicas or DynamoDB global tables), and the reduced footprint avoids the cost of full production capacity while still enabling rapid failover.

Exam trap

The trap here is that candidates confuse pilot light with warm standby, assuming that any minimal running infrastructure qualifies as pilot light, but warm standby specifically requires a scaled-down but fully functional environment that can serve traffic immediately after scaling, whereas pilot light requires significant provisioning before it can serve traffic.

How to eliminate wrong answers

Option A is wrong because backup and restore only would require restoring from snapshots or backups, which typically takes longer than 6 hours and cannot achieve a 1-hour RPO due to the time needed to provision infrastructure and load data. Option B is wrong because pilot light keeps only the minimal core resources (e.g., a small database and a single EC2 instance) that must be fully provisioned and scaled after failover, which cannot meet the 6-hour RTO if scaling takes too long, and the RPO may be missed if replication is not continuous. Option D is wrong because multi-site active-active runs full production capacity in both Regions at all times, which violates the requirement to not run full production capacity continuously and incurs unnecessary cost.

Full explanation →

1033

MCQmedium

An event ingestion service writes to a DynamoDB table where the partition key is tenantId and the sort key is eventTime. During a campaign, one tenant generates a disproportionate share of traffic, causing write throttling and increased latency for that tenant’s writes. You can change the data model and application queries, but you must still efficiently retrieve events for a tenant for the last 10 minutes. Which change best improves write throughput by reducing hot partitions?

A.Keep tenantId as the partition key and rely on DynamoDB adaptive capacity to automatically remove all throttling.

B.Add a shard attribute to the partition key (partition key = tenantId#shard, where shard is randomly selected from a fixed range). Query all shards for the tenant for eventTime values in the last 10 minutes, then merge results in the application.

C.Change the sort key to eventTimeBucket (for example, eventTime rounded to 1-minute buckets) while keeping the partition key as tenantId.

D.Enable DAX and use it for write operations so throttled writes are served from cache instead of reaching DynamoDB.

AnswerB

This “write sharding” spreads a tenant’s traffic across multiple partition key values, which distributes the write load across multiple DynamoDB partitions (and thus multiple throughput slices). Reads for the last 10 minutes remain efficient because each shard still supports a sort-key range query on eventTime; the application merges results across shards.

Why this answer

Option B is correct because it distributes writes for a hot tenant across multiple partitions by appending a random shard suffix to the tenantId partition key. This eliminates a single hot partition, allowing DynamoDB to scale write capacity horizontally. The application can then query all shards for the last 10 minutes and merge results, satisfying the retrieval requirement.

Exam trap

The trap here is that candidates assume adaptive capacity or caching (DAX) can solve write throttling, but neither addresses the root cause—a single partition exceeding its write capacity—which requires redistributing the partition key across multiple physical partitions.

How to eliminate wrong answers

Option A is wrong because DynamoDB adaptive capacity can only mitigate moderate hot spots by temporarily allocating extra capacity, but it cannot eliminate throttling when a single partition exceeds its 1,000 WCU or 3,000 RCU limit; sustained high traffic from one tenant will still cause throttling. Option C is wrong because changing the sort key to eventTimeBucket does nothing to distribute writes across partitions—the partition key remains tenantId, so all writes for that tenant still target the same partition, leaving the hot partition problem unsolved. Option D is wrong because DAX is a read-through cache and does not handle write operations; throttled writes are not served from cache, and DAX cannot increase write throughput or reduce hot partition contention.

Full explanation →

1034

MCQeasy

A containerized service needs to read exactly one secret value from AWS Secrets Manager. The secret’s ARN is already known, and the secret is encrypted with the AWS-managed KMS key for Secrets Manager, so no separate KMS permissions are needed for this question. The service does not need to list secrets, create secrets, rotate them, or write updates. What is the most least-privilege IAM permission statement to grant the service role?

A.Allow secretsmanager:GetSecretValue on the specific secret ARN only.

B.Allow secretsmanager:* on all resources in the account.

C.Allow secretsmanager:ListSecrets so the service can discover the secret ARN at runtime.

D.Allow secretsmanager:PutSecretValue so the service can retrieve and update the secret value.

AnswerA

For a read-only use case where the secret ARN is already known, the minimum required Secrets Manager action is secretsmanager:GetSecretValue. Scoping the resource to only that secret ARN minimizes blast radius if the role is compromised.

Why this answer

Option A is correct because the service only needs to read a single secret value, and the least-privilege permission is to allow only the `secretsmanager:GetSecretValue` action on that specific secret's ARN. This grants exactly the required read access without any additional capabilities, adhering to the principle of least privilege. Since the secret is encrypted with the AWS-managed KMS key for Secrets Manager, no separate KMS permissions are needed, as the key policy automatically grants access to the Secrets Manager service.

Exam trap

The trap here is that candidates often choose a broader permission like `secretsmanager:*` or `secretsmanager:ListSecrets` because they confuse the need to discover the secret with the need to read it, or they overlook that the ARN is already known, making list actions unnecessary.

How to eliminate wrong answers

Option B is wrong because `secretsmanager:*` on all resources grants full administrative access to all secrets in the account, which is far more permissive than needed and violates least privilege. Option C is wrong because `secretsmanager:ListSecrets` allows listing all secret names and ARNs in the account, which is unnecessary since the secret ARN is already known, and it provides no ability to read the secret value itself. Option D is wrong because `secretsmanager:PutSecretValue` allows updating the secret value, which is not required and introduces unnecessary write permissions that could lead to accidental or malicious modification.

Full explanation →

1035

MCQmedium

Developers for a financial reporting platform need temporary elevated access to production resources for troubleshooting. The security team wants approvals, expiry, and audit logging. Which approach is best? The design must avoid adding custom operational scripts.

A.Use IAM Identity Center permission sets with time-bound access processes and CloudTrail auditing

B.Disable CloudTrail during troubleshooting

C.Create shared administrator access keys for the team

D.Attach AdministratorAccess permanently to every developer role

AnswerA

Federated access with permission sets and audited temporary assignments reduces standing privilege.

Why this answer

IAM Identity Center permission sets allow granting time-bound, least-privilege access to production resources. Combined with CloudTrail auditing, this provides full logging of all actions taken during the elevated access period. The solution meets the security team's requirements for approvals (via the permission set request workflow), expiry (via session duration or time-bound assignments), and audit logging (via CloudTrail), without requiring custom operational scripts.

Exam trap

The trap here is that candidates may think shared keys (Option C) are acceptable for 'temporary' access, but AWS explicitly discourages shared credentials because they break audit trails and accountability, which is a core security requirement in the SAA-C03 exam.

How to eliminate wrong answers

Option B is wrong because disabling CloudTrail during troubleshooting removes all audit logging, directly violating the security team's requirement for audit logging and making it impossible to track actions taken. Option C is wrong because creating shared administrator access keys violates the principle of least privilege, eliminates individual accountability, and prevents proper audit logging of who performed which action. Option D is wrong because permanently attaching AdministratorAccess to every developer role grants excessive, always-on privileges with no time-bound expiry, violating the requirement for temporary elevated access and approvals.

Full explanation →

1036

MCQeasy

A security team needs an audit trail to investigate suspicious API activity across multiple AWS accounts. Which AWS approach best provides centralized visibility into who did what, when, for service API calls?

A.Create an AWS CloudTrail organization trail that delivers logs to a centralized, access-controlled S3 bucket.

B.Enable AWS Config only for EC2 security groups and rely on it for API call auditing.

C.Turn on S3 server access logging for every bucket and assume it covers all AWS services.

D.Use only Amazon CloudWatch alarms with no logging destination to reduce storage costs.

AnswerA

An AWS Organizations organization trail centralizes management and API activity logs across accounts. CloudTrail provides detailed event records including the requesting principal, source information, event time, and the specific API action, which supports forensic investigation.

Why this answer

AWS CloudTrail organization trail is the correct approach because it captures all management and data events across multiple AWS accounts within an AWS Organizations structure, delivering them to a single, centralized S3 bucket. This provides a unified, immutable audit trail of who performed which API call, when, and from which source IP, enabling the security team to investigate suspicious activity with full visibility. The centralized bucket can be access-controlled with S3 bucket policies and IAM to ensure only authorized personnel can view the logs.

Exam trap

The trap here is that candidates often confuse AWS Config (which tracks resource configuration changes) with CloudTrail (which records API calls), leading them to pick Option B, but Config does not provide the who, what, when details needed for an API audit trail.

How to eliminate wrong answers

Option B is wrong because AWS Config is a resource inventory and configuration change tracking service, not an API call auditor; it does not capture who made the API call or the full request/response details. Option C is wrong because S3 server access logging only records requests made to S3 buckets, not API calls for other AWS services like EC2, IAM, or Lambda, leaving a massive gap in the audit trail. Option D is wrong because CloudWatch alarms only trigger on metric thresholds and do not store or provide any log data for forensic investigation; without a logging destination, there is no audit trail at all.

Full explanation →

1037

MCQmedium

Based on the exhibit, what should the security team implement so developers can create AWS Lambda execution roles, but no developer-created role can ever exceed the approved permission set?

A.Place the developers in an IAM group with a deny-only managed policy attached.

B.Require a permissions boundary on every developer-created role and set the boundary to the approved maximum permissions.

C.Use an AWS Organizations SCP to grant only the approved Lambda permissions directly to the developer roles.

D.Create the roles with inline policies only, because inline policies are always safer than managed policies.

AnswerB

A permissions boundary limits the highest permissions a role can ever have, even if someone attaches broader policies later. This is the right guardrail when developers are allowed to create roles but must stay within a security-approved ceiling. It still lets them work independently while preventing privilege escalation through policy attachment.

Why this answer

Option B is correct because a permissions boundary explicitly defines the maximum permissions that an IAM role can have, and when attached to developer-created roles, it prevents any role from exceeding the approved set of permissions, even if the developer attaches a more permissive policy. This directly addresses the requirement that no developer-created role can ever exceed the approved permission set, as the boundary acts as a hard cap.

Exam trap

The trap here is that candidates often confuse permissions boundaries with SCPs or assume that inline policies are more restrictive, but the key is that a permissions boundary is the only mechanism that directly caps the maximum permissions of a specific role without affecting other principals.

How to eliminate wrong answers

Option A is wrong because a deny-only managed policy attached to an IAM group would deny all actions by default, preventing developers from creating any roles at all, rather than allowing creation within a permission limit. Option C is wrong because an AWS Organizations SCP applies to all accounts in the organization and cannot be scoped to specific developer roles; it would affect all principals in the account, not just developer-created roles, and it cannot grant permissions—only allow or deny. Option D is wrong because inline policies are not inherently safer than managed policies; the security concern is about permission scope, not policy type, and inline policies can still grant excessive permissions without a boundary.

Full explanation →

1038

MCQmedium

Your team hosts a private web app on an S3 bucket and serves it through CloudFront using a modern Origin Access Control (OAC). After deployment, users receive HTTP 403 from CloudFront with the S3 origin error "AccessDenied". Which S3 bucket policy change best aligns with CloudFront OAC so the distribution can fetch objects privately?

A.Allow the CloudFront service principal cloudfront.amazonaws.com to perform s3:GetObject, and scope access with a condition on AWS:SourceArn matching your CloudFront distribution ARN.

B.Allow only the S3 bucket owner account to perform s3:GetObject without any condition, so CloudFront can inherit access automatically.

C.Add a policy statement that denies s3:GetObject when the request does not include the header CloudFront-Viewer-Country.

D.Grant s3:GetObject permission to an Origin Access Identity (OAI) canonical user ID even though you are using Origin Access Control (OAC).

AnswerA

With CloudFront OAC, the request to S3 is authorized using the CloudFront service principal. Granting s3:GetObject to cloudfront.amazonaws.com and constraining it with AWS:SourceArn to the specific distribution is the standard secure pattern for private S3 origins.

Why this answer

Option A is correct because CloudFront Origin Access Control (OAC) requires an explicit S3 bucket policy that allows the CloudFront service principal (`cloudfront.amazonaws.com`) to perform `s3:GetObject`, and the recommended best practice is to scope the permission using a condition on `AWS:SourceArn` matching the specific CloudFront distribution ARN. This ensures that only requests originating from that distribution can access the bucket objects, preventing unauthorized access from other sources.

Exam trap

The trap here is that candidates often confuse Origin Access Control (OAC) with the older Origin Access Identity (OAI) and incorrectly select an OAI-based policy (Option D), or they assume that bucket owner permissions automatically extend to CloudFront (Option B), failing to recognize that OAC requires an explicit service principal-based policy with a source ARN condition.

How to eliminate wrong answers

Option B is wrong because simply allowing the S3 bucket owner account to perform `s3:GetObject` does not grant CloudFront any permissions; CloudFront operates under its own service principal, not the bucket owner's account, so the distribution would still receive 403 errors. Option C is wrong because requiring the `CloudFront-Viewer-Country` header does not address the underlying access control issue; CloudFront OAC requires a policy that explicitly allows the service principal to read objects, and this header condition is unrelated to authentication or authorization. Option D is wrong because Origin Access Control (OAC) is a newer mechanism that replaces Origin Access Identity (OAI); using an OAI canonical user ID is incompatible with OAC, and the policy must reference the CloudFront service principal, not the OAI canonical user.

Full explanation →

1039

MCQeasy

A mobile app reads the same product details many times per minute from Amazon DynamoDB. The table design is already correct, but repeated reads are still causing noticeable latency. Which service should the team add to improve read performance?

A.Amazon DAX

B.Amazon EFS

C.AWS Lambda

D.Amazon SNS

AnswerA

DAX adds an in-memory cache in front of DynamoDB to reduce read latency for repeated accesses.

Why this answer

Amazon DAX (DynamoDB Accelerator) is an in-memory cache specifically designed for DynamoDB. It reduces read latency from single-digit milliseconds to microseconds by caching frequently accessed items, which directly addresses the repeated read pattern described in the question.

Exam trap

The trap here is that candidates may confuse DAX with ElastiCache (which is generic and not DynamoDB-native) or assume that Lambda or SNS can somehow accelerate reads, but DAX is the only service purpose-built for DynamoDB read acceleration.

How to eliminate wrong answers

Option B (Amazon EFS) is wrong because it is a file-level storage service for EC2 instances, not a cache for DynamoDB reads, and it would introduce network latency rather than reducing DynamoDB latency. Option C (AWS Lambda) is wrong because it is a serverless compute service that executes code in response to events; it does not cache data or improve DynamoDB read performance on its own. Option D (Amazon SNS) is wrong because it is a pub/sub messaging service for sending notifications, not a caching layer for database reads.

Full explanation →

1040

MCQmedium

A Lambda function for a IoT ingestion API needs to read a database password. The password must rotate automatically every 30 days and should not be stored in environment variables. Which service should be used?

A.AWS Secrets Manager with rotation enabled

B.A KMS-encrypted Lambda environment variable

C.AWS Systems Manager Parameter Store SecureString without automation

D.An encrypted object in Amazon S3

AnswerA

Secrets Manager stores secrets securely and supports automatic rotation using a rotation Lambda function.

Why this answer

AWS Secrets Manager is the correct choice because it is purpose-built for securely storing, automatically rotating, and managing secrets such as database passwords. With rotation enabled, Secrets Manager can automatically rotate the password every 30 days without requiring custom code, and it integrates natively with Lambda via the AWS SDK to retrieve the secret at runtime, avoiding storage in environment variables.

Exam trap

The trap here is that candidates often confuse AWS Systems Manager Parameter Store SecureString with Secrets Manager, but Parameter Store lacks native automatic rotation, making it unsuitable for the 30-day rotation requirement without additional custom automation.

How to eliminate wrong answers

Option B is wrong because storing a KMS-encrypted password in a Lambda environment variable still exposes the encrypted value in the function's configuration and does not support automatic rotation; the password would need manual rotation and re-deployment. Option C is wrong because AWS Systems Manager Parameter Store SecureString without automation does not provide automatic rotation; it only stores the secret securely, requiring custom logic to rotate the password every 30 days. Option D is wrong because an encrypted object in Amazon S3 lacks native rotation capabilities and introduces unnecessary complexity for secret retrieval, as Lambda would need to decrypt the object and manage rotation manually.

Full explanation →

SAA-C03 (SAA-C03) — Questions 976–1040