SAA-C03 Exam Questions and Answers

A Lambda function needs to read the current value of exactly one AWS Secrets Manager secret at startup. Which least-privilege IAM permission (action and resource scope) should you grant to the Lambda execution role?

secretsmanager:ListSecrets on all secrets (resource set to "*")

secretsmanager:GetSecretValue on only the secret’s full ARN

GetSecretValue is the specific action required to retrieve the secret value. Scoping the permission to the secret’s full ARN ensures the Lambda role can read only that secret and cannot access other secrets.

secretsmanager:UpdateSecret on the specific secret ARN

secretsmanager:DescribeSecret on all secrets (resource set to "*")

Why: Grant the Lambda execution role the least-privilege permission secretsmanager:GetSecretValue scoped to the full ARN of the single secret. This allows the function to retrieve the secret value it needs at startup, while preventing access to any other secrets. Other permissions (such as ListSecrets or UpdateSecret) are either unnecessary for reading or expand the blast radius beyond the stated requirement. Why others are wrong: ListSecrets enables enumeration and is unnecessary when the secret ARN/name is known. UpdateSecret is write access and is outside the requirement to read only. DescribeSecret provides metadata but not the secret value, and using a wildcard resource scope is unnecessarily broad.

A security team requires that every object uploaded to s3://secure-bucket/uploads/ must be encrypted using SSE-KMS with a specific customer-managed KMS key. Which S3 bucket policy condition approach best enforces this requirement for PutObject requests?

Deny PutObject unless s3:x-amz-server-side-encryption equals "aws:kms" and s3:x-amz-server-side-encryption-aws-kms-key-id equals the required CMK ARN

This enforces the encryption choice at upload time by validating the request headers that specify SSE-KMS and the exact KMS key ID/ARN. Using a Deny condition ensures uploads that do not include the correct SSE-KMS headers (for example, unencrypted uploads or uploads using a different KMS key) are rejected immediately.

Allow PutObject only when aws:SecureTransport is true; encryption is then guaranteed automatically

Deny PutObject if the request includes Content-Type other than "application/octet-stream"

Deny PutObject when the caller’s role is not allowed to kms:Decrypt in their IAM policy

Why: To enforce that uploads use SSE-KMS with a specific customer-managed KMS key, use a bucket policy that denies PutObject unless the client’s request headers indicate (1) server-side encryption type is aws:kms and (2) the specified KMS key ID/ARN matches the required CMK. This checks SSE configuration during the upload request, preventing both unencrypted uploads and uploads encrypted with a different KMS key. Why others are wrong: aws:SecureTransport ensures TLS in transit, not encryption at rest. Content-Type does not affect SSE settings. Checking kms:Decrypt permission is about authorization for decryption operations later and does not enforce the SSE-KMS headers used during PutObject.

An application in Account B (IAM role arn:aws:iam::account-b:role/app-read) reads objects from an S3 bucket in Account A. The bucket uses SSE-KMS with a customer-managed KMS key in Account A. Object reads consistently fail with an error that includes "AccessDenied" and "kms:Decrypt".

The IAM permissions in Account B for kms:Decrypt are correct, but the requests still fail.

Which change will most directly fix the failure?

Add kms:Decrypt to the KMS key policy in Account A for the Account B role arn:aws:iam::account-b:role/app-read, and remove kms:Decrypt from the role policy in Account B.

Update the IAM role in Account B to use the s3:GetObject permission only, and rely on S3 to authorize KMS decrypt automatically.

Modify the KMS key policy in Account A to allow kms:Decrypt for the Account B role arn:aws:iam::account-b:role/app-read, using the appropriate cross-account conditions (for example, allowing the use via S3 and the expected encryption context for the bucket).

For SSE-KMS, S3 must call KMS Decrypt when serving objects. KMS authorization is evaluated against the KMS key policy in Account A in addition to the identity policy in Account B. If the error includes kms:Decrypt AccessDenied in a cross-account scenario, the most direct fix is to update the KMS key policy to allow the Account B role to use the key for decrypt (often with conditions tied to S3 usage and the specific bucket/object encryption context).

Switch the S3 bucket encryption from SSE-KMS to SSE-S3, keeping all existing IAM and KMS configuration unchanged.

Why: With SSE-KMS, an S3 GetObject request causes S3 to call KMS Decrypt to read the object. Even when the IAM role in Account B has an identity policy that allows kms:Decrypt, KMS still evaluates the KMS key policy in Account A. In cross-account scenarios, AccessDenied for kms:Decrypt typically means the KMS key policy does not allow the external principal (the Account B role) to use the key for decrypt, possibly under the expected S3 encryption context/usage conditions. Updating the KMS key policy to explicitly permit kms:Decrypt for arn:aws:iam::account-b:role/app-read resolves the authorization path that is currently blocked. Why others are wrong: A is not the best fix because it removes a required permission from the caller’s identity policy. Even if the key policy is updated, removing kms:Decrypt from Account B can still block KMS. B is incorrect because S3 GetObject does not grant KMS Decrypt; KMS checks both identity and key policy. D bypasses the failure by changing encryption, but it does not directly address the secure, least-change root cause (KMS key policy authorization) and may not meet compliance requirements.

A server assumes an IAM role and must read export objects only from this prefix in an S3 bucket: s3://customer-data/exports/acme/ . The application also needs to list the objects under that exact prefix so it can discover which export folders exist. The application performs ListBucket requests with Prefix set to exactly "exports/acme/".

The current role policy allows s3:ListBucket on the bucket ARN without a prefix condition, and security reports the role can list other tenants’ export object keys.

Which IAM policy change best enforces least privilege for both ListBucket and GetObject?

Keep s3:ListBucket allowed on arn:aws:s3:::customer-data, but restrict s3:GetObject to arn:aws:s3:::customer-data/exports/acme/*.

Allow s3:ListBucket on arn:aws:s3:::customer-data only when s3:prefix equals "exports/acme/" (for example, using a StringEquals condition on s3:prefix). Also allow s3:GetObject only on arn:aws:s3:::customer-data/exports/acme/*.

ListBucket must be authorized at the bucket ARN level, then scoped using a Condition on the request prefix (so only the approved listing prefix is allowed). GetObject is authorized at the object ARN level and is restricted to exports/acme/*, preventing reads outside the prefix.

Allow s3:ListBucket only on arn:aws:s3:::customer-data/exports/acme/* and allow s3:GetObject on arn:aws:s3:::customer-data/*.

Add a Deny statement for s3:GetObject outside arn:aws:s3:::customer-data/exports/acme/*, but keep s3:ListBucket unrestricted on arn:aws:s3:::customer-data.

Why: For S3 tenant isolation with least privilege, you need two distinct controls: 1) Listing: Allow s3:ListBucket on the bucket ARN, but use a Condition that restricts the request to only the required listing prefix (the app’s Prefix value). 2) Reading: Allow s3:GetObject only for the specific object ARN pattern under exports/acme/. This ensures both enumeration (ListBucket) and data access (GetObject) cannot escape the approved prefix. Why others are wrong: A restricts reads but not enumeration; the role can still list other tenants’ object keys. C mis-scopes ListBucket (bucket ARN required) and makes GetObject too broad. D may prevent reads outside the prefix but still leaks other tenants’ keys via ListBucket.

hardFull explanation →

A platform team lets project administrators create IAM roles for workloads in their own AWS accounts, but every role must stay inside a fixed security baseline. The organization also wants to block all member accounts from using AWS Regions outside us-east-1 and us-west-2. Which three controls should be used? Select three.

Attach a permissions boundary to each role created through the delegation process.

A permissions boundary caps the maximum permissions a created role can ever receive, even if an administrator later attaches broader policies. This is the right mechanism for a fixed security baseline on delegated role creation.

Require iam:PermissionsBoundary in the role creation policy so every new role must include the approved boundary.

The creation policy should enforce that the boundary is present at creation time. This prevents a delegated admin from simply omitting the boundary and creating a role that exceeds the approved limit.

Use an SCP to deny actions in all AWS Regions except us-east-1 and us-west-2.

An SCP is the correct organizational guardrail for region restrictions across member accounts. It applies broadly and consistently, which is ideal for blocking unapproved Regions regardless of the local IAM configuration.

Grant AdministratorAccess to the project administrators and rely on later audits for enforcement.

Use an AWS Config rule alone to stop role creation if the permissions are too broad.

Why: This scenario needs both delegation control and org-wide guardrails. A permissions boundary sets the upper limit for any created role, and the delegated creation policy must require that boundary so it cannot be skipped. Separately, an SCP is the right way to block the member accounts from using unapproved Regions. Together, these controls prevent privilege creep while still allowing controlled self-service. Why others are wrong: AdministratorAccess is the opposite of least privilege and does not enforce a boundary. AWS Config can report or trigger remediation, but it does not stop role creation in the moment. The correct answer uses preventive controls: a permissions boundary, an IAM creation condition, and an SCP for regional guardrails.

A company serves private images stored in S3 through Amazon CloudFront. Only authenticated users should be able to access each image, and access should expire after 1 hour. Which CloudFront feature best meets this requirement?

Signed URLs or signed cookies with an expiration time of 1 hour

Signed URLs/cookies provide cryptographic, edge-enforced authorization for specific CloudFront resources and include an expiration timestamp. After expiry, CloudFront rejects requests (for example, with 403) without needing the origin to handle time-based authorization.

A WAF rule that blocks requests without valid JWTs, without using signed URLs

Turning on S3 bucket public access block, without any CloudFront viewer authentication

Enabling CloudFront geo restriction to allow only one country

Why: Signed URLs or signed cookies are the correct CloudFront pattern for private content that requires authenticated-only access with a short expiration. By generating signatures with a 1-hour expiry, you ensure CloudFront validates the signature at the edge and only serves the requested objects to users who have received valid signed credentials. WAF may supplement security, but it does not replace CloudFront’s native signed authorization model for time-limited access to specific resources. Why others are wrong: Option B focuses on WAF/JWT blocking, which is not the most appropriate native feature for time-limited, resource-scoped access via CloudFront. Option C only affects direct S3 access; it does not enforce authenticated user access or a 1-hour expiry for requests through CloudFront. Option D enforces location-based restrictions rather than authentication-based, expiring access.

Want more Design Secure Architectures practice?

All Design Resilient Architectures questions

Domain 2: Design Resilient Architectures

26% of exam · 6 sample questions below

An order-processing service consumes messages from an Amazon SQS Standard queue using a custom worker. During traffic spikes, the worker occasionally times out after performing some work but before acknowledging the message, so SQS redelivers it and it may be processed again.

You also observe that a small set of “poison” messages always fail validation.

What change most directly improves resilience by (1) preventing poison messages from retrying indefinitely and (2) avoiding duplicate side effects caused by legitimate retries?

Increase the SQS visibility timeout and, when validation fails, call DeleteMessage in the consumer to remove the message immediately.

Move to SNS topics with subscriptions and rely on SNS to provide exactly-once delivery to eliminate duplicates automatically.

Configure a dead-letter queue (DLQ) with a redrive policy that moves messages after maxReceiveCount, and implement idempotent processing in the consumer using an idempotency key.

SQS Standard is at-least-once delivery, so timeouts can cause redelivery and duplicates. A DLQ with a redrive policy prevents poison messages from retrying forever by moving them after repeated failures. Idempotent processing (for example, storing a processed marker in a database with conditional logic keyed by an idempotency key) prevents duplicate side effects when retries occur for valid messages.

Change the queue to FIFO and enable content-based deduplication, leaving the consumer logic unchanged.

Why: Because SQS Standard provides at-least-once delivery, timeouts can cause redelivery. To prevent poison messages from blocking progress indefinitely, configure a DLQ with a redrive policy (maxReceiveCount) so failing messages are quarantined. To avoid duplicate side effects from legitimate retries, make the consumer idempotent using an idempotency key so repeated deliveries do not re-apply side effects. Why others are wrong: Deleting or simply increasing visibility does not quarantine poison messages for investigation and does not reliably prevent infinite retry behavior. Relying on SNS for exactly-once delivery is incorrect. Switching to FIFO with deduplication alone does not address poison-message handling and does not replace idempotency for safe retry behavior.

Based on the exhibit, the application sees several minutes of connection errors during an Aurora failover. What is the best change to reduce failover impact?

Change the application to use the Aurora cluster writer endpoint and retry transient connections.

The current configuration targets a specific instance endpoint, which becomes stale after failover. The Aurora cluster writer endpoint always resolves to the current writer, so the application can reconnect without manual endpoint changes. Adding retries with backoff helps the application survive the short DNS and connection transition during failover.

Add an Aurora read replica and keep using the same JDBC URL.

Increase the EC2 instance size of the application servers.

Switch to a single-AZ RDS PostgreSQL instance for simpler connectivity.

Why: The best way to reduce Aurora failover impact is to connect through the Aurora cluster writer endpoint rather than a specific instance endpoint, and to retry transient connection failures. A fixed instance endpoint can become stale after failover, especially when the application uses a connection pool that holds on to old connections. The writer endpoint always points to the current writer, so the application recovers much more quickly. Why others are wrong: Adding a read replica does not help writer failover or stale endpoints. Increasing EC2 size does not change how the database endpoint resolves. Switching to single-AZ would make availability worse, not better.

A payments service receives payment orders by consuming messages from an Amazon SQS Standard queue. The downstream processor occasionally exceeds its processing timeout. As a result, some messages reappear in the queue and may be processed more than once.

The team wants to prevent duplicate side effects (for example, double-charging) and also ensure poison messages do not repeatedly consume processing capacity.

What approach best satisfies both goals?

Implement idempotent processing (for example, store processed payment IDs in DynamoDB) and configure an SQS dead-letter queue (DLQ) using a redrive policy with an appropriate maxReceiveCount.

With SQS Standard’s at-least-once delivery, duplicates can occur. Idempotency ensures repeated processing of the same payment ID does not create duplicate side effects. A DLQ with redrive policy isolates poison messages: after a message is received and fails processing more than maxReceiveCount times, SQS moves it to the DLQ instead of cycling it back to the main queue indefinitely.

Rely only on increasing the SQS visibility timeout so duplicates rarely occur, without adding idempotency checks or a DLQ.

Switch to a FIFO queue and delete messages immediately upon receipt to avoid duplicates.

Move the workload to SNS and use synchronous HTTP endpoints so the sender retries until the receiver confirms success.

Why: Because SQS Standard is at-least-once, duplicate deliveries are expected when processing exceeds the visibility timeout or when consumers fail mid-processing. Idempotent processing prevents duplicates from causing duplicate side effects by ensuring each payment ID is applied only once. Separately, configuring an SQS DLQ via a redrive policy with maxReceiveCount prevents poison messages from continuously reappearing and consuming worker time by quarantining messages that repeatedly fail. Why others are wrong: Visibility timeout tuning alone cannot guarantee correctness; duplicates can still occur and poison messages can still loop. Deleting immediately upon receipt breaks reliability because failures after delete cannot be recovered. Moving to a synchronous HTTP retry model does not inherently provide safe handling of duplicates nor DLQ-based quarantine for poison messages in the same way as an SQS DLQ combined with idempotency.

A company runs an application behind an Application Load Balancer (ALB). An Auto Scaling group (ASG) is configured with desired capacity 2, but it is attached only to subnets in a single Availability Zone. The ALB is healthy because it is configured across multiple Availability Zones.

When the Availability Zone that contains the ASG subnets experiences an outage, what change most directly improves resilience and allows capacity to be restored automatically?

Update the ASG to use subnet IDs that span at least two Availability Zones so it can launch replacement instances after an AZ outage.

If the ASG is attached to subnets in multiple Availability Zones, when instances in the failed AZ become unhealthy/terminate, Auto Scaling can launch new instances in the remaining AZs to restore the desired capacity. This directly addresses the root cause: the ASG cannot create capacity outside the AZs it is configured for.

Reduce the ALB health check interval to speed up detection of unhealthy targets.

Enable connection draining on the ALB so existing requests complete before targets are terminated.

Increase the ASG desired capacity from 2 to 6 to compensate for the missing subnets.

Why: To recover from an Availability Zone outage, Auto Scaling must be able to create replacement capacity in the remaining Availability Zones. Because the ASG is currently attached only to subnets in one AZ, it cannot launch new instances after that AZ fails. Updating the ASG to span at least two Availability Zones allows Auto Scaling to restore desired capacity automatically. Why others are wrong: ALB health check timing and connection draining affect how quickly requests are marked unhealthy or how in-flight requests complete, but they do not change the ASG’s ability to launch instances in other AZs. Increasing desired capacity only scales within the AZs already configured for the ASG, so it cannot restore capacity after the sole AZ fails.

hardFull explanation →

Based on the exhibit, DNS still sends traffic to the primary Region even though Route 53 health checks show the primary endpoint is unhealthy. What is the best change to make failover work as intended?

Change both records to weighted routing with a 50/50 split so Route 53 can shift traffic gradually.

Use a failover routing policy with a primary record and a secondary record, and attach the health check to the primary record.

Failover routing is designed for active-passive DNS behavior. With a primary and secondary record, Route 53 answers with the primary record when it is healthy and returns the secondary record when the primary health check fails. The exhibit shows simple routing, which does not express the failover intent. Switching to failover routing aligns the DNS policy with the stated requirement.

Switch to latency-based routing so users are always directed to the lowest-latency Region.

Use geolocation routing so clients in one Region are sent to the healthier endpoint.

Why: The current configuration uses simple routing, which does not provide the desired active-passive behavior. To route all traffic to us-east-1 normally and shift to us-west-2 only when the primary is unhealthy, Route 53 must use failover routing with a primary record, a secondary record, and health checking on the primary. That is the AWS feature built for this exact use case. Why others are wrong: Weighted routing shares traffic rather than expressing a primary/backup relationship. Latency-based routing is for performance optimization, not explicit outage failover. Geolocation routing makes decisions based on source location and does not switch based on endpoint health.

Based on the exhibit, the web application must remain available even if one Availability Zone fails. What is the best change to improve resilience with the least redesign?

Increase DesiredCapacity to 4 while keeping all instances in subnet-a1.

Add subnet-b1 in a different Availability Zone to the Auto Scaling group.

This spreads EC2 instances across two Availability Zones, so the Auto Scaling group can continue serving traffic if one AZ becomes unavailable. Because the ALB is already deployed in both subnets, this is the smallest change that adds true zonal resilience to the compute tier.

Replace the Application Load Balancer with a Network Load Balancer.

Enable EBS encryption on the launch template volumes.

Why: The best resilience improvement is to place the Auto Scaling group in multiple Availability Zones. The ALB is already multi-AZ, but the compute layer is not. Adding subnet-b1 lets the group launch instances in both AZs, so if us-east-1a is disabled the application can continue running on instances in us-east-1b. This is the least disruptive change that directly addresses zonal failure. Why others are wrong: Increasing instance count in one AZ only adds capacity, not redundancy. Replacing the ALB does not solve single-AZ compute placement. EBS encryption is a security control and has no impact on availability.

Want more Design Resilient Architectures practice?

All Design High-Performing Architectures questions

Domain 3: Design High-Performing Architectures

24% of exam · 6 sample questions below

A Lambda function behind API Gateway has predictable traffic spikes every hour. The function does not need access to resources in a VPC, and p95 latency spikes are caused by cold starts during scale-out. Which two actions are most effective? Select two.

Enable provisioned concurrency for the function.

Provisioned concurrency keeps a pool of initialized execution environments ready to handle requests. That removes most cold-start delay and is the most direct way to stabilize p95 latency during predictable bursts.

Remove the function from a VPC because it has no VPC dependencies.

If the function does not need private network access, keeping it out of a VPC avoids the extra networking setup associated with VPC-enabled Lambdas. That reduces startup overhead and helps new execution environments become available faster.

Set reserved concurrency to a low fixed number.

Increase the Lambda timeout to 15 minutes.

Add an SQS dead-letter queue to reduce startup latency.

Why: Provisioned concurrency is the strongest control for cold-start reduction because it keeps Lambda execution environments initialized and ready before requests arrive. Since the function has no VPC dependency, removing it from a VPC also avoids extra network attachment work during startup. Together, those changes reduce both the initialization overhead and the visible latency spikes during predictable scale-out. Why others are wrong: Reserved concurrency manages throughput limits, not startup latency. Increasing timeout does not improve function startup speed. A dead-letter queue improves failure handling, but it does not change how quickly Lambda can begin processing requests.

An Aurora PostgreSQL application has an OLTP writer and a reporting dashboard that issues many read-only queries. The writer is healthy, but read latency rises noticeably during reporting windows. Which two changes should you make? Select two.

Add Aurora Replicas to scale out the read workload.

Aurora Replicas provide additional read capacity, which lets you spread read-only traffic away from the writer instance.

Send read-only application traffic to the reader endpoint.

The reader endpoint automatically distributes reads across available replicas, reducing load on the writer and improving throughput.

Scale up only the writer instance and keep all queries on it.

Replace the cluster with a single-AZ RDS instance to reduce replication overhead.

Move the dashboard to DynamoDB without changing the query model.

Why: Aurora Replicas are designed for read scaling, so they absorb reporting traffic that would otherwise compete with the writer. Using the reader endpoint directs read-only requests to those replicas automatically, which is the simplest way to separate read load from writes. This combination improves performance without changing the transactional write path or introducing unnecessary architectural complexity. Why others are wrong: Scaling only the writer does not separate reads from writes, so the reporting workload still competes with OLTP traffic. A single-AZ RDS instance removes the very feature that helps here: read scaling. Moving the dashboard to DynamoDB is a redesign, not a targeted performance fix for the Aurora bottleneck.

A production application writes to an Amazon Aurora PostgreSQL cluster. Users report that during business-hour reporting runs, write latency increases. The application team wants to keep the writer focused on OLTP writes while still providing low-latency reads for reporting queries. What architectural approach should the solutions architect recommend?

Create Aurora read replicas and direct reporting read-only connections to the cluster reader endpoint.

Read replicas offload read workloads from the writer. Using the reader endpoint lets reporting queries use replicas, improving write responsiveness.

Resize the writer instance to a larger class so it can handle both writes and reads with fewer slowdowns.

Enable cross-region replication for the entire cluster so reporting always runs in the secondary Region.

Disable read replicas and use caching only in the application layer, keeping all queries connected to the writer endpoint.

Why: To reduce write latency caused by reporting reads, the architect should offload reporting workloads to read replicas. Aurora read replicas handle read-only queries, preserving writer resources for OLTP writes. Directing reporting to the cluster reader endpoint (or the replica endpoints) ensures that read traffic goes to replicas instead of the writer. Options that resize the writer or keep reporting on the writer do not achieve workload isolation, and cross-region replication may be overkill for the specific performance problem described. Why others are wrong: Resizing the writer treats symptoms and can keep mixed workload contention on the same endpoint, which may still impact write latency. Disabling replicas and relying only on application caching doesn’t guarantee performance during reporting runs when many queries are not cache hits. Cross-region replication addresses resilience/geography but is not the most targeted solution for business-hour read/write contention within a single Region.

A DynamoDB table stores device status items. The partition key is deviceId, and the partition distribution is healthy (no single partition dominates). However, during peak periods the application experiences high read latency because many clients repeatedly request the latest status for the same devices. Which action best improves read latency without changing the DynamoDB partitioning model?

Add Amazon DAX as a caching layer in front of DynamoDB and route repeated read operations through DAX.

Amazon DAX is an in-memory caching layer for DynamoDB that accelerates repeated reads. When many clients request the same items (for example, “latest status” point reads by deviceId), DAX can serve cached responses directly, reducing round trips to DynamoDB and lowering read latency during peak periods.

Change the partition key to a random value for each request to eliminate hot partitions.

Increase write capacity only, because writes generally determine read latency in DynamoDB.

Create an additional Global Secondary Index (GSI) and read exclusively from the index to accelerate reads.

Why: Because the partitioning model is already healthy, the latency issue is driven by repeated reads for the same items. Amazon DAX is specifically designed to cache DynamoDB read results in memory, which dramatically reduces latency for hot read patterns without requiring a change to partition keys or the data model. Routing those repeated “latest status” reads through DAX improves read latency while preserving the existing partitioning strategy. Why others are wrong: Randomizing the partition key attacks the wrong symptom and undermines the ability to access items by deviceId. Increasing write capacity does not address read-path latency caused by repeated reads. Adding a GSI may improve certain query patterns, but it does not inherently reduce latency for repeated point reads the way an in-memory caching layer does.

A team is splitting a new workload into two fronts. The first front serves HTTPS microservices that need host- and path-based routing plus health checks. The second front must handle TCP and UDP traffic for a real-time service and preserve static IP addresses for firewall allowlisting. Which two AWS load balancer choices best match these requirements? Select two.

Application Load Balancer

Application Load Balancer supports HTTP and HTTPS routing with host- and path-based rules, making it ideal for microservices.

Network Load Balancer

Network Load Balancer handles TCP and UDP traffic and can preserve stable IP addresses for allowlisting.

Amazon API Gateway

Amazon CloudFront

Gateway Load Balancer

Why: Application Load Balancer is the best fit for the HTTP/HTTPS microservice front end because it supports host-based and path-based routing and integrates well with health checks. Network Load Balancer is the correct choice for the real-time service because it supports TCP and UDP and can provide stable IP addresses for firewall allowlisting. Together, they match the two distinct traffic patterns without forcing the workload into a single load balancer type. Why others are wrong: API Gateway is an API management service, not the right tool for arbitrary TCP or UDP balancing. CloudFront speeds up content delivery but does not replace a load balancer for these backend routing requirements. Gateway Load Balancer is used to chain network appliances such as firewalls or IDS solutions; it is not the correct front door for microservices or real-time UDP traffic.

An API team runs an AWS Lambda function behind an Application Load Balancer (ALB). During predictable hourly traffic spikes, p95 response latency increases due to occasional cold starts. The team wants stable latency during those spikes without permanently overprovisioning resources for all functions. Which configuration is the most appropriate way to reduce cold starts for this Lambda function?

Publish a version of the function and configure provisioned concurrency on an alias, using autoscaling for the alias.

Provisioned concurrency pre-initializes execution environments for a specific published function version. By attaching provisioned concurrency to an alias, you can control warm capacity and (with the right settings) autoscale the provisioned capacity for predictable spike patterns, reducing cold-start-driven latency increases.

Increase the function memory size and rely on faster initialization to reduce cold starts.

Set reserved concurrency equal to the expected peak requests per second for the function.

Use an event source mapping with a higher batch size so Lambda triggers earlier and keeps the runtime warm.

Why: Use provisioned concurrency on a published version attached to an alias. Provisioned concurrency keeps a specified number of execution environments initialized and ready to handle ALB-invoked requests, which directly reduces cold-start latency. Using the alias model allows the team to manage and autoscale warm capacity for those predictable hourly spikes rather than relying on default on-demand initialization behavior. Why others are wrong: Increasing memory or adjusting reserved concurrency can change performance and limits, but they do not ensure pre-initialized execution environments are available during bursts. Event source mapping configuration is not applicable for ALB-triggered synchronous invocation in a way that would reliably prevent cold starts.

Want more Design High-Performing Architectures practice?

All Design Cost-Optimized Architectures questions

Domain 4: Design Cost-Optimized Architectures

20% of exam · 6 sample questions below

You store application logs in an S3 bucket. After 30 days, the logs are rarely accessed, but you must retain them for 1 year for compliance. Which S3 feature is the best way to reduce storage cost while meeting the retention requirement?

Create an S3 lifecycle rule to transition older objects to a colder storage class after 30 days, then expire after 1 year

S3 lifecycle policies can automatically transition objects to lower-cost storage classes based on age. Transitioning after 30 days reduces ongoing storage costs because the logs are rarely accessed, while expiring after 1 year ensures you still meet the compliance retention window.

Keep all logs in S3 Standard and rely on lower request rates to reduce cost

Copy logs to EBS snapshots each week and delete the original files

Use S3 replication to a second bucket in another region to reduce costs

Why: The best approach is an S3 lifecycle policy because it matches the retention pattern: logs must be kept for 1 year, but they become infrequently accessed after 30 days. A lifecycle rule can transition objects to a cheaper storage class once they age past 30 days, and an expiration rule can delete them only after 1 year. The other choices either keep data in the most expensive storage class, use an inappropriate storage primitive (EBS snapshots) for log retention, or use replication, which does not address storage-class pricing and can increase cost. Why others are wrong: Keeping logs in S3 Standard ignores the main cost lever: storage class selection. EBS snapshots are not designed for general application log retention. Replication does not reduce storage cost for the original objects and can increase costs by storing/copying data in another region.

CloudWatch metrics show your EC2 instances have average CPU utilization around 10% with stable performance over several weeks. The application does not require additional headroom right now. What is the most effective cost-optimization action?

Right-size the instances to a smaller size that matches the observed utilization

Right sizing reduces cost by matching instance capacity to actual demand. If average CPU is consistently low (around 10%) and performance is stable, it strongly indicates overprovisioning. Moving to a smaller instance (or a smaller capability within the same family) typically lowers hourly cost while maintaining sufficient capacity for the workload.

Increase the Auto Scaling desired capacity to add more instances

Switch to Spot Instances immediately even though interruptions would impact users

Disable detailed monitoring to reduce CPU usage from the monitoring agent

Why: Right sizing is the most direct cost-optimization step because persistent low utilization indicates the instances are larger than required. When average CPU is around 10% for weeks and the application remains stable, downsizing reduces the hourly compute cost while keeping performance within acceptable limits for normal fluctuations. The other options increase capacity cost (higher desired capacity), introduce operational risk unrelated to the metric evidence (Spot interruptions), or optimize a smaller secondary cost (monitoring) without fixing the overprovisioned compute spend. Why others are wrong: Increasing Auto Scaling desired capacity increases spend despite low CPU utilization. Spot is a separate decision that trades lower price for interruption risk and does not inherently solve overprovisioning. Disabling detailed monitoring reduces some monitoring costs, but it does not meaningfully address the dominant EC2 cost created by oversized instances.

A marketing site serves versioned JavaScript and CSS files from Amazon S3 through CloudFront. The origin bill is rising because CloudFront keeps fetching the same files too often, and the application never changes a file at the same URL once it is published. Which two changes should you make? Select two.

Set long-lived Cache-Control headers, such as a high max-age and immutable policy, on the versioned assets.

Versioned assets are ideal for long cache lifetimes because their URLs change when the content changes. Strong Cache-Control headers let CloudFront serve more requests from edge locations instead of repeatedly fetching the same files from S3.

Configure the CloudFront cache policy to avoid forwarding unnecessary query strings, headers, and cookies.

A smaller cache key improves the cache hit rate because more viewer requests map to the same cached object. Avoiding unnecessary request attributes also reduces origin fetches and lowers the bandwidth sent to the origin.

Move the static assets to an EC2 web server behind an Application Load Balancer.

Disable CloudFront caching so every request always reaches the origin.

Add more viewer-facing headers to the cache key so each browser variation gets a unique cached object.

Why: For versioned static assets, the lowest-cost approach is to let CloudFront cache them aggressively and keep the cache key as small as possible. Long-lived Cache-Control headers mean the same file can be served from edge caches for a long time, while avoiding unnecessary query strings, headers, and cookies improves cache hit rate. Both changes reduce S3 origin fetches and lower data transfer costs. Why others are wrong: Moving static assets to EC2 adds cost and management effort without improving caching efficiency. Disabling caching directly increases origin traffic. Expanding the cache key with more headers fragments the cache and reduces the hit rate, which is the opposite of the goal.

An application serves static images through Amazon CloudFront. The team observes higher-than-expected origin fetches, which increases origin bandwidth costs. Which change most directly improves CloudFront cache reuse to reduce origin requests for the static content?

Set appropriate Cache-Control headers (or origin cache settings) so CloudFront caches responses longer

Cache headers and TTL determine how long objects are kept in CloudFront’s edge caches. Longer caching for static assets increases the cache hit ratio, reducing how often requests must go back to the origin.

Disable caching for the distribution so every request goes back to the origin

Configure CloudFront to forward all request headers and query strings to the origin

Move the S3 bucket to a different AWS Region, without changing CloudFront caching behavior

Why: CloudFront origin requests are reduced primarily by increasing the cache hit ratio. The most direct way to do that for static content is to configure correct caching behavior—such as appropriate Cache-Control headers/TTL—so edge locations can reuse the cached objects for longer periods. When objects remain in cache, repeated viewer requests are served from edge caches instead of triggering new origin fetches, lowering origin bandwidth and request-related costs. Why others are wrong: Disabling caching forces an origin fetch for every request, increasing origin traffic. Forwarding additional headers and query strings expands cache-key variability and reduces cache hits. Moving the origin bucket region does not, by itself, change caching policy/TTL, so it is not the most direct lever to stop unnecessary origin fetches.

Your team runs a batch processing workload on EC2 that can tolerate interruptions. If an instance is terminated, the job can restart from checkpoints. To reduce compute costs, what is the most cost-optimized approach?

Use EC2 Spot Instances for the batch workers

Spot provides significantly lower pricing than On-Demand for interruptible workloads. Because the workload can restart from checkpoints, termination interruptions are acceptable and the application can recover efficiently, meeting both correctness and throughput requirements at a lower cost.

Use Dedicated Hosts to ensure capacity for the cheapest instance

Use On-Demand instances and schedule extra runs to offset interruptions

Use Reserved Instances only, because they eliminate instance termination events

Why: EC2 Spot Instances are the most cost-optimized option because the workload explicitly tolerates interruptions and can restart from checkpoints. Spot capacity may be reclaimed and instances can be terminated, but checkpoint-based restart handles this safely. This design directly aligns the interruption tolerance requirement with the Spot pricing model, producing lower compute costs than On-Demand. Dedicated Hosts and Reserved Instances focus on different guarantees (licensing control and discounted On-Demand, respectively) and do not provide the same interruption-tolerant cost advantage as Spot. Scheduling extra On-Demand runs would increase compute usage and cost. Why others are wrong: Dedicated Hosts are about host-level control and licensing constraints and usually cost more than Spot. On-Demand removes interruption variability but is not the cost-optimal pricing model for interruptible batch work. Reserved Instances provide On-Demand discounts but do not replace Spot’s interruption-tolerant pricing advantage. Scheduling extra On-Demand runs can increase total compute cost.

An internal rendering job runs on EC2 workers in an Auto Scaling group. Each job writes checkpoints every few minutes to S3 and can resume from the latest checkpoint after an interruption. The queue depth varies sharply, and the team wants the lowest possible compute cost. Which two changes should they make? Select two.

Run the worker fleet on EC2 Spot Instances.

Spot Instances usually provide the lowest EC2 compute price and fit workloads that can tolerate interruption. Because the job checkpoints to S3, the application can resume after Spot interruptions without losing all progress.

Purchase Dedicated Hosts so the fleet keeps physical servers reserved for the workload.

Use a Mixed Instances Policy with several compatible instance types and Spot capacity-optimized allocation.

Diversifying instance types improves the chance that Auto Scaling can obtain cheap Spot capacity. A mixed policy also reduces the risk of a single instance type shortage stopping the job fleet.

Run the entire fleet on On-Demand Instances to avoid any interruption risk.

Move the workers to AWS Outposts to keep compute close to the data.

Why: Spot Instances are the most cost-effective choice for interruption-tolerant EC2 workers, and the checkpoint design makes that practical. A Mixed Instances Policy with several instance types further improves availability of Spot capacity and helps the Auto Scaling group keep scaling during demand spikes. Together, these changes reduce compute cost while preserving job progress after interruptions. Why others are wrong: Dedicated Hosts and On-Demand Instances both increase cost relative to the interruption-tolerant design in the scenario. Outposts is a hybrid infrastructure choice, not a cost-optimization tactic for a cloud batch fleet. The job already tolerates interruption, so paying for guaranteed capacity is unnecessary.

Want more Design Cost-Optimized Architectures practice?