Knowledge + Practice

CCNA Design High-Performing Architectures Questions

75 of 238 questions · Page 1/4 · Design High-Performing Architectures · Answers revealed

Practice these questions Domain overview All questions

1

MCQmedium

A read-heavy document portal repeatedly queries the same product catalogue data from DynamoDB with millisecond latency requirements. Which service can reduce read latency and table load? The architecture review board prefers a managed AWS-native control.

A.Amazon Kinesis Data Firehose

B.S3 Transfer Acceleration

C.DynamoDB Accelerator (DAX)

D.AWS Glue Data Catalog

AnswerC

DAX is an in-memory cache for DynamoDB that reduces read latency for suitable access patterns.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that delivers up to 10x read performance improvement, reducing read latency to microseconds for repeated queries. It offloads read traffic from the DynamoDB table, lowering consumed read capacity units and table load, making it ideal for read-heavy workloads with millisecond latency requirements. As a fully managed, AWS-native service, DAX aligns with the architecture review board's preference for managed controls.

Exam trap

The trap here is that candidates may confuse DAX with ElastiCache (which is also a caching service but not DynamoDB-native) or assume that any AWS caching service works interchangeably, but DAX is the only managed, DynamoDB-specific cache that integrates directly with the DynamoDB API without application code changes.

How to eliminate wrong answers

Option A is wrong because Amazon Kinesis Data Firehose is a streaming data ingestion service for loading data into data stores and analytics tools, not a caching layer for DynamoDB reads; it cannot reduce read latency or table load for repeated queries. Option B is wrong because S3 Transfer Acceleration speeds up uploads and downloads to/from S3 over long distances using AWS edge locations, but it does not cache DynamoDB data or reduce read latency for DynamoDB queries. Option D is wrong because AWS Glue Data Catalog is a metadata repository for ETL jobs and data lake schemas, not a caching service for DynamoDB reads; it has no impact on DynamoDB read latency or table load.

Practice this question →

2

MCQeasy

A team runs a stateless web app on Amazon EC2 behind an Application Load Balancer. During traffic spikes, new EC2 instances take several minutes to finish bootstrapping before they can receive traffic. Which Auto Scaling configuration most directly reduces the time until additional capacity is available?

A.Increase the ALB target group deregistration delay.

B.Use an Auto Scaling warm pool so pre-initialized instances are ready to enter service.

C.Reduce the Auto Scaling group minimum size to one instance.

D.Replace the Application Load Balancer with a Network Load Balancer.

AnswerB

Warm pools keep instances pre-launched and initialized, which reduces the time needed to add capacity during spikes.

Why this answer

Option B is correct because an Auto Scaling warm pool allows you to maintain a pool of pre-initialized instances that are ready to quickly enter the target group and start serving traffic. Instead of waiting for new instances to boot and configure during a scale-out event, the warm pool provides instances that have already completed bootstrapping, drastically reducing the time to additional capacity.

Exam trap

The trap here is that candidates may confuse the deregistration delay (which handles graceful connection draining) with a mechanism to speed up instance readiness, or they may incorrectly assume that reducing the minimum size or switching to a Network Load Balancer will improve scaling speed, when neither addresses the root cause of slow bootstrapping.

How to eliminate wrong answers

Option A is wrong because increasing the ALB target group deregistration delay only affects how long the load balancer waits before terminating existing connections when an instance is deregistered; it does not speed up the provisioning of new instances. Option C is wrong because reducing the Auto Scaling group minimum size to one instance actually decreases the baseline capacity, making the system more vulnerable to traffic spikes and potentially increasing the time to scale out. Option D is wrong because replacing the Application Load Balancer with a Network Load Balancer does not address the bootstrapping delay; NLB operates at Layer 4 and does not reduce instance initialization time, and it lacks the application-layer health checks and routing features that ALB provides for HTTP-based workloads.

Practice this question →

3

MCQmedium

A analytics dashboard uses RDS MySQL and receives many read-only reporting queries that slow down the primary database. What should the architect add?

A.S3 lifecycle policy

B.RDS read replica and route reporting queries to it

C.Multi-AZ standby and route reads to the standby

D.A larger NAT gateway

AnswerB

Read replicas offload read traffic from the primary instance.

Why this answer

Adding an RDS read replica offloads read-heavy reporting queries from the primary MySQL instance, preserving write performance. The read replica asynchronously replicates data using MySQL's native binlog replication, and routing reporting queries to its endpoint reduces contention on the primary.

Exam trap

The trap here is confusing a Multi-AZ standby (which is for high availability only and cannot serve reads) with a read replica (which is explicitly designed to offload read traffic).

How to eliminate wrong answers

Option A is wrong because S3 lifecycle policies manage object transitions and expirations in S3, not database query offloading. Option C is wrong because a Multi-AZ standby is a synchronous replica used only for failover; it does not serve read traffic (RDS does not allow direct reads from the standby). Option D is wrong because a larger NAT gateway increases outbound internet bandwidth for private subnets, which does not address database read query performance.

Practice this question →

4

MCQmedium

A mobile game backend uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added? The architecture review board prefers a managed AWS-native control.

A.An internet gateway

B.S3 Select

C.RDS Proxy

D.A larger Route 53 hosted zone

AnswerC

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy is a fully managed, AWS-native service that sits between Lambda functions and Aurora, pooling and reusing database connections. This prevents connection storms by reducing the overhead of establishing new connections for each short-lived Lambda invocation, and it also improves scalability and resilience by handling failover transparently.

Exam trap

The trap here is that candidates might confuse network-level components (like an internet gateway) or data retrieval services (like S3 Select) with database connection management, overlooking the purpose-built RDS Proxy service for handling short-lived, high-frequency connections from serverless workloads.

How to eliminate wrong answers

Option A is wrong because an internet gateway is used to enable VPC-to-internet connectivity, not to manage database connections or mitigate connection storms. Option B is wrong because S3 Select is a feature for retrieving subsets of data from objects in S3, not for managing database connections or connection pooling. Option D is wrong because a larger Route 53 hosted zone increases the number of DNS records you can host, but it does not address database connection management or connection storms.

Practice this question →

5

MCQeasy

Based on the exhibit, what change best reduces Lambda cold-start impact for a predictable user-upload workflow?

A.Set a reserved concurrency limit for the function to protect it from throttling.

B.Enable provisioned concurrency for the function.

C.Increase the function timeout to give more time for initialization.

D.Move the function to a larger memory setting only to eliminate all initialization time.

AnswerB

Provisioned concurrency keeps a pre-initialized pool of Lambda execution environments ready to respond immediately. The exhibit shows long init duration after inactivity, which is the classic symptom of cold starts affecting user experience. Because the traffic pattern is predictable during launches, provisioned concurrency is the most direct way to reduce startup latency and smooth response times.

Why this answer

Provisioned concurrency pre-warms a specified number of execution environments so that when a user upload triggers the Lambda function, there is no cold-start latency. This is the most direct way to eliminate initialization time for a predictable workload, as it keeps instances ready to handle requests immediately.

Exam trap

The trap here is that candidates often confuse reserved concurrency (which limits concurrency) with provisioned concurrency (which pre-warms instances), or they assume that increasing memory or timeout will solve cold starts, when in fact only provisioned concurrency directly addresses initialization latency for predictable workloads.

How to eliminate wrong answers

Option A is wrong because reserved concurrency only caps the maximum number of concurrent executions to prevent throttling; it does not pre-warm instances or reduce cold-start impact. Option C is wrong because increasing the function timeout does not affect initialization time; it only extends the maximum duration a function can run, which does not address cold starts. Option D is wrong because moving to a larger memory setting can reduce initialization time by providing more CPU and resources, but it does not eliminate all initialization time, and it is not as targeted or effective as provisioned concurrency for predictable workloads.

Practice this question →

6

Multi-Selectmedium

A Lambda function behind API Gateway has predictable traffic spikes every hour. The function does not need access to resources in a VPC, and p95 latency spikes are caused by cold starts during scale-out. Which two actions are most effective? Select two.

Select 2 answers

A.Enable provisioned concurrency for the function.

B.Remove the function from a VPC because it has no VPC dependencies.

C.Set reserved concurrency to a low fixed number.

D.Increase the Lambda timeout to 15 minutes.

E.Add an SQS dead-letter queue to reduce startup latency.

AnswersA, B

Provisioned concurrency keeps a pool of initialized execution environments ready to handle requests. That removes most cold-start delay and is the most direct way to stabilize p95 latency during predictable bursts.

Why this answer

Option A is correct because provisioned concurrency pre-warms a specified number of Lambda execution environments, eliminating cold starts for those instances. This directly addresses the p95 latency spikes caused by cold starts during predictable traffic spikes, as the function will have warm containers ready to handle incoming requests without the initialization delay.

Exam trap

The trap here is that candidates often confuse reserved concurrency (which limits concurrency and can cause throttling) with provisioned concurrency (which pre-warms environments), or they mistakenly believe that increasing timeout or adding a DLQ can mitigate cold start latency.

Practice this question →

7

Multi-Selectmedium

A company is designing a high-performance database architecture for an e-commerce platform that experiences rapid spikes in read traffic during flash sales. The database must handle millions of reads per second with sub-millisecond latency. The data is key-value in nature, with a small number of attributes per item. Which three options should be included in the architecture? (Choose three.)

Select 3 answers

.Amazon DynamoDB as the primary database.

.Amazon RDS for MySQL with Multi-AZ and Read Replicas.

.DynamoDB Accelerator (DAX) as an in-memory cache.

.Amazon ElastiCache for Redis with cluster mode enabled.

.Amazon S3 as a primary data store accessed via Select and Range queries.

.Amazon Redshift with auto-scaling for real-time reads.

Why this answer

Amazon DynamoDB is a fully managed NoSQL key-value database that delivers single-digit millisecond latency at any scale, making it ideal for high-traffic e-commerce platforms with key-value data. DynamoDB Accelerator (DAX) is an in-memory cache that sits in front of DynamoDB, reducing read latency to microseconds for millions of reads per second. Amazon ElastiCache for Redis with cluster mode enabled provides a distributed in-memory cache that can offload read traffic from the primary database, further reducing latency and handling spikes during flash sales.

Exam trap

The trap here is that candidates often choose Amazon RDS with Read Replicas for read scaling, but they fail to recognize that relational databases cannot achieve sub-millisecond latency for millions of reads per second, and that DynamoDB with caching layers is the correct high-performance key-value solution.

Practice this question →

8

Matchinghard

A company runs a stateless application tier behind an Application Load Balancer. Match each observed scaling pattern on the left to the best Auto Scaling strategy or metric on the right.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Scale the Auto Scaling group on ALB RequestCountPerTarget.

Scale on SQS queue depth using a custom CloudWatch metric.

Use scheduled scaling to add capacity before the recurring surge.

Use target tracking on EC2 CPUUtilization.

Why these pairings

Steady increase uses step scaling for gradual adjustments; sudden spikes use simple scaling; cyclical patterns use scheduled scaling; consistent low traffic may need no scaling; unpredictable bursts use target tracking; gradual decrease uses simple scaling.

Practice this question →

9

MCQhard

A Lambda-based retail API has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The design must avoid adding custom operational scripts.

A.A larger deployment package

B.Reserved concurrency only

C.Provisioned concurrency during campaign windows

D.CloudTrail data events

AnswerC

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, eliminating cold starts during campaign windows. This ensures consistent response times for the Lambda-based retail API under unpredictable traffic spikes without requiring custom scripts.

Exam trap

The trap here is confusing reserved concurrency (which prevents throttling but does not address cold starts) with provisioned concurrency (which eliminates cold starts by keeping environments warm).

How to eliminate wrong answers

Option A is wrong because a larger deployment package increases cold start duration, making latency worse. Option B is wrong because reserved concurrency only caps the maximum number of concurrent executions to prevent throttling, but does not pre-warm environments to avoid cold starts. Option D is wrong because CloudTrail data events record API activity for auditing, not performance optimization.

Practice this question →

10

MCQmedium

A team serves static assets from an S3 origin through CloudFront. Cache hit ratio is low. Analytics show that requests include an Authorization header (even though the assets are public) and the cache key currently varies on that header, causing CloudFront to treat the same asset as different cache entries. What is the best change to improve cache hit ratio without breaking access controls?

A.Keep Authorization in the CloudFront cache key, but increase the origin response minimum TTL to 1 day.

B.Modify the CloudFront cache policy so the cache key does not include the Authorization header.

C.Switch the S3 origin from the current bucket to a website endpoint to enable automatic caching headers.

D.Enable CloudFront to forward all headers to S3 so origin can decide caching behavior per request.

AnswerB

CloudFront cache hit ratio depends on what constitutes a unique cache key. If Authorization is included, identical public assets requested with different Authorization values will map to different cache objects and reduce reuse. Removing Authorization from the cache key makes those requests share the same edge cache entry, improving hit ratio and reducing origin traffic. Because the scenario states the assets are public, removing Authorization from the cache key does not break access controls (access is not controlled by Authorization at the origin).

Why this answer

The low cache hit ratio is caused by the Authorization header being included in the CloudFront cache key, which creates separate cache entries for the same object even though the assets are public. By modifying the cache policy to exclude the Authorization header, CloudFront will treat all requests for the same asset as identical, dramatically improving the cache hit ratio without affecting access controls because the assets are already public.

Exam trap

The trap here is that candidates may think increasing TTL or changing the origin type will fix caching, when the real issue is the cache key composition—specifically, the Authorization header fragmenting the cache.

How to eliminate wrong answers

Option A is wrong because increasing the minimum TTL does not address the root cause—the cache key still varies on the Authorization header, so separate cache entries will persist and the cache hit ratio will remain low. Option C is wrong because switching to an S3 website endpoint does not change how CloudFront caches based on headers; the cache key is still controlled by the CloudFront cache policy, not the origin type. Option D is wrong because forwarding all headers to S3 would include the Authorization header in the cache key, making the problem worse by further fragmenting the cache.

Practice this question →

11

MCQmedium

A read-heavy media archive repeatedly queries the same product catalogue data from DynamoDB with millisecond latency requirements. Which service can reduce read latency and table load? The architecture review board prefers a managed AWS-native control.

A.DynamoDB Accelerator (DAX)

B.Amazon Kinesis Data Firehose

C.AWS Glue Data Catalog

D.S3 Transfer Acceleration

AnswerA

DAX is an in-memory cache for DynamoDB that reduces read latency for suitable access patterns.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that delivers microsecond read latency, directly addressing the millisecond requirement. By caching frequently accessed product catalogue data, DAX offloads read requests from the DynamoDB table, reducing table load and read capacity unit consumption. As a fully managed, AWS-native service, it aligns with the architecture review board's preference for managed controls.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration (which optimizes uploads to S3) with a caching solution for DynamoDB, or mistakenly think Glue Data Catalog or Kinesis Firehose can cache database queries, when only DAX provides in-memory acceleration for DynamoDB reads.

How to eliminate wrong answers

Option B is wrong because Amazon Kinesis Data Firehose is a streaming data ingestion service for loading data into data lakes or analytics tools, not a caching or read-latency reduction solution for DynamoDB. Option C is wrong because AWS Glue Data Catalog is a metadata repository for ETL and data discovery, not a cache that accelerates DynamoDB read queries. Option D is wrong because S3 Transfer Acceleration uses AWS edge locations to speed up uploads to S3 over long distances, but it does not cache DynamoDB data or reduce read latency for repeated queries.

Practice this question →

12

MCQeasy

A company runs a stateless web API on Amazon EC2 behind an Application Load Balancer. The team notices that during business hours, the ALB starts queueing requests and the average request latency rises. They want to scale out quickly and reliably based on demand, not CPU alone. Which Auto Scaling approach best matches this requirement?

A.Use a fixed-size Auto Scaling group and increase capacity manually once per hour.

B.Use target tracking scaling based on ALB request count per target.

C.Scale based only on EC2 instance memory utilization, regardless of load.

D.Use step scaling with a single threshold on average network-in bytes.

AnswerB

Target tracking can automatically adjust capacity using ALB load metrics and respond faster.

Why this answer

Option B is correct because target tracking scaling based on ALB request count per target directly measures the load on each instance, allowing the Auto Scaling group to add or remove instances to maintain a target value. This approach scales out quickly and reliably based on actual demand (request queuing and latency), not just CPU, which aligns with the requirement to respond to rising latency and queueing during business hours.

Exam trap

The trap here is that candidates often default to CPU-based scaling (a common but incomplete metric) or memory-based scaling, overlooking that for a stateless web API behind an ALB, request count per target is the most direct indicator of demand and latency issues.

How to eliminate wrong answers

Option A is wrong because manual scaling once per hour cannot react quickly to sudden demand spikes during business hours, leading to continued queueing and latency. Option C is wrong because scaling based solely on memory utilization ignores the actual request load and latency, and a stateless web API may not show memory pressure even when request queueing is high. Option D is wrong because step scaling with a single threshold on average network-in bytes is not directly correlated with request queueing or latency, and network-in can be influenced by factors other than application demand (e.g., large payloads), making it unreliable for scaling based on request count.

Practice this question →

13

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The team wants the control to be enforceable during normal operations.

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content (images, JavaScript) at edge locations worldwide, reducing latency for users in distant countries. By using the S3 bucket as an origin, CloudFront serves cached copies from the nearest edge, drastically improving load times. This solution is enforceable during normal operations because CloudFront provides cache control headers and invalidation APIs to manage content freshness.

Exam trap

The trap here is that candidates may confuse scaling compute (EC2 Auto Scaling) or database (RDS read replicas) with content delivery, failing to recognize that static content performance is solved by a CDN like CloudFront, not by scaling backend resources.

How to eliminate wrong answers

Option A is wrong because increasing the S3 bucket size does not reduce latency; S3 is a regional service and does not cache content globally. Option C is wrong because RDS read replicas are for database read scaling, not for serving static files from S3. Option D is wrong because an EC2 Auto Scaling group in one Region does not address global latency; it only scales compute capacity in a single geographic area, leaving distant users unaffected.

Practice this question →

14

MCQeasy

Based on the exhibit, the team wants to improve application performance without changing the code. Which EC2 instance family should they choose next?

A.Choose a compute-optimized instance family such as C6i to increase CPU performance.

B.Choose a memory-optimized instance family such as R6i to provide more RAM.

C.Choose a storage-optimized instance family such as I4i to improve block storage throughput.

D.Choose a burstable instance family such as T3 to reduce cost and improve performance.

AnswerB

Memory-optimized instances are the best fit when memory pressure is causing slowdowns. The exhibit shows CPU is low while memory is consistently near saturation, which strongly suggests the application needs more RAM rather than more compute. Moving to an R6i family should reduce paging and improve response times without changing the application design.

Why this answer

The exhibit shows that the application is experiencing high memory utilization (e.g., memory pressure or swapping), which degrades performance. Choosing a memory-optimized instance family such as R6i provides more RAM per vCPU, directly addressing the bottleneck without requiring code changes. This improves application performance by reducing or eliminating swap usage and allowing more data to be cached in memory.

Exam trap

The trap here is that candidates often assume 'improving performance' always means faster CPU or storage, but the exhibit’s memory utilization metric directly points to a memory bottleneck, making the memory-optimized family the correct choice despite the lack of explicit code changes.

How to eliminate wrong answers

Option A is wrong because compute-optimized instances (C6i) increase CPU performance, but the exhibit indicates the bottleneck is memory, not CPU; thus, more CPU would not resolve high memory utilization. Option C is wrong because storage-optimized instances (I4i) improve block storage throughput and IOPS, which is irrelevant if the performance issue stems from insufficient RAM rather than disk I/O. Option D is wrong because burstable instances (T3) are designed for workloads with low average CPU usage and can actually degrade performance under sustained high load due to CPU credit exhaustion; they do not address memory constraints and may worsen the problem.

Practice this question →

15

MCQhard

A Lambda-based retail API has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The architecture review board prefers a managed AWS-native control.

A.A larger deployment package

B.Reserved concurrency only

C.Provisioned concurrency during campaign windows

D.CloudTrail data events

AnswerC

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency pre-warms a specified number of Lambda execution environments, eliminating cold starts during the campaign windows. This is a managed AWS-native feature that ensures consistent sub-100ms response times even under unpredictable traffic spikes, directly addressing the latency issue.

Exam trap

The trap here is that candidates confuse reserved concurrency (which only limits scaling) with provisioned concurrency (which pre-warms environments), leading them to pick option B as a cost-saving measure without realizing it does not solve cold starts.

How to eliminate wrong answers

Option A is wrong because a larger deployment package increases cold start latency (as more code must be loaded and initialized), making the problem worse. Option B is wrong because reserved concurrency only caps the maximum number of concurrent executions to prevent runaway scaling; it does not pre-warm environments and thus does not eliminate cold starts. Option D is wrong because CloudTrail data events capture API activity for auditing and governance, not for performance optimization or cold start mitigation.

Practice this question →

16

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The design must avoid adding custom operational scripts.

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content (images, JavaScript files) at edge locations worldwide. By distributing content closer to users, it reduces latency and improves load times for distant countries without requiring any custom operational scripts or changes to the S3 bucket.

Exam trap

The trap here is that candidates may think increasing S3 bucket size or adding compute resources (EC2, RDS) can solve latency issues, but the correct solution is a CDN like CloudFront that brings content physically closer to users.

How to eliminate wrong answers

Option A is wrong because increasing the size of an S3 bucket does not improve performance; S3 bucket size has no impact on latency or throughput for static content delivery. Option C is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3. Option D is wrong because an EC2 Auto Scaling group in a single Region does not reduce latency for users in distant countries; it only provides compute scaling within one geographic area, not global edge caching.

Practice this question →

17

MCQhard

Based on the exhibit, a batch-processing service runs on Amazon EC2. The workload is Linux-based, can run on ARM64, and is CPU-bound during its nightly processing window. The team wants the best throughput per dollar without changing the application logic. Which EC2 instance family should the solutions architect recommend?

A.C7g instances based on AWS Graviton processors

B.R7i instances because more memory will improve CPU-bound job throughput.

C.M7a instances because general-purpose families are always the safest performance choice.

D.T3 instances because burstable instances can handle occasional nighttime spikes at lower cost.

AnswerA

C7g instances are compute optimized and use Graviton processors, which often deliver strong price-performance for CPU-bound Linux workloads that can run on ARM64. The exhibit shows the application is compatible and even benchmarks faster on ARM.

Why this answer

The C7g instances are based on AWS Graviton processors (ARM64 architecture), which offer up to 25% better performance per dollar compared to x86-based instances for CPU-bound workloads. Since the workload is Linux-based, can run on ARM64, and is CPU-bound, the C7g family provides the best throughput per dollar without requiring any application logic changes.

Exam trap

The trap here is that candidates may choose memory-optimized or general-purpose instances (like R7i or M7a) thinking they are safer, or burstable instances (T3) assuming they handle spikes cheaply, without recognizing that compute-optimized ARM64 instances (C7g) provide the best throughput per dollar for CPU-bound, ARM64-compatible workloads.

How to eliminate wrong answers

Option B is wrong because R7i instances are memory-optimized, designed for workloads that require large amounts of memory, not for CPU-bound jobs where additional memory does not improve throughput. Option C is wrong because M7a instances are general-purpose and balance compute, memory, and networking, but they are not optimized for CPU-bound workloads and use x86 architecture, which typically offers lower performance per dollar compared to ARM64-based instances for this specific scenario. Option D is wrong because T3 instances are burstable and designed for workloads with low baseline CPU usage and occasional spikes, but they are not suitable for sustained CPU-bound processing during a nightly window, as they would exhaust CPU credits and incur performance throttling or additional costs.

Practice this question →

18

Multi-Selectmedium

An Aurora PostgreSQL application has an OLTP writer and a reporting dashboard that issues many read-only queries. The writer is healthy, but read latency rises noticeably during reporting windows. Which two changes should you make? Select two.

Select 2 answers

A.Add Aurora Replicas to scale out the read workload.

B.Send read-only application traffic to the reader endpoint.

C.Scale up only the writer instance and keep all queries on it.

D.Replace the cluster with a single-AZ RDS instance to reduce replication overhead.

E.Move the dashboard to DynamoDB without changing the query model.

AnswersA, B

Aurora Replicas provide additional read capacity, which lets you spread read-only traffic away from the writer instance.

Why this answer

Adding Aurora Replicas (Option A) is correct because Aurora Replicas are dedicated read-only instances that share the same underlying storage volume as the writer, allowing you to scale read capacity linearly without impacting write performance. Sending read-only traffic to the reader endpoint (Option B) is correct because the reader endpoint automatically load-balances connections across all available Aurora Replicas, ensuring that dashboard queries are distributed and do not overload a single instance.

Exam trap

The trap here is that candidates may think scaling up the writer instance (Option C) is sufficient, but they overlook that read-heavy workloads require horizontal read scaling via replicas, not just vertical scaling of the writer.

Practice this question →

19

MCQhard

Based on the exhibit, a media company serves versioned JavaScript and CSS files from an Amazon S3 origin through CloudFront. After a frontend release, the cache hit ratio dropped sharply even though the file names are versioned. The application team says the browser requests include the same Authorization header on every asset request because the frontend and API share one domain. What should the solutions architect do to improve CloudFront cache hit ratio without changing the application authentication model for the API?

A.Enable S3 Transfer Acceleration on the bucket so CloudFront fetches objects faster from the origin.

B.Create a CloudFront cache policy that excludes Authorization, cookies, and unnecessary query strings from the cache key.

C.Switch the origin from S3 to an Application Load Balancer so CloudFront can cache dynamic responses more effectively.

D.Configure CloudFront to forward every viewer header to the origin so the origin can decide whether the content is cacheable.

AnswerB

This reduces cache fragmentation because CloudFront can reuse the same cached object for many viewers. Since the assets are immutable and versioned, the Authorization header is not needed to vary the cache for these files. Keeping API authentication separate preserves the application model while improving hit ratio.

Why this answer

The sharp drop in cache hit ratio is caused by the Authorization header being included in the cache key, which makes each request unique even though the file names are versioned. By creating a CloudFront cache policy that excludes the Authorization header (and unnecessary cookies/query strings) from the cache key, CloudFront can serve cached responses to requests with different Authorization headers, restoring the cache hit ratio without altering the application's authentication model for the API.

Exam trap

The trap here is that candidates may think the Authorization header is required for caching or that forwarding all headers is safe, but in reality, including it in the cache key destroys cache efficiency for static assets, and the correct solution is to exclude it via a cache policy.

How to eliminate wrong answers

Option A is wrong because S3 Transfer Acceleration improves upload/download speed over long distances but does not affect CloudFront's cache key or hit ratio. Option C is wrong because switching to an Application Load Balancer would not solve the cache key issue; ALB is for dynamic content and would not improve caching for static versioned files served from S3. Option D is wrong because forwarding every viewer header to the origin would include the Authorization header in the cache key, making each request unique and further reducing the cache hit ratio, which is the opposite of what is needed.

Practice this question →

20

MCQeasy

A DynamoDB-backed multi-tenant app experiences throttling during a promotion. Most writes and reads target tenant "ACME" and use the same partition key value, causing a hot partition. Which design change most directly improves performance?

A.Add a "shard" component to the partition key (for example, tenantId + hashed bucket) to spread traffic across partitions

B.Increase the table’s read capacity without changing the partition key

C.Switch all reads to strongly consistent reads to guarantee faster results

D.Store ACME data in S3 and query it directly to avoid DynamoDB throttling

AnswerA

DynamoDB throughput is distributed across physical partitions. If one partition key value receives most traffic, that partition throttles. Adding a shard component to the partition key increases the number of partition key values being used, spreading requests across more partitions and reducing hot-partition throttling.

Why this answer

Option A is correct because adding a shard component to the partition key (e.g., appending a random or hash-based suffix to the tenant ID) distributes writes and reads for the same tenant across multiple physical partitions. This directly alleviates the hot partition caused by all ACME traffic hitting a single partition key value, allowing DynamoDB to utilize its full provisioned throughput across partitions.

Exam trap

The trap here is that candidates may think increasing total table capacity (Option B) solves throttling, but they overlook that DynamoDB throttles at the partition level, not the table level, so a single hot partition remains constrained regardless of total capacity.

How to eliminate wrong answers

Option B is wrong because increasing the table’s read capacity does not fix the hot partition issue—DynamoDB distributes throughput evenly across partitions, so a single partition can still throttle even if total table capacity is high. Option C is wrong because strongly consistent reads do not improve performance; they are slower and consume more read capacity units than eventually consistent reads, and they do not spread traffic across partitions. Option D is wrong because storing ACME data in S3 and querying it directly bypasses DynamoDB’s low-latency access patterns and introduces additional complexity (e.g., S3 eventual consistency, lack of native querying), making it an inefficient and indirect solution for a hot partition problem.

Practice this question →

21

MCQmedium

A production application writes to an Amazon Aurora PostgreSQL cluster. Users report that during business-hour reporting runs, write latency increases. The application team wants to keep the writer focused on OLTP writes while still providing low-latency reads for reporting queries. What architectural approach should the solutions architect recommend?

A.Create Aurora read replicas and direct reporting read-only connections to the cluster reader endpoint.

B.Resize the writer instance to a larger class so it can handle both writes and reads with fewer slowdowns.

C.Enable cross-region replication for the entire cluster so reporting always runs in the secondary Region.

D.Disable read replicas and use caching only in the application layer, keeping all queries connected to the writer endpoint.

AnswerA

Read replicas offload read workloads from the writer. Using the reader endpoint lets reporting queries use replicas, improving write responsiveness.

Why this answer

A is correct because creating Aurora read replicas and directing reporting read-only connections to the cluster reader endpoint offloads read traffic from the writer instance. This allows the writer to focus on OLTP writes, while the reader endpoint load-balances read-only queries across replicas, providing low-latency reads for reporting without impacting write performance.

Exam trap

The trap here is that candidates may think resizing the writer instance (Option B) is sufficient, but the exam tests the architectural principle of separating read and write workloads to avoid resource contention, not just scaling vertically.

How to eliminate wrong answers

Option B is wrong because resizing the writer instance to a larger class increases capacity for both writes and reads, but does not isolate reporting queries from the writer, so write latency can still spike during heavy read loads. Option C is wrong because cross-region replication adds significant latency for reads and does not address the immediate need for low-latency reads within the same region during business hours. Option D is wrong because disabling read replicas and relying solely on application-layer caching forces all queries to the writer endpoint, which increases contention and does not scale for reporting workloads that require fresh data.

Practice this question →

22

MCQmedium

A telemetry pipeline uses RDS MySQL and receives many read-only reporting queries that slow down the primary database. What should the architect add? The architecture review board prefers a managed AWS-native control.

A.Multi-AZ standby and route reads to the standby

B.RDS read replica and route reporting queries to it

C.S3 lifecycle policy

D.A larger NAT gateway

AnswerB

Read replicas offload read traffic from the primary instance.

Why this answer

The correct answer is B because RDS read replicas are designed specifically to offload read-heavy workloads like reporting queries from the primary database. They provide an asynchronous read-only copy of the database that can handle SELECT statements without impacting the primary's write performance. This is a fully managed AWS-native solution that aligns with the architecture review board's preference.

Exam trap

The trap here is confusing Multi-AZ standby (which is for failover only) with read replicas (which are for read scaling), leading candidates to incorrectly choose Option A.

How to eliminate wrong answers

Option A is wrong because a Multi-AZ standby is a synchronous replica used for high availability and failover, not for read traffic; it cannot serve read queries directly. Option C is wrong because an S3 lifecycle policy manages object storage transitions and expiration, which is unrelated to offloading database read queries. Option D is wrong because a larger NAT gateway increases outbound internet capacity for private subnets, which does not address read query load on an RDS database.

Practice this question →

23

MCQmedium

A analytics dashboard uses RDS MySQL and receives many read-only reporting queries that slow down the primary database. What should the architect add? The design must avoid adding custom operational scripts.

A.S3 lifecycle policy

B.RDS read replica and route reporting queries to it

C.Multi-AZ standby and route reads to the standby

D.A larger NAT gateway

AnswerB

Read replicas offload read traffic from the primary instance.

Why this answer

RDS read replicas are designed specifically to offload read-heavy workloads from the primary DB instance. By routing reporting queries to a read replica, the primary database is freed from processing those queries, reducing contention and improving overall performance. This approach requires no custom scripts—AWS handles replication automatically.

Exam trap

The trap here is confusing Multi-AZ standby with read replica functionality—candidates often assume the standby can serve reads, but AWS explicitly disallows this for RDS MySQL and PostgreSQL.

How to eliminate wrong answers

Option A is wrong because S3 lifecycle policies manage object transitions and deletions in S3 buckets, not database query offloading. Option C is wrong because Multi-AZ standby is for high availability and failover, not for serving read traffic; routing reads to the standby is not supported and would cause errors. Option D is wrong because a larger NAT gateway increases outbound internet capacity for private subnets, which has no effect on database query performance.

Practice this question →

24

Multi-Selectmedium

A company is designing a high-performance architecture for a real-time analytics platform that ingests millions of events per second. The events must be processed with minimal latency and then stored for long-term analysis. Which three services should be combined to build this architecture? (Choose three.)

Select 3 answers

.Amazon Kinesis Data Streams for real-time data ingestion

.Amazon SQS (Simple Queue Service) for buffering events

.AWS Lambda for real-time event processing

.Amazon Redshift for long-term data storage and analytics

.Amazon RDS for MySQL for storing processed results

.Amazon CloudWatch Logs for event storage

Why this answer

Amazon Kinesis Data Streams is correct because it is designed for real-time data ingestion of millions of events per second with low latency, providing durable storage and ordered processing. AWS Lambda is correct because it can process events from Kinesis in near real-time with automatic scaling, making it ideal for minimal-latency processing. Amazon Redshift is correct because it is a petabyte-scale data warehouse optimized for long-term storage and complex analytics on large datasets, supporting high-performance queries.

Exam trap

The trap here is that candidates often confuse Amazon SQS with Kinesis for streaming ingestion, or assume Amazon RDS can handle long-term analytics storage, but the exam tests the specific use case of high-throughput, low-latency streaming and petabyte-scale analytics.

Practice this question →

25

MCQeasy

Based on the exhibit, which EBS volume type should the team use to meet the performance need at lower cost than overprovisioning capacity?

A.Use gp3 and provision the needed IOPS independently of volume size.

B.Use sc1 because it is optimized for infrequent access and large objects.

C.Use st1 because it provides high throughput for streaming data.

D.Use standard magnetic storage because it is compatible with all EC2 instances.

AnswerA

gp3 is the best fit because it lets you provision IOPS and throughput separately from volume size. The exhibit shows the workload needs around 10,000 IOPS and experiences queue buildup on gp2. With gp3, the team can raise performance without unnecessarily increasing storage capacity, which is usually more cost-effective for this kind of database workload.

Why this answer

The team needs to meet performance requirements at lower cost than overprovisioning capacity. gp3 allows you to provision baseline performance of 3,000 IOPS and 125 MiB/s throughput for any volume size, and you can independently increase IOPS up to 16,000 and throughput up to 1,000 MiB/s without needing to increase volume size. This avoids the cost of overprovisioning large gp2 volumes to achieve higher IOPS, which are tied to volume size (3 IOPS per GiB).

Exam trap

The trap here is that candidates assume all EBS volume types require overprovisioning capacity to achieve higher IOPS, forgetting that gp3 decouples performance from size, making it the most cost-effective choice for workloads needing specific IOPS without large storage.

How to eliminate wrong answers

Option B is wrong because sc1 (Cold HDD) is designed for infrequently accessed, large sequential workloads with low cost per GB, but it cannot meet consistent IOPS performance needs due to its burst model and very low baseline IOPS (as low as 12 IOPS per TB). Option C is wrong because st1 (Throughput Optimized HDD) is optimized for high throughput for streaming, big data, and log processing, but it does not support independent IOPS provisioning and has low IOPS (as low as 40 IOPS per TB), making it unsuitable for workloads needing predictable IOPS. Option D is wrong because standard magnetic storage (previous generation) is deprecated for most use cases, offers very low performance (average 100 IOPS), and is not cost-effective compared to gp3 for any performance requirement.

Practice this question →

26

MCQmedium

Your company needs a high-throughput, low-latency TCP service using a custom binary protocol. Requirements: preserve the original client source IP for rate limiting, keep latency minimal, and use TCP health checks. The current setup uses an Application Load Balancer and performance is inconsistent. Which load balancer choice best meets these requirements?

A.Keep the Application Load Balancer (ALB), because ALBs also preserve client source IP for TCP protocols.

B.Use a Network Load Balancer (NLB) with TCP listeners so traffic stays at Layer 4 and the original source IP is preserved.

C.Use Amazon API Gateway because it preserves client source IP and provides TCP health checks for all protocols.

D.Use Amazon CloudFront with an S3 origin, because CloudFront reduces latency for TCP-based protocols.

AnswerB

NLB is designed for Layer 4 TCP/UDP traffic with very low latency and high throughput. It supports TCP health checks and preserves the original client source IP by default, which enables accurate client-IP-based rate limiting for a custom TCP protocol.

Why this answer

A Network Load Balancer (NLB) operates at Layer 4 and preserves the original client source IP by default, which is essential for accurate rate limiting. Its TCP listeners provide low-latency, high-throughput handling of custom binary protocols, and it supports TCP health checks natively. This directly addresses the performance inconsistency seen with the Application Load Balancer, which operates at Layer 7 and introduces additional processing overhead.

Exam trap

The trap here is that candidates often assume Application Load Balancers preserve client source IP for all protocols, but they only do so for HTTP/HTTPS traffic via the X-Forwarded-For header, not for raw TCP traffic, and they introduce higher latency due to Layer 7 processing.

How to eliminate wrong answers

Option A is wrong because an Application Load Balancer operates at Layer 7 (HTTP/HTTPS) and does not preserve the original client source IP for TCP traffic; it terminates the client connection and re-establishes a new one, so the source IP seen by the backend is the ALB's private IP. Option C is wrong because Amazon API Gateway is a fully managed service for creating RESTful and WebSocket APIs, not a load balancer; it does not support TCP listeners or TCP health checks, and it operates at Layer 7. Option D is wrong because Amazon CloudFront is a content delivery network (CDN) that caches content at edge locations, but it does not support TCP-based custom binary protocols (it works with HTTP/HTTPS and WebSocket) and cannot use an S3 origin for a TCP service; it also does not preserve the original client source IP for TCP traffic.

Practice this question →

27

Matchinghard

A company runs a stateless application tier behind an Application Load Balancer. Match each observed scaling pattern on the left to the best Auto Scaling strategy or metric on the right.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Scale the Auto Scaling group on ALB RequestCountPerTarget.

Scale on SQS queue depth using a custom CloudWatch metric.

Use scheduled scaling to add capacity before the recurring surge.

Use target tracking on EC2 CPUUtilization.

Why these pairings

Steady increase is best handled by step scaling for gradual adjustments; sudden spikes use simple scaling for immediate action; cyclical patterns benefit from scheduled scaling; consistent low traffic may not need scaling; unpredictable bursts are managed by target tracking to maintain a metric; gradual decrease uses simple scaling to reduce capacity.

Practice this question →

28

MCQmedium

A high-volume telemetry pipeline writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate?

A.Amazon Kinesis Data Streams

B.AWS DataSync

C.Amazon EBS

D.Amazon Route 53

AnswerA

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the most appropriate service because it is designed for real-time streaming data ingestion and processing. It can capture and store terabytes of data per hour from hundreds of thousands of sources, such as click events, and allows multiple independent consumers to read and process the same stream concurrently using the Kinesis Client Library (KCL) or enhanced fan-out with dedicated throughput.

Exam trap

The trap here is that candidates may confuse Kinesis Data Streams with simpler messaging services like SQS or SNS, but the key differentiator is that Kinesis supports multiple independent consumers processing the same stream in real-time with replay capability, whereas SQS is designed for point-to-point message delivery and SNS for pub/sub with push-based fan-out.

How to eliminate wrong answers

Option B (AWS DataSync) is wrong because it is a data transfer service for moving large datasets between on-premises storage and AWS services, not for real-time streaming or multiple consumer processing. Option C (Amazon EBS) is wrong because it provides block-level storage volumes for EC2 instances, not a streaming data ingestion or processing capability. Option D (Amazon Route 53) is wrong because it is a DNS web service for domain name resolution and routing, not for handling streaming telemetry data.

Practice this question →

29

MCQeasy

A system uses multiple AWS Lambda functions behind different event sources. One Lambda occasionally spikes and causes other Lambdas to be throttled due to shared concurrency limits. Which setting best helps ensure the important Lambda keeps capacity during spikes?

A.Increase the function timeout so throttling is less likely.

B.Set Reserved Concurrency for the important Lambda function.

C.Enable Provisioned Concurrency for every Lambda in the account.

D.Reduce the number of IAM policies attached to the Lambda roles.

AnswerB

Reserved concurrency allocates a guaranteed amount of concurrent execution capacity to a specific Lambda. This prevents other functions from consuming all concurrency and throttling the important one. If the reserved limit is reached, only that function is throttled, isolating impact.

Why this answer

Reserved Concurrency guarantees that the important Lambda function always has a set number of concurrent executions available, preventing other functions from consuming the account-level concurrency pool and throttling it during spikes. This setting isolates the function's capacity from shared contention, ensuring its performance remains stable.

Exam trap

The trap here is confusing Provisioned Concurrency (which reduces cold starts) with Reserved Concurrency (which guarantees capacity), leading candidates to pick Option C even though it does not solve the throttling issue.

How to eliminate wrong answers

Option A is wrong because increasing the function timeout does not affect concurrency limits; it only extends the maximum execution duration, which could actually increase the chance of throttling by holding concurrency slots longer. Option C is wrong because Provisioned Concurrency pre-warms environments to reduce cold starts but does not reserve capacity against the shared concurrency limit; it still counts toward the account's total concurrency and does not prevent other functions from consuming the pool. Option D is wrong because reducing IAM policies affects permissions, not concurrency limits; it has no impact on Lambda's throttling behavior.

Practice this question →

30

MCQeasy

A retail API uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured? The architecture review board prefers a managed AWS-native control.

A.Auto Scaling policy based on an appropriate CloudWatch metric

B.S3 Object Lock

C.A VPC endpoint for CloudWatch only

D.Disable health checks

AnswerA

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

The correct answer is A because an Auto Scaling policy based on an appropriate CloudWatch metric (e.g., CPUUtilization or ALB RequestCountPerTarget) dynamically adds or removes EC2 instances to match demand, preventing sustained high CPU and rising latency. This is a managed, AWS-native control that aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates may confuse operational controls (like health checks or VPC endpoints) with scaling mechanisms, or assume that disabling health checks somehow improves performance, when in fact it worsens reliability and latency.

How to eliminate wrong answers

Option B is wrong because S3 Object Lock is a data protection feature for Amazon S3 objects (write-once-read-many, WORM) and has no role in scaling compute resources or reducing request latency. Option C is wrong because a VPC endpoint for CloudWatch only provides private connectivity to CloudWatch APIs, not scaling or performance improvement for EC2 instances behind an ALB. Option D is wrong because disabling health checks would cause the ALB to continue routing traffic to unhealthy instances, increasing latency and potentially causing failures, which is the opposite of a high-performance architecture.

Practice this question →

31

Multi-Selecthard

A latency-sensitive mobile game backend uploads large files to S3 from users around the world. Which two features can improve upload performance? The team wants the control to be enforceable during normal operations.

Select 2 answers

A.S3 Object Lock

B.S3 multipart upload

C.S3 Inventory

D.S3 Transfer Acceleration

AnswersB, D

Multipart upload parallelizes large object upload parts and improves reliability.

Why this answer

S3 multipart upload is correct because it allows large files to be uploaded in parallel parts, significantly reducing the impact of network latency and improving throughput. This is ideal for a latency-sensitive mobile game backend where users worldwide upload large files, as it enables faster uploads by splitting the file into smaller chunks that can be uploaded concurrently.

Exam trap

The trap here is that candidates might confuse S3 Transfer Acceleration with a feature that requires client-side changes, but it actually works transparently via a special endpoint, while S3 Object Lock is mistakenly thought to improve performance due to its 'lock' name implying faster access.

Practice this question →

32

MCQmedium

A read-heavy document portal repeatedly queries the same product catalogue data from DynamoDB with millisecond latency requirements. Which service can reduce read latency and table load?

A.Amazon Kinesis Data Firehose

B.S3 Transfer Acceleration

C.DynamoDB Accelerator (DAX)

D.AWS Glue Data Catalog

AnswerC

DAX is an in-memory cache for DynamoDB that reduces read latency for suitable access patterns.

Why this answer

DynamoDB Accelerator (DAX) is a fully managed, in-memory cache for Amazon DynamoDB that delivers up to 10x read performance improvement by caching frequently accessed data. For a read-heavy workload querying the same product catalogue data, DAX reduces read latency to microseconds and offloads read requests from the DynamoDB table, lowering consumed read capacity units and table load.

Exam trap

The trap here is that candidates often confuse caching services (DAX) with data transfer acceleration (S3 Transfer Acceleration) or data ingestion (Kinesis Data Firehose), failing to recognize that the core requirement is to reduce DynamoDB read latency and table load, which only a dedicated in-memory cache like DAX can achieve.

How to eliminate wrong answers

Option A is wrong because Amazon Kinesis Data Firehose is a streaming data ingestion service for loading data into data stores or analytics tools, not a caching layer for DynamoDB read operations. Option B is wrong because S3 Transfer Acceleration speeds up uploads and downloads to Amazon S3 over long distances using AWS edge locations, but it does not cache DynamoDB data or reduce read latency for DynamoDB queries. Option D is wrong because AWS Glue Data Catalog is a metadata repository for data assets used in ETL and analytics, not a cache for DynamoDB read requests.

Practice this question →

33

MCQhard

Based on the exhibit, which change best reduces latency during peak traffic without overprovisioning the fleet?

A.Replace the instances with a larger instance family so each server has more headroom.

B.Change the Auto Scaling policy to target tracking on ALB RequestCountPerTarget.

C.Use scheduled scaling to add instances only during the business hours peak window.

D.Replace the ALB with a Network Load Balancer to reduce request latency.

AnswerB

RequestCountPerTarget matches the actual demand reaching each instance and scales capacity before the thread pool saturates. Because CPU is still low, CPU-based scaling would react too late or not at all. Target tracking on request count helps keep queue depth and latency down while avoiding unnecessary overprovisioning during quieter periods.

Why this answer

Option B is correct because using a target tracking scaling policy on ALB RequestCountPerTarget dynamically adjusts the fleet size based on the actual load per instance, ensuring that capacity scales with demand during peak traffic without manual intervention or overprovisioning. This approach directly addresses latency caused by high request rates per instance by maintaining a target request count, which reduces response time without adding unnecessary instances during off-peak periods.

Exam trap

The trap here is that candidates confuse 'reducing latency' with 'improving network throughput' (Option D) or 'static capacity increases' (Option A), missing that dynamic scaling based on per-target request count directly addresses the latency caused by overloaded instances during peak traffic.

How to eliminate wrong answers

Option A is wrong because simply replacing instances with a larger family increases per-instance capacity but does not automatically scale the fleet; it leads to overprovisioning during low traffic and fails to adapt to variable peak loads, wasting cost and not reducing latency efficiently. Option C is wrong because scheduled scaling adds instances only during a fixed business hours window, which cannot handle unpredictable peak traffic spikes outside that window, leaving the fleet either under-provisioned or over-provisioned. Option D is wrong because replacing the ALB with a Network Load Balancer (NLB) reduces transport-layer latency but does not address the root cause of latency—high request load per target—and NLB lacks the application-layer metrics (like RequestCountPerTarget) needed for intelligent Auto Scaling based on request volume.

Practice this question →

34

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The architecture review board prefers a managed AWS-native control.

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a global content delivery network (CDN) that caches static content (images, JavaScript) at edge locations closer to users, drastically reducing latency for distant countries. By using the S3 bucket as the origin, CloudFront offloads requests from S3 and accelerates delivery via HTTP/2, TCP optimizations, and persistent connections. This is the most effective managed AWS-native solution for improving global load times for static assets.

Exam trap

The trap here is that candidates may confuse scaling storage (larger bucket) or compute (Auto Scaling) with performance improvement, overlooking that latency for static content is primarily a network distance problem solved by a CDN like CloudFront.

How to eliminate wrong answers

Option A is wrong because increasing the S3 bucket size does not improve data transfer speed or reduce latency; S3 performance is independent of bucket size and is limited by the bucket's regional location. Option C is wrong because RDS read replicas are designed for scaling database read traffic, not for serving static files or accelerating HTTP content delivery. Option D is wrong because an EC2 Auto Scaling group in a single Region does not reduce latency for users in distant countries; it only provides regional scalability and fault tolerance, not global edge caching.

Practice this question →

35

MCQmedium

A DynamoDB table stores device status items. The partition key is deviceId, and the partition distribution is healthy (no single partition dominates). However, during peak periods the application experiences high read latency because many clients repeatedly request the latest status for the same devices. Which action best improves read latency without changing the DynamoDB partitioning model?

A.Add Amazon DAX as a caching layer in front of DynamoDB and route repeated read operations through DAX.

B.Change the partition key to a random value for each request to eliminate hot partitions.

C.Increase write capacity only, because writes generally determine read latency in DynamoDB.

D.Create an additional Global Secondary Index (GSI) and read exclusively from the index to accelerate reads.

AnswerA

Amazon DAX is an in-memory caching layer for DynamoDB that accelerates repeated reads. When many clients request the same items (for example, “latest status” point reads by deviceId), DAX can serve cached responses directly, reducing round trips to DynamoDB and lowering read latency during peak periods.

Why this answer

Amazon DAX is a fully managed, in-memory cache for DynamoDB that provides microsecond read latency. By caching the results of repeated GetItem and Query requests for the same device status items, DAX offloads read traffic from the underlying DynamoDB table, reducing the number of read capacity units consumed and eliminating the latency caused by repeated fetches from disk. This directly addresses the high read latency during peak periods without altering the existing partition key or partitioning model.

Exam trap

The trap here is that candidates may think a GSI can magically speed up reads, but GSIs do not provide caching and still read from the same storage layer, so they do not reduce latency for repeated identical queries.

How to eliminate wrong answers

Option B is wrong because changing the partition key to a random value would break the ability to query for the latest status of a specific device, as the partition key is used to identify the device; this would require a complete redesign of the access pattern and data model. Option C is wrong because increasing write capacity does not reduce read latency; read latency is primarily affected by the number of read requests and the time to fetch data from storage, not by write capacity. Option D is wrong because creating a Global Secondary Index (GSI) does not inherently accelerate reads; while a GSI can support different query patterns, it still reads from the same underlying storage and does not provide caching, so repeated reads for the same items would still incur the same latency.

Practice this question →

36

MCQmedium

A containerized service fleet running on EC2 instances needs to share user-uploaded files and access them with low latency. The workload is bursty: sometimes dozens of instances concurrently read the same directory for short periods, and then traffic drops. Which Amazon EFS configuration best matches these performance needs?

A.Use Amazon EFS General Purpose performance mode and Throughput mode set to Bursting.

B.Use Amazon EFS Max I/O performance mode with Throughput mode set to Provisioned.

C.Use Amazon EFS General Purpose performance mode with Throughput mode set to Provisioned.

D.Use Amazon EFS Max I/O performance mode with Throughput mode set to Bursting.

AnswerA

EFS General Purpose performance mode is designed for latency-sensitive use cases with a broad range of I/O sizes, including typical file-sharing and web-content workloads. Throughput mode Bursting provides baseline throughput and allows throughput to scale up during demand spikes, which matches the pattern of short read bursts from many instances. When traffic drops, the system returns to baseline without requiring you to provision peak throughput for all time.

Why this answer

Option A is correct because the workload is bursty with concurrent reads of the same directory, which favors the General Purpose performance mode for its strong consistency and lower latency per operation. The Bursting Throughput mode is ideal for bursty traffic as it allows the file system to accumulate burst credits during idle periods and consume them during high-demand spikes, matching the described pattern without incurring additional costs.

Exam trap

The trap here is that candidates often assume Max I/O is always better for high concurrency, but they overlook that General Purpose mode provides lower latency and stronger consistency for directory-heavy bursty reads, which is the actual requirement.

How to eliminate wrong answers

Option B is wrong because Max I/O performance mode is designed for highly parallelized workloads (e.g., thousands of instances) but sacrifices consistency and can introduce higher per-operation latency, which is not suitable for low-latency access to the same directory. Option C is wrong because Provisioned Throughput mode is intended for steady-state throughput requirements and would waste cost on a bursty workload that could use Bursting mode's credit-based model. Option D is wrong because Max I/O performance mode is not optimal for low-latency, directory-heavy access patterns, and while Bursting mode fits the bursty nature, the combination with Max I/O undermines the low-latency requirement.

Practice this question →

37

MCQeasy

A customer-facing application has a relational data model and needs frequent complex queries (joins and aggregations), but it also experiences a significant read-heavy workload. Which design choice best improves read performance while keeping relational features?

A.Use DynamoDB with a single partition key and avoid indexes to keep writes simple.

B.Add read replicas to an RDS or Aurora cluster and keep the primary for writes.

C.Store the data in S3 and query it directly from the application without a database.

D.Switch the database to DynamoDB but keep using the same relational SQL queries and joins.

AnswerB

Read replicas offload read operations from the primary database instance, improving read throughput and reducing contention with writes. RDS/Aurora preserve relational capabilities like joins and SQL queries. This is a common and practical way to scale performance for read-heavy workloads without completely changing the data model.

Why this answer

Adding read replicas to an RDS or Aurora cluster offloads read traffic from the primary instance, improving read performance for complex queries (joins and aggregations) while preserving the full relational data model and SQL capabilities. Aurora’s distributed storage layer also allows replicas to serve reads with minimal replication lag, making this the optimal choice for read-heavy workloads that require relational features.

Exam trap

The trap here is that candidates often assume NoSQL (DynamoDB) is always the best choice for read-heavy workloads, overlooking that complex relational queries and joins are not supported, making read replicas on RDS/Aurora the correct relational scaling solution.

How to eliminate wrong answers

Option A is wrong because DynamoDB with a single partition key and no indexes would severely limit query flexibility and performance for complex joins and aggregations, which are not natively supported by NoSQL. Option C is wrong because S3 is an object store with no native support for relational queries, joins, or aggregations; querying it directly would require scanning entire datasets and implementing complex logic in the application, leading to poor performance and high latency. Option D is wrong because DynamoDB does not support SQL joins or complex relational queries; attempting to use the same SQL queries would fail or require significant application-level re-engineering, defeating the purpose of keeping relational features.

Practice this question →

38

MCQmedium

A global mobile game backend serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The design must avoid adding custom operational scripts.

A.RDS read replicas

B.Amazon CloudFront distribution with the S3 bucket as origin

C.A larger S3 bucket

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content at edge locations worldwide, reducing latency for users in distant countries by serving files from the nearest edge. Using the S3 bucket as the origin, CloudFront distributes the content globally without requiring any custom operational scripts, directly addressing the slow load times for static assets.

Exam trap

The trap here is that candidates may confuse database read replicas (Option A) with content delivery, not realizing that static asset acceleration requires a CDN like CloudFront, not a database scaling solution.

How to eliminate wrong answers

Option A is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3. Option C is wrong because increasing the S3 bucket size does not improve performance; S3 performance is independent of bucket size and does not reduce latency for distant users. Option D is wrong because an EC2 Auto Scaling group in a single Region does not provide global edge caching; it only scales compute capacity within one geographic area, failing to reduce latency for users in other regions.

Practice this question →

39

MCQhard

A DynamoDB table for a travel booking site has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The design must avoid adding custom operational scripts.

A.Create a global secondary index with the same date key

B.Move the table to S3 Glacier Instant Retrieval

C.Reduce the table's write capacity

D.Use a higher-cardinality partition key that distributes writes across partitions

AnswerD

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Option D is correct because using a low-cardinality partition key like the current date causes all writes to land on a single partition, leading to throttling. By designing a higher-cardinality key (e.g., combining date with a random suffix or user ID), writes are distributed evenly across partitions, fully utilizing the provisioned write capacity without custom scripts.

Exam trap

The trap here is that candidates may think adding a GSI (Option A) solves the issue, but GSIs inherit the same partition key design flaws and can also throttle independently.

How to eliminate wrong answers

Option A is wrong because a global secondary index (GSI) with the same date key would still concentrate writes on a single partition in the index, replicating the throttling issue. Option B is wrong because S3 Glacier Instant Retrieval is a storage class for infrequently accessed objects, not a replacement for DynamoDB's transactional write throughput, and moving the table would break the application's access pattern. Option C is wrong because reducing write capacity would worsen throttling during business hours, not solve the underlying partition hot-spotting problem.

Practice this question →

40

MCQeasy

A travel booking site uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured? The architecture review board prefers a managed AWS-native control.

A.A VPC endpoint for CloudWatch only

B.Auto Scaling policy based on an appropriate CloudWatch metric

C.S3 Object Lock

D.Disable health checks

AnswerB

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

The correct answer is B because an Auto Scaling policy based on an appropriate CloudWatch metric (such as CPUUtilization or request latency) dynamically adds or removes EC2 instances to match demand. This managed AWS-native control directly addresses high CPU and rising latency during peak traffic by scaling out capacity, which is the preferred approach per the architecture review board's requirement for a managed solution.

Exam trap

The trap here is that candidates may confuse monitoring (VPC endpoints) or data protection (S3 Object Lock) with performance scaling, overlooking that Auto Scaling is the direct AWS-native solution for handling variable load and high CPU.

How to eliminate wrong answers

Option A is wrong because a VPC endpoint for CloudWatch only provides private connectivity to CloudWatch APIs, not scaling or performance improvement; it does not reduce CPU load or latency. Option C is wrong because S3 Object Lock is a data protection feature for preventing object deletion or overwrites in S3, unrelated to EC2 performance or scaling. Option D is wrong because disabling health checks would cause the ALB to route traffic to unhealthy instances, worsening latency and availability, not solving the high CPU issue.

Practice this question →

41

MCQeasy

An application repeatedly reads the same DynamoDB items with very low latency requirements. The application can tolerate slightly stale data (for example, within a few seconds). You want to improve read latency without changing the existing DynamoDB table schema. Which service is the best choice?

A.Amazon DAX

B.Amazon S3 Transfer Acceleration

C.Amazon EFS

D.AWS CloudTrail for data plane reads

AnswerA

Amazon DAX is an in-memory cache specifically designed for DynamoDB reads. It can significantly reduce read latency for frequently accessed items. Because the application can tolerate brief staleness, DAX’s caching behavior is appropriate and does not require a DynamoDB schema change.

Why this answer

Amazon DAX (DynamoDB Accelerator) is an in-memory cache that sits between your application and DynamoDB, providing microsecond read latency for frequently accessed items. Since the application can tolerate slightly stale data (within seconds), DAX's default TTL-based caching is ideal because it reduces read pressure on DynamoDB while serving cached results with significantly lower latency than direct DynamoDB reads.

Exam trap

The trap here is confusing a caching layer (DAX) with unrelated acceleration or storage services (S3 Transfer Acceleration, EFS) or with auditing tools (CloudTrail), leading candidates to pick options that don't address DynamoDB read latency at all.

How to eliminate wrong answers

Option B (Amazon S3 Transfer Acceleration) is wrong because it accelerates uploads to S3 over long distances using edge locations, not DynamoDB reads, and does not address DynamoDB latency. Option C (Amazon EFS) is wrong because it is a file storage service for EC2 instances, not a cache for DynamoDB items, and introduces network filesystem overhead incompatible with sub-millisecond read requirements. Option D (AWS CloudTrail for data plane reads) is wrong because CloudTrail records API activity for auditing, not caching or accelerating data reads, and enabling it would add latency and cost without improving read performance.

Practice this question →

42

MCQmedium

A DynamoDB-backed multi-tenant app experiences throttling. Most write traffic for tenant 'ACME' targets a single logical stream of events (you write items for ACME in near-real time). The table currently uses partition key = tenantId and sort key = eventTimestamp. CloudWatch shows partition-level throttling concentrated in the ACME partition. What design change most directly improves write throughput for the hottest tenant while still enabling efficient queries for recent events for that tenant?

A.Add a Global Secondary Index (GSI) with the same partition key (tenantId) and eventTimestamp, and rely on the GSI to spread load.

B.Mitigate the hotspot by changing the partition key to include a shard value (for example, tenantId + '#' + shardId) and write using shardId. Query recent events by fanning out across ACME shards and merging results by eventTimestamp.

C.Increase the table’s write capacity (or on-demand baseline) without changing the partition key, because DynamoDB will automatically balance hotspots.

D.Switch the sort key to a random value to prevent writes from landing on the same physical partition.

AnswerB

In DynamoDB, the partition key controls which physical partitions receive traffic for that key value. By adding shardId into the partition key, ACME writes are distributed across multiple partitions, increasing aggregate write capacity and reducing partition-level throttling. Efficient recent-event queries are still possible by querying each ACME shard for the relevant time range (using eventTimestamp as the sort key) and merging the ordered results.

Why this answer

Option B is correct because it directly addresses the partition-level throttling by introducing a shard key (e.g., tenantId + '#' + shardId) as the partition key, which distributes ACME's write load across multiple physical partitions. To query recent events for ACME, the application must fan out queries across all shards and merge results by eventTimestamp, which is efficient because each shard holds a subset of the data and the sort key remains eventTimestamp for ordering.

Exam trap

The trap here is that candidates often think increasing provisioned capacity or switching to on-demand mode will automatically resolve a hot partition, but DynamoDB's per-partition throughput limit (3,000 RCU or 1,000 WCU) is a hard ceiling that cannot be overcome without redistributing the partition key.

How to eliminate wrong answers

Option A is wrong because adding a GSI with the same partition key (tenantId) does not spread the write load; the base table still experiences the same hotspot, and the GSI inherits the same partition-level throttling. Option C is wrong because DynamoDB does not automatically balance hotspots caused by a single partition key; increasing write capacity on a table with a skewed partition key only raises the per-partition limit but does not distribute the load across more partitions. Option D is wrong because changing the sort key to a random value would prevent efficient queries for recent events (since sort key ordering is lost) and does not change the partition key, so writes still target the same physical partition.

Practice this question →

43

MCQhard

Based on the exhibit, a serverless API on AWS Lambda experiences a predictable cold-start penalty every weekday at 09:00 UTC when a marketing campaign begins. The team wants the first requests to stay fast while minimizing extra cost during quiet periods. What is the best approach?

A.Enable provisioned concurrency on the published version and schedule it to scale up shortly before the spike.

B.Increase the Lambda timeout so cold starts have more time to complete.

C.Move the function behind an Application Load Balancer to improve warm-up behavior.

D.Increase the function memory to the maximum value and leave concurrency unchanged.

AnswerA

Provisioned concurrency keeps warm execution environments ready for the alias or version, which removes the cold-start penalty. Scheduling it only around the known spike keeps performance high while limiting unnecessary cost during idle periods.

Why this answer

Provisioned concurrency pre-warms a specified number of Lambda execution environments so that incoming requests do not incur a cold start. By scheduling the provisioned concurrency to scale up just before the 09:00 UTC spike and scale down afterward, the team eliminates the cold-start penalty during the campaign while minimizing cost during quiet periods. This directly addresses the predictable, time-bound traffic pattern without requiring code changes or over-provisioning.

Exam trap

The trap here is that candidates confuse increasing memory or timeout with solving cold starts, or they mistakenly think an ALB can pre-warm Lambda, when in fact only provisioned concurrency guarantees warm containers for the first requests in a predictable traffic spike.

How to eliminate wrong answers

Option B is wrong because increasing the Lambda timeout does not prevent cold starts; it only extends the maximum execution duration, which has no effect on the initialization latency of a new execution environment. Option C is wrong because placing the function behind an Application Load Balancer does not warm up the Lambda; ALB is a request router and does not maintain warm containers or alter Lambda's scaling behavior. Option D is wrong because increasing memory (which also increases CPU allocation) can reduce cold-start duration but does not eliminate it, and setting memory to the maximum value (10,240 MB) would significantly increase cost without guaranteeing zero cold starts for the first requests.

Practice this question →

44

MCQeasy

A data processing application runs on a single EC2 instance and needs persistent block storage with sustained low-latency random read/write performance (high IOPS). Which storage choice is most appropriate?

A.EBS io2 provisioned IOPS SSD

B.Amazon S3 Standard

C.Amazon EFS for POSIX file sharing between multiple instances

D.EBS Throughput Optimized HDD (st1) storage

AnswerA

EBS io2 is built for high-performance, low-latency block storage with provisioned IOPS.

Why this answer

Amazon EBS io2 Provisioned IOPS SSD volumes are designed for I/O-intensive workloads that require sustained, low-latency random read/write performance with high IOPS. They provide consistent performance by allowing you to provision a specific IOPS level (up to 256,000 IOPS per volume) and offer a 99.999% durability guarantee, making them ideal for a single EC2 instance needing persistent block storage.

Exam trap

The trap here is that candidates often confuse throughput-optimized HDD (st1) with IOPS-optimized SSD (io2) because both are EBS volume types, but st1 is designed for sequential, not random, I/O workloads.

How to eliminate wrong answers

Option B is wrong because Amazon S3 is an object storage service accessed via HTTP/HTTPS, not a block storage device, and cannot be attached directly to an EC2 instance for low-latency random read/write operations. Option C is wrong because Amazon EFS is a file-level, NFS-based storage service designed for shared access across multiple EC2 instances, not for providing persistent block storage with sustained low-latency random I/O to a single instance. Option D is wrong because EBS Throughput Optimized HDD (st1) volumes are optimized for large, sequential workloads (e.g., big data, log processing) and cannot deliver the sustained low-latency random I/O performance required for high IOPS workloads.

Practice this question →

45

MCQeasy

Based on the exhibit, which Amazon EFS performance mode is the best fit for this workload?

A.Use General Purpose performance mode for low-latency access.

B.Use Max I/O performance mode to optimize for the highest possible latency tolerance.

C.Use One Zone storage class to increase metadata speed.

D.Use Provisioned Throughput mode because it is the only performance mode available.

AnswerA

General Purpose is the best EFS performance mode when the priority is low latency for small file operations. The exhibit describes a moderate number of clients and latency-sensitive metadata access, which matches the strengths of General Purpose. It is the usual choice for most applications unless the workload specifically needs very large-scale parallel throughput.

Why this answer

General Purpose performance mode is the best fit for this workload because it provides the lowest latency for file operations, which is critical for applications like content management, web serving, or home directories that require consistent, sub-millisecond metadata latency. Max I/O mode, in contrast, trades off latency for higher throughput and IOPS, making it unsuitable for latency-sensitive workloads.

Exam trap

The trap here is that candidates confuse performance modes (General Purpose vs. Max I/O) with throughput modes (Bursting vs. Provisioned) or storage classes (Standard vs.

One Zone), leading them to select options that address throughput or availability instead of latency requirements.

How to eliminate wrong answers

Option B is wrong because Max I/O performance mode is designed for high throughput and IOPS at the cost of higher latency, not for optimizing latency tolerance; it is intended for large-scale, parallel workloads like big data analytics. Option C is wrong because One Zone storage class is a storage class that stores data in a single Availability Zone, not a performance mode, and it does not affect metadata speed; metadata performance is determined by the performance mode, not the storage class. Option D is wrong because Provisioned Throughput mode is a throughput mode, not a performance mode; Amazon EFS offers two performance modes (General Purpose and Max I/O) and two throughput modes (Bursting and Provisioned), and Provisioned Throughput is not a performance mode.

Practice this question →

46

Multi-Selecthard

A latency-sensitive video platform uploads large files to S3 from users around the world. Which two features can improve upload performance?

Select 2 answers

A.S3 Object Lock

B.S3 Transfer Acceleration

C.S3 multipart upload

D.S3 Inventory

AnswersB, C

Transfer Acceleration uses optimized edge paths into AWS for long-distance S3 transfers.

Why this answer

S3 Transfer Acceleration (B) uses AWS edge locations to route uploads over optimized network paths, reducing latency and packet loss for global users. S3 multipart upload (C) allows large files to be uploaded in parallel parts, improving throughput and enabling retries of individual parts without restarting the entire upload.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration with CloudFront or think multipart upload is only for resumability, when in fact both features directly address latency and throughput for global, large-file uploads.

Practice this question →

47

MCQmedium

A high-volume analytics dashboard writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate? The architecture review board prefers a managed AWS-native control.

A.Amazon Route 53

B.Amazon EBS

C.Amazon Kinesis Data Streams

D.AWS DataSync

AnswerC

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the most appropriate service because it is a managed, AWS-native solution designed for real-time streaming data ingestion and processing. It durably stores records for up to 365 days and allows multiple independent consumers (e.g., Lambda, Kinesis Data Analytics, EC2) to read from the same stream concurrently using enhanced fan-out or shared throughput, meeting the requirement for high-volume click event processing.

Exam trap

The trap here is that candidates may confuse Amazon Kinesis Data Streams with Amazon Kinesis Data Firehose, but Firehose is a near-real-time delivery service that does not support multiple independent consumers reading the same data stream concurrently.

How to eliminate wrong answers

Option A is wrong because Amazon Route 53 is a DNS web service that translates domain names to IP addresses; it does not ingest, store, or process streaming data. Option B is wrong because Amazon EBS provides block-level storage volumes for EC2 instances; it is not a streaming data service and cannot support multiple independent consumers reading a continuous data stream. Option D is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services (e.g., S3, EFS) over the network; it is designed for batch transfers, not real-time streaming or concurrent consumer processing.

Practice this question →

48

Multi-Selecthard

A retail analytics table stores events in Amazon DynamoDB with partition key tenantId and sort key eventTime. During a promotion, one tenant generates most writes and repeatedly polls the same latest-status items, causing throttling on a single partition key and high latency on reads. The business can tolerate read results that are a few seconds stale. Which two changes will most effectively reduce throttling and latency? Select two.

Select 2 answers

A.Introduce write sharding by adding a bounded random suffix to the hot tenant partition key and fan out reads across the shards.

B.Add DynamoDB Accelerator (DAX) in front of the table for the repeated status reads.

C.Keep the same key design and increase only the table’s provisioned RCUs and WCUs.

D.Replace the table reads with a Scan operation to distribute the load across all partitions.

E.Move the table to another Availability Zone so the hot tenant uses a different storage node.

AnswersA, B

Sharding spreads the hot tenant’s traffic across multiple partitions so DynamoDB is no longer forced to serve all writes through one physical partition. Querying across the shard set restores access to the tenant’s data while reducing throttling. This is the standard fix when a single partition key becomes a hot spot.

Why this answer

Option A is correct because write sharding distributes the hot tenant's writes across multiple partitions by appending a bounded random suffix to the partition key, preventing a single partition from throttling. Reads then fan out across all shards and aggregate results, which is acceptable since the business tolerates a few seconds of staleness. This directly addresses the single-partition bottleneck without changing the overall data model.

Exam trap

The trap here is that candidates often assume DAX alone can fix both read and write throttling, but DAX only caches reads and does not address the write-side partition bottleneck that causes throttling in the first place.

Practice this question →

49

Multi-Selecthard

A media company serves versioned JavaScript and CSS files from Amazon S3 through CloudFront. After each release, the cache hit ratio drops sharply because the same distribution also fronts a personalized API path, and the current cache policy forwards cookies, all query strings, and several headers to every origin request. The static assets already use content-hashed filenames. Which two changes will most directly improve cache hit ratio for the static assets without changing the application behavior? Select two.

Select 2 answers

A.Create a dedicated cache behavior for the static asset path that excludes cookies, query strings, and unneeded headers from the cache key.

B.Keep the content-hashed filenames and send long Cache-Control max-age and immutable headers for the versioned objects.

C.Increase the size of the S3 bucket’s underlying storage to absorb more origin traffic.

D.Add Lambda@Edge logic to append a timestamp to every asset request so updates are always fetched immediately.

E.Disable compression so CloudFront can treat each object as a separate cache entry.

AnswersA, B

Separating the static asset behavior lets CloudFront cache those objects independently from the personalized API. Excluding cookies, query strings, and unnecessary headers prevents cache fragmentation, so many viewers can reuse the same cached object. This is the most direct way to raise hit ratio without altering how the application serves assets.

Why this answer

Option A is correct because creating a dedicated cache behavior for the static asset path (e.g., /static/*) allows you to configure a cache policy that excludes cookies, query strings, and unneeded headers from the cache key. Since the static assets use content-hashed filenames, they are immutable and do not vary by user-specific attributes. By removing these variables from the cache key, CloudFront can serve the same cached object to all users, drastically improving the cache hit ratio.

Exam trap

The trap here is that candidates may think that content-hashed filenames alone guarantee high cache hit ratios, but they overlook that the shared cache policy forwarding cookies and query strings creates many unique cache keys for the same static file, negating the benefit of hashed filenames.

Practice this question →

50

MCQmedium

A high-volume analytics dashboard writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate? The design must avoid adding custom operational scripts.

A.Amazon Route 53

B.Amazon EBS

C.Amazon Kinesis Data Streams

D.AWS DataSync

AnswerC

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the correct choice because it is designed for real-time streaming data ingestion and processing by multiple independent consumers. Each consumer can read from the stream at its own pace using its own shard iterator, enabling parallel processing of click events without custom scripts. This aligns with the requirement for a high-volume analytics dashboard where multiple downstream applications need to consume the same stream independently.

Exam trap

The trap here is that candidates may confuse Kinesis Data Streams with Kinesis Data Firehose, but Firehose delivers data to a single destination and does not support multiple independent consumers natively.

How to eliminate wrong answers

Option A is wrong because Amazon Route 53 is a DNS and traffic management service, not designed for streaming data ingestion or processing. Option B is wrong because Amazon EBS provides block-level storage volumes for EC2 instances, not a streaming data platform for multiple consumers. Option D is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services, not for real-time streaming or multi-consumer processing.

Practice this question →

51

MCQhard

A document portal needs low-latency full-text search across product descriptions and filtered attributes. Which managed service is most suitable?

A.Amazon OpenSearch Service

B.AWS Config

C.Amazon EFS

D.Amazon SQS

AnswerA

OpenSearch is designed for search and analytics over indexed text and structured fields.

Why this answer

Amazon OpenSearch Service is purpose-built for full-text search, log analytics, and real-time application monitoring. It provides low-latency indexing and querying of unstructured and semi-structured data, making it ideal for searching product descriptions and filtering attributes. The service uses a RESTful API and supports advanced query DSL for complex search operations.

Exam trap

The trap here is that candidates may confuse a storage service (EFS) or a messaging service (SQS) with a search service, or mistakenly think AWS Config can be used for text search because it stores configuration data in a queryable format.

How to eliminate wrong answers

Option B (AWS Config) is wrong because it is a resource inventory and compliance auditing service, not a search engine; it cannot perform full-text search across product descriptions. Option C (Amazon EFS) is wrong because it is a scalable file storage service for Linux-based workloads, not a search or indexing service; it provides shared file access but no search capabilities. Option D (Amazon SQS) is wrong because it is a fully managed message queuing service for decoupling application components, not a search service; it cannot index or query text data.

Practice this question →

52

Multi-Selectmedium

A team is splitting a new workload into two fronts. The first front serves HTTPS microservices that need host- and path-based routing plus health checks. The second front must handle TCP and UDP traffic for a real-time service and preserve static IP addresses for firewall allowlisting. Which two AWS load balancer choices best match these requirements? Select two.

Select 2 answers

A.Application Load Balancer

B.Network Load Balancer

C.Amazon API Gateway

D.Amazon CloudFront

E.Gateway Load Balancer

AnswersA, B

Application Load Balancer supports HTTP and HTTPS routing with host- and path-based rules, making it ideal for microservices.

Why this answer

The Application Load Balancer (ALB) is correct because it supports host-based and path-based routing for HTTP/HTTPS traffic, which is essential for the microservices front. It also provides health checks at the target group level, enabling automatic routing away from unhealthy instances. ALB operates at Layer 7, making it ideal for the HTTPS microservices requirement.

Exam trap

The trap here is that candidates often confuse the Gateway Load Balancer (GWLB) with the Network Load Balancer (NLB), but GWLB is specifically for transparent network appliances and does not support TCP/UDP traffic for real-time services or static IP preservation in the same way.

Practice this question →

53

MCQmedium

An API team runs an AWS Lambda function behind an Application Load Balancer (ALB). During predictable hourly traffic spikes, p95 response latency increases due to occasional cold starts. The team wants stable latency during those spikes without permanently overprovisioning resources for all functions. Which configuration is the most appropriate way to reduce cold starts for this Lambda function?

A.Publish a version of the function and configure provisioned concurrency on an alias, using autoscaling for the alias.

B.Increase the function memory size and rely on faster initialization to reduce cold starts.

C.Set reserved concurrency equal to the expected peak requests per second for the function.

D.Use an event source mapping with a higher batch size so Lambda triggers earlier and keeps the runtime warm.

AnswerA

Provisioned concurrency pre-initializes execution environments for a specific published function version. By attaching provisioned concurrency to an alias, you can control warm capacity and (with the right settings) autoscale the provisioned capacity for predictable spike patterns, reducing cold-start-driven latency increases.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, keeping them warm and ready to handle requests without cold start latency. By configuring provisioned concurrency on an alias with autoscaling, the team can dynamically adjust the number of pre-warmed environments to match predictable traffic spikes, avoiding permanent overprovisioning while ensuring stable p95 latency.

Exam trap

The trap here is confusing reserved concurrency (which limits concurrency but does not prevent cold starts) with provisioned concurrency (which pre-warms environments), leading candidates to select Option C as a cost-saving measure that fails to address latency.

How to eliminate wrong answers

Option B is wrong because increasing memory size can reduce initialization time for some runtimes (e.g., Java, .NET) but does not eliminate cold starts; it only shortens the duration, not the occurrence, and may not provide stable latency during spikes. Option C is wrong because reserved concurrency caps the maximum concurrent executions but does not pre-warm environments; it only prevents resource contention, leaving cold starts intact. Option D is wrong because event source mappings are used with stream-based triggers (e.g., DynamoDB Streams, Kinesis), not with ALB invocations, and higher batch sizes do not keep the runtime warm—they simply process more records per invocation.

Practice this question →

54

MCQeasy

Your web application runs on EC2 instances behind an Application Load Balancer (ALB). During traffic spikes, p95 response time increases, but average CPU utilization remains below 40%. The current Auto Scaling policy scales based on average CPU%. What should you change to improve performance during spikes?

A.Keep scaling on CPU% to avoid over-scaling

B.Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)

C.Disable scaling and manually increase capacity during business hours

D.Scale only when network packet drops fall below a threshold

AnswerB

A request-driven metric correlates directly with incoming workload pressure. Scaling on request rate helps ensure enough capacity is added before request queues build up, which can reduce p95 response time even when CPU remains low.

Why this answer

The p95 response time is increasing during traffic spikes while CPU utilization remains low, indicating that the bottleneck is not compute capacity but rather request handling or connection overhead. By scaling on ALB RequestCountPerTarget, you directly target the metric causing latency—each target's request load—rather than an indirect metric like CPU. This ensures that new instances are launched precisely when individual targets are overwhelmed by requests, reducing queueing delays and improving response times.

Exam trap

The trap here is that candidates assume high latency always means high CPU, but AWS tests the understanding that p95 latency can spike due to request queueing even when CPU is idle, making request-based scaling the correct choice over CPU-based scaling.

How to eliminate wrong answers

Option A is wrong because continuing to scale on CPU% ignores the actual symptom (high p95 latency with low CPU), leading to under-provisioning during request bursts. Option C is wrong because manual scaling during business hours is not elastic and cannot react to unpredictable traffic spikes, violating the principle of auto scaling for performance. Option D is wrong because scaling on network packet drops is irrelevant to the described issue (low CPU, high latency) and packet drops typically indicate network congestion or buffer exhaustion, not request overload on the application layer.

Practice this question →

55

MCQhard

A media archive needs low-latency full-text search across product descriptions and filtered attributes. Which managed service is most suitable? The design must avoid adding custom operational scripts.

A.AWS Config

B.Amazon OpenSearch Service

C.Amazon EFS

D.Amazon SQS

AnswerB

OpenSearch is designed for search and analytics over indexed text and structured fields.

Why this answer

Amazon OpenSearch Service is the correct choice because it provides managed, low-latency full-text search capabilities with support for filtering on structured attributes (e.g., product categories, price ranges). It indexes JSON documents and exposes a RESTful API for search queries, eliminating the need for custom operational scripts while meeting the media archive's requirements.

Exam trap

The trap here is that candidates might confuse AWS Config's resource tracking or EFS's file storage with search capabilities, overlooking that OpenSearch Service is the only managed option purpose-built for full-text search and filtering.

How to eliminate wrong answers

Option A is wrong because AWS Config is a service for auditing and evaluating resource configurations against compliance rules, not for full-text search or indexing product descriptions. Option C is wrong because Amazon EFS is a scalable NFS file system for shared storage, not a search engine; it cannot perform low-latency full-text queries across text content. Option D is wrong because Amazon SQS is a managed message queue for decoupling application components, not a search or indexing service, and it does not support querying stored data.

Practice this question →

56

MCQhard

Based on the exhibit, which storage choice best matches the workload requirements?

A.Use io2 EBS volumes because they provide the highest durable block storage performance.

B.Use instance store NVMe for the temporary processing workspace.

C.Use Amazon EFS for the workspace so the temporary files survive instance replacement.

D.Use S3 as the working directory and read and write the intermediate files directly there.

AnswerB

Instance store fits a high-IOPS scratch workload where data can be lost safely and rebuilt from S3. The benchmark shows extremely low latency and very high random I/O performance, which is ideal for intermediate transcode files. Because the job can be retried from the source object, persistence is not needed on the local workspace.

Why this answer

Instance store NVMe volumes provide temporary, ephemeral block storage directly attached to the EC2 instance, offering extremely low latency and high throughput for temporary processing workspaces. Since the workload requires a temporary workspace where data does not need to persist beyond the instance lifecycle, instance store is the optimal choice because it avoids the cost and overhead of durable storage while delivering the highest performance for scratch data.

Exam trap

The trap here is that candidates often choose durable storage options like EBS or EFS because they are familiar and seem 'safer,' failing to recognize that the workload explicitly requires a temporary workspace where data does not need to persist, making instance store the most performant and cost-effective choice.

How to eliminate wrong answers

Option A is wrong because io2 EBS volumes are designed for durable, persistent block storage with high IOPS and durability, which is unnecessary and cost-inefficient for temporary processing data that does not require persistence. Option C is wrong because Amazon EFS is a durable, shared file system that persists across instance replacements, which contradicts the requirement for a temporary workspace where files should not survive instance replacement. Option D is wrong because using S3 as a working directory for intermediate files introduces significant latency and throughput limitations due to S3's object storage API and eventual consistency model, making it unsuitable for high-frequency read/write operations in a temporary processing workspace.

Practice this question →

57

Drag & Dropmedium

Arrange the steps to troubleshoot an EC2 instance that is unreachable via SSH.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start with security group, then NACL, routing, public IP, and finally OS-level logs.

Practice this question →

58

MCQhard

Based on the exhibit, which design change is the best way to reduce the observed read latency for this DynamoDB-backed service?

A.Add a DynamoDB Accelerator (DAX) cluster in front of the table and send repeated read traffic through it.

B.Increase the on-demand table limits so DynamoDB can automatically absorb more traffic.

C.Create a global secondary index on tenantId to distribute the load across more partitions.

D.Move the dashboard data into S3 and use Lambda functions to read it on demand.

AnswerA

DAX is designed to accelerate repeated eventually consistent reads from DynamoDB by caching hot items in memory. The exhibit shows one tenant driving most of the reads and the same dashboard items being requested repeatedly within a short window, which is an excellent fit for DAX. It reduces latency and offloads the hot key without requiring a schema redesign.

Why this answer

Adding a DynamoDB Accelerator (DAX) cluster in front of the table reduces read latency by providing an in-memory cache that serves repeated read requests with microsecond response times, bypassing the need to read from the underlying DynamoDB table's SSD storage. This directly addresses the observed latency issue for frequently accessed data, as DAX is optimized for read-heavy workloads and supports eventual and strong consistency reads.

Exam trap

The trap here is that candidates confuse increasing throughput capacity (Option B) with reducing latency, not realizing that DynamoDB's storage latency is fixed and that caching (DAX) is the correct solution for repeated read-heavy workloads.

How to eliminate wrong answers

Option B is wrong because increasing on-demand table limits does not inherently reduce read latency; on-demand scaling handles throughput capacity but does not improve the per-request latency of DynamoDB's storage layer. Option C is wrong because creating a global secondary index (GSI) on tenantId distributes read load across partitions but does not cache data; it still requires reading from DynamoDB's storage, which does not reduce latency for repeated reads. Option D is wrong because moving dashboard data to S3 and using Lambda to read it on demand introduces additional latency from S3 GET requests and Lambda cold starts, which is typically slower than DynamoDB's single-digit millisecond reads, especially for repeated access patterns.

Practice this question →

59

MCQhard

A DynamoDB table for a retail API has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The team wants the control to be enforceable during normal operations.

A.Use a higher-cardinality partition key that distributes writes across partitions

B.Create a global secondary index with the same date key

C.Reduce the table's write capacity

D.Move the table to S3 Glacier Instant Retrieval

AnswerA

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

A is correct because using a partition key based solely on the current date creates a hot partition — all writes for a given day go to a single partition, causing throttling during peak hours. Increasing the partition key's cardinality (e.g., by appending a random suffix or a user ID) distributes writes evenly across multiple partitions, allowing DynamoDB to use its full write capacity without throttling. This design change is enforceable during normal operations because it modifies the data model rather than relying on temporary capacity adjustments.

Exam trap

The trap here is that candidates often confuse throttling with insufficient capacity and choose to reduce capacity (Option C) or add an index (Option B), missing the root cause — a hot partition due to a low-cardinality partition key — which is a classic DynamoDB design anti-pattern tested in SAA-C03.

How to eliminate wrong answers

Option B is wrong because creating a global secondary index (GSI) with the same date key does not solve the hot partition issue — the GSI would inherit the same skewed write pattern and could itself become throttled, and GSIs do not redistribute writes to the base table. Option C is wrong because reducing the table's write capacity would worsen throttling during business hours, not resolve it; the problem is uneven distribution, not insufficient total capacity. Option D is wrong because S3 Glacier Instant Retrieval is an object storage class for infrequently accessed data with millisecond retrieval, not a replacement for DynamoDB's low-latency, high-throughput transactional workloads, and moving the table would break the API's real-time access requirements.

Practice this question →

60

Multi-Selecthard

A rendering service runs on a single EC2 instance and writes a large working set of metadata to disk using sustained random reads and writes. The data must persist across stops and restarts, and the team sees queue depth spikes when the job reaches peak throughput. Which changes should the team make? Select three.

Select 3 answers

A.Use an Amazon EBS io2 volume with provisioned IOPS for the metadata store.

B.Run the workload on a Nitro-based, EBS-optimized instance that has enough EBS bandwidth.

C.Place the EC2 instance and the EBS volume in the same Availability Zone.

D.Move the working set to Amazon EFS because it automatically stripes across Availability Zones.

E.Store the metadata in Amazon S3 because object storage is cheaper and supports random writes.

AnswersA, B, C

Correct because io2 is designed for high, sustained IOPS with low latency. Provisioned IOPS is the right control when random disk activity, not capacity, is the bottleneck.

Why this answer

Option A is correct because an Amazon EBS io2 volume with provisioned IOPS is designed for I/O-intensive workloads with sustained random reads and writes, such as the rendering service's metadata store. The io2 volume type offers high durability (99.999%) and consistent low-latency performance, which directly addresses the queue depth spikes caused by peak throughput demands. Provisioned IOPS ensures the volume can handle the required random I/O without throttling, meeting the persistence requirement across stops and restarts.

Exam trap

The trap here is that candidates may confuse Amazon EFS or S3 as suitable for random I/O workloads, but the exam tests understanding that EBS io2 is the only AWS block storage option designed for sustained random reads/writes with consistent low latency, while EFS and S3 are network-based and optimized for different access patterns.

Practice this question →

61

Multi-Selecthard

A latency-sensitive video platform uploads large files to S3 from users around the world. Which two features can improve upload performance? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.S3 Object Lock

B.S3 Transfer Acceleration

C.S3 multipart upload

D.S3 Inventory

AnswersB, C

Transfer Acceleration uses optimized edge paths into AWS for long-distance S3 transfers.

Why this answer

S3 Transfer Acceleration (B) uses AWS edge locations to accelerate uploads over long distances by routing traffic through the AWS global network, reducing latency and packet loss compared to the public internet. Multipart upload (C) improves performance by splitting large files into smaller parts that can be uploaded in parallel, increasing throughput and allowing retries of individual parts without restarting the entire upload.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration with CloudFront or think multipart upload is only for reliability, not performance, while overlooking that both features are managed AWS-native controls that directly address latency and throughput for large file uploads.

Practice this question →

62

MCQmedium

A video platform uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added? The architecture review board prefers a managed AWS-native control.

A.S3 Select

B.An internet gateway

C.A larger Route 53 hosted zone

D.RDS Proxy

AnswerD

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy is the correct choice because it manages a pool of reusable database connections, allowing Lambda functions to share and reuse connections rather than opening and closing them with each invocation. This eliminates connection storms by buffering short-lived connections from Lambda and reducing the load on the Aurora database, all as a fully managed AWS-native service.

Exam trap

The trap here is that candidates may confuse connection pooling with scaling the database instance or adding network components, but the correct solution is a dedicated proxy layer that manages connections at the application-to-database boundary.

How to eliminate wrong answers

Option A is wrong because S3 Select is a service for retrieving subsets of data from objects in Amazon S3 using SQL expressions; it has no role in managing database connections or mitigating connection storms. Option B is wrong because an internet gateway enables VPC-to-internet communication for public subnets; it does not handle database connection pooling or reduce the number of connections to Aurora. Option C is wrong because a larger Route 53 hosted zone increases the capacity for DNS records but does not affect database connection management or prevent connection storms.

Practice this question →

63

MCQeasy

A compute workload uses temporary scratch space for intermediate results (reproducible), and it can tolerate data loss if the instance is terminated. The workload benefits from very high local I/O throughput. Which storage option is the best fit for the scratch data?

A.Amazon EBS General Purpose (gp3) volumes to persist intermediate results across reboots.

B.Amazon EFS for a shared file system between multiple instances.

C.Instance store for local temporary files that can be lost when the instance stops.

D.Amazon S3 for scratch data so it is always durable and accessible from anywhere.

AnswerC

Instance store is designed for temporary high-performance local storage and is acceptable when loss is tolerable.

Why this answer

Instance store volumes provide very high local I/O throughput because they are physically attached to the host server, making them ideal for temporary scratch data that is reproducible and can tolerate loss. Since the workload explicitly accepts data loss on instance termination and does not require persistence across reboots, instance store is the best fit for this use case.

Exam trap

The trap here is that candidates often choose EBS gp3 (Option A) because they assume all block storage is persistent and high-performance, overlooking the fact that instance store offers even higher local throughput and is explicitly designed for temporary, loss-tolerant workloads.

How to eliminate wrong answers

Option A is wrong because Amazon EBS gp3 volumes, while offering good performance, have lower maximum IOPS and throughput compared to instance store and are designed for persistent block storage, which is unnecessary for scratch data that can be regenerated. Option B is wrong because Amazon EFS is a network file system that introduces latency and throughput limitations, and the workload does not require shared access between multiple instances. Option D is wrong because Amazon S3 is object storage with higher latency and lower throughput than local storage, and it is designed for durable, accessible data, not for high-performance temporary scratch space.

Practice this question →

64

MCQmedium

A global mobile game backend serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The architecture review board prefers a managed AWS-native control.

A.RDS read replicas

B.Amazon CloudFront distribution with the S3 bucket as origin

C.A larger S3 bucket

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a global content delivery network (CDN) that caches static content (images, JavaScript) at edge locations close to users, drastically reducing latency. By using the S3 bucket as the origin, CloudFront offloads requests from S3 and serves cached objects from the nearest edge, which directly addresses slow load times for distant users. This is a managed AWS-native service that aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates may think increasing S3 bucket size or using RDS replicas can improve static content delivery, but the core issue is geographic latency, which only a CDN like CloudFront can solve by caching content at edge locations.

How to eliminate wrong answers

Option A is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3; they have no effect on S3 latency. Option C is wrong because increasing the S3 bucket size does not improve data transfer speed or reduce latency; S3 performance is independent of bucket size and is limited by regional endpoints. Option D is wrong because an EC2 Auto Scaling group in a single Region does not provide geographic distribution; users in distant countries would still experience high latency connecting to that single Region, and it adds unnecessary compute overhead for serving static content.

Practice this question →

65

MCQhard

Based on the exhibit, which change will most improve the CloudFront cache hit ratio for the static assets while still serving the same files to all users?

A.Create a custom cache policy that includes only the v query string and excludes cookies.

B.Enable Origin Shield and keep the current cache behavior unchanged.

C.Move the static assets to individual presigned URLs for each viewer.

D.Increase the CloudFront default TTL to 24 hours while continuing to forward all cookies and query strings.

AnswerA

This removes unnecessary cache-key fragmentation. Since all users receive identical static files, forwarding user-specific cookies and irrelevant query strings destroys cache reuse. Keeping only the version parameter preserves correct object variation while allowing many more requests to hit the same cached object at the edge.

Why this answer

The CloudFront cache hit ratio for static assets is reduced when query strings and cookies are forwarded to the origin, because each unique combination creates a separate cache entry. By creating a custom cache policy that includes only the 'v' query string (used for versioning) and excludes cookies, CloudFront can cache a single object for all users regardless of other query parameters or cookie values, maximizing cache hits while still serving the same file.

Exam trap

The trap here is that candidates assume increasing TTL or enabling Origin Shield will fix a low cache hit ratio, when the real issue is an overly broad cache key caused by forwarding all query strings and cookies.

How to eliminate wrong answers

Option B is wrong because enabling Origin Shield reduces load on the origin and improves cache fill efficiency, but it does not address the root cause of low cache hit ratio—forwarding all query strings and cookies still creates many unique cache keys. Option C is wrong because moving static assets to individual presigned URLs for each viewer would force CloudFront to treat each URL as a distinct object, drastically reducing the cache hit ratio and defeating the purpose of caching. Option D is wrong because increasing the default TTL to 24 hours while continuing to forward all cookies and query strings does not reduce the number of unique cache keys; CloudFront will still cache separate copies for each cookie and query string combination, so the cache hit ratio remains low.

Practice this question →

66

MCQmedium

A high-volume analytics dashboard writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate?

A.Amazon Route 53

B.Amazon EBS

C.Amazon Kinesis Data Streams

D.AWS DataSync

AnswerC

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the most appropriate service because it is designed for real-time streaming data ingestion and can be consumed by multiple independent consumers in parallel. Each shard within a Kinesis stream supports up to 5 read transactions per second and a total data read rate of 2 MB per second, allowing multiple consumer applications to process the same stream of click events concurrently without interfering with each other.

Exam trap

The trap here is that candidates often confuse Amazon Kinesis Data Streams with Amazon SQS or Amazon SNS, but SQS is a message queue for decoupled point-to-point communication and SNS is a pub/sub notification service, neither of which natively supports multiple independent consumers processing the same stream of data with replay capability.

How to eliminate wrong answers

Option A is wrong because Amazon Route 53 is a DNS web service that translates domain names to IP addresses and does not ingest or process streaming data. Option B is wrong because Amazon EBS provides block-level storage volumes for EC2 instances and cannot natively support multiple independent consumers reading a continuous stream of events. Option D is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services, not for real-time streaming event processing.

Practice this question →

67

MCQeasy

A web application uses an Amazon Aurora DB cluster. The workload is becoming read-heavy, and the application team wants to increase read throughput without changing the database schema. They can adjust the application to route reads differently. What should they do?

A.Add Aurora read replicas and route read queries to the cluster reader endpoint

B.Switch the cluster to Multi-AZ with a longer failover target clock

C.Move all reads to the writer endpoint to reduce connection overhead

D.Disable automated backups to reduce storage overhead and speed reads

AnswerA

Aurora read replicas scale out read capacity. By routing read traffic to the cluster reader endpoint, the application can distribute SELECT queries across replicas, improving overall read throughput without schema changes.

Why this answer

Adding Aurora read replicas and routing read queries to the cluster reader endpoint is the correct approach because Aurora replicas share the same underlying storage volume as the primary instance, so they can serve read traffic with minimal replication lag. The reader endpoint automatically load-balances connections across all available replicas, increasing aggregate read throughput without requiring any schema changes.

Exam trap

The trap here is confusing Multi-AZ with read replicas: candidates often think Multi-AZ improves read performance, but in standard RDS Multi-AZ the standby is passive and cannot serve reads, whereas Aurora's architecture allows all replicas to actively handle read traffic.

How to eliminate wrong answers

Option B is wrong because Multi-AZ with a longer failover target clock does not increase read throughput; it only provides high availability by maintaining a standby in another Availability Zone, and the standby cannot serve reads. Option C is wrong because moving all reads to the writer endpoint would increase load on the single writer instance, reducing overall read throughput and potentially impacting write performance. Option D is wrong because disabling automated backups does not increase read throughput; backups are stored separately and do not affect the performance of read operations on the cluster.

Practice this question →

68

MCQmedium

Your company currently uses an Application Load Balancer (ALB) in front of a service that receives a large number of TCP and UDP packets (including UDP-based telemetry). During load tests, you need to support both TCP and UDP traffic at high throughput while keeping stable IP endpoints for a downstream firewall allowlist. Which change best meets these requirements?

A.Switch to a Network Load Balancer (NLB) configured for TCP/UDP, and use Elastic IPs to provide stable endpoint IP addresses for allowlisting.

B.Keep the ALB and add an AWS WAF Web ACL to improve throughput and add static IP support.

C.Replace the ALB with an API Gateway REST API to support UDP because API Gateway can forward UDP packets.

D.Use an Auto Scaling group with multiple EC2 instances and no load balancer to avoid any networking bottlenecks.

AnswerA

NLB operates at Layer 4 and supports both TCP and UDP. For stable IP allowlists, you can associate Elastic IP addresses with the NLB so the load balancer exposes consistent IPs (as opposed to relying on dynamic addresses). This combination directly satisfies protocol support and stable endpoint requirements.

Why this answer

A Network Load Balancer (NLB) operates at Layer 4 and can handle both TCP and UDP traffic natively, unlike an ALB which only supports HTTP/HTTPS and cannot forward UDP packets. By assigning Elastic IPs to the NLB, you provide stable, static IP endpoints that can be added to a downstream firewall allowlist, meeting both the protocol and throughput requirements.

Exam trap

The trap here is that candidates assume an ALB can handle all traffic types because it is the most commonly used load balancer, but they forget that ALB is strictly Layer 7 and cannot process UDP packets, making the NLB the only correct choice for mixed TCP/UDP workloads requiring static IPs.

How to eliminate wrong answers

Option B is wrong because an ALB cannot handle UDP traffic (it only supports HTTP/HTTPS and WebSocket), and AWS WAF does not add static IP support or improve throughput for Layer 4 traffic. Option C is wrong because API Gateway REST APIs do not support UDP traffic; they only handle HTTP/HTTPS and WebSocket protocols. Option D is wrong because removing the load balancer eliminates the stable IP endpoint required for the firewall allowlist and introduces a single point of failure, while also not addressing the need for high-throughput TCP/UDP handling with a consistent front-end IP.

Practice this question →

69

MCQhard

Based on the exhibit, which change best reduces latency during peak traffic without overprovisioning the fleet?

A.Replace the instances with a larger instance family so each server has more headroom.

B.Change the Auto Scaling policy to target tracking on ALB RequestCountPerTarget.

C.Use scheduled scaling to add instances only during the business hours peak window.

D.Replace the ALB with a Network Load Balancer to reduce request latency.

AnswerB

RequestCountPerTarget matches the actual demand reaching each instance and scales capacity before the thread pool saturates. Because CPU is still low, CPU-based scaling would react too late or not at all. Target tracking on request count helps keep queue depth and latency down while avoiding unnecessary overprovisioning during quieter periods.

Why this answer

Option B is correct because using a target tracking scaling policy on ALB RequestCountPerTarget dynamically adjusts the fleet size based on the actual number of requests each instance receives. This ensures that during peak traffic, additional instances are added only when needed, reducing latency by distributing the load without overprovisioning. It directly addresses the goal of minimizing latency during spikes while maintaining cost efficiency.

Exam trap

The trap here is that candidates confuse reducing latency with scaling the fleet, often choosing a load balancer change (Option D) or a static instance upgrade (Option A) instead of recognizing that dynamic scaling based on per-target request count is the correct method to handle peak traffic without overprovisioning.

How to eliminate wrong answers

Option A is wrong because replacing instances with a larger family increases per-instance capacity but does not scale the fleet dynamically; it leads to overprovisioning during low traffic and may not handle sudden spikes without manual intervention. Option C is wrong because scheduled scaling adds instances only during a fixed business hours window, which cannot adapt to variable or unexpected peak traffic patterns outside that window, potentially causing latency or waste. Option D is wrong because replacing the ALB with a Network Load Balancer (NLB) reduces latency at the transport layer but does not address the need to scale the fleet; NLB lacks application-layer metrics like request count per target, which are essential for the described scaling requirement.

Practice this question →

70

MCQmedium

A analytics dashboard uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered? The architecture review board prefers a managed AWS-native control.

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.AWS Backup cross-Region copy

D.CloudFront only with long TTLs

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator uses the AWS global network to route traffic from edge locations to the Application Load Balancer, reducing internet latency and jitter. It does not cache responses, making it ideal for dynamic content where caching is not desired. This managed service provides static IP addresses and improves performance without modifying the application.

Exam trap

The trap here is that candidates often choose CloudFront for any performance improvement, but the requirement for no caching of dynamic responses makes Global Accelerator the correct choice, as CloudFront inherently caches content even with short TTLs.

How to eliminate wrong answers

Option B is wrong because S3 Cross-Region Replication replicates objects between S3 buckets, not traffic routing, and does not reduce network latency for ALB-based applications. Option C is wrong because AWS Backup cross-Region copy is for disaster recovery of backup data, not for improving real-time network performance to an ALB. Option D is wrong because CloudFront with long TTLs caches responses at edge locations, which is unsuitable for dynamic content that must not be cached; additionally, CloudFront is a CDN, not a network optimization service for uncached traffic.

Practice this question →

71

MCQeasy

A retail analytics app uses Amazon RDS for PostgreSQL. Read traffic is growing, and the database CPU spikes mainly due to SELECT-heavy workloads. Writes are less frequent, and the app can tolerate eventually consistent reads for the reports. What is the most appropriate AWS-native way to improve read performance with minimal application changes?

A.Create an RDS read replica and point the reporting queries to the replica endpoint.

B.Switch the cluster to DynamoDB without redesigning the data model.

C.Enable S3 event notifications to trigger a Lambda function after each write to the database.

D.Replace the RDS instance class with a smaller size to reduce cost and improve performance.

AnswerA

Read replicas offload reads from the primary and can speed up SELECT-heavy workloads with minimal changes.

Why this answer

Creating an RDS read replica is the most appropriate AWS-native solution because it offloads SELECT-heavy workloads from the primary database instance to a read-only copy, reducing CPU spikes on the primary. The application can tolerate eventually consistent reads for reports, which is exactly the consistency model of RDS read replicas (typically sub-second replication lag). This requires minimal application changes—only updating the reporting queries to point to the replica endpoint—and fully leverages PostgreSQL's built-in replication capabilities.

Exam trap

The trap here is that candidates might assume read replicas require application changes to handle eventual consistency, but the question explicitly states the app can tolerate eventually consistent reads, making the replica endpoint swap a minimal-change solution.

How to eliminate wrong answers

Option B is wrong because switching to DynamoDB without redesigning the data model would require significant application changes, including re-architecting the schema, query patterns, and transaction handling, which contradicts the requirement for minimal application changes. Option C is wrong because enabling S3 event notifications to trigger a Lambda function after each write does not directly improve read performance on the RDS database; it adds complexity and latency without addressing the CPU spikes from SELECT queries. Option D is wrong because replacing the RDS instance class with a smaller size would reduce compute capacity, likely worsening CPU spikes and degrading performance, not improving it.

Practice this question →

72

MCQeasy

A service performs many repeated read requests for the same DynamoDB items. The reads are latency-sensitive, but the application can tolerate slightly stale data. Which AWS service is the best fit to reduce read latency?

A.Amazon DAX (DynamoDB Accelerator)

B.Amazon S3 Select

C.Amazon SQS FIFO queue

D.AWS Lambda provisioned concurrency

AnswerA

Amazon DAX is an in-memory cache for DynamoDB. It reduces latency for repeated reads by caching results and serving subsequent read requests from the DAX cluster rather than repeatedly calling DynamoDB. Because it provides cached reads that may be slightly stale, it matches the scenario’s tolerance.

Why this answer

Amazon DAX (DynamoDB Accelerator) is an in-memory cache specifically designed for DynamoDB. It reduces read latency from single-digit milliseconds to microseconds by caching frequently accessed items, and it supports eventually consistent reads, which aligns with the application's tolerance for slightly stale data. DAX handles repeated read requests without additional DynamoDB read capacity unit consumption, making it the optimal choice for this latency-sensitive workload.

Exam trap

The trap here is that candidates often confuse caching services (DAX) with data retrieval services (S3 Select) or assume that a queue (SQS) or compute optimization (Lambda provisioned concurrency) can solve read latency issues, when only a purpose-built in-memory cache like DAX directly addresses repeated DynamoDB reads with stale data tolerance.

How to eliminate wrong answers

Option B (Amazon S3 Select) is wrong because it retrieves subsets of data from objects stored in S3 using SQL-like queries, not from DynamoDB items, and it does not provide a caching layer to reduce read latency for repeated DynamoDB reads. Option C (Amazon SQS FIFO queue) is wrong because it is a message queuing service for decoupling and ordering messages, not a caching or read-acceleration service for DynamoDB; it adds latency rather than reducing it for repeated reads. Option D (AWS Lambda provisioned concurrency) is wrong because it pre-warms Lambda execution environments to reduce cold starts, but it does not cache DynamoDB items or reduce read latency for repeated database queries.

Practice this question →

73

Multi-Selectmedium

A distributed simulation launches 40 EC2 instances that exchange small packets frequently and are sensitive to cross-instance latency. The workload stays in one Availability Zone and can use the same instance family across nodes. Which two choices improve network performance the most? Select two.

Select 2 answers

A.Launch all instances in a cluster placement group.

B.Place the instances across several Availability Zones for higher aggregate resilience.

C.Choose an instance family with high network bandwidth and enhanced networking support.

D.Use a spread placement group to pack the instances tightly together.

E.Put the workload behind CloudFront so internal node communication is faster.

AnswersA, C

A cluster placement group places instances physically close together within one Availability Zone, which reduces inter-node latency and jitter. This is the standard AWS pattern for tightly coupled distributed workloads such as simulations, MPI-style jobs, and HPC clusters.

Why this answer

A cluster placement group provides low-latency, high-bandwidth network connectivity by placing instances in a single Availability Zone within the same logical rack or cluster. This minimizes cross-instance latency and maximizes throughput for frequent small-packet exchanges, which is ideal for tightly coupled distributed simulations.

Exam trap

The trap here is confusing a spread placement group (which is for high availability by isolating instances on different hardware) with a cluster placement group (which is for low latency by grouping instances closely together).

Practice this question →

74

Matchinghard

A media platform serves global users through Amazon CloudFront and an S3 origin. Match each requirement on the left to the CloudFront configuration or behavior on the right.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Use CloudFront Origin Access Control and allow only the distribution in the bucket policy.

Use versioned object filenames or hashed asset names with a long TTL.

Exclude the tracking query string from the cache key with a cache policy.

Use CloudFront signed URLs or signed cookies.

Why these pairings

Geo restriction blocks countries; Lambda@Edge can inspect User-Agent for device; cache behaviors set caching rules; referer header prevents hotlinking; origin shield caches dynamic content; origin groups allow multiple origins per behavior.

Practice this question →

75

MCQeasy

A web service runs on an Auto Scaling group (ASG). The team updates configuration (AMIs, environment variables) in a Launch Template and wants new instances created during scale-out to use the latest Launch Template version. What should the architect do?

A.Leave the ASG attached to the previous Launch Template version so scale-out is stable.

B.Set the ASG to use the latest Launch Template version and optionally start an instance refresh for existing instances.

C.Manually SSH into each new instance and reconfigure it after it launches.

D.Move the configuration changes into a security group rule so the ASG updates them automatically.

AnswerB

ASG scale-out uses the configured Launch Template version at instance launch time. Switching the ASG to the latest version ensures new instances are consistent. An instance refresh helps apply changes to running instances safely and predictably.

Why this answer

Option B is correct because the Auto Scaling group can be configured to use the latest version of a launch template by specifying the `$Latest` version. This ensures that any new instances launched during scale-out automatically use the most recent configuration. Additionally, an instance refresh can be initiated to update existing instances to the latest template version without manual intervention.

Exam trap

The trap here is that candidates may think the ASG automatically updates existing instances when the launch template is updated, but in reality, only new instances launched after the update use the new version unless an instance refresh is explicitly triggered.

How to eliminate wrong answers

Option A is wrong because leaving the ASG attached to a previous launch template version means new instances will not receive the updated configuration, defeating the purpose of updating the template. Option C is wrong because manually SSHing into each new instance is not scalable, violates infrastructure-as-code principles, and is error-prone in an auto-scaling environment. Option D is wrong because security group rules control network traffic, not instance configuration (such as AMIs or environment variables), and cannot propagate launch template changes.

Practice this question →

Page 1 of 4 · 238 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Design High-Performing Architectures questions.

Start 20-question session