Knowledge + Practice

CCNA Db Design Questions

75 of 444 questions · Page 3/6 · Db Design topic · Answers revealed

Practice these questions Exam hub All questions

151

MCQhard

A company uses Amazon DynamoDB for a session management system. They need to store session data with a TTL of 24 hours. However, they notice that expired items are not being deleted promptly, causing storage costs to increase. What is the most likely cause?

A.The table has insufficient write capacity

B.TTL is not enabled on the table

C.DynamoDB typically deletes expired items within 48 hours

D.The TTL attribute is set as a string instead of a number

AnswerC

TTL deletions are eventually consistent and can take up to 48 hours.

Why this answer

Option C is correct because DynamoDB's TTL mechanism typically deletes expired items within 48 hours, not immediately. The service processes TTL deletions as a background process, and while items are marked as expired at the TTL time, actual deletion can be delayed up to 48 hours. This explains why expired session data persists and increases storage costs despite TTL being properly configured.

Exam trap

The trap here is that candidates assume TTL deletions are instantaneous or happen within minutes, but AWS explicitly documents a 48-hour window, making delayed deletion the expected behavior rather than a misconfiguration.

How to eliminate wrong answers

Option A is wrong because write capacity affects throughput for writes, not the timing of TTL-based deletions; TTL deletions consume no write capacity units. Option B is wrong because the question states the company 'needs to store session data with a TTL of 24 hours,' implying TTL is enabled; if TTL were not enabled, no expired items would be deleted at all, not just delayed. Option D is wrong because DynamoDB TTL supports both Number and String data types for the TTL attribute, as long as the value is a Unix epoch timestamp; setting it as a string does not prevent deletion, though it must be a valid epoch value.

Practice this question →

152

MCQeasy

A startup is building a social media analytics application that ingests high-velocity streaming data from multiple sources. The data consists of JSON objects with varying schemas. The application needs to store this data for real-time querying and later batch processing. Which AWS database solution is most cost-effective and scalable for this workload?

A.Amazon ElastiCache for Redis with persistence enabled.

B.Amazon DynamoDB with on-demand capacity.

C.Amazon RDS for MySQL with multiple read replicas.

D.Amazon Redshift with auto-ingest from Kinesis.

AnswerB

DynamoDB handles high-velocity writes and varying schemas, and is cost-effective for unpredictable workloads.

Why this answer

Amazon DynamoDB with on-demand capacity is the most cost-effective and scalable solution for this workload because it is a fully managed NoSQL database that can handle high-velocity streaming data with varying JSON schemas without requiring schema definition or provisioning. Its on-demand capacity mode automatically scales to accommodate unpredictable traffic spikes, making it ideal for real-time querying and batch processing via features like DynamoDB Streams and integration with AWS Glue or EMR.

Exam trap

The trap here is that candidates often choose Amazon Redshift (Option D) because they associate streaming data with data warehousing, but Redshift is optimized for batch analytics on structured data, not for real-time ingestion and querying of schema-less JSON, making DynamoDB the correct choice for this specific workload.

How to eliminate wrong answers

Option A is wrong because Amazon ElastiCache for Redis is an in-memory cache, not a durable primary data store; while persistence can be enabled, it is not designed for long-term storage of high-velocity streaming data with varying schemas and would be cost-prohibitive for large datasets. Option C is wrong because Amazon RDS for MySQL requires a fixed schema, which cannot handle JSON objects with varying schemas efficiently, and its read replicas do not address the write scalability needed for high-velocity ingestion. Option D is wrong because Amazon Redshift is a columnar data warehouse optimized for analytical queries on structured data, not for real-time querying of raw JSON with varying schemas; auto-ingest from Kinesis adds latency and cost, and Redshift is not designed for high-frequency point lookups or schema-less data.

Practice this question →

153

MCQeasy

A startup wants to store session data for a web application. Each session is small (under 1 KB) and accessed frequently with low latency. The data can be ephemeral and does not require complex queries. Which AWS database service is most suitable?

A.Amazon DynamoDB

B.Amazon RDS for PostgreSQL

C.Amazon Neptune

D.Amazon ElastiCache for Redis

AnswerD

In-memory key-value store with sub-millisecond latency, ideal for session management.

Why this answer

Option A is correct because Amazon ElastiCache for Redis provides in-memory storage with low latency, perfect for session data. Option B is wrong because RDS is relational and overkill for simple key-value session data. Option C is wrong because DynamoDB is also suitable but for this use case, ElastiCache is simpler and faster.

Option D is wrong because Neptune is a graph database, not appropriate for session storage.

Practice this question →

154

MCQhard

Refer to the exhibit. A DynamoDB table has a primary key of pk (partition key) and sk (sort key). An application needs to perform GetItem and Query operations but should only be allowed to retrieve the pk and sk attributes. The IAM policy above is applied to the application's IAM role. Why does the policy fail to achieve the goal?

A.The Deny statement uses the wrong condition key; it should use 'dynamodb:Select' instead of 'dynamodb:Attributes'.

B.The policy should use 'dynamodb:ReturnValues' condition key.

C.The Deny statement does not prevent retrieval of all attributes when no ProjectionExpression is specified.

D.The Allow statement should include 'dynamodb:Scan' to allow Query operations.

AnswerC

If the request does not specify attributes, the condition has no values to compare, so the Deny is not applied, allowing full access.

Why this answer

The Deny statement with the condition ForAllValues:StringNotEquals will deny a request if ANY requested attribute is not in the specified list. However, GetItem and Query can request specific attributes using ProjectionExpression. If the request does not specify any attributes, the condition evaluates to true (since no values to compare), and the Deny does not apply, allowing full item retrieval.

Also, the condition uses StringNotEquals incorrectly; it should use StringEquals. The correct approach is to use a condition on dynamodb:Select or use a fine-grained access control with a condition on the requested attributes.

Practice this question →

155

MCQhard

A company runs a time-series application that collects sensor data from millions of IoT devices. The data is written in batches every minute and queried to generate hourly, daily, and monthly aggregates. The database must support high ingestion rates and efficient storage. Which database service is most appropriate?

A.Amazon DynamoDB with TTL

B.Amazon Redshift

C.Amazon RDS for PostgreSQL

D.Amazon Timestream

AnswerD

Timestream is purpose-built for time-series data, with automatic storage tiering and aggregate functions.

Why this answer

Amazon Timestream is purpose-built for time-series data, offering a serverless architecture that automatically scales to handle high ingestion rates from millions of IoT devices. It optimizes storage by separating recent data (in memory) from historical data (in a magnetic store), and its built-in aggregation functions (e.g., `BIN`, `DATE_BIN`) efficiently compute hourly, daily, and monthly aggregates without manual partitioning or indexing.

Exam trap

The trap here is that candidates often choose Amazon DynamoDB for high ingestion rates, overlooking that time-series workloads require efficient time-based aggregation and storage optimization, which DynamoDB lacks, while Timestream is the only AWS service purpose-built for this exact use case.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB with TTL is a key-value and document database optimized for low-latency lookups, not for time-series analytics; TTL only expires old data but does not provide native time-based aggregation or efficient range scans over time intervals. Option B is wrong because Amazon Redshift is a columnar data warehouse designed for complex analytical queries on structured data, but its high overhead for small, frequent batch writes (every minute) and lack of native time-series optimization make it unsuitable for high-ingestion IoT workloads. Option C is wrong because Amazon RDS for PostgreSQL is a relational database that requires manual schema design, indexing, and partitioning to handle time-series data, and it cannot match the ingestion throughput or storage efficiency of a purpose-built time-series engine.

Practice this question →

156

MCQhard

A company is running an Oracle database on Amazon RDS with the configuration shown in the exhibit. The application is experiencing high latency for write operations. The storage is consistently showing high queue depth and write latency. Which change will most improve write performance?

A.Increase allocated storage to 1,000 GB to get higher gp2 baseline IOPS.

B.Enable storage auto scaling and increase storage throughput to 500 MB/s.

C.Change the DB instance class to db.r5.2xlarge.

D.Migrate to io1 or io2 storage with higher provisioned IOPS.

AnswerD

Provisioned IOPS storage provides consistent low-latency performance for write-intensive workloads.

Why this answer

Option D is correct because migrating to io1 or io2 block storage with higher provisioned IOPS directly addresses the root cause of high queue depth and write latency. Unlike gp2, which has a baseline IOPS of 3 per GB (up to 16,000 IOPS at 5,334 GB) and a burst bucket that depletes under sustained load, io1/io2 provide consistent, provisioned IOPS independent of volume size. This ensures the storage subsystem can keep up with the write workload, reducing queue depth and latency.

Exam trap

The trap here is that candidates often assume increasing storage size (Option A) or instance class (Option C) will fix I/O bottlenecks, but the real constraint is the gp2 burst model and insufficient provisioned IOPS for sustained write-heavy workloads.

How to eliminate wrong answers

Option A is wrong because increasing gp2 storage to 1,000 GB only raises baseline IOPS to 3,000 (3 IOPS/GB), which may still be insufficient for the workload, and does not address the burst bucket exhaustion that causes high latency under sustained writes. Option B is wrong because storage auto scaling adjusts volume size automatically, but it does not increase throughput beyond gp2 limits (250 MB/s for volumes up to 1,000 GB), and the problem is IOPS-bound, not throughput-bound; 500 MB/s throughput is not achievable on gp2 without exceeding its maximum of 250 MB/s. Option C is wrong because changing the DB instance class to db.r5.2xlarge improves CPU and memory but does not affect the storage layer's IOPS or queue depth; the bottleneck is at the EBS volume, not the compute instance.

Practice this question →

157

MCQmedium

A social media company uses Amazon ElastiCache for Redis to store user session data. The application experiences occasional spikes in traffic, causing cache misses and increased latency. The operations team notices that the cache hit ratio drops to 60% during spikes. Which action would most effectively improve the cache hit ratio?

A.Enable ElastiCache for Redis Cluster Mode to distribute data across shards.

B.Upgrade to a larger cache node type to store more data.

C.Enable Redis persistence (AOF) and configure a replication group with a read replica.

D.Decrease the TTL of session keys to force more frequent refreshes.

AnswerC

Persistence ensures data survives restarts, and replicas can help rebuild the cache faster after a spike-induced eviction.

Why this answer

Option C is correct because enabling persistence and replication ensures data availability to rebuild the cache. Option A is wrong because TTL is already low; increasing it might cause stale data. Option B is wrong because larger instance types may help but don't address the root cause of misses.

Option D is wrong because Cluster Mode improves scalability but not hit ratio.

Practice this question →

158

Multi-Selectmedium

A company is migrating a large Oracle data warehouse to Amazon Redshift. Which THREE design considerations are important for performance optimization?

Select 3 answers

A.Choose appropriate distribution styles (KEY, ALL, EVEN).

B.Use compression encodings for columns.

C.Define sort keys for commonly filtered columns.

D.Enable cross-Region replication for data locality.

E.Use row-based storage for fact tables.

AnswersA, B, C

Affects data distribution across nodes.

Why this answer

Option A is correct because distribution styles (KEY, ALL, EVEN) control how data is distributed across compute nodes in Amazon Redshift. Choosing the right distribution style minimizes data movement during joins and aggregations, which is critical for performance in a large data warehouse migration from Oracle. For example, using KEY distribution on a frequently joined column keeps related rows on the same node, reducing network traffic.

Exam trap

The trap here is that candidates may confuse disaster recovery features (like cross-Region replication) with performance design choices, or mistakenly think row-based storage applies to Redshift because of their Oracle background, where row storage is the norm.

Practice this question →

159

MCQmedium

A company wants to run a MongoDB-compatible database on AWS with automated patching and backups. Which service should they choose?

A.Amazon DocumentDB (with MongoDB compatibility)

B.Self-managed MongoDB on Amazon EC2

C.Amazon DynamoDB

D.Amazon RDS for MySQL

AnswerA

Fully managed MongoDB-compatible database.

Why this answer

Amazon DocumentDB with MongoDB compatibility is the correct choice because it is a fully managed, MongoDB-compatible database service that provides automated patching and backups. It supports the MongoDB wire protocol and drivers, allowing existing MongoDB applications to migrate with minimal changes while offloading administrative tasks like patching, backup, and replication to AWS.

Exam trap

The trap here is that candidates may confuse Amazon DocumentDB's MongoDB compatibility with full feature parity, but DocumentDB does not support all MongoDB features (e.g., some aggregation pipeline stages or change streams), so the exam expects you to recognize it as the only managed MongoDB-compatible option with automated patching and backups.

How to eliminate wrong answers

Option B is wrong because self-managed MongoDB on Amazon EC2 requires the company to manually handle patching, backups, and operational overhead, which contradicts the requirement for automated patching and backups. Option C is wrong because Amazon DynamoDB is a key-value and document database that is not MongoDB-compatible; it uses a different API and data model, so it cannot run MongoDB workloads. Option D is wrong because Amazon RDS for MySQL is a relational database that does not support the MongoDB wire protocol or document model, making it incompatible with MongoDB-based applications.

Practice this question →

160

Multi-Selectmedium

Which THREE considerations are important when designing a database for a global, multi-Region application using Amazon DynamoDB Global Tables? (Select THREE.)

Select 3 answers

A.Using a single write region to avoid conflicts

B.Strongly consistent reads across regions

C.Application tolerance for eventually consistent reads

D.Provisioning sufficient write capacity in each region

E.Conflict resolution using last writer wins

AnswersC, D, E

Cross-region replication is asynchronous; reads may be eventually consistent.

Why this answer

Global Tables replicate data across regions asynchronously, so conflict resolution (last writer wins) is important. Applications must handle eventually consistent reads. Write capacity must be provisioned in each region.

Option D (strong consistency) is not guaranteed across regions. Option E (single write region) defeats the purpose of Global Tables.

Practice this question →

161

MCQeasy

A company needs to store and query JSON documents that have varying attributes. The workload is read-heavy, and the team wants to use SQL-like queries. Which service is most appropriate?

A.Amazon Neptune

B.Amazon DocumentDB (with MongoDB compatibility)

C.Amazon DynamoDB with PartiQL

D.Amazon RDS for MySQL with JSON data type

AnswerB

DocumentDB is a document database that natively stores JSON and supports MongoDB queries.

Why this answer

Option B is correct because Amazon DocumentDB (with MongoDB compatibility) stores JSON documents natively and supports MongoDB-like queries. Option A is wrong because DynamoDB is key-value, not optimized for rich JSON queries. Option C is wrong because RDS requires a fixed schema.

Option D is wrong because Neptune is a graph database.

Practice this question →

162

MCQmedium

A company runs a MySQL database on Amazon RDS. They need to export a subset of data to Amazon S3 for analysis using Amazon Athena. The data is stored in multiple tables with complex joins. What is the MOST efficient way to export the data?

A.Use the COPY command from Amazon Redshift to pull data from RDS.

B.Use the SELECT INTO OUTFILE S3 extension to export the query results directly to S3.

C.Use the AWS Schema Conversion Tool (SCT) to extract the data to S3.

D.Use mysqldump to export the tables and then upload the files to S3.

AnswerB

This RDS feature allows exporting arbitrary SQL query results to S3.

Why this answer

RDS for MySQL supports the SELECT INTO OUTFILE command to export query results to S3. This allows you to specify a SQL query (including joins) and export the results directly to S3 as CSV or other formats. This is efficient because it does not require an intermediate EC2 instance.

Option A (mysqldump) exports table by table and requires manual processing. Option C (pg_dump) is for PostgreSQL. Option D (copy command) is for Redshift.

Practice this question →

163

MCQmedium

A company uses Amazon DynamoDB for a time-series IoT workload. Each device sends a data point every minute. The primary key consists of device_id (partition key) and timestamp (sort key). The company wants to efficiently retrieve the latest 10 data points for a specific device. Which query design is most efficient?

A.Use GetItem on the device_id partition key with the maximum timestamp.

B.Query the table with ScanIndexForward=true and Limit=10, then reverse the result set.

C.Query the table with ScanIndexForward=false and Limit=10.

D.Scan the entire table and filter by device_id, then sort by timestamp.

AnswerC

This returns the most recent 10 items in descending order by timestamp.

Why this answer

Option C is correct because Query with ScanIndexForward=false retrieves items in descending order by the sort key (timestamp), and Limit=10 stops after the first 10 items, which are the most recent 10 data points for the given device_id. This is the most efficient design as it reads only the 10 items needed, leveraging the DynamoDB local secondary index or table's sort key order without any post-processing.

Exam trap

The trap here is that candidates may confuse ScanIndexForward=true with 'latest' results, or incorrectly assume GetItem can retrieve the maximum sort key without knowing its value, leading them to choose inefficient options like scanning or reversing an ascending query.

How to eliminate wrong answers

Option A is wrong because GetItem requires both partition key and sort key; using only device_id with a maximum timestamp is not a valid operation—GetItem cannot compute a max value, and you would need to know the exact timestamp. Option B is wrong because ScanIndexForward=true retrieves items in ascending order (oldest first), so with Limit=10 you get the oldest 10 items, not the latest; reversing the result set still gives the oldest 10, not the newest. Option D is wrong because Scan reads the entire table, which is inefficient and costly for large datasets, and filtering by device_id after scanning defeats the purpose of using DynamoDB's indexed access.

Practice this question →

164

MCQeasy

A company is using Amazon ElastiCache for Redis as a caching layer for frequently accessed data. The application needs to support caching of session data that must be highly available across multiple Availability Zones. Which ElastiCache configuration should be used?

A.Deploy a single Redis node in one Availability Zone.

B.Deploy a Redis cluster with cluster mode disabled.

C.Deploy a Memcached cluster with multiple nodes.

D.Deploy a Redis cluster with cluster mode enabled and replica nodes in a different Availability Zone.

AnswerD

Cluster mode with replicas across AZs provides high availability and automatic failover.

Why this answer

Option D is correct because deploying a Redis cluster with cluster mode enabled and replica nodes in a different Availability Zone provides both high availability and automatic failover for session data. ElastiCache for Redis with cluster mode enabled supports sharding and replication, allowing replica nodes to be placed in a separate AZ to survive an AZ failure. This configuration ensures session data remains accessible even if the primary node or an entire AZ becomes unavailable, meeting the requirement for multi-AZ high availability.

Exam trap

The trap here is that candidates often confuse cluster mode enabled/disabled with multi-AZ support, mistakenly thinking that cluster mode disabled cannot place replicas in different AZs, when in fact both modes support multi-AZ replication, but the question's requirement for 'highly available across multiple Availability Zones' and the specific wording of the correct answer point to cluster mode enabled as the intended solution for a Redis cluster that can scale and survive AZ failures.

How to eliminate wrong answers

Option A is wrong because a single Redis node in one AZ provides no redundancy; if the node or AZ fails, all session data is lost and the application becomes unavailable. Option B is wrong because a Redis cluster with cluster mode disabled (i.e., a single shard with replicas) can provide multi-AZ replication, but the question specifies 'cluster mode enabled' is required for the configuration that explicitly supports sharding and scaling; however, the core issue is that cluster mode disabled still allows replicas in different AZs, but the exam trap is that candidates may think cluster mode disabled cannot achieve multi-AZ HA—actually it can, but the question's correct answer explicitly requires cluster mode enabled for the described scenario, and the other options are clearly wrong. Option C is wrong because Memcached does not support replication or persistence; it is a pure caching engine with no built-in high availability or failover, so it cannot guarantee session data durability across AZ failures.

Practice this question →

165

MCQeasy

A company uses Amazon DynamoDB for a gaming application that stores player data. The application frequently accesses items by the player's user ID. However, the company also needs to query players by their subscription tier (Gold, Silver, Bronze) and registration date. Which design should the database specialist recommend for this access pattern?

A.Export the data to Amazon Elasticsearch Service for querying.

B.Create a Local Secondary Index (LSI) on subscription tier and registration date.

C.Enable DynamoDB Streams and process the stream to populate a separate table.

D.Create a Global Secondary Index (GSI) on subscription tier and registration date.

AnswerD

A GSI allows querying on different attributes with its own partition and sort keys.

Why this answer

A Global Secondary Index (GSI) on subscription tier and registration date is the correct choice because it allows efficient querying on non-primary key attributes without affecting the base table's primary key structure. DynamoDB GSIs support eventually consistent reads and can be created on any table, enabling the required access pattern of querying players by subscription tier and registration date while maintaining the primary access pattern by user ID.

Exam trap

Cisco often tests the distinction between LSI and GSI, where candidates mistakenly choose LSI because they think it's the only index that can include multiple attributes, but they forget that LSI shares the base table's partition key and cannot be added after table creation.

How to eliminate wrong answers

Option A is wrong because exporting data to Amazon Elasticsearch Service introduces unnecessary complexity, latency, and cost for a simple query pattern that DynamoDB can handle natively with an index. Option B is wrong because a Local Secondary Index (LSI) can only be created at table creation time and shares the same partition key as the base table, which would not allow efficient querying by subscription tier and registration date as a composite sort key across all partitions. Option C is wrong because enabling DynamoDB Streams and populating a separate table adds operational overhead and eventual consistency delays without providing the direct query capability that a GSI offers.

Practice this question →

166

MCQeasy

A company wants to store and analyze time-series sensor data from millions of IoT devices. The data is append-only and rarely updated. Queries aggregate data over time ranges. Which AWS database service is most cost-effective and performant for this workload?

A.Amazon DynamoDB with time-series design pattern

B.Amazon Timestream

C.Amazon Redshift

D.Amazon RDS for MySQL with partitioning by date

AnswerB

Amazon Timestream is a fast, scalable, fully managed time-series database service.

Why this answer

Option A is correct because Timestream is purpose-built for time-series data, offering automatic tiering and optimized aggregation queries. Option B (DynamoDB) can handle time-series but requires careful design and is less cost-effective for large-scale. Option C (RDS) is relational and not optimized for time-series.

Option D (Redshift) is for analytical workloads but is heavier and more expensive for IoT sensor data.

Practice this question →

167

MCQmedium

A company is designing a relational database for an e-commerce application that requires high availability and automated failover across AWS Regions. Which AWS database service should they use?

A.Amazon DynamoDB Global Tables

B.Amazon RDS with Multi-AZ deployment

C.Amazon Aurora Global Database

D.Amazon Redshift with cross-Region snapshot copy

AnswerC

Supports cross-Region replication and failover.

Why this answer

Amazon Aurora Global Database provides cross-Region replication and automated failover, meeting the high availability requirements. Option A (DynamoDB) is NoSQL, not relational. Option B (RDS Multi-AZ) is single-Region.

Option D (Redshift) is a data warehouse, not for OLTP.

Practice this question →

168

MCQhard

Refer to the exhibit. An IAM policy is attached to a role used by an application that accesses the DynamoDB 'Orders' table. The application needs to perform a Scan operation on the table. According to the policy, is the Scan operation allowed?

A.Yes, but only if the scan uses a filter expression

B.No, because the Deny statement blocks all actions

C.Yes, because the policy explicitly allows Scan

D.No, because the policy does not specify a condition

AnswerC

The Allow statement includes 'Scan', so it is permitted.

Why this answer

The IAM policy includes an explicit Allow statement for the `dynamodb:Scan` action on the `Orders` table. In IAM policy evaluation logic, an explicit Allow overrides any default implicit Deny, and the Deny statement in the policy only blocks actions that match its `NotAction` element, which does not include Scan. Therefore, the Scan operation is allowed.

Exam trap

The trap here is that candidates misread the Deny statement's `NotAction` as a blanket denial of all actions, when in fact it only denies actions not explicitly listed, allowing the explicit Allow for Scan to take effect.

How to eliminate wrong answers

Option A is wrong because the policy does not require a filter expression for Scan; filter expressions are optional and do not affect IAM authorization. Option B is wrong because the Deny statement uses `NotAction` to block all actions except those listed (like `dynamodb:GetItem`), but `dynamodb:Scan` is not listed in the Deny's `NotAction`, so it is not blocked. Option D is wrong because IAM policies do not require a condition element for an action to be allowed; conditions are optional and only refine permissions.

Practice this question →

169

MCQeasy

A company wants to run a graph database for a social network application. The data model involves users, posts, comments, and likes, with many-to-many relationships. Which AWS database service is most appropriate?

A.Amazon RDS for PostgreSQL

B.Amazon Neptune

C.Amazon DocumentDB

D.Amazon DynamoDB

AnswerB

Neptune is purpose-built for graph databases and efficiently handles complex relationships.

Why this answer

Amazon Neptune is a fully managed graph database service optimized for highly connected data. DynamoDB is NoSQL but not graph. RDS is relational, which can model graphs but with complex queries.

DocumentDB is document-based.

Practice this question →

170

MCQhard

A financial services company uses Amazon RDS for MySQL to store transaction data. The database has a single table 'transactions' with 500 million rows. The table has an auto-increment primary key and an index on 'transaction_date'. The company runs a monthly report that aggregates transactions by account_id and transaction_date. The report query uses a GROUP BY on account_id and transaction_date, and scans the entire table. The query takes over 2 hours to complete and often times out. The DBA suggests creating a materialized view. However, the company wants to minimize operational overhead. Which solution meets the requirements with the LEAST operational overhead?

A.Increase the RDS instance size to the largest available to improve performance.

B.Migrate the reporting workload to Amazon Redshift by loading the transactions table into Redshift and running the report query there.

C.Create a materialized view in MySQL that pre-aggregates the data and refreshes it nightly.

D.Add a composite index on (account_id, transaction_date) to speed up the GROUP BY.

AnswerB

Redshift is optimized for analytical queries and can handle large aggregations efficiently with minimal operational overhead.

Why this answer

Option B is correct because Amazon Redshift is purpose-built for large-scale analytical queries. By migrating the reporting workload to Redshift, the company offloads the heavy aggregation from the transactional RDS instance to a columnar storage engine that can scan and aggregate 500 million rows efficiently using massively parallel processing (MPP). This approach requires no changes to the existing RDS database and minimizes operational overhead compared to managing a materialized view or manual indexing.

Exam trap

The trap here is that candidates often assume a larger instance or a composite index can fix any performance issue, but the DBS-C01 exam tests the understanding that analytical workloads require a different engine (Redshift) and that operational overhead includes ongoing maintenance, not just initial setup.

How to eliminate wrong answers

Option A is wrong because simply increasing the RDS instance size does not address the fundamental architectural limitation: MySQL is optimized for OLTP, not for full-table scans and large aggregations; the query will still be I/O and CPU-bound, and scaling vertically has a hard ceiling and high cost. Option C is wrong because creating a materialized view in MySQL adds significant operational overhead—it requires custom refresh logic, storage management, and risks data staleness, contradicting the requirement to minimize overhead. Option D is wrong because adding a composite index on (account_id, transaction_date) will not help a query that scans the entire table with a GROUP BY; the optimizer will likely ignore the index for a full scan, and even if used, it cannot avoid reading all rows for aggregation.

Practice this question →

171

Multi-Selecthard

Which THREE factors should be considered when designing a database for a high-traffic web application that requires low-latency reads and writes?

Select 3 answers

A.Caching layer

B.Partitioning strategy

C.Strict normalization

D.Connection pooling

E.Denormalization of data

AnswersA, B, D

Caching reduces database load and latency.

Why this answer

A caching layer (e.g., Amazon ElastiCache for Redis or Memcached) reduces read latency by serving frequently accessed data from in-memory stores, offloading the primary database. For high-traffic web applications, this minimizes disk I/O and improves response times for both reads and writes when combined with write-through or write-behind strategies.

Exam trap

The trap here is that candidates may confuse denormalization as a mandatory design choice for low-latency reads, when in fact it is a trade-off that can complicate writes and is not a core factor for both low-latency reads and writes in a high-traffic web application.

Practice this question →

172

MCQhard

A company uses Amazon RDS for MySQL with a Multi-AZ deployment. During a recent failover, the application experienced a 2-minute downtime because it was connecting to the primary instance endpoint. The company needs to reduce failover downtime to under 30 seconds. What should be done?

A.Implement Amazon ElastiCache to cache database connections.

B.Use the Multi-AZ DB cluster endpoint instead of the instance endpoint.

C.Increase the instance size to improve failover speed.

D.Deploy a read replica and promote it manually during failover.

AnswerB

Cluster endpoint automatically redirects to the new primary after failover.

Why this answer

The Multi-AZ DB cluster endpoint provides a single DNS name that automatically routes connections to the current writer instance, eliminating the need for application-side reconnection logic. During a failover, the endpoint updates its DNS record to point to the new primary within seconds, reducing downtime to under 30 seconds. This is the recommended approach for minimizing failover disruption in Multi-AZ deployments.

Exam trap

The trap here is that candidates assume increasing instance size or using read replicas will speed up failover, but the real bottleneck is DNS propagation and the lack of an automatic redirect for the instance endpoint, which the cluster endpoint specifically addresses.

How to eliminate wrong answers

Option A is wrong because ElastiCache caches query results or session data, not database connections; it does not reduce failover downtime for the database itself. Option C is wrong because increasing instance size improves performance but does not affect the failover process timing, which is governed by DNS propagation and replication lag, not compute capacity. Option D is wrong because promoting a read replica manually requires application reconfiguration and typically takes longer than 30 seconds due to DNS changes and manual intervention, defeating the goal of automated fast failover.

Practice this question →

173

MCQeasy

A startup is building a social media application that requires storing user profiles, posts, comments, and likes. The workload has variable traffic, with spikes after marketing campaigns. The team expects to run complex JOIN queries to generate a user's feed. Which AWS database service is MOST suitable for this relational workload?

A.Amazon Neptune

B.Amazon RDS for PostgreSQL

C.Amazon DynamoDB with global secondary indexes

D.Amazon ElastiCache for Redis

AnswerB

RDS PostgreSQL offers full relational capabilities and managed scaling.

Why this answer

Amazon RDS provides managed relational databases (MySQL, PostgreSQL) that support complex JOINs and can be scaled vertically or with read replicas. Option B is wrong because DynamoDB is NoSQL and does not natively support JOINs. Option C is wrong because ElastiCache is an in-memory cache, not a primary database.

Option D is wrong because Neptune is a graph database, which is overkill and not optimized for typical relational queries.

Practice this question →

174

MCQeasy

A company needs to store session state for a web application that runs on Amazon EC2 instances behind an Application Load Balancer. The session data is small (less than 1 KB per user) and must be highly available with low latency. Which AWS database service is best for this use case?

A.Amazon S3

B.Amazon ElastiCache for Redis

C.Amazon DynamoDB

D.Amazon RDS for MySQL

AnswerB

In-memory storage with low latency and high availability.

Why this answer

Amazon ElastiCache for Redis is ideal for storing session state because it is an in-memory key-value store with low latency and supports features like TTL and replication for high availability. Option B (DynamoDB) is wrong because it is not as fast for session state and is more expensive for small data. Option C (RDS) is wrong because it is a relational database with higher latency.

Option D (S3) is wrong because it is object storage with higher latency.

Practice this question →

175

MCQhard

A company runs a global application using Amazon Aurora Global Database. The primary region is us-east-1, and secondary regions are eu-west-1 and ap-southeast-1. The application reports that writes to the primary are taking longer than expected. What is the most likely cause?

A.Multi-AZ failover occurred in the primary region.

B.The Global Database replication to secondary regions is causing synchronous commit latencies.

C.The primary DB instance is under-provisioned.

D.Read replicas in secondary regions are overloaded.

AnswerB

Aurora Global Database uses synchronous replication across regions.

Why this answer

Aurora Global Database uses asynchronous replication from the primary to secondary regions, not synchronous. However, the question states writes to the primary are taking longer than expected, which is a symptom of synchronous replication overhead. In Aurora Global Database, the primary commits locally and then asynchronously replicates to secondaries, so writes should not be directly impacted by secondary region latency.

The most likely cause is that the primary DB instance is under-provisioned (Option C), leading to resource contention and slower write operations.

Exam trap

The trap here is confusing asynchronous replication with synchronous replication; candidates may incorrectly assume that Global Database replication causes synchronous commit delays, but Aurora Global Database is designed to avoid that by using asynchronous replication.

How to eliminate wrong answers

Option A is wrong because Multi-AZ failover in the primary region would cause a brief write outage or failover time, not consistently longer write latencies; after failover, writes resume normally. Option B is wrong because Aurora Global Database replication is asynchronous, not synchronous; synchronous replication would cause commit latency, but that is not how Aurora Global Database works. Option D is wrong because read replicas in secondary regions are read-only and do not affect write performance on the primary; they handle only read traffic.

Practice this question →

176

MCQmedium

A company is running a MongoDB workload on-premises and wants to migrate to AWS with minimal operational overhead. The application uses MongoDB-specific features like aggregation pipelines. Which service is best?

A.Amazon DynamoDB

B.Amazon DocumentDB

C.Amazon RDS for PostgreSQL

D.Amazon Elasticsearch Service

AnswerB

DocumentDB is MongoDB-compatible and fully managed, reducing operational overhead.

Why this answer

Amazon DocumentDB is the correct choice because it is a fully managed, MongoDB-compatible document database that supports MongoDB-specific features like aggregation pipelines, indexes, and queries. It minimizes operational overhead by handling hardware provisioning, patching, backups, and replication, making it ideal for migrating an on-premises MongoDB workload to AWS without significant application changes.

Exam trap

The trap here is that candidates often choose Amazon DynamoDB because it is a NoSQL database, but they overlook that DynamoDB lacks MongoDB wire protocol compatibility and aggregation pipeline support, forcing a complete application rewrite.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB is a key-value and document database that does not support MongoDB aggregation pipelines or MongoDB wire protocol, requiring significant application rewrites. Option C is wrong because Amazon RDS for PostgreSQL is a relational database that does not natively support MongoDB's document model or aggregation pipelines, forcing schema redesign and data migration complexity. Option D is wrong because Amazon Elasticsearch Service is a search and analytics engine, not a document database, and lacks MongoDB compatibility, making it unsuitable for running MongoDB workloads.

Practice this question →

177

Multi-Selectmedium

A company is designing a disaster recovery strategy for an Amazon RDS for PostgreSQL database. They need a Recovery Point Objective (RPO) of less than 5 seconds and a Recovery Time Objective (RTO) of less than 1 minute. Which TWO actions should they take? (Choose two.)

Select 2 answers

A.Use AWS Database Migration Service for continuous replication to a separate instance

B.Create a cross-region read replica and manually update DNS in a disaster

C.Take hourly snapshots and restore in another region

D.Create a cross-region read replica and configure automatic failover using Amazon Route 53 health checks

E.Configure Multi-AZ deployment with a synchronous standby in another AZ

AnswersD, E

Automatic failover with health checks can achieve RTO <1 minute and RPO <5 seconds with synchronous replication.

Why this answer

Option D is correct because a cross-region read replica can be promoted to a primary instance in under a minute, and with Amazon Route 53 health checks configured for automatic failover, the DNS update occurs automatically, meeting the RTO of less than 1 minute. The asynchronous replication lag is typically sub-second, achieving an RPO of less than 5 seconds. Option E is correct because a Multi-AZ deployment with a synchronous standby in another Availability Zone provides automatic failover with no data loss (RPO of 0) and failover completes in about 30-60 seconds, satisfying both RPO and RTO requirements.

Exam trap

The trap here is that candidates often assume cross-region read replicas support automatic failover natively, but they do not; you must explicitly configure Route 53 health checks and DNS failover to achieve the required RTO, while Multi-AZ provides automatic failover but only within the same region, not cross-region.

Practice this question →

178

MCQeasy

A startup is building a social media application. User profiles, posts, and comments have relationships but the team expects rapid growth and wants to scale horizontally with no single points of failure. They need a database that supports flexible schemas for different content types. Which database service is most appropriate?

A.Amazon DynamoDB

B.Amazon Neptune

C.Amazon Redshift

D.Amazon RDS for MySQL

AnswerA

DynamoDB provides horizontal scaling, flexible schema, and high availability.

Why this answer

Amazon DynamoDB is a fully managed NoSQL key-value and document database that scales horizontally, is highly available, and supports flexible schemas. RDS is relational with fixed schemas. Redshift is for analytics.

Neptune is for graph databases, not general purpose.

Practice this question →

179

MCQeasy

A company needs to store application logs for 90 days and run periodic analytical queries. The logs are generated at 1 TB per day. Which storage solution is most cost-effective?

A.Store logs in Amazon RDS for MySQL with partitioning.

B.Store logs in Amazon Redshift with automatic compression.

C.Store logs in Amazon DynamoDB with TTL for expiration.

D.Store logs in Amazon S3 and use S3 Select for queries.

AnswerD

S3 is cost-effective and S3 Select supports queries.

Why this answer

Amazon S3 is the most cost-effective storage solution for 90-day retention of 1 TB/day of application logs, as it offers low-cost object storage with lifecycle policies to automatically expire data after 90 days. S3 Select allows you to run analytical queries (e.g., filtering, aggregations) directly on the data stored in S3 using SQL-like statements, without needing to load data into a separate analytics engine, thus minimizing compute costs and operational overhead.

Exam trap

The trap here is that candidates often over-engineer the solution by choosing a database or data warehouse (like Redshift or RDS) for log storage, forgetting that S3 with S3 Select is purpose-built for cost-effective storage and serverless querying of large datasets with minimal operational complexity.

How to eliminate wrong answers

Option A is wrong because Amazon RDS for MySQL is a relational database designed for transactional workloads, not for storing and querying large volumes of log data at petabyte scale; it would be prohibitively expensive for 90 TB of logs and lacks native log expiration features. Option B is wrong because Amazon Redshift is a data warehouse optimized for complex analytical queries on structured data, but it is overkill and costly for simple log retention and periodic queries; it also requires loading data into the warehouse, incurring additional compute and storage costs. Option C is wrong because Amazon DynamoDB is a NoSQL key-value and document database designed for low-latency access at scale, but it is not cost-effective for storing 90 TB of log data due to its per-GB storage cost and provisioned throughput costs; while TTL can expire items, DynamoDB is not optimized for analytical queries like S3 Select.

Practice this question →

180

MCQmedium

A company is designing a database for an e-commerce platform that requires high availability and automatic failover with minimal downtime. The application performs both OLTP and read-heavy analytics. Which AWS database service should be used?

A.Amazon DynamoDB

B.Amazon RDS for MySQL

C.Amazon Redshift

D.Amazon Aurora

AnswerD

Aurora offers high availability, automatic failover, and up to 15 read replicas for analytics.

Why this answer

Amazon Aurora is the correct choice because it combines the high availability and automatic failover of a relational database with the performance needed for both OLTP and read-heavy analytics. Aurora provides six-way replication across three Availability Zones, automatic failover in under 30 seconds, and supports up to 15 low-latency read replicas that can offload analytics queries without impacting write performance.

Exam trap

The trap here is that candidates often choose Amazon RDS for MySQL because they assume Multi-AZ provides automatic failover and read replicas for analytics, but they overlook that Aurora offers faster failover, better read replica performance, and integrated storage replication without the need for separate Multi-AZ configuration.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB is a NoSQL key-value and document database optimized for high-scale OLTP workloads, but it lacks native support for complex SQL joins, aggregations, and the relational schema required for read-heavy analytics typical of an e-commerce platform. Option B is wrong because Amazon RDS for MySQL, while supporting read replicas, has a single-AZ primary by default and requires Multi-AZ deployment for failover, which still incurs a longer failover time (typically 1-2 minutes) and does not offer the same level of read replica performance or automatic scaling as Aurora. Option C is wrong because Amazon Redshift is a columnar data warehouse designed for large-scale analytics and OLAP workloads, not for OLTP transactions with high concurrency and sub-millisecond latency requirements.

Practice this question →

181

MCQhard

A company is designing a database for a mobile application that requires offline synchronization. Users should be able to read and write data while offline, and changes should sync when connectivity is restored. Which AWS service supports this pattern?

A.Amazon RDS Proxy

B.Amazon S3 Transfer Acceleration

C.Amazon Cognito

D.AWS AppSync with Amazon DynamoDB

AnswerD

AppSync supports offline data sync with conflict resolution.

Why this answer

AWS AppSync with Amazon DynamoDB is correct because AppSync provides managed GraphQL APIs that support offline data synchronization via its client SDKs. When a mobile app is offline, mutations are queued locally and automatically replayed against DynamoDB once connectivity is restored, using a conflict resolution mechanism (e.g., last-writer-wins or custom resolvers) to merge changes.

Exam trap

The trap here is that candidates may confuse Amazon Cognito's authentication capabilities with the offline sync feature, overlooking that AppSync is the service that actually provides the offline mutation queue and conflict resolution.

How to eliminate wrong answers

Option A is wrong because Amazon RDS Proxy is a connection pooling service for relational databases, not designed for offline sync or mobile client data caching. Option B is wrong because Amazon S3 Transfer Acceleration speeds up uploads to S3 over long distances using edge locations, but it does not provide offline write queuing or conflict resolution for application data. Option C is wrong because Amazon Cognito is an identity and user management service; while it can integrate with AppSync for authentication, it alone does not enable offline data synchronization or local mutation storage.

Practice this question →

182

Multi-Selecthard

A company is using Amazon DynamoDB to store IoT sensor data. The application writes a large volume of data and needs to read recent data by timestamp. The table has a partition key of device_id and a sort key of timestamp. The access pattern is to read the latest data for a specific device. Which TWO design patterns will optimize read performance and reduce costs?

Select 2 answers

A.Use adaptive capacity to evenly distribute traffic across partitions.

B.Use DynamoDB Accelerator (DAX) to cache the most recent reads.

C.Enable auto scaling for the table to handle spikes.

D.Use DynamoDB Transactions for consistent reads.

E.Create a global secondary index (GSI) with device_id as partition key and timestamp as sort key.

AnswersA, B

Adaptive capacity helps handle hot partitions, improving performance and cost efficiency.

Why this answer

Option A is correct because adaptive capacity allows DynamoDB to automatically manage partition traffic distribution, preventing hot partitions when a single device_id receives a high volume of writes. This ensures consistent read performance without manual partition management. Option B is correct because DAX provides an in-memory cache for the most frequently accessed data, reducing read latency and read capacity unit consumption for repeated queries of recent sensor data.

Exam trap

The trap here is that candidates often confuse auto scaling with adaptive capacity, or assume a GSI is always beneficial, not realizing that duplicating the base table key structure adds cost without performance gain.

Practice this question →

183

MCQmedium

A company is running a MySQL database on an EC2 instance and wants to migrate to Amazon RDS for MySQL with minimal downtime. The database is 500 GB in size and has a high write workload. Which migration approach is most appropriate?

A.Export data to Amazon S3 and use AWS Glue to load into RDS.

B.Copy the MySQL data directory to Amazon EBS and attach to RDS.

C.Take a mysqldump from the source and import into RDS.

D.Use AWS Database Migration Service (DMS) with ongoing replication.

AnswerD

DMS supports live migration with minimal downtime.

Why this answer

AWS DMS with ongoing replication (change data capture) is the most appropriate approach because it allows you to migrate the 500 GB database with minimal downtime. DMS performs a full load of the existing data and then continuously replicates ongoing changes from the source MySQL EC2 instance to the target Amazon RDS for MySQL, enabling a cutover with only a brief pause in writes.

Exam trap

The trap here is that candidates often choose mysqldump (Option C) because it is a familiar tool, but they overlook the requirement for minimal downtime and the impact of a high write workload on the time needed to complete a consistent export.

How to eliminate wrong answers

Option A is wrong because AWS Glue is an ETL service designed for transforming and loading data into data lakes or analytics services, not for direct database migration with minimal downtime; it cannot handle ongoing replication of MySQL binary logs. Option B is wrong because you cannot attach an EBS volume to an RDS instance; RDS manages its own storage and does not allow direct mounting of external EBS volumes. Option C is wrong because mysqldump is a logical backup tool that requires taking the source database offline or locking tables to ensure consistency, resulting in significant downtime for a 500 GB database with a high write workload.

Practice this question →

184

MCQeasy

A startup is building a mobile app backend using Amazon DynamoDB. They anticipate unpredictable traffic spikes. Which DynamoDB feature should they use to handle the spikes without manual intervention?

A.Use DynamoDB Accelerator (DAX) as a cache layer.

B.Enable DynamoDB Auto Scaling for read and write capacity.

C.Set up a TTL (Time to Live) to automatically expire old items.

D.Implement DynamoDB Global Tables for multi-region replication.

AnswerB

Auto Scaling adjusts capacity based on traffic patterns, handling spikes automatically.

Why this answer

DynamoDB Auto Scaling adjusts provisioned throughput based on actual traffic, using CloudWatch alarms. This handles spikes automatically. On-Demand capacity mode is another option that handles spikes automatically but can be more expensive for predictable workloads.

However, the question asks for a feature to handle unpredictable spikes. Auto Scaling is the traditional approach. On-Demand is also valid but the question may expect Auto Scaling as the classic answer.

But to be precise, On-Demand is designed for unpredictable traffic. However, the options: A: Auto Scaling, B: DAX, C: Global Tables, D: TTL. Auto Scaling is the most direct answer.

Practice this question →

185

MCQmedium

Refer to the exhibit. An application team notices that the MySQL RDS instance 'mydb' is running at 80% CPU utilization during peak hours. They need to improve read performance without increasing the CPU load on the primary instance. Which action should they take?

A.Increase the DB instance class to db.r5.xlarge

B.Create a Read Replica in the same region

C.Change storage type to io1 with higher IOPS

D.Enable Multi-AZ deployment

AnswerB

Read Replica offloads read queries, reducing CPU on primary.

Why this answer

Creating a Read Replica offloads read traffic from the primary MySQL RDS instance, reducing CPU load on the primary while improving read performance for applications. Read Replicas asynchronously replicate data using MySQL’s native binlog-based replication, allowing the primary to focus on write operations without additional CPU overhead from serving reads.

Exam trap

The trap here is confusing Multi-AZ (which provides failover but no read scaling) with Read Replicas (which offload reads), leading candidates to select Multi-AZ when the goal is to reduce CPU load on the primary.

How to eliminate wrong answers

Option A is wrong because increasing the DB instance class to db.r5.xlarge would add more CPU and memory to the primary instance, but it does not offload read traffic; the primary would still handle all read requests, potentially increasing CPU utilization further. Option C is wrong because changing storage type to io1 with higher IOPS improves disk I/O performance but does not reduce CPU load; CPU utilization is driven by query processing, not storage throughput. Option D is wrong because enabling Multi-AZ deployment provides high availability and automatic failover via synchronous standby replication, but it does not offload read traffic; the standby replica cannot serve reads, so CPU load on the primary remains unchanged.

Practice this question →

186

MCQeasy

A mobile gaming company needs a database to store player scores and leaderboards. The data must be updated in real time as players finish games. The database must support high write throughput and provide sub-millisecond read latency for leaderboard queries. Which database is best suited?

A.Amazon RDS for MySQL with read replicas

B.Amazon Redshift

C.Amazon ElastiCache for Redis

D.Amazon DynamoDB

AnswerD

DynamoDB offers consistent single-digit millisecond latency and high throughput.

Why this answer

Amazon DynamoDB is a NoSQL database that provides single-digit millisecond latency and can handle high write throughput. It also supports sorted data for leaderboards via Local Secondary Indexes or Global Secondary Indexes. RDS is slower and not designed for sub-millisecond reads.

Redshift is for analytics. ElastiCache is a cache, not a primary database.

Practice this question →

187

MCQhard

A company uses Amazon Aurora MySQL-Compatible Edition for its e-commerce platform. During flash sales, the database experiences high write contention on the 'orders' table, causing slow inserts and deadlocks. The development team wants to reduce contention without changing the application code. Which database design strategy is MOST effective?

A.Implement manual sharding across multiple Aurora clusters

B.Add more read replicas to offload read traffic

C.Use a larger instance type with higher IOPS

D.Enable Aurora Multi-Master to allow multiple write nodes

AnswerD

Multi-Master allows concurrent writes, reducing contention.

Why this answer

Option D is correct because using Aurora Auto Scaling with a Multi-Master cluster allows multiple writers to handle writes concurrently, reducing contention. Option A is wrong because read replicas are for read scaling, not write. Option B is wrong because vertical scaling may help but does not address write contention.

Option C is wrong because sharding requires application changes.

Practice this question →

188

MCQeasy

A company needs to store and query graph data (nodes and edges) for a social network. They require low-latency traversals. Which AWS database is best suited?

A.Amazon Neptune

B.Amazon RDS for MySQL

C.Amazon ElastiCache for Redis

D.Amazon DynamoDB

AnswerA

Purpose-built graph database for low-latency traversals.

Why this answer

Option B is correct because Neptune is a purpose-built graph database. Option A is wrong because RDS is relational. Option C is wrong because DynamoDB is key-value.

Option D is wrong because ElastiCache is in-memory cache, not a graph database.

Practice this question →

189

MCQeasy

A startup is building a real-time leaderboard for a gaming application. The data is highly dynamic with frequent updates and requires single-digit millisecond latency. Which database is most suitable?

A.Amazon Neptune

B.Amazon Redshift

C.Amazon DynamoDB

D.Amazon RDS for PostgreSQL

AnswerC

DynamoDB offers consistent single-digit millisecond performance, ideal for real-time leaderboards.

Why this answer

Amazon DynamoDB is the most suitable choice because it is a fully managed NoSQL key-value and document database that delivers single-digit millisecond latency at any scale, making it ideal for real-time leaderboards with high-frequency updates. Its DAX (DynamoDB Accelerator) caching layer can further reduce read latency to microseconds, while its auto-scaling and on-demand capacity modes handle the highly dynamic workload without downtime.

Exam trap

Cisco often tests the misconception that a relational database like PostgreSQL is inherently faster for all real-time workloads, but the trap here is that the question explicitly requires 'single-digit millisecond latency' and 'highly dynamic with frequent updates'—characteristics that DynamoDB's NoSQL architecture is specifically designed to meet, whereas RDS for PostgreSQL would introduce latency from locking, indexing overhead, and connection pooling that prevents it from consistently achieving that performance under high write loads.

How to eliminate wrong answers

Option A is wrong because Amazon Neptune is a graph database optimized for highly connected data (e.g., social networks, fraud detection), not for high-throughput, low-latency key-value access patterns required by a real-time leaderboard. Option B is wrong because Amazon Redshift is a petabyte-scale data warehouse designed for complex analytical queries on large datasets, not for single-digit millisecond transactional updates or real-time point lookups. Option D is wrong because Amazon RDS for PostgreSQL is a relational database that, while capable, introduces overhead from ACID transactions, indexing, and connection management that typically results in higher latency (often 5–20 ms or more) compared to DynamoDB's optimized NoSQL engine, and it does not natively support the auto-scaling or DAX caching needed for such a dynamic workload.

Practice this question →

190

MCQeasy

A financial services company is migrating its Oracle database to Amazon Aurora PostgreSQL. The database runs a critical batch processing job every night that updates millions of rows. The company needs the migration to minimize downtime and ensure data integrity. Which AWS service should the database specialist use to perform the migration?

A.AWS Database Migration Service (AWS DMS) with ongoing replication from Oracle to Aurora PostgreSQL

B.AWS Data Pipeline to export data from Oracle and import into Aurora PostgreSQL

C.AWS Schema Conversion Tool (AWS SCT) to convert the schema and then use native PostgreSQL tools to migrate data

D.AWS Glue to extract data from Oracle and load into Aurora PostgreSQL

AnswerA

AWS DMS can perform a one-time migration and then use ongoing replication to keep the target in sync with the source, minimizing downtime.

Why this answer

AWS DMS with ongoing replication (change data capture, CDC) is the correct choice because it enables a near-zero-downtime migration by continuously replicating changes from the source Oracle database to the target Aurora PostgreSQL while the source remains fully operational. After the initial full load, DMS applies ongoing transactions, allowing you to cut over with minimal interruption. This directly addresses the requirement to minimize downtime for the nightly batch job and ensures data integrity through transactional consistency.

Exam trap

The trap here is that candidates often confuse AWS DMS with ETL tools like Glue or Data Pipeline, assuming any data movement service can handle live migrations, but only DMS provides the transactional consistency and CDC required for near-zero-downtime database migrations.

How to eliminate wrong answers

Option B is wrong because AWS Data Pipeline is a workflow orchestration service, not a database migration tool; it lacks built-in CDC capabilities and would require manual scripting to handle ongoing replication, leading to significant downtime. Option C is wrong because AWS SCT only converts the schema and code, not the data; using native PostgreSQL tools for data migration would require the source database to be offline or heavily throttled, causing unacceptable downtime for the nightly batch job. Option D is wrong because AWS Glue is an ETL service designed for data transformation and analytics, not for transactional database migrations; it does not support ongoing replication (CDC) and would require the source to be quiesced, breaking the requirement for minimal downtime.

Practice this question →

191

MCQmedium

A company is building a real-time chat application using Amazon DynamoDB. Each message has a conversation ID, timestamp, sender, and content. The primary access pattern is to retrieve the most recent 50 messages for a given conversation, ordered by timestamp. Which table design minimizes cost and latency?

A.Use a composite primary key of conversation ID and message ID, and enable DynamoDB Streams to process messages.

B.Use DynamoDB Accelerator (DAX) to cache the most recent messages.

C.Use conversation ID as partition key and timestamp as sort key; query with ScanIndexForward=false and Limit=50.

D.Use conversation ID as partition key and a GSI on timestamp.

AnswerC

Directly supports the access pattern without additional indexes.

Why this answer

Option C is correct because using conversation ID as the partition key and timestamp as the sort key allows a single Query operation with ScanIndexForward=false to retrieve items in reverse chronological order, and Limit=50 ensures only the most recent 50 messages are returned. This design minimizes cost by avoiding full table scans or additional indexes, and minimizes latency by leveraging DynamoDB's native sort key ordering without needing external caching or streams.

Exam trap

The trap here is that candidates may over-engineer the solution by adding unnecessary components like DAX or GSIs, when DynamoDB's native sort key and query parameters directly solve the access pattern with minimal cost and latency.

How to eliminate wrong answers

Option A is wrong because enabling DynamoDB Streams adds cost and complexity without addressing the access pattern; streams are for change data capture, not for efficient querying of recent messages. Option B is wrong because DAX is an in-memory cache that reduces read latency but adds cost and complexity; the primary access pattern can be efficiently served directly from DynamoDB without caching, making DAX unnecessary and more expensive. Option D is wrong because using a GSI on timestamp introduces additional storage and write costs, and the query still requires a ScanIndexForward=false with Limit=50 on the GSI, which is redundant since the base table's sort key already supports the same pattern more efficiently.

Practice this question →

192

MCQhard

A financial services company uses Amazon Aurora MySQL-Compatible Edition for transaction processing. They need to run complex analytical queries on the same data without impacting transactional performance. Which solution meets these requirements?

A.Use Aurora Zero-ETL integration with Amazon Redshift

B.Enable Performance Insights and use RDS Proxy

C.Export data to Amazon S3 and query with Athena

D.Create an Aurora Replica and run analytical queries against it

AnswerA

Zero-ETL integration allows Redshift to query Aurora data directly without impacting performance.

Why this answer

Aurora Zero-ETL integration with Amazon Redshift allows you to run complex analytical queries on transactional data without impacting Aurora's performance. It eliminates the need for extract, transform, and load (ETL) pipelines by automatically replicating data from Aurora to Redshift in near real-time, ensuring that analytical workloads are offloaded to a separate, optimized analytics engine.

Exam trap

The trap here is that candidates often assume an Aurora Replica (Option D) is sufficient for read-heavy analytics, but they overlook that it still shares the same storage subsystem and can cause I/O contention and replication lag under heavy analytical loads.

How to eliminate wrong answers

Option B is wrong because Performance Insights and RDS Proxy are designed for monitoring and connection management, not for offloading analytical queries; they do not prevent analytical workloads from consuming Aurora's compute and I/O resources. Option C is wrong because exporting data to S3 and querying with Athena introduces latency and manual ETL steps, and Athena is optimized for ad-hoc querying of data lakes, not for continuous, complex analytical queries on live transactional data. Option D is wrong because an Aurora Replica shares the same underlying storage and can still impact the primary instance's performance during heavy analytical queries, as it competes for storage I/O and can cause replication lag.

Practice this question →

193

MCQeasy

A company is designing a database for an IoT application that ingests millions of small sensor readings per second. The data is time-series and queries are mostly range scans over time. The company needs a cost-effective solution with high write throughput. Which AWS service should the database specialist recommend?

A.Amazon Redshift with auto-ingest

B.Amazon Timestream

C.Amazon RDS for PostgreSQL with pg_partman extension

D.Amazon DynamoDB with time-series data modeling

AnswerB

Timestream is a serverless time-series database optimized for IoT data.

Why this answer

Amazon Timestream is a purpose-built time-series database designed for IoT and operational applications that ingest millions of data points per second. It automatically manages storage tiers (in-memory and magnetic) to optimize cost, and its query engine is optimized for range scans over time, making it the most cost-effective and high-throughput choice for this workload.

Exam trap

The trap here is that candidates often default to DynamoDB for high-throughput workloads without recognizing that Timestream is the only AWS service purpose-built for time-series data, offering automatic tiering and optimized time-range queries that DynamoDB cannot match without complex custom sharding and indexing.

How to eliminate wrong answers

Option A is wrong because Amazon Redshift is a columnar data warehouse optimized for complex analytical queries on large datasets, not for high-velocity, high-volume time-series ingestion; its auto-ingest feature cannot handle millions of writes per second without significant cost and latency. Option C is wrong because Amazon RDS for PostgreSQL with pg_partman is a relational database that, even with partitioning, cannot sustain millions of writes per second due to single-writer limitations and transaction overhead, and it lacks the specialized storage tiering for time-series data. Option D is wrong because while DynamoDB can be modeled for time-series data, it is not purpose-built for time-series workloads; it requires manual partitioning and TTL management, and its query model is less efficient for range scans over time compared to Timestream's native time-based indexing.

Practice this question →

194

MCQmedium

A company is designing a database for an IoT application that receives millions of sensor readings per second. Each reading is a small JSON payload (timestamp, device_id, metric, value). The primary query pattern retrieves the most recent reading for a given device. Which AWS database service is BEST suited for this workload?

A.Amazon Aurora

B.Amazon DynamoDB

C.Amazon ElastiCache for Redis

D.Amazon RDS for MySQL

AnswerB

Handles high write throughput and supports efficient point queries with sort key.

Why this answer

DynamoDB is a key-value and document database optimized for high-velocity writes and low-latency point lookups. With a primary key of device_id and a sort key of timestamp, the query for the most recent reading can use ScanIndexForward: false and limit 1, which is efficient. RDS and Aurora are relational and not ideal for high-throughput writes.

ElastiCache is a cache, not a durable database. Redshift is for analytics.

Practice this question →

195

MCQeasy

A company is building a social network application that needs to store user profiles, friend relationships, and a feed of posts. The feed queries are complex, involving graph traversals (e.g., friends of friends). Which database is best suited for the relationship data?

A.Amazon DynamoDB

B.Amazon RDS for MySQL

C.Amazon ElastiCache for Redis

D.Amazon Neptune

AnswerD

Neptune is a graph database purpose-built for traversing relationships.

Why this answer

Amazon Neptune is a fully managed graph database service optimized for storing and querying highly connected data. It supports both property graph and RDF models, and it uses graph traversal languages like Gremlin and SPARQL, making it ideal for complex friend-of-friend queries and social network relationship data.

Exam trap

The trap here is that candidates often choose DynamoDB for its scalability or RDS for its familiarity with JOINs, failing to recognize that graph databases are purpose-built for relationship-heavy workloads and that the exam specifically tests the ability to match database types to query patterns.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB is a key-value and document database that does not natively support graph traversals; querying friends of friends would require multiple expensive client-side joins or scans. Option B is wrong because Amazon RDS for MySQL is a relational database that can model relationships with JOINs, but it suffers from performance degradation and complexity as the depth of graph traversals increases, lacking native graph traversal optimizations. Option C is wrong because Amazon ElastiCache for Redis is an in-memory data store primarily used for caching, session management, and simple data structures; while it can store adjacency lists, it does not provide a graph query language or support complex multi-hop traversals efficiently.

Practice this question →

196

Drag & Dropmedium

Arrange the steps to enable encryption at rest for an existing unencrypted Amazon RDS for MariaDB DB instance in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Encryption at rest for an existing instance requires creating an encrypted snapshot and restoring it, then migrating applications.

Practice this question →

197

MCQeasy

A startup is building a social media analytics platform. The workload is write-heavy, with millions of events per day containing user actions (likes, shares, comments). The data model is simple: each event is a JSON document with a timestamp, user ID, and action type. Queries are primarily aggregations over time (e.g., count of likes per hour) and require low-latency responses for dashboards. The team wants to minimize operational overhead and cost. Which database service is most appropriate?

A.Amazon ElastiCache for Redis to store aggregated counts.

B.Amazon RDS for PostgreSQL with TimescaleDB extension.

C.Amazon Timestream, a purpose-built time-series database.

D.Amazon DynamoDB with global secondary indexes on timestamp and action type.

AnswerC

Timestream is designed for high write throughput and time-based aggregations.

Why this answer

Option B is correct. Amazon Timestream is optimized for time-series data, supports high write throughput, and provides built-in aggregation functions for time-based queries. Option A (DynamoDB) is not ideal for time-series aggregations without additional processing.

Option C (RDS for PostgreSQL) can handle the workload but may require more management and scaling effort. Option D (ElastiCache) is a cache, not a durable database.

Practice this question →

198

MCQmedium

A company is building a mobile application that requires users to be able to query their order history quickly. The data is stored in Amazon DynamoDB, and each user has up to 10,000 orders over time. The application needs to support pagination and filtering by order date. What is the MOST efficient way to model this data in DynamoDB?

A.Scan the entire table and filter on user ID

B.Store all orders as a JSON document in a single item per user

C.Use user ID as the partition key and a Global Secondary Index on order date

D.Use user ID as the partition key and order date as the sort key

AnswerD

Allows range queries on order date and efficient pagination.

Why this answer

Option D is correct because using user ID as the partition key ensures all orders for a user are co-located on a single partition, enabling efficient queries. Adding order date as the sort key allows the application to filter and paginate by date range using the Query API with KeyConditionExpression, which is far more efficient than scanning or using a secondary index.

Exam trap

The trap here is that candidates often choose a Global Secondary Index (Option C) thinking it is necessary for date-based filtering, but the sort key on the base table is more efficient and avoids the cost and eventual consistency of a GSI when the partition key already isolates the user's data.

How to eliminate wrong answers

Option A is wrong because scanning the entire table and filtering on user ID would read every item in the table, consuming excessive read capacity and causing high latency, especially as the table grows. Option B is wrong because storing all orders as a JSON document in a single item per user would exceed DynamoDB's 400 KB item size limit when a user has up to 10,000 orders, and it prevents efficient filtering and pagination by order date. Option C is wrong because while a Global Secondary Index (GSI) on order date could support date-based queries, it would require a separate query to retrieve orders for a specific user and would not be as efficient as using the sort key on the base table, which avoids the eventual consistency and additional cost of a GSI.

Practice this question →

199

MCQeasy

A company runs a time-series application that records sensor data every second. The data volume is 500 GB per month and grows continuously. They need to query the last 30 days of data frequently and older data rarely. Which database design is MOST appropriate?

A.Amazon Timestream

B.Amazon RDS for PostgreSQL with partitioning

C.Amazon DynamoDB with TTL

D.Amazon S3 with Athena and partitioning

AnswerA

Timestream is purpose-built for time-series data with automatic tiering.

Why this answer

Amazon Timestream is purpose-built for time-series data, automatically storing recent data in memory for fast queries and moving older data to a cost-optimized store. This matches the workload of frequent queries on the last 30 days and rare queries on older data, with continuous growth at 500 GB/month.

Exam trap

The trap here is that candidates often choose DynamoDB with TTL because they associate TTL with data lifecycle management, but they overlook that DynamoDB lacks native time-series query capabilities and efficient range scans, making it a poor fit for frequent time-based queries.

How to eliminate wrong answers

Option B is wrong because Amazon RDS for PostgreSQL with partitioning requires manual management of partition maintenance, vacuuming, and scaling, and does not natively separate hot and cold storage tiers for time-series data, leading to higher operational overhead and cost for this volume. Option C is wrong because Amazon DynamoDB with TTL only handles data expiration, not efficient range scans or aggregation queries over time-series data; it lacks native time-based query optimization and can result in high read costs for scanning large time ranges. Option D is wrong because Amazon S3 with Athena and partitioning requires running a query engine that incurs per-scan costs and latency, making it unsuitable for frequent sub-second queries on the last 30 days of data, and it lacks a built-in hot/cold storage tier.

Practice this question →

200

MCQmedium

An application is receiving the error shown in the exhibit. The application uses connection pooling. The RDS instance is a db.r5.large with max_connections set to 1000. What is the most likely cause?

A.The security group is blocking incoming connections.

B.The max_connections parameter is set too low for the instance size.

C.The connection pool in the application is not releasing idle connections.

D.The RDS instance is in a different VPC than the application.

AnswerC

Leaked connections accumulate, exhausting the max_connections limit.

Why this answer

Option C is correct because connection pooling can accumulate connections if not properly configured, leading to exhaustion. Option A is wrong because 1000 is the default for db.r5.large. Option B is wrong because network ACLs affect connectivity, not the number of connections once established.

Option D is wrong because the error indicates connections are being accepted but exhausted.

Practice this question →

201

MCQmedium

A company is migrating an on-premises PostgreSQL database to Amazon RDS for PostgreSQL. The database has a large table that is frequently accessed by reporting queries. The reporting queries filter on a column that has a high cardinality but low selectivity. To optimize query performance on this table, which design choice should the database specialist recommend?

A.Partition the table by the filter column

B.Use a read replica to offload reporting queries

C.Increase the provisioned read IOPS for the RDS instance

D.Create a covering index on the filter column

AnswerD

A covering index includes all columns needed, allowing query results to be returned from the index alone.

Why this answer

Option D is correct because a covering index includes all columns needed by the reporting queries, allowing PostgreSQL to satisfy the query entirely from the index without accessing the heap (table) pages. This eliminates the overhead of random I/O for row lookups, which is especially beneficial when filtering on a high-cardinality, low-selectivity column where many rows match but the index scan alone can return the required data. In Amazon RDS for PostgreSQL, this reduces read IOPS consumption and improves query latency.

Exam trap

The trap here is that candidates often choose partitioning (Option A) for any large table with filtering, but fail to recognize that low selectivity means partitioning offers no pruning benefit, while a covering index directly reduces I/O by avoiding heap access.

How to eliminate wrong answers

Option A is wrong because partitioning by a high-cardinality, low-selectivity column would create many partitions with similar row counts, offering minimal pruning benefit and adding management overhead without improving query performance. Option B is wrong because a read replica offloads the query execution but does not optimize the query itself; the same slow table scan or index lookup would still occur on the replica. Option C is wrong because increasing provisioned read IOPS addresses throughput capacity but does not reduce the number of I/O operations required; the query still performs the same inefficient access pattern.

Practice this question →

202

MCQhard

An e-commerce platform uses Amazon RDS for PostgreSQL to store order data. The database has a table "orders" with 500 million rows. The application runs a report query that aggregates daily sales for the last 30 days. The query currently scans the entire table and takes 15 minutes to complete. The team needs to reduce the query time to under 30 seconds. Which solution is MOST cost-effective?

A.Partition the table by month and query only the relevant partitions.

B.Create a materialized view that stores daily sales aggregates and refresh it nightly.

C.Add a composite index on the date column and the sales amount column.

D.Upgrade the RDS instance to a larger size with more vCPUs and memory.

AnswerB

The report reads pre-computed aggregates, reducing query time drastically.

Why this answer

Option B is correct because a materialized view can pre-aggregate the daily sales, and refreshing it daily avoids scanning the full table. Option A is wrong because indexes on date columns help but still require scanning large portions. Option C is wrong because partitioning by month still requires scanning the partition.

Option D is wrong because upgrading to a larger instance is costly and may not achieve sub-30 seconds.

Practice this question →

203

MCQmedium

A company is migrating an on-premises Oracle database to Amazon Aurora PostgreSQL. The database is 1 TB and has complex stored procedures. The migration must be completed within a 4-hour downtime window. Which migration approach is most efficient?

A.Use AWS Schema Conversion Tool (SCT) to convert schema only.

B.Use AWS SCT to convert schema and code, then AWS DMS for data migration.

C.Use Oracle Data Pump to export and pg_restore to import.

D.Use AWS DMS with ongoing replication.

AnswerB

SCT converts schema/code, DMS migrates data.

Why this answer

Option B is correct because AWS SCT converts the Oracle schema and complex stored procedures to Aurora PostgreSQL-compatible code, while AWS DMS performs the full data migration within the 4-hour window. This combination handles both schema/code conversion and bulk data transfer efficiently, meeting the time constraint.

Exam trap

The trap here is that candidates may think DMS alone can handle the entire migration, overlooking that schema and stored procedure conversion is a prerequisite that SCT must address first.

How to eliminate wrong answers

Option A is wrong because using SCT for schema only leaves the stored procedures unconverted, and no data migration is performed, so the database cannot be used. Option C is wrong because Oracle Data Pump and pg_restore are manual, offline tools that require significant downtime for a 1 TB database and do not handle stored procedure conversion automatically, likely exceeding the 4-hour window. Option D is wrong because DMS with ongoing replication alone does not convert the schema or stored procedures; it requires a compatible target schema, which is missing without SCT.

Practice this question →

204

MCQhard

Refer to the exhibit. An application uses Cognito identity pools to authenticate users and dynamodb:LeadingKeys condition to restrict access to items where the partition key matches the user's sub. Some users report that they can see items belonging to other users. What is the most likely cause?

A.The IAM policy does not include a condition for the table

B.The Cognito identity pool is not configured correctly

C.The table ARN is incorrect

D.The ForAllValues:StringEquals should be ForAnyValue:StringEquals

AnswerD

ForAllValues allows access if all request keys match, but if multiple keys are sent, it may allow unintended items. ForAnyValue ensures at least one key matches.

Why this answer

Option B is correct because ForAllValues:StringEquals allows the action if the condition value matches at least one of the request values, but if the request has multiple leading keys (e.g., batch operations), it may pass incorrectly. ForAllValues checks that every key in the request matches the condition; if no keys are present, it also passes. The correct condition should be ForAnyValue:StringEquals to ensure at least one key matches.

Option A is wrong because the condition is set. Option C is wrong unless the condition is missing. Option D is wrong because the audience is not relevant to item-level access.

Practice this question →

205

MCQeasy

A startup is building a mobile app backend with user profiles and social features. They need a database that can handle flexible schemas, high read throughput for user profiles, and strong consistency for friend requests. Which database service should they choose?

A.Amazon RDS for MySQL

B.Amazon Neptune

C.Amazon DynamoDB

D.Amazon DocumentDB

AnswerC

DynamoDB provides flexible schema and high performance with strong consistency.

Why this answer

Option C is correct because Amazon DynamoDB offers flexible schema, high throughput, and optional strong consistency. Option A (RDS MySQL) is relational with fixed schema. Option B (Neptune) is graph-dedicated.

Option D (DocumentDB) is MongoDB-compatible but not as optimized for consistent low-latency reads across a wide range of access patterns.

Practice this question →

206

MCQhard

A company has a multi-player game that uses DynamoDB to store game state. The access pattern is write-heavy, and the game state for each active game session is updated frequently. The team notices throttling on the table during peak hours. The table has a partition key of game_id and no sort key. What design change would best reduce throttling?

A.Use a composite key with a random suffix on the partition key.

B.Enable DynamoDB global tables.

C.Enable DynamoDB Accelerator (DAX) for the table.

D.Increase the provisioned read capacity units (RCU).

AnswerA

Write sharding distributes writes across multiple partitions, reducing hot spots.

Why this answer

Option D is correct because using a write sharding pattern with a suffix distributes writes across partitions, avoiding hot spots. Option A is wrong because increasing read capacity does not help write throttling. Option B is wrong because DynamoDB Accelerator (DAX) only caches reads.

Option C is wrong because global tables replicate data, not reduce throttling.

Practice this question →

207

MCQeasy

A social media startup is selecting a database for user profiles with a flexible schema and high write throughput. The application is built on Node.js and requires low-latency access. Which database should they choose?

A.Amazon Aurora

B.Amazon ElastiCache for Redis

C.Amazon RDS for MySQL

D.Amazon DynamoDB

AnswerD

NoSQL, flexible schema, high throughput.

Why this answer

Amazon DynamoDB is a NoSQL key-value and document database that offers flexible schema and high throughput with low latency. Option A (Aurora) is relational. Option B (RDS MySQL) is relational.

Option D (ElastiCache) is a cache, not a primary database.

Practice this question →

208

MCQhard

A startup is building a real-time analytics dashboard on AWS. The data arrives as time-series events from IoT devices at a rate of 10,000 writes per second. Each event is approximately 1 KB. The dashboard requires sub-second query latency for the last hour of data and must support ad-hoc analytical queries on historical data spanning months. The team needs to design a cost-effective database solution. Which combination of AWS services should be used?

A.Amazon ElastiCache for Redis for real-time queries, and Amazon OpenSearch Service for historical analytics.

B.Amazon DynamoDB with DynamoDB Accelerator (DAX) for real-time queries, and Amazon S3 with Amazon Athena for historical analytics.

C.Amazon Redshift for both real-time and historical queries, using auto-scaling and materialized views.

D.Amazon RDS for PostgreSQL with read replicas for real-time queries, and Amazon Redshift for historical analytics.

AnswerB

DynamoDB handles high write throughput, DAX provides sub-second reads, and S3 with Athena allows cost-effective ad-hoc queries on historical data.

Why this answer

Option A is correct because DynamoDB Accelerator (DAX) provides sub-second read latency for hot data, while Amazon S3 with Athena supports ad-hoc queries on historical data cost-effectively. Option B is wrong because Amazon Redshift is optimized for complex analytics but not for sub-second real-time queries. Option C is wrong because ElastiCache for Redis alone cannot handle the high write throughput and is not designed for analytics.

Option D is wrong because Amazon RDS for PostgreSQL with read replicas is not designed for 10,000 writes per second and querying historical data would be expensive and slow.

Practice this question →

209

MCQeasy

A company is building a real-time analytics dashboard for IoT sensor data. The data arrives as JSON and needs to be stored in a way that supports fast ingestion and complex queries. Which database service is best suited?

A.Amazon RDS for PostgreSQL

B.Amazon DynamoDB with TTL

C.Amazon Timestream

D.Amazon Redshift

AnswerC

Timestream is purpose-built for time-series data with fast ingestion and analytical functions.

Why this answer

Amazon Timestream is a time-series database optimized for IoT data, with fast ingestion and built-in analytics functions. DynamoDB is not optimized for time-series queries. RDS is not designed for high-velocity time-series data.

Redshift is for batch analytics.

Practice this question →

210

MCQhard

A financial services company is designing a ledger system using Amazon QLDB. The application records transactions that must never be modified or deleted. The company expects high write throughput and needs to ensure that the ledger can handle the load without throttling. Which design consideration is MOST important to achieve this?

A.Partition the ledger table by transaction date to distribute write load.

B.Create multiple ledgers and distribute writes across them.

C.Enable auto-scaling on the ledger to handle bursts of traffic.

D.Batch multiple document inserts into a single transaction to reduce the number of transactions.

AnswerD

Batching reduces the number of transactions, helping to stay within throughput limits.

Why this answer

Option D is correct because Amazon QLDB charges per transaction (document insert, update, or delete) and has a maximum throughput limit of 1,000 transactions per second per ledger. By batching multiple document inserts into a single transaction, you reduce the number of transactions, thereby staying within the throughput limit while still achieving high write throughput. This approach directly addresses the need to avoid throttling without sacrificing the immutability requirements of the ledger system.

Exam trap

The trap here is that candidates often assume QLDB supports auto-scaling like DynamoDB or Aurora, but QLDB has a fixed throughput limit and requires batching to handle high write loads without throttling.

How to eliminate wrong answers

Option A is wrong because QLDB is a fully managed ledger database that automatically partitions data; manual partitioning by transaction date is not supported and would not distribute write load. Option B is wrong because creating multiple ledgers increases operational complexity and does not inherently increase write throughput per ledger; QLDB's throughput limit applies per ledger, and distributing writes across ledgers would require application-level sharding, which is not a recommended design for a single ledger system. Option C is wrong because QLDB does not support auto-scaling; it has a fixed throughput limit of 1,000 transactions per second per ledger, and enabling auto-scaling is not a feature available in QLDB.

Practice this question →

211

Multi-Selecteasy

A company is building a microservices architecture and needs a database for a service that stores JSON documents with variable schema. The database must support high availability and automatic scaling. Which TWO services meet these requirements? (Choose two.)

Select 2 answers

A.Amazon ElastiCache for Redis

B.Amazon DynamoDB

C.Amazon DocumentDB

D.Amazon Neptune

E.Amazon RDS for MySQL

AnswersB, C

DynamoDB supports JSON documents, high availability, and auto scaling.

Why this answer

Amazon DynamoDB is correct because it is a fully managed NoSQL key-value and document database that natively supports JSON documents with variable schema, offers high availability through multi-AZ replication, and provides automatic scaling via its on-demand capacity mode or auto-scaling policies. It is ideal for microservices architectures that require low-latency, serverless, and elastic throughput.

Exam trap

AWS often tests the misconception that any database supporting JSON (like MySQL with JSON data type) qualifies as a document database for variable schema workloads, but the key differentiator is automatic scaling and native document store capabilities, which DynamoDB and DocumentDB provide, while RDS does not.

Practice this question →

212

MCQhard

A social media company uses Amazon DynamoDB to store user posts. The table has a partition key of 'user_id' and a sort key of 'post_timestamp'. Each item is about 10 KB. The application needs to retrieve all posts for a given user within a date range. The company recently added a new feature that allows users to 'like' posts, and they store the like count as an attribute in the post item. The like count is updated frequently. The application experiences high write throttling on the table. The table has 1000 WCUs provisioned. The write pattern is bursty. Which design change would MOST effectively reduce write throttling?

A.Increase the WCUs to 5000.

B.Enable DynamoDB Accelerator (DAX) to cache writes.

C.Add a random suffix to the user_id partition key to distribute writes across multiple partitions.

D.Create a Global Secondary Index (GSI) on the like count attribute.

AnswerC

Sharding spreads the write load evenly across partitions, reducing throttling.

Why this answer

Option C is correct because the write throttling is caused by a 'hot partition' — all writes for a given user_id go to the same partition, and the bursty write pattern (e.g., many likes on a single post) exceeds the partition's 1,000 WCU limit (1,000 write capacity units per partition). Adding a random suffix to the user_id partition key distributes writes across multiple partitions, effectively increasing the write throughput for that logical user's data. This is a common design pattern for DynamoDB to handle high-traffic items without increasing provisioned capacity.

Exam trap

The trap here is that candidates often assume increasing provisioned capacity (Option A) is the universal fix for throttling, but AWS specifically tests the understanding that DynamoDB's partition-level throughput limits require data distribution changes, not just capacity increases.

How to eliminate wrong answers

Option A is wrong because simply increasing WCUs to 5000 does not solve the hot partition issue — the writes are still concentrated on a single partition key (user_id), and a single partition can only handle up to 1,000 WCUs (or 3,000 if using burst capacity), so throttling will persist. Option B is wrong because DynamoDB Accelerator (DAX) is an in-memory cache for reads, not writes — it does not absorb or buffer write requests, so it cannot reduce write throttling. Option D is wrong because creating a Global Secondary Index (GSI) on the like count attribute does not affect the base table's write capacity or partition distribution; GSIs have their own write capacity and are used for querying, not for alleviating write contention on the base table.

Practice this question →

213

MCQhard

A company is designing a disaster recovery plan for an Amazon DynamoDB table that stores critical session data. The table is provisioned with on-demand capacity. The recovery objective is to have the data available in another AWS Region within 15 minutes of a regional outage. Which design should they choose?

A.Use DynamoDB on-demand backups and restore to another Region.

B.Use DynamoDB Streams to replicate data to a table in another Region via AWS Lambda.

C.Use DynamoDB Global Tables to replicate data across Regions.

D.Create cross-Region Read Replicas for DynamoDB.

AnswerC

Global Tables provide active-active replication across Regions.

Why this answer

Option C is correct because DynamoDB Global Tables provide multi-Region, fully replicated tables with automatic conflict resolution, enabling active-active replication that meets the 15-minute recovery objective without manual intervention. Global Tables replicate data across Regions in sub-second latency, ensuring data availability within the required RTO during a regional outage.

Exam trap

The trap here is that candidates confuse DynamoDB Global Tables with cross-Region Read Replicas (which exist in RDS but not DynamoDB) or assume that on-demand backups can meet a 15-minute RTO, ignoring the manual restore time and lack of continuous replication.

How to eliminate wrong answers

Option A is wrong because on-demand backups are point-in-time snapshots that require manual restore to another Region, which typically takes longer than 15 minutes and does not provide continuous replication for real-time availability. Option B is wrong because DynamoDB Streams with AWS Lambda introduces eventual consistency and potential replication lag that can exceed 15 minutes, and it requires custom code for conflict resolution and error handling, making it less reliable for strict RTOs. Option D is wrong because DynamoDB does not support cross-Region Read Replicas; this feature is available in Amazon RDS (e.g., Aurora, MySQL) but not in DynamoDB, which uses Global Tables for multi-Region replication.

Practice this question →

214

MCQhard

A gaming company uses Amazon DynamoDB to store player profiles. The table has partition key 'player_id' and sort key 'game_id'. During a new game launch, write traffic to a subset of players (influencers) spikes, causing throttling. The table uses on-demand capacity. Which solution resolves the hot key issue?

A.Increase the maximum read capacity units in the on-demand settings

B.Switch to provisioned capacity mode and increase write capacity units

C.Add a random suffix to the partition key for the hot players to distribute writes

D.Enable DynamoDB Accelerator (DAX) to cache writes

AnswerC

Shuffling hot keys across partitions resolves hot key throttling.

Why this answer

Option C is correct because adding a suffix to the partition key for hot keys distributes writes across multiple partitions. Option A (increase read capacity) doesn't help writes. Option B (DAX) is for reads.

Option D (change to provisioned) doesn't solve hot key; adaptive capacity works in both modes.

Practice this question →

215

MCQhard

A financial services company uses Amazon DynamoDB to store transaction records. The table has a partition key of 'AccountId' and a sort key of 'TransactionDate'. The company needs to run analytical queries that aggregate transactions by account and month. Currently, queries are slow due to full table scans. Which design change will improve query performance most effectively?

A.Add DynamoDB Accelerator (DAX) to the table.

B.Change the table's sort key from TransactionDate to Month.

C.Enable DynamoDB Streams and process the stream with AWS Lambda to pre-aggregate results.

D.Create a Global Secondary Index (GSI) with partition key AccountId and sort key Month.

AnswerD

Allows efficient aggregation queries using the GSI.

Why this answer

Option B is correct because creating a Global Secondary Index (GSI) with AccountId as partition key and Month as sort key allows efficient querying without scans. Option A is wrong because changing the sort key to Month loses the ability to query by exact date. Option C is wrong because enabling DynamoDB Streams does not improve query performance.

Option D is wrong because DynamoDB Accelerator (DAX) speeds up reads but does not change the query pattern.

Practice this question →

216

Multi-Selectmedium

A company is designing a database for a global e-commerce platform that requires low-latency reads and writes from multiple AWS Regions. The database must support ACID transactions and complex queries with joins. Which TWO services should they consider? (Choose two.)

Select 2 answers

A.Amazon DynamoDB with Global Tables

B.Amazon ElastiCache for Redis with global datastore

C.Amazon RDS for MySQL with cross-Region read replicas

D.Amazon Aurora with Aurora Global Database

E.Amazon Redshift with cross-Region snapshots

AnswersA, D

DynamoDB Global Tables provide multi-region low-latency writes and reads, but lack complex joins.

Why this answer

For global low-latency with ACID transactions and complex queries, Amazon Aurora (especially with Global Database) is a strong choice. DynamoDB is not relational and does not support complex joins natively. RDS Multi-AZ is single-region.

ElastiCache is not a database. Redshift is for analytics. The correct answers are Aurora (which supports Global Database for multi-region) and possibly RDS with cross-Region replication, but RDS does not have native global database capability like Aurora.

However, the question says 'which TWO services', and the best two are Aurora and DynamoDB? But DynamoDB does not support complex joins. Option A (DynamoDB) is often used for global scale but lacks joins. Option B (Aurora) is the best fit.

Option C (RDS) can be used with cross-Region read replicas but not for writes. Option D (ElastiCache) is cache. Option E (Redshift) is for analytics.

The only viable services for ACID and joins are Aurora and possibly RDS if they accept eventual consistency? But the question says 'low-latency reads and writes' and 'global', so Aurora Global Database is the best. The second could be DynamoDB if they use serverless and global tables, but the question explicitly says 'complex queries with joins', which DynamoDB does not support. Therefore, the correct pair is likely Aurora and something else? Actually, there is no other service that fully meets all requirements.

Perhaps the answer is Aurora and RDS? But RDS does not support multi-region writes. The most reasonable is to select Aurora and DynamoDB for different workloads, but the stem implies a single database. Given the constraints, the best two are Aurora (for relational) and DynamoDB (for non-relational), but they are different paradigms.

However, the exam may expect Aurora and DynamoDB as two services for different parts of the application. Alternatively, the correct answer might be Aurora and RDS with cross-Region replication? But RDS does not have global tables. I'll go with Aurora and DynamoDB as the two services that can be used together to meet the requirements: DynamoDB for high-speed key-value access and Aurora for complex queries.

The question says 'which TWO services should they consider', implying they may use both. So I'll choose A and B.

Practice this question →

217

Multi-Selectmedium

A company is designing a database solution for a global user base that requires single-digit millisecond read latency for user profile data. The data is eventually consistent and can tolerate a few seconds of staleness. Which TWO AWS services or features should be combined to achieve this latency?

Select 2 answers

A.Amazon ElastiCache for Redis with global datastore.

B.Amazon DynamoDB Global Tables.

C.Amazon CloudFront with a custom origin pointing to DynamoDB.

D.Amazon RDS for MySQL with cross-Region read replicas.

E.Amazon Aurora Global Database.

AnswersB, C

Global Tables replicate data across regions, enabling low-latency local reads.

Why this answer

Amazon DynamoDB Global Tables provides a fully managed, multi-Region, multi-active database that replicates data across AWS Regions with sub-second latency, enabling single-digit millisecond reads for user profile data. Combined with Amazon CloudFront as a CDN, you can cache DynamoDB responses at edge locations, further reducing read latency for a global user base while tolerating eventual consistency and a few seconds of staleness.

Exam trap

The trap here is that candidates may assume Amazon ElastiCache or Aurora Global Database are required for single-digit millisecond latency, overlooking how DynamoDB Global Tables combined with CloudFront caching can achieve this without the complexity of managing a separate cache layer or dealing with cross-Region replication lag.

Practice this question →

218

MCQeasy

A company is building a document management system where each document can have multiple tags and users need to query documents by any combination of tags. The number of tags per document is up to 20, and the total number of documents is expected to be 50 million. Which database design is most appropriate for this flexible tag-based querying?

A.Amazon DynamoDB with a global secondary index on the tag attribute

B.Amazon RDS for MySQL with a normalized schema

C.Amazon Neptune

D.Amazon ElastiCache for Memcached

AnswerA

DynamoDB scales easily and supports flexible tag queries.

Why this answer

Amazon DynamoDB with a global secondary index on the tag attribute allows efficient querying by tag. To query multiple tags, application-side intersection is needed. Option A (RDS with multiple join tables) is wrong because it introduces complex joins and is less scalable.

Option C (ElastiCache) is wrong because it is not a persistent database for primary storage. Option D (Neptune) is wrong because although graph databases can handle tags, they are not the most straightforward for simple tag queries.

Practice this question →

219

MCQeasy

A startup is building a social media application that requires a database to store user relationships (followers, following) and support graph queries. The data volume is expected to grow to tens of terabytes. Which AWS database service is most suitable for this workload?

A.Amazon RDS for MySQL with self-joins.

B.Amazon Redshift.

C.Amazon DynamoDB with adjacency list design.

D.Amazon Neptune.

AnswerD

Neptune is a purpose-built graph database.

Why this answer

Amazon Neptune is a fully managed graph database service optimized for storing and querying highly connected data, such as social media user relationships (followers, following). It supports both property graph and RDF models, enabling efficient graph traversal queries using Gremlin or SPARQL, which is ideal for this workload. Neptune scales to tens of terabytes and provides low-latency query performance for complex graph patterns, making it the most suitable choice.

Exam trap

The trap here is that candidates often choose DynamoDB (Option C) because they associate it with NoSQL scalability, but they overlook that adjacency list designs in DynamoDB require multiple queries and client-side logic for graph traversals, making it unsuitable for deep or multi-hop relationship queries at scale.

How to eliminate wrong answers

Option A is wrong because Amazon RDS for MySQL with self-joins is a relational database that does not natively support graph traversal operations; self-joins become exponentially slower and more complex as the depth of relationships increases, leading to poor performance at tens of terabytes. Option B is wrong because Amazon Redshift is a columnar data warehouse designed for analytical queries on large datasets, not for real-time graph queries or transactional relationship storage, and it lacks native graph traversal capabilities. Option C is wrong because Amazon DynamoDB with adjacency list design can model simple one-to-many relationships but is not optimized for multi-hop graph traversals; queries like 'find followers of followers' require multiple round trips and client-side joins, resulting in high latency and complexity at scale.

Practice this question →

220

MCQmedium

A company uses Amazon DynamoDB for a session management workload. The access pattern is random and requires single-digit millisecond latency. The table has a read capacity of 5000 RCU. During peak hours, read requests occasionally exceed this capacity, causing throttling. Which design change is most appropriate to handle traffic spikes?

A.Switch to DynamoDB on-demand mode.

B.Add a global secondary index with different partition key.

C.Enable DynamoDB auto scaling for reads.

D.Add a DAX cluster to cache read requests.

AnswerC

Dynamically adjusts capacity to handle spikes.

Why this answer

Option C is correct because DynamoDB auto scaling allows the table to dynamically adjust its provisioned read capacity (RCU) in response to traffic spikes, preventing throttling while maintaining single-digit millisecond latency. This is the most appropriate design change for a session management workload with random access patterns, as it handles occasional bursts without requiring manual intervention or architectural changes.

Exam trap

The trap here is that candidates often choose DAX (Option D) thinking it solves throttling by caching, but DAX only reduces read load if the same items are frequently requested—random access patterns with low cache hit rates make DAX ineffective for preventing throttling, and it does not increase the table's RCU limit.

How to eliminate wrong answers

Option A is wrong because switching to DynamoDB on-demand mode would eliminate throttling but introduces unpredictable costs and may not be cost-effective for a workload with a baseline of 5000 RCU and only occasional spikes; on-demand is designed for unpredictable or new workloads, not for optimizing cost in a known pattern. Option B is wrong because adding a global secondary index (GSI) with a different partition key does not address read capacity throttling on the base table; GSIs have their own read/write capacity and are used for alternative query patterns, not for scaling existing read throughput. Option D is wrong because adding a DAX cluster caches read requests to reduce latency and offload reads from the table, but it does not increase the provisioned read capacity; if the cache misses or the DAX cluster itself is overwhelmed, throttling can still occur on the underlying table.

Practice this question →

221

MCQmedium

A company is running a MySQL database on Amazon RDS for a customer relationship management (CRM) application. The database has a table named 'contacts' with over 100 million rows. The application frequently runs queries to find contacts by email address. The email column has a B-tree index. Recently, the application started experiencing slow query performance. The team checked CloudWatch metrics and saw that the ReadIOPS for the RDS instance is consistently at 80% of the provisioned IOPS limit. The instance type is db.r5.large with 3000 provisioned IOPS (gp2). The buffer pool hit ratio is 95%. What is the most cost-effective design change to improve query performance?

A.Upgrade the RDS instance to db.r5.xlarge with 6000 provisioned IOPS.

B.Migrate the contacts table to Amazon DynamoDB with email as partition key.

C.Implement an Amazon OpenSearch Service cluster for email search.

D.Increase the buffer pool size by changing to a memory-optimized instance.

AnswerA

Increases IOPS capacity, reducing IO bottleneck.

Why this answer

Option B is correct because moving to a db.r5.xlarge with 6000 provisioned IOPS (gp2) doubles IOPS capacity, reducing IO wait. Option A is wrong because DynamoDB requires application changes. Option C is wrong because Elasticsearch is additional complexity and cost.

Option D is wrong because the buffer pool hit ratio is already high; increasing memory won't help much.

Practice this question →

222

MCQeasy

A startup is building a mobile application that needs to store user profiles and preferences. The data is schema-less and will grow rapidly. The application requires single-digit millisecond latency for reads and writes. Which AWS database should they choose?

A.Amazon Aurora (MySQL compatible)

B.Amazon Redshift

C.Amazon RDS for SQL Server

D.Amazon DynamoDB

AnswerD

DynamoDB supports schema-less design and low-latency access.

Why this answer

Amazon DynamoDB is a fully managed NoSQL database designed for low-latency, schema-less data at any scale. Option A (Aurora) is relational and requires predefined schema. Option C (Redshift) is for analytics.

Option D (RDS SQL Server) is relational and not optimized for rapid scaling of schema-less data.

Practice this question →

223

MCQhard

A company runs a critical e-commerce platform on Amazon Aurora MySQL. The database is 2 TB and experiences a sudden spike in write latency during flash sales. The application uses auto-generated UUIDs as primary keys. The CPU utilization on the writer instance is 80%, and the read replicas show low utilization. Write latency has increased from 5 ms to 200 ms. The company needs to reduce write latency with minimal application changes. Which course of action is MOST effective?

A.Implement sharding across multiple Aurora clusters.

B.Change the primary key to an auto-increment BIGINT and recreate indexes.

C.Add more read replicas and redirect write-heavy queries to replicas.

D.Upgrade the writer instance to a larger instance type with more IOPS.

AnswerB

Sequential keys reduce index page splits, improving write performance.

Why this answer

B is correct because UUID primary keys cause random writes that fragment the B-tree index, leading to frequent page splits and high write latency. Changing to an auto-increment BIGINT allows sequential writes, which fill index pages contiguously and reduce the write amplification that drives latency from 5 ms to 200 ms. This requires no application logic changes beyond the schema migration, making it the most effective minimal-change solution.

Exam trap

The trap here is that candidates assume scaling compute or storage (Option D) is the universal fix for write latency, but the exam specifically tests the impact of primary key design on index write amplification in Aurora MySQL.

How to eliminate wrong answers

Option A is wrong because sharding across multiple Aurora clusters adds significant application complexity (e.g., distributed transactions, cross-cluster joins) and does not address the root cause of random-write overhead from UUIDs. Option C is wrong because read replicas cannot handle write traffic; they only serve read queries, so redirecting write-heavy queries to replicas is impossible and would not reduce write latency on the writer instance. Option D is wrong because upgrading the instance type with more IOPS only masks the symptom of high write latency; it does not fix the underlying index fragmentation caused by UUID primary keys, so the latency will persist after the upgrade.

Practice this question →

224

Multi-Selectmedium

Which THREE factors should be considered when selecting a database for a time-series workload (e.g., IoT sensor data) that requires high write throughput and efficient data retention?

Select 3 answers

A.Normalize the schema to reduce data duplication.

B.Use Amazon RDS Proxy to manage database connections.

C.Configure automatic data expiration using TTL (Time-to-Live).

D.Partition the table by time intervals (e.g., hourly or daily).

E.Use Amazon Timestream for its built-in time-series optimizations.

AnswersC, D, E

TTL automates data retention.

Why this answer

Options A, C, and D are correct. Amazon Timestream is purpose-built for time-series data and supports data retention policies. Partitioning by time in a relational database can improve query performance.

Configuring automatic data expiration using TTL is essential for managing data lifecycle. Option B (normalization) is not typically beneficial for time-series workloads as it can degrade write performance. Option E (using Amazon RDS Proxy) is for connection pooling, not specific to time-series.

Practice this question →

225

MCQeasy

A startup is building a real-time leaderboard for a mobile game using Amazon DynamoDB. The leaderboard must update frequently and support global access with low latency. Which database design approach is most suitable?

A.Use Amazon DynamoDB global tables with appropriate partition key design.

B.Use Amazon ElastiCache for Redis with replication across Regions.

C.Use Amazon Aurora Global Database with a single writer and multiple readers.

D.Use Amazon S3 with event notifications to update a leaderboard file.

AnswerA

Provides low-latency global access and high throughput.

Why this answer

Amazon DynamoDB global tables provide multi-Region, fully managed, multi-master replication, which is ideal for a real-time leaderboard requiring frequent updates and low-latency global access. By designing an appropriate partition key (e.g., game ID or time-based composite key), you can distribute write traffic evenly and avoid hot partitions, ensuring consistent performance under high update frequency.

Exam trap

The trap here is that candidates often assume a caching layer like ElastiCache is always the best for low-latency global access, but they overlook the need for multi-Region write capability and the inherent limitations of Redis cross-Region replication for high-frequency updates.

How to eliminate wrong answers

Option B is wrong because Amazon ElastiCache for Redis with replication across Regions is not natively multi-master; cross-Region replication requires additional tooling (e.g., Global Datastore for Redis) and does not offer the same strong consistency or automatic conflict resolution as DynamoDB global tables for frequent writes. Option C is wrong because Amazon Aurora Global Database is designed for relational workloads with a single writer and multiple readers, which cannot handle the high-velocity, concurrent writes required by a real-time leaderboard without introducing write bottlenecks and latency. Option D is wrong because Amazon S3 with event notifications is not a real-time database; it introduces significant latency for updates and lacks the low-latency query capabilities needed for a live leaderboard, making it unsuitable for frequent updates and global access.

Practice this question →

← PreviousPage 3 of 6 · 444 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Db Design questions.

Start 20-question session