Knowledge + Practice

CCNA Troubleshooting Questions

75 of 300 questions · Page 1/4 · Troubleshooting topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQhard

A company notices that its Aurora MySQL cluster has a high number of locks and deadlocks. The application uses read replicas for read scaling. What is the MOST likely cause?

A.Performance Insights is enabled

B.The writer and reader instances are of different sizes

C.Long-running transactions on the writer instance

D.Read replicas are performing write operations

AnswerC

Long transactions hold locks, increasing contention and deadlock probability.

Why this answer

Option B is correct because long-running transactions on the writer hold locks, increasing deadlocks. Option A is wrong because read replicas do not cause locks on writer. Option C is wrong because different instance sizes do not cause deadlocks.

Option D is wrong because Performance Insights is diagnostic, not a cause.

Practice this question →

2

MCQhard

A company uses Amazon DynamoDB with global tables. During a regional failure, the application in the secondary region experiences higher latency and throttling. The DynamoDB table's WriteCapacityUnits are set to 10000 in both regions. Which action should be taken to reduce throttling during failover?

A.Switch the table to on-demand capacity mode

B.Enable DynamoDB auto scaling for write capacity in both regions

C.Disable global tables and use application-level replication

D.Increase the write capacity of the secondary region to 20000

AnswerB

Auto scaling adjusts capacity automatically to handle traffic spikes and reduce throttling.

Why this answer

Option C is correct because enabling auto scaling allows DynamoDB to adjust capacity based on traffic patterns. Option A is wrong because increasing only the secondary region may not be sufficient without auto scaling. Option B is wrong because turning off global tables would prevent replication.

Option D is wrong because switching to on-demand is expensive and not the best practice for predictable workloads.

Practice this question →

3

MCQhard

A developer receives a 'ResourceNotFoundException' when trying to describe a DynamoDB table. The developer runs the command shown in the exhibit and gets the output. What is the most likely cause?

A.The developer is using a different AWS region or the table name has incorrect case.

B.The table is not in ACTIVE state.

C.The table ARN is incorrect.

D.The developer does not have permission to describe the table.

AnswerA

Correct. The error likely stems from using the wrong region or incorrect table name casing.

Why this answer

Option C is correct because the table is in us-west-2, but the developer may be querying from a different region or the table name is case-sensitive. Option A is wrong because the table is ACTIVE. Option B is wrong because the ARN is present.

Option D is wrong because the command succeeded.

Practice this question →

4

MCQmedium

A company uses Amazon DynamoDB global tables for a multi-region application. They notice that writes in one region are not appearing in another region after several minutes. What should they check first?

A.Check the ReplicationLatency metric in Amazon CloudWatch

B.Verify that auto scaling is configured identically in both regions

C.Ensure DynamoDB Streams are enabled on the table

D.Check the table size in both regions

AnswerA

This metric shows the lag between regions.

Why this answer

Option C is correct because ReplicationLatency metric shows delay between regions. Option A is wrong because Streams are used for replication; check the metric first. Option B is wrong because auto scaling affects capacity, not replication.

Option D is wrong because table size doesn't directly cause replication delay.

Practice this question →

5

Multi-Selectmedium

A database specialist is troubleshooting an Amazon DynamoDB table that is experiencing high throttling on write requests. The table has on-demand capacity and uses a composite primary key (partition key and sort key). Which THREE actions should the specialist take to identify and resolve the issue?

Select 3 answers

A.Examine the partition key value distribution to identify hot partitions

B.Implement DynamoDB Accelerator (DAX) to offload read traffic

C.Change the table to provisioned capacity mode

D.Increase the read capacity units on the table

E.Review Amazon CloudWatch metrics for 'WriteThrottleEvents' and 'ConsumedWriteCapacityUnits'

AnswersA, B, E

Uneven distribution causes throttling on hot partitions.

Why this answer

Option A is correct because high throttling on write requests in DynamoDB often results from uneven partition key value distribution, creating 'hot partitions' that exceed the per-partition throughput limits. By examining the distribution, the specialist can identify which keys are causing the bottleneck and then apply strategies like write sharding or adjusting the partition key design to spread traffic evenly.

Exam trap

The trap here is that candidates may confuse read and write capacity units or assume that on-demand capacity eliminates all throttling, when in fact hot partitions can still cause throttling regardless of the capacity mode.

Practice this question →

6

Multi-Selectmedium

A company is troubleshooting a performance issue with an Amazon RDS for MariaDB instance. The CloudWatch metric 'ReadIOPS' is consistently high, but 'WriteIOPS' is low. Which TWO actions could help improve read performance?

Select 2 answers

A.Add a read replica to offload read queries.

B.Increase the allocated storage to improve IOPS.

C.Use Amazon ElastiCache to cache query results.

D.Increase the DB instance class to one with more memory.

E.Enable Multi-AZ to improve read performance.

AnswersA, D

Read replicas handle read traffic, reducing load on the primary.

Why this answer

Adding a read replica offloads read traffic from the primary. Increasing the instance size can provide more memory for caching, reducing read IO. Option C (increasing storage) may help if it increases IOPS, but not necessarily.

Option D (enabling Multi-AZ) does not improve read performance. Option E (using Amazon ElastiCache) is a valid approach but not specific to RDS.

Practice this question →

7

MCQmedium

A company is using Amazon ElastiCache for Redis as a caching layer. The application performance degrades when cache misses increase. Which metric should be monitored to track the cache hit rate?

A.CurrConnections

B.CacheHits and CacheMisses

C.CPUUtilization

D.Evictions

AnswerB

CacheHits and CacheMisses are used to calculate hit rate.

Why this answer

Option D is correct because CacheHits and CacheMisses together give the hit rate. Option A is wrong because CPUUtilization is for resource usage. Option B is wrong because Evictions indicates memory pressure, not hit rate.

Option C is wrong because CurrConnections shows active connections.

Practice this question →

8

MCQmedium

A team is using Amazon RDS for Oracle with an option group that includes the Oracle Enterprise Manager (OEM) option. After modifying the option group to add a new option, the DB instance is stuck in the 'modifying' state for an extended period. What should the team do?

A.Reboot the DB instance to complete the modification.

B.Contact AWS Support to force the modification.

C.Create a new DB instance with the desired options and migrate the data.

D.Modify the DB instance again to reset the state.

AnswerA

Some option changes require a reboot to take effect.

Why this answer

Option A is correct because adding certain options may require a reboot. Option B is incorrect because the option group modification is likely valid; the issue is that it requires a reboot. Option C is incorrect because modifying the DB instance again would not help.

Option D is incorrect because the DB instance is not in a failed state.

Practice this question →

9

MCQhard

Refer to the exhibit. An IAM policy is attached to a user. The user reports that they cannot delete the production-db database. Which statement best explains the behavior?

A.An explicit Deny statement prevents the deletion of the production-db instance

B.The user needs additional permissions to delete any DB instance

C.The user does not have permission to describe DB instances

D.The user does not have permission to create a DB instance

AnswerA

Explicit Deny overrides Allow.

Why this answer

Option C is correct because an explicit Deny overrides any Allow. The Deny statement specifically denies DeleteDBInstance on that resource. Option A is wrong because the policy allows CreateDBInstance.

Option B is wrong because DescribeDBInstances is allowed. Option D is wrong because the Deny is explicitly on the production-db ARN.

Practice this question →

10

MCQeasy

A company is using Amazon DynamoDB and has enabled DynamoDB Streams. The application needs to process stream records in real-time. Which AWS service can be used to invoke a Lambda function automatically for each stream record?

A.Amazon Kinesis Data Firehose

B.Amazon Simple Queue Service (SQS)

C.AWS Step Functions

D.AWS Lambda

AnswerD

Lambda can be configured as a trigger for DynamoDB Streams to process each stream record.

Why this answer

Option B is correct because Lambda can be triggered directly from DynamoDB Streams. Option A is wrong because Kinesis Data Firehose is for loading streaming data into destinations, not for triggering Lambda. Option C is wrong because SQS is for message queuing, not directly from DynamoDB Streams.

Option D is wrong because Step Functions coordinates workflows, but does not directly trigger from DynamoDB Streams without Lambda.

Practice this question →

11

MCQmedium

A database specialist is investigating a sudden increase in Amazon RDS for PostgreSQL connections. The DB instance's CloudWatch metric DatabaseConnections shows a spike from 100 to 500 within minutes. The application connects using a connection pool. Which step should the specialist take first to mitigate the issue while preserving application availability?

A.Use the RDS console to terminate all active connections and then restart the database.

B.Modify the security group to restrict inbound traffic to the database.

C.Increase the DB instance size to handle more connections.

D.Modify the DB parameter group to reduce the max_connections value and reboot the instance to apply changes.

AnswerD

Lowering max_connections limits the number of concurrent connections, preventing overload.

Why this answer

Option C is correct because reducing the maximum connections in the parameter group and rebooting immediately limits the active connections, preventing the database from being overwhelmed. Option A is wrong because modifying the security group does not affect the number of connections. Option B is wrong because increasing the instance size may help but takes time to provision.

Option D is wrong because terminating all connections will disrupt the application.

Practice this question →

12

Multi-Selecteasy

A company is using Amazon RDS for MySQL and has enabled Enhanced Monitoring. The database administrator wants to identify the top contributors to disk I/O. Which THREE metrics from Enhanced Monitoring should they examine?

Select 3 answers

A.DirtyBufferFlushRate

B.NetworkThroughput

C.LogicalReads

D.WriteOps

E.PhysicalReads

AnswersA, D, E

Indicates how often dirty buffers are written to disk.

Why this answer

Option A is correct because physical reads cause disk I/O. Option C is correct because write operations generate I/O. Option D is correct because dirty buffer flushes cause writes.

Option B is incorrect because logical reads are from memory, not disk. Option E is incorrect because network throughput is not disk I/O.

Practice this question →

13

MCQeasy

A database specialist is troubleshooting a slow Amazon RDS for PostgreSQL query. The specialist has enabled Performance Insights and sees that the database load is high. Which additional tool can provide detailed information about the specific queries causing the load?

A.Use VPC Flow Logs to analyze network traffic to the database.

B.Use Amazon CloudWatch Logs to analyze the PostgreSQL error logs.

C.Use Enhanced Monitoring to view OS-level metrics and correlate with performance insights.

D.Use AWS CloudTrail to view database API calls.

AnswerC

Enhanced Monitoring provides OS metrics that help diagnose resource contention.

Why this answer

Option B is correct because Enhanced Monitoring provides OS-level metrics, while Performance Insights provides query-level details. Option A is wrong because CloudWatch Logs is for log files. Option C is wrong because AWS CloudTrail is for API calls.

Option D is wrong because VPC Flow Logs capture network traffic.

Practice this question →

14

Multi-Selecthard

A company is running a critical Oracle database on Amazon RDS. The DBA wants to set up monitoring to detect if the database is experiencing a high number of full table scans, which may indicate missing indexes. Which TWO metrics should the DBA monitor? (Choose TWO.)

Select 2 answers

A.TableScanRows

B.FullTableScans

C.BufferCacheHitRatio

D.UserCommits

E.RedoLogSpaceUsage

AnswersA, B

Shows the number of rows scanned in full table scans.

Why this answer

Options B and D are correct. FullTableScans (B) directly counts full scan operations. TableScanRows (D) indicates the number of rows read via full scans.

Option A is wrong because BufferCacheHitRatio indicates cache efficiency, not full scans. Option C is wrong because UserCommits shows transaction commits. Option E is wrong because RedoLogSpaceUsage relates to write-ahead logging.

Practice this question →

15

Multi-Selecteasy

Which TWO AWS services can be used to monitor the performance of an Amazon DynamoDB table and send alerts when throttling occurs? (Choose two.)

Select 2 answers

A.Amazon Inspector

B.Amazon CloudWatch Alarms

C.VPC Flow Logs

D.AWS Config

E.Amazon CloudWatch

AnswersB, E

CloudWatch Alarms can trigger notifications based on metrics.

Why this answer

Options A and D are correct. Amazon CloudWatch provides metrics and alarms for DynamoDB. AWS CloudTrail logs API calls but does not monitor performance metrics.

Option B is incorrect because AWS Config tracks resource changes, not performance. Option C is incorrect because VPC Flow Logs capture network traffic. Option E is incorrect because Amazon Inspector is a security assessment service.

Practice this question →

16

MCQhard

A company is using Amazon DynamoDB with auto scaling for a social media application. The table has a partition key of 'user_id'. The application performs many small writes (update user profile) and reads (fetch user profile). Recently, the application's response time has increased. The DBA checks CloudWatch and sees that 'ConsumedWriteCapacityUnits' is close to 'ProvisionedWriteCapacityUnits', and 'WriteThrottleEvents' is low. However, 'ReadThrottleEvents' is high. The table has 1000 WCU and 1000 RCU provisioned. The auto scaling is configured to add capacity when utilization exceeds 70%. The DBA also notices that 'ReadThrottleEvents' spikes during peak hours. What should the DBA do to reduce read throttling?

A.Decrease the provisioned write capacity to 500 WCU to free up resources.

B.Increase the auto scaling target utilization to 90% to allow more headroom.

C.Change the storage type to General Purpose SSD (gp2) to improve I/O.

D.Increase the provisioned read capacity to 2000 RCU or implement DAX caching.

AnswerD

Increasing RCU reduces throttling; DAX offloads reads.

Why this answer

Option D is correct because increasing RCU or using DAX can reduce read throttling. Option A is wrong because auto scaling target should be 70%, not 90%. Option B is wrong because decreasing WCU doesn't help reads.

Option C is wrong because GP2 is for RDS, not DynamoDB.

Practice this question →

17

MCQmedium

A company is running a production Amazon Aurora MySQL database. The database performance has degraded over the past week. The DBA suspects an increase in lock waits. Which tool should be used to identify queries experiencing lock waits?

A.Amazon CloudWatch Logs

B.Amazon RDS Enhanced Monitoring

C.Amazon RDS Performance Insights

D.AWS Trusted Advisor

AnswerC

Performance Insights shows wait events like lock waits.

Why this answer

Option B is correct because Performance Insights shows wait events and SQL queries. Option A is wrong because CloudWatch Logs doesn't show lock waits directly. Option C is wrong because Enhanced Monitoring shows OS metrics.

Option D is wrong because AWS Trusted Advisor is for best practices.

Practice this question →

18

MCQmedium

A developer is trying to connect to an RDS for PostgreSQL instance using the endpoint shown in the exhibit. The connection fails with a timeout. Which of the following is the most likely cause?

A.The endpoint address is incorrect.

B.The DB instance requires SSL encryption to connect.

C.The security group does not allow inbound traffic on port 5432 from the client IP.

D.The DB instance is in a Multi-AZ configuration and requires a different endpoint.

AnswerC

Correct. A timeout often indicates network connectivity issues, such as security group rules blocking the port.

Why this answer

Option B is correct because the port 5432 is the default PostgreSQL port, but the security group may not allow inbound traffic on that port. Option A is wrong because the endpoint is correct. Option C is wrong because Multi-AZ does not affect connectivity.

Option D is wrong because there is no encryption requirement implied.

Practice this question →

19

MCQmedium

Refer to the exhibit. A database administrator runs the AWS CLI command to describe events for an RDS instance. Which conclusion is most likely correct based on the output?

A.The Multi-AZ failover failed and the instance restarted.

B.The DB instance was manually restarted by an administrator.

C.The DB instance experienced a Multi-AZ failover and subsequently restarted.

D.The DB instance was restored from a snapshot and then restarted.

AnswerC

The sequence shows a failover completed, then the instance restarted.

Why this answer

Option C is correct. The output shows a failover event followed by a restart, which is typical after a failover. Option A is incorrect because the events show a failover, not a manual restart.

Option B is incorrect because the failover was completed, not failed. Option D is incorrect because there is no indication of a snapshot restore.

Practice this question →

20

MCQeasy

A DBA is investigating a sudden increase in database connections to an Amazon RDS for SQL Server instance. The application is running on Amazon EC2 instances behind an Application Load Balancer. Which tool can provide real-time information about active connections?

A.AWS Trusted Advisor

B.VPC Flow Logs

C.Amazon RDS Performance Insights

D.AWS CloudTrail

AnswerC

Performance Insights shows database load, including connections.

Why this answer

Option C is correct because RDS Performance Insights shows active session information including connections. Option A is wrong because CloudTrail tracks API calls, not connections. Option B is wrong because VPC Flow Logs track network traffic, not database connections.

Option D is wrong because Trusted Advisor gives best practice checks, not real-time connections.

Practice this question →

21

Multi-Selecthard

A company is using Amazon DynamoDB with auto scaling for read and write capacity. During a traffic spike, write requests are being throttled even though the table's write capacity is below the maximum limit. Which TWO actions should the team take to resolve the throttling?

Select 2 answers

A.Enable DynamoDB Streams on the table to offload writes to a Lambda function

B.Create a DynamoDB global table to distribute writes across regions

C.Review the table's partition key design to ensure even distribution of write traffic

D.Decrease the read capacity to free up resources for writes

E.Pre-warm the table by temporarily increasing the write capacity manually before the expected spike

AnswersC, E

Uneven distribution can cause hot partitions and throttling.

Why this answer

Option A is correct because auto scaling may have a lag in provisioning capacity; pre-warming can help. Option C is correct because a hot partition can cause throttling even if overall capacity is not exhausted. Option B is incorrect because write sharding is not a built-in DynamoDB feature.

Option D is incorrect because global tables add complexity and are not a direct fix for throttling. Option E is incorrect because decreasing read capacity does not help writes.

Practice this question →

22

MCQhard

A company has an Amazon DynamoDB table with on-demand capacity. Users report that write requests are occasionally throttled during peak hours. The application uses the AWS SDK and retries with exponential backoff. Which monitoring approach should be used to identify the cause of throttling?

A.Use AWS CloudTrail to monitor PutItem API calls for throttling errors.

B.Analyze VPC Flow Logs to check for network congestion.

C.Monitor CloudWatch metrics for ThrottledRequests and ConsumedWriteCapacityUnits.

D.Enable DynamoDB Streams and process records to monitor throttled requests.

AnswerC

CloudWatch metrics directly show throttling and capacity consumption.

Why this answer

Option C is correct because CloudWatch metrics such as WriteThrottleEvents, ConsumedWriteCapacityUnits, and ProvisionedWriteCapacityUnits (for on-demand, the provisioned is the maximum) help identify throttling causes. Option A is wrong because DynamoDB Streams track changes, not throttling. Option B is wrong because CloudTrail logs API calls but not throttling events at the item level.

Option D is wrong because VPC Flow Logs capture network traffic, not database throttling.

Practice this question →

23

Multi-Selecthard

Which THREE metrics should be monitored to troubleshoot a slow Amazon Redshift query? (Choose three.)

Select 3 answers

A.DatabaseConnections

B.DiskSpaceUsage

C.WLMQueueWaitTime

D.QueryDuration

E.NetworkThroughput

AnswersB, C, D

High disk usage can cause spills to disk, slowing queries.

Why this answer

Options B, C, and D are correct. WLMQueueWaitTime measures time queries wait in queue, QueryDuration measures execution time, and DiskSpaceUsage indicates if disk spills are causing slowness. Option A (NetworkThroughput) is about network, not query performance.

Option E (DatabaseConnections) is about connection count, not query speed.

Practice this question →

24

MCQeasy

A developer is troubleshooting an application that is unable to write to a DynamoDB table. The above IAM policy is attached to the IAM role used by the application. What is the likely cause?

A.The Deny statement overrides the Allow statement, blocking all DynamoDB actions.

B.The table name in the Resource ARN is incorrect.

C.The IAM user does not exist.

D.The role is not correctly assumed by the application.

AnswerA

Correct. An explicit Deny overrides any Allow, so PutItem is blocked.

Why this answer

Option B is correct because the Deny statement for all DynamoDB actions overrides the Allow for PutItem. Deny statements always take precedence. Option A is wrong because the table name is correct.

Option C is wrong because the role is assumed. Option D is wrong because the user exists.

Practice this question →

25

MCQhard

A database specialist is monitoring an Amazon DynamoDB global table with two replicas in separate regions. The specialist notices that the 'ReplicatedWriteConflictCount' metric is increasing. What is the MOST likely cause?

A.Insufficient write capacity in one of the regions

B.High network latency between the regions

C.The application is using strongly consistent reads

D.The same item is being written concurrently in multiple regions

AnswerD

Global tables use last-writer-wins; concurrent writes increase conflict count.

Why this answer

Option D is correct because concurrent writes to the same item in different regions cause conflicts. Option A is wrong because provisioned throughput affects throttling, not conflicts. Option B is wrong because network latency does not cause conflicts.

Option C is wrong because eventual consistency does not cause conflicts.

Practice this question →

26

MCQeasy

A developer is troubleshooting a slow query on Amazon RDS for MySQL. The query joins three large tables and runs frequently. What is the most effective way to identify the bottleneck?

A.Check the RDS Events for any maintenance notifications

B.Review Amazon CloudWatch CPU and memory metrics

C.Enable the slow query log and analyze the output

D.Use AWS Database Migration Service to migrate to a larger instance

AnswerC

Slow query log records queries that take longer than a set time.

Why this answer

Option B is correct because enabling slow query log captures queries that exceed a threshold, allowing analysis. Option A is wrong because Enhanced Monitoring shows OS metrics, not query details. Option C is wrong because Performance Insights shows wait events and SQL, but slow query log specifically targets slow queries.

Option D is wrong because CloudWatch metrics show overall performance, not per-query.

Practice this question →

27

MCQeasy

A company is migrating an on-premises Oracle database to Amazon RDS for Oracle. After migration, the application team reports that queries are slower than before. Which metric in CloudWatch should the DBA review first to check if the instance is resource-constrained?

A.SwapUsage

B.CPUUtilization

C.FreeableMemory

D.DatabaseConnections

AnswerB

High CPU could indicate resource contention affecting query performance.

Why this answer

Option A is correct because CPUUtilization is a primary indicator of resource saturation. Option B is wrong because DatabaseConnections shows concurrent connections, not resource usage. Option C is wrong because FreeableMemory is important but CPU is more likely the first bottleneck.

Option D is wrong because SwapUsage is relevant but not the first metric to check.

Practice this question →

28

MCQhard

A company runs a critical e-commerce application on Amazon Aurora MySQL with a single DB instance. The database has 8 TB of data and uses the default writer endpoint. Recently, the application experienced a 10-minute outage during a primary instance failover. The failover was triggered by an underlying hardware issue. The database specialist needs to minimize downtime during future failovers. The application team is unwilling to modify the application code to handle connection retries. The company has a 99.99% SLA requirement. Which solution should the database specialist implement to meet the SLA with minimal application changes?

A.Increase the DB instance class to a larger size to improve performance and reduce failover time

B.Enable Multi-AZ deployment with automatic failover

C.Create an Amazon RDS Proxy and configure the application to connect to the proxy endpoint

D.Create a cross-Region read replica and promote it to primary during failover

AnswerC

RDS Proxy handles failover seamlessly by preserving connections and reducing downtime.

Why this answer

Amazon RDS Proxy sits between the application and the database, pooling and reusing database connections. During a failover, RDS Proxy maintains the client connections and transparently reconnects to the new primary instance, so the application does not experience a connection loss and does not need to implement retry logic. This directly addresses the 10-minute outage by reducing failover downtime to seconds, meeting the 99.99% SLA without application code changes.

Exam trap

The trap here is that candidates often assume Multi-AZ (Option B) is sufficient for zero-downtime failover, but they overlook that the application must handle connection retries, which the question explicitly prohibits, making RDS Proxy the only solution that provides transparent failover without code changes.

How to eliminate wrong answers

Option A is wrong because increasing the DB instance class does not reduce failover time; failover duration is determined by the time to detect the failure, promote a replica, and flush transactions, not by instance size. Option B is wrong because while Multi-AZ with automatic failover provides a standby in a different Availability Zone, the application still experiences a connection break during failover and must handle retries, which the team is unwilling to do; the outage would still be several minutes. Option D is wrong because a cross-Region read replica requires manual promotion and DNS changes, leading to significantly longer downtime than 10 minutes, and it does not provide automatic failover or transparent reconnection without application changes.

Practice this question →

29

Multi-Selecteasy

A company uses Amazon RDS for PostgreSQL with Multi-AZ. The primary instance fails and a failover occurs. After failover, the application reports elevated write latency. Which TWO are possible causes?

Select 2 answers

A.A read replica is now promoting to primary

B.The buffer pool is not warm on the new primary

C.The new primary has a smaller instance size

D.Automated backups are running on the new primary

E.Application DNS cache still points to the old primary IP

AnswersB, E

Cold buffer pool increases read I/O.

Why this answer

After a Multi-AZ failover, the new primary instance starts with a cold buffer pool (no cached data blocks). PostgreSQL relies on shared buffers to cache frequently accessed data; without a warm cache, every read request must fetch data from disk, which increases I/O and write latency because writes often require reading the affected pages first. This is a known behavior in RDS for PostgreSQL after failover, and it resolves as the buffer pool warms up over time.

Exam trap

The trap here is that candidates often confuse Multi-AZ failover with read replica promotion, or assume that automated backups cause performance degradation, when in fact the cold buffer pool is the primary culprit for elevated write latency after failover.

Practice this question →

30

MCQhard

A DevOps engineer notices that an Amazon RDS for PostgreSQL instance has been in 'storage-full' state for the past 30 minutes. The instance has 500 GB of General Purpose SSD (gp2) storage, and the free storage space is 0 bytes. The database is critical and cannot tolerate downtime. What is the MOST efficient way to resolve this issue while minimizing downtime?

A.Take a snapshot of the DB instance and restore it to a new instance with larger storage

B.Modify the DB instance to increase allocated storage to 1,000 GB

C.Enable storage auto-scaling on the DB instance

D.Delete unnecessary data, such as old logs or temporary tables

AnswerB

Modifying storage online adds space without downtime.

Why this answer

Option B is correct because modifying storage to 1,000 GB does not require downtime and automatically adds more space. Option A is wrong because deleting logs may not free enough space and requires manual intervention. Option C is wrong because taking a snapshot and restoring to a larger instance causes downtime.

Option D is wrong because enabling storage auto-scaling does not immediately solve the current full state.

Practice this question →

31

MCQhard

A company is using Amazon DynamoDB for a high-traffic application. Users report occasional 'ProvisionedThroughputExceededException' errors. The application uses consistent reads and retries with exponential backoff. What is the MOST efficient way to handle these errors and reduce the number of retries?

A.Increase the provisioned read capacity units manually

B.Switch to eventually consistent reads

C.Enable DynamoDB Auto Scaling for the table

D.Increase the provisioned write capacity units manually

AnswerC

Auto Scaling adjusts capacity automatically to handle traffic spikes.

Why this answer

Option B is correct because DynamoDB Auto Scaling adjusts the provisioned throughput automatically in response to traffic patterns, reducing throttling. Option A is wrong because switching to eventually consistent reads may reduce throttling but changes the consistency model, which may not be acceptable. Option C is wrong because increasing read capacity is a manual action and not efficient.

Option D is wrong because increasing write capacity does not address read throttling.

Practice this question →

32

MCQmedium

A company is monitoring an Amazon Aurora MySQL DB cluster. They observe that the AuroraReplicaLagMaximum metric is consistently above 10 seconds. Which action would best reduce the replica lag?

A.Increase the instance size of the writer.

B.Increase the instance size of the reader.

C.Reduce the number of transactions per second.

D.Enable Multi-AZ on the cluster.

AnswerB

Correct. A larger reader instance can apply changes faster, reducing lag.

Why this answer

Option B is correct because increasing the instance size of the reader can improve its ability to apply changes faster. Option A is wrong because increasing the writer size may not help if the reader is the bottleneck. Option C is wrong because reducing transaction size helps but may not be feasible.

Option D is wrong because Multi-AZ is always enabled for Aurora.

Practice this question →

33

MCQeasy

A developer is troubleshooting a slow-running query on an Amazon RDS for PostgreSQL instance. The query is performing a sequential scan on a large table. Which AWS service or feature should the developer use to identify the missing index?

A.Amazon RDS Enhanced Monitoring

B.AWS CloudTrail

C.Amazon RDS Performance Insights

D.Amazon CloudWatch Logs

AnswerC

Correct. Performance Insights helps identify performance bottlenecks such as missing indexes.

Why this answer

Option B is correct because Performance Insights provides database performance analysis and can identify missing indexes. Option A is wrong because RDS Enhanced Monitoring provides OS-level metrics. Option C is wrong because CloudWatch Logs are for application logs.

Option D is wrong because AWS CloudTrail logs API calls.

Practice this question →

34

MCQeasy

Refer to the exhibit. A database administrator retrieves CloudWatch metrics for an RDS instance. What is the trend of CPU utilization during the monitored period?

A.CPU utilization is decreasing over time.

B.CPU utilization is constant around 90%.

C.CPU utilization fluctuates randomly.

D.CPU utilization is increasing over time.

AnswerD

The average values go from 75.5% to 98.2%.

Why this answer

Option B is correct. The average CPU utilization increases from 75.5% to 98.2% over the hour, indicating a steady increase. Option A is incorrect because it is not decreasing.

Option C is incorrect because it is not constant. Option D is incorrect because it does not fluctuate; it increases monotonically.

Practice this question →

35

MCQmedium

A company's RDS for SQL Server instance is frequently running out of disk space. The instance uses General Purpose SSD (gp2) storage. Which monitoring step will help identify the root cause?

A.Review CloudTrail logs for API calls

B.Monitor FreeStorageSpace and BinaryLogUsage metrics

C.Monitor BackupStorageUsed metric

D.Enable Enhanced Monitoring

AnswerB

FreeStorageSpace shows remaining space; BinaryLogUsage tracks transaction log growth.

Why this answer

Option D is correct because monitoring FreeStorageSpace and transaction log usage reveals what consumes space. Option A is wrong because backup storage is separate. Option B is wrong because Enhanced Monitoring shows OS metrics, not disk usage by database.

Option C is wrong because CloudTrail is for API activity.

Practice this question →

36

Multi-Selecteasy

A company uses Amazon Redshift and notices that queries are taking longer than usual. CloudWatch metrics show 'CPUUtilization' is high and 'DiskSpace' is low. Which TWO actions can help improve query performance?

Select 2 answers

A.Disable concurrency scaling to free resources

B.Add more nodes to the cluster

C.Enable Multi-AZ to distribute load

D.Run VACUUM to reclaim space

E.Optimize sort keys to reduce data scanned

AnswersB, E

Adding nodes increases parallelism and resources for queries.

Why this answer

Option A is correct because adding nodes distributes workload and increases CPU and I/O resources. Option B is wrong because VACUUM reorganizes data but does not free space if the disk is full; it may require space. Option C is correct because optimizing sort keys reduces the amount of data scanned, speeding up queries.

Option D is wrong because disabling concurrency scaling reduces performance. Option E is wrong because Multi-AZ is not available for Redshift. So correct: A, C.

Practice this question →

37

MCQhard

A company is running a self-managed MongoDB cluster on Amazon EC2. The cluster consists of three replica set members in different Availability Zones. The primary node recently experienced a crash, and the cluster failed over to a secondary. However, the new primary is showing significantly higher latency. The operations team wants to ensure that failover is fast and consistent. What should be done to improve the failover reliability?

A.Use Amazon EBS io2 Block Express volumes with provisioned IOPS.

B.Use Amazon EBS Multi-Attach to allow all replicas to share the same volume.

C.Switch to instance store volumes for better I/O performance.

D.Configure EBS snapshots to be taken every 5 minutes.

AnswerA

io2 Block Express provides consistent low-latency performance, reducing replica lag.

Why this answer

Option C: Using Amazon EBS io2 Block Express volumes with higher IOPS and lower latency can reduce the time for the secondary to catch up. Option A: EBS Multi-Attach is not supported for io2? Actually io2 supports multi-attach but for shared volumes, not typically used for MongoDB. Option B: Using instance store volumes may have higher performance but are ephemeral and data loss is possible.

Option D: Snapshots are for backups, not for improving failover performance.

Practice this question →

38

MCQmedium

A company is migrating an on-premises Oracle database to Amazon RDS for Oracle. The migration uses AWS DMS. After the migration, the database has a 30-minute recovery point objective (RPO). Which Amazon RDS feature should be configured to meet a 5-minute RPO?

A.Enable Multi-AZ deployment

B.Take manual snapshots every 5 minutes

C.Enable automated backups with a retention period of 7 days

D.Create a read replica in another region

AnswerC

Automated backups enable point-in-time recovery within seconds.

Why this answer

Option C is correct because automated backups with point-in-time recovery enable recovery to any point within the retention period, meeting 5-minute RPO. Option A is wrong because Multi-AZ is for high availability, not backup. Option B is wrong because read replicas don't provide backup.

Option D is wrong because manual snapshots are taken on-demand, not continuously.

Practice this question →

39

MCQhard

A company runs an e-commerce platform using Amazon Aurora MySQL with Multi-AZ deployment. The application has a read-heavy workload and uses a mix of SELECT and UPDATE queries. Recently, the company migrated from a db.r5.large to a db.r5.2xlarge instance class to handle increased traffic. However, after the migration, the CPU utilization remains high during peak hours, and the application's page load times have increased. The DBA notices that the 'Read IOPS' metric is high, but the 'Read Latency' metric is low. There is also a high number of 'Select' queries in the database. The application uses a single database endpoint. What should the DBA do to reduce CPU utilization and improve read performance?

A.Enable Multi-AZ with one standby replica.

B.Enable Performance Insights and analyze the top SQL.

C.Upgrade the instance class to db.r5.4xlarge.

D.Create one or more Aurora Replicas and modify the application to use read-only endpoints for SELECT queries.

AnswerD

Read replicas offload read traffic, reducing CPU on primary.

Why this answer

Option C is correct because creating read replicas offloads SELECT queries from the primary, reducing CPU. Option A is wrong because increasing instance size again may not be cost-effective and the issue is read-heavy. Option B is wrong because enabling Performance Insights only helps monitoring, not performance.

Option D is wrong because Multi-AZ is for failover, not read scaling.

Practice this question →

40

MCQhard

A company runs an e-commerce platform on AWS using a multi-tier architecture. The application tier consists of Auto Scaling groups of EC2 instances behind an Application Load Balancer. The database tier uses Amazon RDS for MySQL with Multi-AZ deployment. Recently, the operations team noticed that during flash sales, the application becomes unresponsive and users receive 503 errors. The team checks CloudWatch metrics and sees that the RDS instance's CPU utilization spikes to 100%, and the `DatabaseConnections` metric also spikes to the maximum allowed value of 500. The application uses connection pooling with a maximum of 200 connections, but the metric shows 500 connections. The team suspects that the connection pooling configuration is not being honored. The application code is written in Python and uses SQLAlchemy with a connection pool size of 10 per application instance. There are 20 application instances in the Auto Scaling group during peak times. The team wants to resolve the issue without increasing the database instance size. What should the team do?

A.Reduce the Auto Scaling group's desired capacity to 10 instances during flash sales

B.Set the `max_connections` parameter in the RDS parameter group to 200 and configure the application to handle connection errors with retry logic

C.Migrate the database to Amazon Aurora MySQL with Auto Scaling enabled

D.Increase the SQLAlchemy pool size to 25 per instance to reduce connection contention

AnswerB

Limiting max_connections to 200 ensures the database does not accept more connections than the application intends, and retry logic handles connection failures.

Why this answer

Option B is correct because setting a maximum number of connections in the RDS parameter group enforces a hard limit, preventing the database from accepting more connections than the application can handle, which will cause connection errors that the application can handle gracefully. Option A is wrong because increasing the connection pool size per instance would increase the total connections, worsening the problem. Option C is wrong because reducing the number of application instances may reduce load but is not a scalable solution and doesn't address the root cause of connection limit.

Option D is wrong because switching to Aurora may help but is a larger change and does not fix the connection management issue directly.

Practice this question →

41

MCQhard

A company is using Amazon DynamoDB as the primary database for a global e-commerce application. During the holiday season, the application experiences throttling on write requests even though the read and write capacity units are well below the provisioned limits. The table uses on-demand capacity mode. What is the most likely cause of this throttling?

A.There is a hot partition due to an uneven write distribution across partition keys.

B.The table's provisioned write capacity is set too low.

C.The table has exceeded the maximum write capacity units per partition.

D.The AWS account has reached the DynamoDB write throughput limit per region.

AnswerA

Hot partitions cause throttling even in on-demand mode because DynamoDB limits throughput per partition.

Why this answer

Option D is correct because on-demand capacity mode automatically scales but can throttle if traffic is unevenly distributed across partitions, leading to hot partitions. Option A is incorrect because on-demand mode does not have a per-table limit in the same way provisioned capacity does. Option B is incorrect because DynamoDB does not have a global write limit per account that would cause throttling on a single table.

Option C is incorrect because the table uses on-demand capacity mode, not provisioned.

Practice this question →

42

MCQeasy

A developer reports that an application's write requests to a DynamoDB table are failing with ProvisionedThroughputExceededException. The table uses provisioned capacity. Which immediate action will resolve the issue?

A.Switch the table to on-demand capacity

B.Implement exponential backoff in the application

C.Enable DynamoDB Accelerator (DAX)

D.Delete all global secondary indexes

AnswerB

Exponential backoff retries requests with increasing delays, reducing throttling.

Why this answer

Option C is correct because implementing exponential backoff reduces request rate temporarily. Option A is wrong because switching to on-demand takes 30 minutes. Option B is wrong because deleting GSI does not help writes to main table.

Option D is wrong because DAX caches reads, not writes.

Practice this question →

43

MCQeasy

A startup is using Amazon DynamoDB for a gaming leaderboard. The table has a partition key of 'game_id' and a sort key of 'score'. The application frequently queries the top 10 scores for a given game. Recently, users have reported that the leaderboard is showing stale data. The DBA checks the CloudWatch metrics and sees no throttling. The table has auto scaling enabled. The application uses eventual consistent reads. The DBA suspects that the issue is related to write conflicts. What should the DBA do to ensure the leaderboard shows the most recent data?

A.Modify the application to use strongly consistent reads for leaderboard queries.

B.Enable DynamoDB Streams and process updates in near-real-time.

C.Enable DynamoDB Accelerator (DAX) for caching.

D.Increase the write capacity units to reduce write throttling.

AnswerA

Strongly consistent reads return the most up-to-date data.

Why this answer

Option B is correct because using strongly consistent reads ensures the latest data is read. Option A is wrong because increasing WCU does not affect consistency. Option C is wrong because DAX caches data and may serve stale data.

Option D is wrong because DynamoDB Streams are for change data capture, not consistency.

Practice this question →

44

MCQmedium

A company is using Amazon DynamoDB for a gaming leaderboard. The table has a partition key of 'game_id' and a sort key of 'score'. The table is configured with on-demand capacity. During a major tournament, the application experiences high latency and some requests return 'ProvisionedThroughputExceededException' errors. The CloudWatch metric 'ThrottledRequests' spikes. The application uses a single partition key for all writes during the tournament (game_id = 'tournament_final'). What is the most likely cause of the throttling, and what is the best solution?

A.The application is using a single partition key, causing all writes to go to one partition. The team should redesign the partition key to distribute writes across multiple partitions

B.The application is using a single partition key, causing all writes to go to one partition. The team should implement DAX to cache writes

C.The table has a global secondary index that is throttling writes; the team should remove the GSI

D.The table is using on-demand capacity, which has a maximum throughput limit per partition; the team should switch to provisioned capacity with auto scaling

AnswerA

Distributing the write load across partitions avoids throttling.

Why this answer

Option C is correct because using a single partition key creates a hot partition, leading to throttling even with on-demand capacity (which has per-partition limits). The best solution is to redesign the partition key to distribute writes. Option A is incorrect because auto scaling is not needed for on-demand.

Option B is incorrect because DAX caches reads, not writes. Option D is incorrect because creating a GSI does not affect write throttling on the base table.

Practice this question →

45

MCQmedium

A company's Amazon Redshift cluster is experiencing slow query performance. The cluster has three nodes. The administrator wants to identify if the issue is due to data distribution skew. Which approach should be used?

A.Examine the STL_QUERY table to analyze query execution times.

B.Check the STV_WLM_SERVICE_STATE table to see current queue state.

C.Query the STV_SLICES table to compare disk usage across slices.

D.Review CloudWatch metrics for CPUUtilization per node.

AnswerC

STV_SLICES shows disk usage per slice, indicating skew.

Why this answer

Option A is correct because checking the STV_SLICES table shows disk usage per slice, revealing data distribution skew. Option B is wrong because the STL_QUERY table logs query text, not distribution. Option C is wrong because the STV_WLM_SERVICE_STATE table shows queue state, not distribution.

Option D is wrong because CloudWatch metrics like CPUUtilization do not indicate distribution skew.

Practice this question →

46

MCQhard

An application uses Amazon ElastiCache for Redis as a session store. Users report that sessions are being lost intermittently. The ElastiCache cluster has replication enabled with one replica. CloudWatch metrics show 'Evictions' spiking during peak hours. What is the MOST likely cause?

A.Replication lag between primary and replica is causing read failures

B.Encryption in transit is enabled and causes decryption errors

C.The cache's memory is full and the eviction policy is removing keys

D.The cluster is performing automatic snapshots that block writes

AnswerC

Eviction spikes indicate that the cache is out of memory and is removing keys to make space, causing session loss.

Why this answer

Option A is correct because evictions occur when memory is full and the cache evicts keys based on the eviction policy. This would cause session loss. Option B is wrong because replication lag would cause stale data, not loss.

Option C is wrong because snapshotting may cause latency but not evictions. Option D is wrong because encryption in transit does not affect memory.

Practice this question →

47

MCQeasy

A company uses Amazon DocumentDB (with MongoDB compatibility) for its content management system. The application runs on EC2 instances and connects to a DocumentDB cluster with one instance (db.r5.large). Recently, users reported that retrieving documents takes longer than usual. CloudWatch metrics show that the CPU utilization of the DocumentDB instance is at 90% and the freeable memory is below 100 MB. The team has verified that no query optimization is possible. Which action should the team take FIRST to improve performance?

A.Add a read replica instance to offload read traffic.

B.Create additional indexes on frequently queried fields.

C.Increase the storage volume size to improve I/O performance.

D.Scale up the instance to db.r5.xlarge.

AnswerD

A larger instance provides more CPU and memory, directly addressing the resource exhaustion.

Why this answer

Option B is correct because the high CPU and low memory indicate the instance is overworked. Scaling up to a larger instance provides more CPU and memory resources. Option A is incorrect because adding a reader instance helps with read scaling but does not reduce CPU/memory pressure on the primary writer.

Option C is incorrect because creating an index may improve query performance but the team already ruled out query optimization. Option D is incorrect because increasing storage does not directly affect CPU or memory.

Practice this question →

48

Multi-Selecthard

An Amazon DynamoDB table is experiencing throttled write requests. The table uses provisioned capacity with auto-scaling enabled. Which THREE factors could contribute to throttling despite auto-scaling?

Select 3 answers

A.Global secondary index is defined with same partition key

B.Auto-scaling maximum capacity is set too low

C.Sudden traffic spike that exceeds the max capacity

D.Use of eventually consistent reads

E.Uneven key distribution causing hot partitions

AnswersB, C, E

If max is reached, throttling occurs.

Why this answer

Options A, B, and D are correct. Auto-scaling cannot handle sudden bursts (A), hot partitions (B), or exceeded max capacity (D). Option C is wrong because eventual consistency doesn't cause throttling.

Option E is wrong because GSI consumes write capacity from the table.

Practice this question →

49

Multi-Selecthard

A company is using Amazon DynamoDB with auto scaling enabled. The table has a provisioned read capacity of 10,000 RCU and write capacity of 5,000 WCU. Auto scaling target utilization is 70%. The table experiences a sudden spike in read traffic, reaching 12,000 RCU. The table throttles some requests. Which THREE actions should the company take to prevent future throttling?

Select 3 answers

A.Implement exponential backoff in the application to retry throttled requests.

B.Increase the maximum read capacity in the auto scaling configuration.

C.Decrease the auto scaling target utilization to 50% to scale out earlier.

D.Increase the write capacity to 10,000 WCU.

E.Enable DAX to cache read requests and reduce the load on the table.

AnswersA, B, E

Exponential backoff helps handle throttled requests without data loss.

Why this answer

Option A is correct because increasing the maximum read capacity prevents throttling. Option B is correct because enabling DynamoDB Accelerator (DAX) reduces read load. Option E is correct because implementing exponential backoff helps handle throttling gracefully.

Option C is wrong because decreasing target utilization would cause auto scaling to scale out earlier, but it would not prevent throttling during spikes because auto scaling reacts after the spike. Option D is wrong because write capacity is not the issue.

Practice this question →

50

Multi-Selectmedium

Which TWO CloudWatch metrics should be monitored to detect storage performance issues for an Amazon RDS for MySQL instance? (Choose two.)

Select 2 answers

A.NetworkReceiveThroughput

B.DatabaseConnections

C.WriteIOPS

D.CPUUtilization

E.ReadIOPS

AnswersC, E

WriteIOPS indicates storage write performance.

Why this answer

Option A and Option C are correct. ReadIOPS and WriteIOPS measure the input/output operations per second, which can indicate storage performance issues. Option B (DatabaseConnections) is about connections, not storage performance.

Option D (CPUUtilization) is about CPU usage. Option E (NetworkReceiveThroughput) is about network throughput.

Practice this question →

51

MCQeasy

A DevOps engineer notices that an Amazon DynamoDB table's read capacity is frequently throttled during peak hours. The table has read-once, read-many workload. Which action is MOST cost-effective to reduce throttling?

A.Enable auto-scaling for read capacity

B.Enable DynamoDB Accelerator (DAX)

C.Switch to On-Demand capacity mode

D.Increase the provisioned read capacity units

AnswerB

DAX caches reads, reducing read load on the table.

Why this answer

Option B is correct because DynamoDB Accelerator (DAX) caches reads, reducing read capacity consumption. Option A is wrong because increasing read capacity costs more. Option C is wrong because auto-scaling still incurs cost.

Option D is wrong because changing to On-Demand may be more expensive for predictable workloads.

Practice this question →

52

MCQmedium

A company is using Amazon DocumentDB (with MongoDB compatibility) for a content management system. The application team notices that write operations are taking longer than usual. CloudWatch metrics show high WriteLatency and a growing number of documents in the oplog. Which step should the database specialist take to troubleshoot the issue?

A.Enable Multi-AZ on the cluster to offload reads to the standby.

B.Increase the instance size of the primary instance to handle more writes.

C.Increase the allocated storage to improve I/O throughput.

D.Check the CPU and memory utilization of the secondary instance and consider scaling it up.

AnswerD

Secondary might be bottlenecked; scaling it up can reduce replication lag and write latency.

Why this answer

Option D is correct because high WriteLatency and growing oplog suggest that the secondary instance is too slow to apply operations, causing replication lag. Checking the secondary's metrics helps diagnose. Option A is wrong because enabling Multi-AZ does not directly address write latency.

Option B is wrong because increasing the instance class may help but should be done after diagnosis. Option C is wrong because increasing storage does not improve write performance.

Practice this question →

53

Multi-Selecthard

A database administrator is monitoring an Amazon RDS for MySQL instance and sees the following CloudWatch metrics: 'DiskQueueDepth' is consistently at 10, 'WriteLatency' is 20 ms, 'FreeStorageSpace' is less than 10% of total. The instance uses gp2 storage. Which THREE actions should be taken to improve performance?

Select 3 answers

A.Switch to Provisioned IOPS (io1 or io2) for consistent performance

B.Increase allocated storage to improve baseline IOPS

C.Delete unnecessary data to free up storage space

D.Enable Multi-AZ to increase I/O capacity

E.Enable Performance Insights to identify slow queries

AnswersA, B, C

Provisioned IOPS ensures consistent I/O performance regardless of storage size.

Why this answer

Option A is correct because low free space can cause performance degradation on gp2; increasing storage increases baseline IOPS. Option B is correct because high 'DiskQueueDepth' indicates I/O bottleneck; switching to Provisioned IOPS provides consistent I/O. Option C is correct because enabling Multi-AZ does not improve I/O performance, it adds overhead.

Option D is wrong because Performance Insights is a monitoring tool, not a fix. Option E is correct because deleting old data frees up space and can improve performance. So correct: A, B, E.

Practice this question →

54

MCQeasy

A company is using Amazon DynamoDB for a gaming leaderboard application. Recently, users have experienced increased latency when updating scores. The DynamoDB table has on-demand capacity mode. The application performs UpdateItem calls with a condition expression. Which action is most likely to reduce the latency?

A.Add a global secondary index (GSI) with the score as the sort key to improve update performance.

B.Switch the table to provisioned capacity and increase the read capacity units to handle peak load.

C.Disable conditional writes to reduce the overhead of condition expression evaluation.

D.Ensure that there are no throttled requests in the CloudWatch metrics and verify that the table is not experiencing hot partitions.

AnswerD

On-demand mode automatically scales, but hot partitions can cause latency; checking metrics helps identify partition issues.

Why this answer

Option A is correct because with on-demand capacity, DynamoDB can handle sudden traffic spikes without throttling. Increasing read capacity is not applicable because on-demand mode automatically adjusts. Optimistic locking is already used implicitly with condition expressions.

Adding a global secondary index does not directly reduce UpdateItem latency.

Practice this question →

55

MCQmedium

A company uses Amazon Aurora MySQL. They notice that the DB cluster's failover took longer than expected during a recent primary instance failure. CloudWatch shows Failover latency of 120 seconds. Which configuration change would most likely reduce the failover time?

A.Increase the instance class of the primary and replica instances.

B.Increase the backup retention period to 35 days.

C.Enable Multi-AZ on the DB cluster.

D.Configure the application to use the cluster endpoint with Aurora JDBC driver's fast failover feature.

AnswerD

Fast failover reduces failover detection and recovery time.

Why this answer

Option A is correct because Aurora's fast failover requires the JDBC driver to use the cluster endpoint with the Aurora hostlist provider. Option B is wrong because increasing instance size does not reduce failover time. Option C is wrong because Multi-AZ is already inherent in Aurora.

Option D is wrong because increasing backup retention does not affect failover.

Practice this question →

56

MCQmedium

A company uses Amazon ElastiCache for Redis as a caching layer for a web application. They notice increased latency and cache miss rates. The cache cluster has 5 nodes with replication. Which metric should be monitored to identify if the cache is under-provisioned?

A.ReplicationLag

B.CacheHits

C.CPUUtilization

D.Evictions

AnswerC

High CPU suggests nodes are processing too many requests.

Why this answer

Option A is correct because high CPUUtilization indicates the cache nodes are overloaded. Option B is wrong because CacheHits are a measure of effectiveness, not provisioning. Option C is wrong because Evictions occur when memory is full but CPU is more direct for throughput.

Option D is wrong because ReplicationLag indicates replication issues, not capacity.

Practice this question →

57

MCQeasy

A developer is troubleshooting slow queries in Amazon RDS for MySQL. The 'Threads_running' status variable is consistently above 200. The application uses connection pooling. Which metric should be monitored to identify the root cause?

A.Innodb_row_lock_current_waits

B.Queries_per_second

C.Slow_queries

D.Threads_connected

AnswerA

High thread count with many lock waits indicates contention.

Why this answer

Option A is correct because high 'Threads_running' often indicates queries waiting on locks or I/O. Option B is wrong because 'Connections' shows total connections, not concurrent active queries. Option C is wrong because 'Queries' shows total queries over time, not concurrency.

Option D is wrong because 'Slow_queries' shows only long-running queries.

Practice this question →

58

MCQmedium

A company is running Amazon RDS for MySQL and notices that the database CPU utilization is consistently above 80% during peak hours. The application performance is degrading. Which action should be taken first to troubleshoot the issue?

A.Increase the instance size of the RDS instance immediately.

B.Create a read replica to offload read traffic.

C.Enable Performance Insights to identify the queries causing high CPU usage.

D.Switch the database engine to Amazon Aurora for better performance.

AnswerC

Performance Insights helps identify performance bottlenecks.

Why this answer

Option B is correct because enabling Performance Insights provides a detailed analysis of database performance, helping to identify the root cause of high CPU utilization. Option A is wrong because increasing instance size without understanding the cause may lead to unnecessary costs. Option C is wrong because creating a read replica does not directly address CPU utilization on the primary instance.

Option D is wrong because switching to a different database engine is a major change and not a troubleshooting step.

Practice this question →

59

Multi-Selectmedium

A company is troubleshooting an Amazon DynamoDB table that is throttling write requests. The table has a partition key ('userId') and a sort key ('timestamp'). The 'WriteCapacityUnits' is set to 1000. CloudWatch shows 'ThrottledWriteRequests' but the 'ConsumedWriteCapacityUnits' is only 500. Which TWO actions could resolve the throttling?

Select 2 answers

A.Add a random suffix to the partition key to distribute writes more evenly

B.Enable DynamoDB Accelerator (DAX) to cache writes

C.Increase the write capacity units to allow more throughput

D.Enable Global Tables to replicate writes across regions

E.Remove the sort key and use only a partition key

AnswersA, C

Randomizing the partition key helps distribute write load across partitions, reducing throttling.

Why this answer

Option A is correct because a hot partition can throttle even if overall capacity is underutilized; adjusting the partition key design can distribute writes. Option D is correct because using DynamoDB Accelerator (DAX) can offload read traffic but does not affect writes; actually DAX is for reads. Option B is correct because increasing write capacity ensures enough capacity for peaks.

Option C is wrong because removing the sort key changes the table structure and may not help with partition hotness. Option E is wrong because Global Tables replicate writes and may increase throttling. So correct answers: B and E? Let's re-evaluate: The issue is write throttling despite consumed capacity below provisioned.

This indicates a hot partition. Option A: Adding a random suffix to partition key helps distribute writes across partitions. Option B: Increasing write capacity may help if the hot partition is still within limits, but if a single partition is throttled due to its own limit, increasing total capacity may increase the partition limit.

Actually, DynamoDB's partition capacity is 1000 WCU per partition; if a single partition receives more than 1000 WCU, it throttles. Increasing total WCU may increase the partition count, distributing the load. Option C: Removing the sort key does not affect partition distribution.

Option D: DAX is a read cache, not for writes. Option E: Global Tables add more write replication and could increase throttling. So likely correct: A and B.

Option A is a design change to avoid hot partition. Option B increases total capacity, which may increase partitions. So answer: A, B.

Practice this question →

60

MCQhard

Refer to the exhibit. An IAM policy is attached to a user who is attempting to run a Scan operation on the Orders table using the AWS CLI. The Scan operation fails with an AccessDeniedException. What is the most likely reason?

A.The resource ARN does not include the table name.

B.The Scan action is not allowed in the policy.

C.The condition requires the partition key to be 'CustomerID', but the Scan operation does not specify a partition key.

D.The 'ForAllValues:StringEquals' condition set operator prevents the Scan operation because it requires all leading keys to match a single value, which is impossible for a Scan.

AnswerD

'ForAllValues' evaluates to false if the request has no leading keys (as in Scan) or multiple keys.

Why this answer

Option D is correct. The condition 'dynamodb:LeadingKeys' applies only to Query and Scan operations when the condition key is used to restrict partition key values. However, the condition 'ForAllValues:StringEquals' requires that all leading keys in the request match the specified value.

For a Scan operation without a specific partition key, the condition cannot be satisfied, leading to denial. Option A is incorrect because the resource ARN includes the table name, so it is valid. Option B is incorrect because the policy allows Scan action.

Option C is incorrect because the condition is on LeadingKeys, not on the table.

Practice this question →

61

MCQmedium

A company is using Amazon RDS for MySQL with Multi-AZ deployment. The database experiences a sudden increase in latency and the application reports timeouts. CloudWatch shows elevated 'ReadLatency' and 'WriteLatency' metrics, while 'CPUUtilization' and 'DatabaseConnections' remain normal. Which is the MOST likely cause?

A.A runaway query is consuming CPU resources

B.A Multi-AZ failover occurred

C.A large transaction is being processed

D.The database has insufficient provisioned IOPS

AnswerC

Large transactions can cause high I/O wait and latency without high CPU or connections.

Why this answer

Option B is correct because a large transaction can cause increased latency without high CPU or connection count, as it may be waiting on disk I/O or replication. Option A is wrong because Multi-AZ failover is automatic and would show a spike then recovery. Option C is wrong because insufficient storage I/O would show in 'BurstBalance' or 'WriteIOPS' metrics, not just latency.

Option D is wrong because normal CPU and connections rule out a runaway query in terms of CPU, though a query could be I/O-bound. However, a large transaction is a common cause of such symptoms.

Practice this question →

62

MCQeasy

A developer notices that an Amazon RDS for PostgreSQL DB instance is running low on free storage space. The instance has 100 GB of allocated storage. What is the recommended first step to troubleshoot this issue?

A.Enable storage auto scaling

B.Modify the DB instance to increase allocated storage

C.Check for unused indexes or table bloat using pg_repack or similar tools

D.Delete the oldest transaction logs

AnswerC

Index bloat and table bloat are common causes of storage consumption.

Why this answer

Option B is correct because checking for unused indexes or bloat is a typical starting point for storage issues. Option A is wrong because modifying storage is a solution, not a troubleshooting step. Option C is wrong because deleting logs may not recover much space.

Option D is wrong because enabling auto scaling is a preventive measure, not a troubleshooting step.

Practice this question →

63

MCQmedium

A company is using Amazon Redshift for data warehousing. Users report that queries are taking longer than expected. Which CloudWatch metric should be monitored to identify if queries are waiting for resources due to concurrency scaling?

A.WLMQueueLength

B.DiskSpaceUsage

C.QueryDuration

D.ConcurrencyScalingActiveQueries

AnswerD

This metric shows the number of queries running on concurrency scaling clusters.

Why this answer

Option A is correct because ConcurrencyScalingActiveQueries indicates queries running on concurrency scaling clusters. Option B is wrong because WLMQueueLength shows queue wait, not concurrency scaling. Option C is wrong because QueryDuration measures query execution time.

Option D is wrong because DiskSpaceUsage is for storage, not concurrency.

Practice this question →

64

MCQeasy

An Amazon RDS for Oracle instance is experiencing high swap usage. Which metric should be monitored to determine if the instance is memory-constrained?

A.CPUUtilization

B.SwapUsage

C.WriteIOPS

D.FreeableMemory

AnswerB

High swap usage indicates memory pressure.

Why this answer

SwapUsage indicates that the instance is using swap space, which is a sign of memory pressure. CPUUtilization is for CPU, not memory. FreeableMemory shows available memory, but swap usage directly indicates memory constraint.

Practice this question →

65

MCQeasy

A database specialist is trying to connect to an Amazon RDS for MySQL instance from an EC2 instance but receives a 'Connection timed out' error. The security group for the RDS instance allows inbound traffic on port 3306 from the security group of the EC2 instance. What should the specialist check next?

A.Check the network ACL associated with the subnet of the RDS instance to ensure it allows inbound traffic on port 3306 and outbound traffic on ephemeral ports.

B.Check that the RDS instance has a public DNS name and the EC2 instance can resolve it.

C.Ensure that the VPC has an internet gateway attached and the route table has a route to it.

D.Verify that the security group for the EC2 instance allows outbound traffic on port 3306.

AnswerA

Network ACLs are stateless and must allow both inbound and outbound traffic.

Why this answer

Option D is correct because the network ACL must allow inbound traffic on ephemeral ports for the response. Option A is wrong because the security group already allows inbound on 3306. Option B is wrong because the error is 'Connection timed out', not authentication.

Option C is wrong because if the VPC had no internet gateway, the error would be different for private connections.

Practice this question →

66

MCQeasy

Refer to the exhibit. A developer runs the AWS CLI command and receives the output shown. What is this output?

A.The DNS endpoint of the DB instance

B.The private IP address of the DB instance

C.The reader endpoint of a Multi-AZ cluster

D.The resource ID of the DB instance

AnswerA

RDS provides a DNS endpoint for connections.

Why this answer

Option C is correct. The output is the DNS endpoint of the RDS instance. Option A is wrong because it's not an IP address.

Option B is wrong because Resource ID looks different. Option D is wrong because the reader endpoint includes '-ro'.

Practice this question →

67

MCQmedium

A company is using Amazon RDS for MySQL and notices that database connections are being rejected intermittently. The application logs show 'Too many connections' errors. The DB instance has 1000 max_connections. Which action should the DBA take to troubleshoot and resolve this issue without impacting performance?

A.Increase the max_connections parameter to 5000 in the DB parameter group

B.Create a read replica to offload read traffic

C.Enable Performance Insights and review the 'DB Connections' metric to identify spikes and troubleshoot application connection pooling

D.Set the 'wait_timeout' parameter to a lower value to close idle connections faster

AnswerC

Performance Insights helps identify the source of connection bursts and allows tuning of the application's connection pooling behavior.

Why this answer

Option C is correct because enabling Performance Insights allows the DBA to monitor the 'DB Connections' metric in near real-time, identify exactly when connection spikes occur, and correlate those spikes with application behavior. This diagnostic approach pinpoints the root cause—such as a connection leak or insufficient connection pooling—without making changes that could degrade performance. Increasing max_connections or lowering wait_timeout without understanding the usage pattern can lead to resource exhaustion or premature connection termination.

Exam trap

The trap here is that candidates assume increasing max_connections or lowering timeouts is a quick fix, but AWS tests the ability to diagnose first using monitoring tools (Performance Insights) before making configuration changes that could harm performance or availability.

How to eliminate wrong answers

Option A is wrong because blindly increasing max_connections to 5000 does not resolve the underlying cause of connection spikes and can overwhelm the DB instance's memory and CPU, leading to worse performance or instability. Option B is wrong because a read replica offloads read traffic but does not address the 'Too many connections' error, which is a connection limit issue affecting all connections (reads and writes) on the primary instance. Option D is wrong because reducing wait_timeout may close idle connections faster, but it can disrupt long-running transactions or applications with legitimate idle periods, and it does not fix the root cause of connection spikes or leaks.

Practice this question →

68

MCQhard

A company is running an Amazon RDS for Oracle database in Multi-AZ. The primary instance fails over unexpectedly. The DBA wants to determine the cause of the failover. What should the DBA do?

A.Review the Enhanced Monitoring metrics for the primary instance.

B.Query the database error logs for the failover time.

C.View the RDS events in the AWS Management Console.

D.Check AWS CloudTrail for any database-related API calls.

AnswerC

RDS events provide details about failover reasons.

Why this answer

Option C is correct because RDS events log failover reasons. Option A is wrong because Enhanced Monitoring does not capture failover events. Option B is wrong because CloudTrail logs API calls, not failover reasons.

Option D is wrong because error logs may not include failover cause.

Practice this question →

69

MCQhard

A company's Amazon RDS for PostgreSQL instance is experiencing a high number of connections, causing performance degradation. The DBA wants to identify which user and application are creating the most connections. What should the DBA do?

A.Enable AWS CloudTrail to log database logins.

B.Enable Performance Insights and use the 'db.sql_tokenized' dimension to analyze connections by user.

C.Enable Enhanced Monitoring and check the 'Connection Count' metric.

D.Enable VPC Flow Logs to track connection attempts.

AnswerB

Performance Insights provides SQL-level performance data, including top users and applications.

Why this answer

Option D is correct because Performance Insights with the db.sql_tokenized dimension allows grouping by user and application. Option A is wrong because RDS Enhanced Monitoring does not show SQL-level details. Option B is wrong because VPC Flow Logs capture network traffic, not database connections.

Option C is wrong because CloudTrail logs API calls, not database connections.

Practice this question →

70

MCQeasy

A developer needs to monitor the number of throttled read requests for a DynamoDB table. Which CloudWatch metric should be used?

A.ReadThrottleEvents

B.ThrottledWriteEvents

C.SuccessfulRequestLatency

D.ConsumedReadCapacityUnits

AnswerA

This metric directly counts throttled read requests.

Why this answer

Option C is correct because ReadThrottleEvents counts throttled read requests. Option A is wrong because ConsumedReadCapacityUnits shows successful reads, not throttled ones. Option B is wrong because ThrottledWriteEvents counts write throttles.

Option D is wrong because SuccessfulRequestLatency measures latency, not throttles.

Practice this question →

71

MCQhard

A team is troubleshooting an Amazon RDS for SQL Server instance that is running out of storage. The instance uses General Purpose SSD (gp2) storage. The team wants to increase storage without downtime. Which action should they take?

A.Migrate to gp3 storage.

B.Add a read replica to offload queries.

C.Take a snapshot and restore to a larger instance.

D.Modify the DB instance to increase allocated storage.

AnswerD

Correct. RDS allows modifying storage online without downtime.

Why this answer

Option D is correct because RDS supports modifying storage settings without downtime, and the change occurs during the next maintenance window. Option A is wrong because creating a snapshot does not increase storage. Option B is wrong because gp3 is not supported for SQL Server.

Option C is wrong because read replicas do not increase storage on the primary.

Practice this question →

72

MCQmedium

A company uses Amazon Redshift for data warehousing. They run a daily ETL job that loads data into the cluster. Recently, the job started failing with 'Disk Full' errors. The cluster has 5 RA3 nodes. Which step should be taken to resolve the issue?

A.Disable concurrency scaling to free up resources

B.Run a VACUUM command to reclaim space from deleted rows

C.Resize the cluster to a larger node type or add more nodes

D.Enable Redshift Spectrum to offload queries to S3

AnswerC

RA3 nodes separate compute and storage; you can increase storage by resizing or adding nodes.

Why this answer

Option A is correct because RA3 nodes use managed storage; resizing the cluster to a different node type or adding more nodes can increase storage capacity. Option B is wrong because VACUUM reorganizes data but does not free space if the disk is full; it may even require temporary space. Option C is wrong because Analytics is not a feature that affects storage.

Option D is wrong because disabling concurrency scaling does not free disk space.

Practice this question →

73

Multi-Selecthard

Which TWO settings should be verified when troubleshooting an RDS for MySQL instance that has a high number of aborted connections? (Choose 2.)

Select 2 answers

A.connect_timeout parameter

B.max_allowed_packet parameter

C.query_cache_type parameter

D.binlog_retention_hours parameter

E.max_connections parameter

AnswersA, B

Low connect_timeout can cause aborted connections if client takes too long.

Why this answer

Options A and C are correct because connection timeout and max allowed packet can cause aborted connections. Option B is wrong because max_connections limits total connections but not aborts. Option D is wrong because query cache is not related to connections.

Option E is wrong because binlog retention is for replication.

Practice this question →

74

Multi-Selecthard

A company is using Amazon DynamoDB with provisioned capacity. The application is experiencing throttling on write requests. The database specialist needs to identify the cause. Which THREE metrics should be reviewed in CloudWatch? (Select THREE.)

Select 3 answers

A.ConsumedWriteCapacityUnits

B.WriteThrottleEvents

C.ThrottledWriteRequests

D.ReadThrottleEvents

E.SuccessfulRequestLatency

AnswersA, B, C

Shows consumed capacity to compare with provisioned capacity.

Why this answer

Option A is correct because 'ThrottledWriteRequests' directly indicates throttling. Option B is correct because 'WriteThrottleEvents' is another metric for throttling events. Option C is correct because 'ConsumedWriteCapacityUnits' helps understand if provisioned capacity is being fully used.

Option D is wrong because 'ReadThrottleEvents' is for reads, not writes. Option E is wrong because 'SuccessfulRequestLatency' measures latency, not throttling.

Practice this question →

75

MCQmedium

A company is running an Amazon Aurora MySQL database cluster. The database specialist notices that the write latency is high during peak hours. The cluster consists of one writer and two reader instances. Which action should the specialist take to reduce write latency?

A.Enable Auto Scaling on the cluster to automatically adjust capacity.

B.Increase the instance class of the writer instance to a larger size.

C.Enable Multi-AZ deployment for the cluster.

D.Add more reader instances to distribute the read load.

AnswerB

A larger instance class provides more CPU and memory, reducing write latency.

Why this answer

Option C is correct because increasing the instance class of the writer can improve performance. Option A is wrong because read replicas do not help with write latency. Option B is wrong because Auto Scaling does not apply to Aurora instance classes automatically.

Option D is wrong because Multi-AZ is already inherent in Aurora.

Practice this question →

Page 1 of 4 · 300 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Troubleshooting questions.

Start 20-question session