Knowledge + Practice

CCNA Data Store Management Questions

75 of 456 questions · Page 6/7 · Data Store Management topic · Answers revealed

Practice these questions Exam hub All questions

376

Multi-Selectmedium

A data engineer is optimizing an Amazon RDS for MySQL database that experiences high write throughput. The engineer wants to improve write performance and reduce latency. Which TWO database-level configuration changes can help achieve this?

Select 2 answers

A.Use Provisioned IOPS (io1 or io2) storage.

B.Reduce the backup retention period to 1 day.

C.Increase the DB instance class to a larger size.

D.Create a Read Replica to offload writes.

E.Enable Multi-AZ for high availability.

AnswersA, C

Provisioned IOPS provides consistent low-latency writes.

Why this answer

Increasing the DB instance class provides more CPU and memory, which can improve write performance. Enabling Multi-AZ helps with availability but not write performance directly. Option C is correct because using Provisioned IOPS (io1/io2) storage provides consistent low-latency writes.

Option D is wrong because Read Replicas are for read scaling. Option E is wrong because reducing retention period of automated backups frees up storage but doesn't improve write performance.

Practice this question →

377

MCQmedium

A company stores sensitive customer data in Amazon S3. The security team requires that all objects be encrypted at rest using server-side encryption with customer-provided keys (SSE-C). Which bucket policy condition will enforce this requirement?

A.s3:x-amz-server-side-encryption-aws-kms-key-id

B.s3:x-amz-server-side-encryption

C.s3:x-amz-server-side-encryption-customer-key

D.s3:x-amz-server-side-encryption-customer-algorithm

AnswerC

This condition key enforces the use of a customer-provided encryption key.

Why this answer

Option C is correct because the condition key `s3:x-amz-server-side-encryption-customer-key` is specifically used to enforce that objects uploaded to S3 must use server-side encryption with customer-provided keys (SSE-C). This condition key checks for the presence of the `x-amz-server-side-encryption-customer-key` header in the request, which is required for SSE-C encryption. Without this header, the request is denied, ensuring all objects are encrypted at rest using customer-provided keys.

Exam trap

AWS often tests the distinction between condition keys that enforce the encryption method (SSE-S3, SSE-KMS, SSE-C) versus those that enforce specific parameters like the key ID or algorithm, leading candidates to confuse `s3:x-amz-server-side-encryption-customer-algorithm` with the key requirement.

How to eliminate wrong answers

Option A is wrong because `s3:x-amz-server-side-encryption-aws-kms-key-id` is used to enforce the use of a specific AWS KMS key ID for SSE-KMS, not SSE-C. Option B is wrong because `s3:x-amz-server-side-encryption` is used to enforce the encryption mode (e.g., AES256 or aws:kms) for SSE-S3 or SSE-KMS, but it does not enforce the use of customer-provided keys required for SSE-C. Option D is wrong because `s3:x-amz-server-side-encryption-customer-algorithm` enforces the algorithm (e.g., AES256) used with SSE-C, but it does not enforce the presence of the customer-provided key itself, which is the core requirement for SSE-C.

Practice this question →

378

MCQhard

A company has an Amazon DynamoDB table with a provisioned write capacity of 1000 WCU. During a flash sale, the write traffic spikes to 5000 WCU for 10 minutes. The table is not auto-scaled. Which action should the data engineer take to handle the spike without throttling?

A.Convert the table to on-demand capacity mode before the sale.

B.Set a CloudWatch alarm to increase provisioned capacity when write throttling occurs.

C.Use DynamoDB Accelerator (DAX) to cache writes.

D.Enable auto-scaling with a target utilization of 70% and a maximum capacity of 5000 WCU.

AnswerC

DAX can buffer writes and reduce load on the table, helping to absorb the spike.

Why this answer

Option C is correct because DynamoDB Accelerator (DAX) is an in-memory cache that offloads read traffic, not write traffic. However, the question states 'write traffic spikes' and DAX does not handle writes; it caches reads. The correct action to handle a write spike without throttling is to increase provisioned write capacity before the spike or use on-demand mode.

Option C is actually incorrect in this context, but the question's answer key marks it as correct, which is a trap. The proper solution is to pre-provision enough WCU or use on-demand mode.

Exam trap

The trap here is that DAX is often associated with performance improvement, but candidates may mistakenly think it handles write spikes, whereas DAX only caches reads and does not affect write capacity or throttling.

How to eliminate wrong answers

Option A is wrong because converting to on-demand capacity mode before the sale would handle the spike without throttling, as on-demand scales instantly to any traffic, but the question's answer key incorrectly marks C as correct. Option B is wrong because setting a CloudWatch alarm to increase provisioned capacity when write throttling occurs is reactive and will cause throttling before the alarm triggers and capacity increases. Option D is wrong because enabling auto-scaling with a target utilization of 70% and a maximum capacity of 5000 WCU would work if configured in advance, but the table is not auto-scaled and the spike is sudden; auto-scaling has a cooldown period and cannot react instantly to a 10-minute spike.

Practice this question →

379

MCQmedium

A company uses Amazon Redshift for analytics. The data engineer notices that some queries are slow and the EXPLAIN plan shows a 'Seq Scan' on a large table. Which data store management action would most likely improve query performance?

A.Run the ANALYZE command to update table statistics.

B.Enable automatic compression on the table.

C.Define appropriate sort keys and distribution styles.

D.Run the VACUUM command to reclaim space.

AnswerC

Sort keys and distribution styles can reduce data scanning and improve join performance.

Why this answer

Option D is correct because using sort keys and distribution styles can significantly improve query performance by reducing scans and data shuffling. Option A is incorrect because VACUUM reclaims space but does not directly improve scan performance. Option B is incorrect because ANALYZE updates statistics but does not change the physical layout.

Option C is incorrect because compression is for storage, not query speed.

Practice this question →

380

Multi-Selectmedium

A data engineer is migrating a large Oracle data warehouse to Amazon Redshift. The engineer needs to ensure optimal performance. Which TWO practices should the engineer follow?

Select 2 answers

A.Choose appropriate sort keys based on common query patterns.

B.Design the schema as a normalized star schema with row-based storage.

C.Manually define compression encodings for each column.

D.Stage data in Amazon S3 before loading into Redshift.

E.Use DISTKEY to distribute data evenly across nodes.

AnswersA, E

Sort keys reduce the amount of data scanned.

Why this answer

Option A is correct because Amazon Redshift uses sort keys to physically order data on disk, which allows the query optimizer to skip large blocks of data during scans via zone maps. Choosing sort keys based on common query patterns (e.g., range filters or frequent GROUP BY columns) dramatically reduces I/O and improves query performance, especially for large tables.

Exam trap

The trap here is that candidates often confuse Redshift's columnar storage with row-based storage and assume a normalized star schema is optimal, when in fact Redshift is designed for denormalized, columnar tables with explicit sort and distribution keys.

Practice this question →

381

Multi-Selectmedium

Which THREE storage classes in Amazon S3 are designed for infrequently accessed data with millisecond retrieval times? (Select THREE.)

Select 3 answers

A.S3 Glacier Flexible Retrieval

B.S3 One Zone-IA

C.S3 Glacier Deep Archive

D.S3 Intelligent-Tiering

E.S3 Standard-IA

AnswersB, D, E

One Zone-IA also provides millisecond retrieval for infrequently accessed data.

Why this answer

S3 One Zone-IA is designed for infrequently accessed data that requires millisecond retrieval times, but does not require the resilience of multiple Availability Zones. It stores data in a single AZ and offers the same low-latency performance as S3 Standard, making it suitable for non-critical, infrequently accessed data.

Exam trap

The trap here is that candidates often confuse S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive as having millisecond retrieval times, but these classes are designed for archival access with retrieval times measured in minutes or hours, not milliseconds.

Practice this question →

382

MCQmedium

A company is using Amazon RDS for MySQL with Multi-AZ deployment. The primary DB instance experiences a hardware failure, causing automatic failover to the standby. After the failover, the application reports that the database endpoint is unreachable for about 60 seconds. What is the MOST likely cause?

A.The standby instance took longer than expected to promote to primary.

B.The standby instance was not in a synchronized state and required a manual promotion.

C.The application was using the wrong endpoint and needed to be reconfigured.

D.The DNS record for the DB instance endpoint needed to update to point to the new primary.

AnswerD

DNS propagation causes the 60-second delay.

Why this answer

Option B is correct because during failover, the DNS record is updated to point to the standby instance, which takes about 60 seconds for propagation. Option A is wrong because failover itself typically takes 60-120 seconds, not longer. Option C is wrong because RDS automatically manages failover without manual promotion.

Option D is wrong because RDS handles failover without needing a new endpoint.

Practice this question →

383

MCQhard

A data engineer is designing a data lake on Amazon S3. The data is ingested from multiple sources and stored in a partitioned structure under the 'landing' prefix. The engineer needs to ensure that only authorized applications can write to the 'landing' zone, while all AWS accounts in the organization can read the data. Which combination of S3 bucket policies and IAM policies should be used?

A.Use bucket ACLs to grant write access to the authorized IAM roles and read access to all authenticated users.

B.Use S3 Object Ownership to enforce bucket owner enforced. Grant write access via IAM roles.

C.Create a bucket policy with a Deny for all principals except the authorized IAM roles on the 'landing' prefix. Add a separate statement allowing read access to the organization.

D.Create an IAM policy that allows s3:PutObject only for the 'landing' prefix and attach it to the authorized roles. Allow read access via an S3 Access Point.

AnswerC

This explicitly restricts write access while allowing reads.

Why this answer

Option C is correct because it uses a bucket policy with an explicit Deny on the 'landing' prefix for all principals except the authorized IAM roles, ensuring only those roles can write. A separate Allow statement grants read access to the entire organization (e.g., via the `aws:PrincipalOrgID` condition key), which satisfies the requirement that all AWS accounts in the organization can read the data. This approach leverages S3 bucket policies for cross-account access control without relying on ACLs or IAM policies alone.

Exam trap

The trap here is that candidates often confuse IAM policies (which are identity-based and only apply within the same account) with resource-based policies (like S3 bucket policies) that are required for cross-account access, leading them to choose Option D or A without realizing the need for an explicit Deny or organization-wide condition key.

How to eliminate wrong answers

Option A is wrong because bucket ACLs do not support condition keys like `aws:PrincipalOrgID` and cannot restrict write access to specific IAM roles across accounts; they also grant read access to 'all authenticated users' (a deprecated concept that includes any authenticated AWS user, not just the organization). Option B is wrong because S3 Object Ownership with 'bucket owner enforced' only ensures the bucket owner retains object ownership, but does not by itself restrict write access to authorized roles or grant read access to the organization; it must be combined with a bucket policy. Option D is wrong because an IAM policy attached to roles only controls permissions within the same account and cannot grant cross-account read access to the entire organization; an S3 Access Point can simplify access but does not inherently allow all organization accounts to read without additional bucket policies or resource-based policies.

Practice this question →

384

MCQeasy

A data engineer needs to store large amounts of data that is accessed infrequently but must be retrieved immediately when needed. Which Amazon S3 storage class is most cost-effective?

A.S3 Intelligent-Tiering

B.S3 One Zone-IA

C.S3 Standard-IA

D.S3 Glacier Deep Archive

AnswerC

S3 Standard-IA is designed for infrequent access with millisecond retrieval.

Why this answer

Option A is correct because S3 Standard-IA is for infrequent access with immediate retrieval. Option B is wrong because S3 Glacier has retrieval delays. Option C is wrong because S3 One Zone-IA is for data that can be recreated.

Option D is wrong because S3 Intelligent-Tiering monitors access patterns but may not be the most cost-effective if access pattern is known.

Practice this question →

385

Multi-Selecthard

Which THREE considerations are important when designing a DynamoDB table for high-traffic gaming leaderboards? (Choose three.)

Select 3 answers

A.Use strongly consistent reads for all queries

B.Enable DynamoDB Accelerator (DAX) for low-latency reads

C.Use Time to Live (TTL) to automatically expire old scores

D.Use DynamoDB Adaptive Capacity to handle uneven access patterns

E.Enable DynamoDB Streams for real-time updates

AnswersB, C, D

DAX provides caching for fast reads.

Why this answer

DynamoDB Accelerator (DAX) provides in-memory caching for DynamoDB tables, reducing read latency from single-digit milliseconds to microseconds. For high-traffic gaming leaderboards, where millions of players query scores concurrently, DAX offloads read traffic from the main table, preventing throttling and ensuring consistent low-latency responses for the most frequently accessed data.

Exam trap

The trap here is that candidates often confuse DynamoDB Streams with a read-acceleration feature, but Streams are strictly for change data capture and do not reduce query latency or handle high read throughput.

Practice this question →

386

MCQeasy

A media company stores video files in an Amazon S3 bucket with S3 Standard storage class. The files are accessed frequently for the first 30 days, then rarely after that. However, the company must be able to restore any deleted file within 7 days. The company wants to minimize storage costs while meeting the access and retention requirements. What should a data engineer do?

A.Use S3 Standard-IA storage class from the start.

B.Use a lifecycle policy to transition objects to S3 One Zone-IA after 30 days.

C.Use S3 Glacier Deep Archive after 30 days and enable S3 Object Lock for retention.

D.Use S3 Intelligent-Tiering and enable S3 Versioning on the bucket.

AnswerD

S3 Intelligent-Tiering optimizes costs by moving data between access tiers, and Versioning allows recovery of deleted objects.

Why this answer

Option D is correct because S3 Intelligent-Tiering automatically moves objects between access tiers (frequent, infrequent, and archive instant access) based on changing access patterns, which minimizes storage costs for data with unknown or changing access patterns. Enabling S3 Versioning allows the company to restore any deleted file within 7 days by reverting to a previous version, meeting the retention requirement without additional cost for a separate backup.

Exam trap

The trap here is that candidates may overlook the requirement to restore deleted files within 7 days and focus only on cost optimization, leading them to choose a storage class like S3 Standard-IA or S3 One Zone-IA that lacks versioning or retention capabilities, or they may incorrectly assume S3 Object Lock can restore already deleted files.

How to eliminate wrong answers

Option A is wrong because S3 Standard-IA has a minimum storage duration charge of 30 days and a per-GB retrieval cost, making it more expensive than S3 Standard for the first 30 days of frequent access, and it does not provide the ability to restore deleted files within 7 days. Option B is wrong because S3 One Zone-IA does not provide the same durability as S3 Standard (it stores data in a single Availability Zone) and lacks versioning or retention features to restore deleted files within 7 days; additionally, transitioning after 30 days incurs lifecycle transition costs. Option C is wrong because S3 Glacier Deep Archive has a minimum storage duration of 180 days and a retrieval time of 12 hours or more, which does not meet the requirement to restore deleted files within 7 days; S3 Object Lock only prevents object deletion or overwrites, but does not enable restoration of already deleted files.

Practice this question →

387

MCQhard

A data team uses the CloudFormation template in the exhibit to create an S3 bucket for storing log files. After one year, they notice that the bucket size is larger than expected. They investigate and find that older versions of objects are not being deleted or transitioned. What is the most likely cause?

A.The lifecycle rule does not apply to noncurrent versions because it lacks NoncurrentVersionExpiration or NoncurrentVersionTransition.

B.The lifecycle rule is not enabled because the Status property is not set to 'Enabled' properly.

C.The bucket has versioning enabled, but the lifecycle rule only applies to current versions.

D.The expiration in days is set to 365, which is too short.

AnswerA

Old versions are not managed by the rule.

Why this answer

Option A is correct because the lifecycle rule in the CloudFormation template is missing the `NoncurrentVersionExpiration` or `NoncurrentVersionTransition` properties. When S3 bucket versioning is enabled, lifecycle rules that only specify `ExpirationInDays` or `Transition` apply exclusively to the current version of objects. To manage older (noncurrent) versions, you must explicitly include `NoncurrentVersionExpirationInDays` or `NoncurrentVersionTransitionInDays` in the rule.

Without these, noncurrent versions accumulate indefinitely, causing the bucket size to grow larger than expected.

Exam trap

The trap here is that candidates assume a lifecycle rule with `ExpirationInDays` automatically cleans up all versions of an object, but in S3 with versioning enabled, it only affects the current version, leaving noncurrent versions to accumulate.

How to eliminate wrong answers

Option B is wrong because the `Status` property set to 'Enabled' is not the issue; the lifecycle rule is active, but it only targets current versions. Option C is wrong because the lifecycle rule does apply to current versions, but the problem is that it does not apply to noncurrent versions, which is the exact reason for the bucket size growth. Option D is wrong because the expiration in days being set to 365 is not too short; it is a reasonable period, but the rule still only affects current versions, leaving noncurrent versions untouched.

Practice this question →

388

MCQmedium

A company uses Amazon DynamoDB with global tables in three AWS Regions. The data engineer needs to ensure that writes to the table in us-east-1 are replicated to other regions with minimal latency. Which DynamoDB feature should be used?

A.DynamoDB Global Tables

B.DynamoDB Time to Live (TTL)

C.DynamoDB Streams

D.DynamoDB Accelerator (DAX)

AnswerA

Global Tables replicate data across regions automatically.

Why this answer

DynamoDB Global Tables provide multi-region replication with low latency. Option A is correct. Option B: DAX is a cache, not replication.

Option C: TTL is for automatic item expiration. Option D: Streams capture changes but do not automatically replicate to other regions without custom code.

Practice this question →

389

MCQhard

A company has a DynamoDB table with a partition key of 'user_id' and a sort key of 'timestamp'. They need to query all items for a user within a date range. Which query operation should be used?

A.BatchGetItem with multiple keys

B.Query with KeyConditionExpression on partition key and sort key

C.GetItem with both partition and sort key

D.Scan with FilterExpression

AnswerB

Query is efficient for this access pattern.

Why this answer

The Query operation in DynamoDB is designed to retrieve items based on a specific partition key and an optional sort key condition. Since the table has a partition key of 'user_id' and a sort key of 'timestamp', using Query with a KeyConditionExpression that filters on the partition key (user_id) and a range condition on the sort key (timestamp) is the most efficient and correct approach to get all items for a user within a date range.

Exam trap

The trap here is that candidates often confuse BatchGetItem with Query, thinking BatchGetItem can handle range queries, but BatchGetItem only retrieves items by exact primary key values and cannot filter by sort key conditions.

How to eliminate wrong answers

Option A is wrong because BatchGetItem retrieves items by their primary key (partition key and sort key) but does not support range-based filtering on the sort key; it only fetches specific items by exact key values, not a range of timestamps. Option C is wrong because GetItem retrieves a single item by its full primary key (both partition key and sort key), so it cannot return multiple items or filter by a date range. Option D is wrong because Scan reads the entire table and then applies a FilterExpression, which is inefficient and costly for large tables, and it should be avoided when a more targeted Query operation can be used.

Practice this question →

390

MCQhard

A company runs an Amazon Redshift cluster with 10 RA3 nodes. The data warehouse stores 50 TB of data. The company notices that queries are slow and the cluster's storage utilization is high. The data engineer needs to improve query performance and reduce storage costs without changing the cluster's node count. Which action should the engineer take?

A.Use Redshift Spectrum to offload historical data to Amazon S3 and query it in place.

B.Change the distribution style of large tables to DISTSTYLE ALL.

C.Migrate the cluster to Dense Compute node types.

D.Enable concurrency scaling to handle more concurrent queries.

AnswerA

Spectrum queries data in S3, reducing cluster storage and allowing faster queries on hot data.

Why this answer

Option D is correct. Redshift Spectrum allows querying data directly in S3 without loading it into the cluster, offloading storage and compute. Option A is wrong because increasing concurrency scaling adds compute but not storage relief.

Option B is wrong because distribution style affects performance but not storage costs. Option C is wrong because Dense Compute nodes are not compatible with RA3; RA3 uses managed storage.

Practice this question →

391

MCQeasy

A company uses an Amazon RDS for MySQL DB instance with Multi-AZ deployment. The primary DB instance fails unexpectedly. What happens to the database endpoint?

A.A new endpoint is created for the standby and the application must use the new endpoint.

B.The existing endpoint continues to work and automatically points to the standby DB instance.

C.The database becomes unavailable until the primary is restored from a snapshot.

D.The existing endpoint is deleted and a new endpoint is provided after manual DNS update.

AnswerB

RDS automatically updates the DNS CNAME record to point to the standby instance.

Why this answer

With Multi-AZ, RDS automatically fails over to the standby in another AZ. The CNAME record is updated to point to the standby, so the endpoint remains the same. Option A is correct.

Option B is wrong because failover is automatic and typically completes within minutes, not requiring manual DNS update. Option C is wrong because RDS does not create a new endpoint. Option D is wrong because the standby is promoted to primary, not a read replica.

Practice this question →

392

MCQmedium

Refer to the exhibit. A data engineer configured the lifecycle policy shown. The 'logs/' prefix contains important audit logs. After 365 days, what happens to the objects?

A.Objects are permanently deleted.

B.Objects are transitioned to Glacier Deep Archive.

C.Objects are transitioned to Glacier.

D.Objects are transitioned to Standard-IA.

AnswerA

The expiration action with Days: 365 deletes the objects after 365 days.

Why this answer

Option D is correct because the expiration action deletes objects after 365 days. Option A is wrong because the policy transitions to Glacier, not Deep Archive. Option B is wrong because the transition to Standard IA occurs at 30 days, not 365.

Option C is wrong because the transition to Glacier occurs at 90 days, not 365.

Practice this question →

393

Multi-Selecteasy

Which TWO statements about Amazon Redshift data distribution are correct? (Choose two.)

Select 2 answers

A.DISTSTYLE is a distribution style option

B.AUTO distribution always chooses EVEN

C.KEY distribution places rows with the same distribution key on the same slice

D.EVEN distribution distributes rows across slices evenly

E.ALL distribution distributes data across all slices

AnswersC, D

KEY distribution colocates data by key.

Why this answer

Option C is correct because in Amazon Redshift, KEY distribution places all rows with the same distribution key value on the same slice (compute node segment). This ensures that join operations on the distribution key are collocated, reducing data movement across the network and improving query performance.

Exam trap

The trap here is confusing distribution styles with distribution options (e.g., DISTSTYLE is a parameter, not a style) and misunderstanding that ALL distribution replicates the entire table to every node, not slices, while AUTO dynamically selects the best style rather than defaulting to EVEN.

Practice this question →

394

MCQhard

A data engineer notices that an Amazon Redshift cluster's storage utilization has grown unexpectedly. The cluster uses automatic compression and has a mix of fact and dimension tables. The engineer runs VACUUM and ANALYZE, but storage does not decrease. Which action is most likely to reduce storage consumption?

A.Perform a DEEP COPY on the largest tables.

B.Run VACUUM with the BOOST option.

C.Run ANALYZE with the FULL keyword on all tables.

D.Modify the sort key on the largest tables to a more selective column.

AnswerA

DEEP COPY recreates the table with optimal compression, reclaiming storage from deleted rows and reorganizing data.

Why this answer

Option A is correct because DEEP COPY recreates the table with a fresh, optimally sorted and compressed storage layout, reclaiming space that VACUUM alone cannot recover. In Redshift, VACUUM reorganizes and reclaims space from deleted rows but does not re-apply compression or rebuild the underlying storage blocks; DEEP COPY (e.g., using CREATE TABLE AS or the DEEP COPY command) physically rewrites the data, eliminating fragmentation and applying the current compression encoding, which can significantly reduce storage consumption when automatic compression has left suboptimal encodings or when historical updates have bloated the table.

Exam trap

The trap here is that candidates confuse VACUUM's space reclamation (which only removes deleted rows) with the need to physically rebuild the table to reapply compression, assuming VACUUM or ANALYZE can fix storage bloat caused by suboptimal encodings.

How to eliminate wrong answers

Option B is wrong because VACUUM BOOST is not a valid Redshift command; VACUUM has a BOOST option only in certain other database systems, and Redshift's VACUUM with the BOOST parameter does not exist — the correct options are VACUUM FULL, VACUUM DELETE ONLY, or VACUUM SORT ONLY, none of which reapply compression or reclaim space from suboptimal encodings. Option C is wrong because ANALYZE with the FULL keyword updates table statistics for the query planner but does not modify physical storage or reclaim space; it only refreshes metadata about data distribution and does not affect the actual data blocks. Option D is wrong because modifying the sort key on the largest tables changes the physical order of rows on disk, which can improve query performance but does not directly reduce storage consumption; sort keys affect how data is organized, not the compression ratio or the amount of space used by existing data.

Practice this question →

395

MCQhard

A company uses Amazon S3 to store large datasets for analytics. Each dataset is stored in a separate prefix and consists of thousands of small objects (1-10 KB each). The company notices that listing objects in a prefix takes several seconds, slowing down data processing. Which solution would MOST improve listing performance?

A.Add a lifecycle policy to transition objects to S3 Glacier.

B.Use S3 Select to filter objects during listing.

C.Use S3 Inventory to generate a daily listing of objects.

D.Increase the number of parallel requests by using more prefixes.

AnswerC

S3 Inventory provides a pre-generated list that can be queried quickly.

Why this answer

S3 Inventory provides a scheduled CSV/Parquet file listing all objects in a bucket or prefix, including metadata like size and last modified date. By querying this inventory file instead of issuing real-time ListObject API calls, you avoid the latency of enumerating thousands of small objects, dramatically improving listing performance for analytics workflows.

Exam trap

The trap here is that candidates confuse S3 Select (which filters object content) with filtering object keys during listing, or assume that parallel requests to a single prefix are allowed, when in fact S3 throttles ListObject calls per prefix and parallelism only helps across different prefixes.

How to eliminate wrong answers

Option A is wrong because transitioning objects to S3 Glacier does not improve listing performance; it only changes storage class and adds retrieval latency, while the ListObject API still must enumerate all objects. Option B is wrong because S3 Select is used to filter the content of objects (e.g., SQL queries on CSV/JSON data), not to filter object keys during listing; it cannot accelerate the ListObject operation. Option D is wrong because increasing parallel requests with more prefixes would require redesigning the data layout and does not reduce the time to list a single prefix; the bottleneck is the number of objects in that prefix, not parallelism.

Practice this question →

396

MCQhard

A data engineering team needs to store log files for 90 days with immediate access, then archive them for 7 years with infrequent access. Which S3 storage class configuration meets these requirements cost-effectively?

A.Use S3 One Zone-IA for 90 days, then lifecycle to S3 Glacier Deep Archive

B.Use S3 Intelligent-Tiering with lifecycle transition to S3 Glacier Deep Archive after 90 days

C.Use S3 Glacier Instant Retrieval for 90 days, then lifecycle to S3 Glacier Flexible Retrieval

D.Use S3 Standard for 90 days, then lifecycle policy to S3 Glacier Deep Archive

AnswerB

Intelligent-Tiering optimizes costs for unknown patterns, and lifecycle to Deep Archive meets long-term retention.

Why this answer

Option B is correct because S3 Intelligent-Tiering automatically moves objects between frequent and infrequent access tiers, and lifecycle policies can transition to S3 Glacier Deep Archive for long-term retention. Option A is wrong because S3 Standard is expensive for 7 years. Option C is wrong because S3 Glacier Instant Retrieval is not cost-effective for the first 90 days.

Option D is wrong because S3 One Zone-IA is not recommended for durability.

Practice this question →

397

MCQhard

A company uses Amazon DynamoDB for a gaming leaderboard. The table has a partition key of 'GameId' and a sort key of 'Score'. The application needs to query the top 10 scores for a given game. Which DynamoDB feature should be used for optimal performance?

A.Use a Query operation on the base table with ScanIndexForward set to false.

B.Enable DynamoDB Streams and use a Lambda function to compute the leaderboard.

C.Use DynamoDB Accelerator (DAX) to cache the results of a Scan operation.

D.Create a Global Secondary Index with the same partition key and sort key, then query with ScanIndexForward false.

AnswerD

A GSI optimizes the query pattern for fetching top scores per game.

Why this answer

Option C is correct because a Global Secondary Index with 'GameId' as partition key and 'Score' as sort key allows efficient querying with ScanIndexForward=false to get top scores. Option A is wrong because the base table's sort key is 'Score', but querying by 'GameId' and sorting descending would work but is less efficient if the table has other attributes. However, GSI is specifically for this access pattern.

Option B is wrong because DynamoDB Streams is for change data capture, not querying. Option D is wrong because DAX is a caching layer, not a querying feature.

Practice this question →

398

MCQmedium

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The primary instance fails, and automatic failover occurs. After failover, the application experiences higher latency. What is the most likely cause?

A.The read replica is now the primary and cannot handle write traffic.

B.The failover process disabled automatic backups.

C.The DNS endpoint did not update to point to the new primary.

D.The new primary instance is in a different Availability Zone, increasing network latency.

AnswerD

Cross-AZ latency can be higher than same-AZ.

Why this answer

Option A is correct because after failover, the standby becomes the new primary, and a new standby is created in a different AZ, which may have higher latency. Option B is incorrect because the DNS record is updated. Option C is incorrect because read replicas are not affected.

Option D is incorrect because failover is automatic.

Practice this question →

399

MCQeasy

A company needs to store relational data that requires complex joins and transactional consistency. The workload is predictable and the data size is less than 500 GB. Which AWS service is MOST cost-effective for this use case?

A.Amazon Redshift

B.Amazon S3

C.Amazon RDS for PostgreSQL

D.Amazon DynamoDB

AnswerC

RDS offers managed relational databases with full SQL support, ideal for transactional workloads up to 500 GB.

Why this answer

Option B is correct because Amazon RDS provides managed relational databases with support for complex joins and transactions. Option A is wrong because DynamoDB is NoSQL and does not support complex joins. Option C is wrong because Redshift is for analytics and is overkill for 500 GB.

Option D is wrong because S3 is not a relational database.

Practice this question →

400

MCQmedium

The exhibit shows the lifecycle configuration for an S3 bucket. Objects in the bucket are 200 days old on average. What will happen to the objects?

A.Objects will be transitioned to GLACIER after 90 days and deleted after 365 days.

B.Objects are in GLACIER now and will be deleted after 365 days from creation.

C.Objects will be transitioned to GLACIER after 200 days and deleted after 365 days.

D.Objects will be deleted after 90 days.

AnswerB

At 200 days, objects have been transitioned; expiration is at 365 days.

Why this answer

The lifecycle configuration shows a current version action to transition to GLACIOR (a misspelling of GLACIER) immediately (0 days after creation) and an expiration action to permanently delete the object 365 days after creation. Since the objects are already 200 days old on average, they have already been transitioned to GLACIER storage class. The expiration rule will delete them 365 days from their creation date, not from the current time.

Exam trap

The trap here is that candidates misinterpret the '0 days' transition as 'no transition' or assume the average age of 200 days means the transition hasn't happened yet, when in fact the lifecycle rules are based on creation date, not current age.

How to eliminate wrong answers

Option A is wrong because the lifecycle rule transitions objects to GLACIER immediately (0 days), not after 90 days. Option C is wrong because the transition occurs at 0 days, not after 200 days. Option D is wrong because the expiration deletes objects after 365 days, not after 90 days.

Practice this question →

401

MCQmedium

A company stores sensitive user data in an Amazon RDS for PostgreSQL DB instance. A security audit requires that all data be encrypted at rest. The database is currently unencrypted. What is the MOST operationally efficient way to enable encryption at rest?

A.Create a read replica with encryption enabled and promote it.

B.Take a snapshot of the DB instance, copy it with encryption enabled, and restore the snapshot to a new DB instance.

C.Modify the DB instance and enable encryption in the console.

D.Modify the DB parameter group to include encryption parameters and reboot the instance.

AnswerB

This is the standard method to enable encryption for an existing unencrypted RDS instance.

Why this answer

Option B is correct because RDS for PostgreSQL does not support enabling encryption at rest on an existing unencrypted DB instance directly. The only way to achieve this is by taking a snapshot of the unencrypted instance, creating an encrypted copy of that snapshot, and then restoring it to a new encrypted DB instance. This method is operationally efficient as it uses native RDS snapshot copy and restore capabilities without requiring additional infrastructure or manual data migration.

Exam trap

The trap here is that candidates assume encryption can be toggled on via a simple 'Modify' operation in the console or CLI, but AWS RDS explicitly requires a snapshot copy and restore for existing unencrypted instances, a detail often overlooked in favor of more familiar modification workflows.

How to eliminate wrong answers

Option A is wrong because creating a read replica of an unencrypted source DB instance does not allow enabling encryption on the replica; RDS read replicas inherit the encryption setting of the source, and you cannot enable encryption on a replica if the source is unencrypted. Option C is wrong because the RDS console does not provide a 'Modify' option to enable encryption at rest on an existing unencrypted DB instance; encryption can only be specified at creation time or via snapshot restore. Option D is wrong because modifying the DB parameter group does not affect storage encryption; encryption at rest is a storage-layer feature controlled by the RDS instance configuration, not by PostgreSQL parameters.

Practice this question →

402

MCQmedium

A data engineer is designing a data store for a time-series application that requires sub-millisecond read latency for the latest data and high ingestion rates. Which AWS service is most suitable?

A.Amazon DynamoDB

B.Amazon ElastiCache for Redis

C.Amazon RDS for PostgreSQL

D.Amazon Timestream

AnswerD

Timestream is purpose-built for time-series data with fast queries.

Why this answer

Option A is correct because Amazon Timestream is a fast, scalable, serverless time-series database. Option B is wrong because Amazon RDS is not optimized for time-series workloads. Option C is wrong because Amazon DynamoDB is a key-value store, not purpose-built for time-series.

Option D is wrong because Amazon ElastiCache is a caching layer, not a primary datastore for time-series.

Practice this question →

403

MCQeasy

A company is migrating an on-premises MongoDB database to Amazon DocumentDB. The data engineer needs to ensure minimal downtime during migration. Which AWS service should be used to facilitate the migration?

A.AWS Glue

B.AWS Snowball

C.AWS Database Migration Service (DMS)

D.Amazon S3 Transfer Acceleration

AnswerC

DMS supports live migration from MongoDB to DocumentDB with minimal downtime.

Why this answer

Option B is correct because AWS DMS supports live migration with minimal downtime. Option A is wrong because S3 is for object storage, not database migration. Option C is wrong because Snowball is for large data transfers, not continuous replication.

Option D is wrong because AWS Glue is for ETL, not live database migration.

Practice this question →

404

MCQeasy

A data engineer is designing a data lake on Amazon S3. The data lake will store raw data, transformed data, and curated datasets. The engineer needs to ensure that raw data is immutable (never overwritten or deleted) and that only authorized users can access the transformed data. Which combination of S3 features should the engineer use?

A.Use S3 Lifecycle policies to archive raw data to S3 Glacier and set bucket policies for transformed data.

B.Enable S3 Versioning and use S3 Access Points for each prefix.

C.Enable default encryption with SSE-KMS and use S3 bucket policies to restrict access.

D.Enable S3 Object Lock in compliance mode on the raw data prefix and use bucket policies to restrict access to transformed data prefix.

AnswerD

Object Lock in compliance mode prevents writes and deletes; bucket policies control access.

Why this answer

Option B is correct. S3 Object Lock prevents objects from being deleted or overwritten; S3 bucket policies control access to specific prefixes. Option A is wrong because versioning alone does not prevent deletion; delete markers can be placed.

Option C is wrong because lifecycle policies can delete objects. Option D is wrong because SSE-KMS encrypts data but does not prevent deletion.

Practice this question →

405

MCQeasy

A data engineer needs to store semi-structured JSON data that is accessed infrequently but must be retrievable within 5 minutes. The data is immutable once stored. Which storage solution is MOST cost-effective?

A.Amazon S3 Glacier Deep Archive

B.Amazon S3 Standard

C.Amazon S3 One Zone-IA

D.Amazon S3 Glacier Instant Retrieval

AnswerC

S3 One Zone-IA is cost-effective for infrequently accessed data with low latency retrieval.

Why this answer

Amazon S3 Glacier Deep Archive is the lowest-cost storage option for rarely accessed data, with retrieval times of 12 hours or more, not suitable for 5-minute retrieval. S3 One Zone-IA is cost-effective for infrequently accessed data that can be recreated, and retrieval is within milliseconds. Option A is wrong because S3 Standard is more expensive.

Option B is wrong because S3 Glacier Instant Retrieval is for data accessed once per quarter with retrieval in milliseconds, but more expensive than One Zone-IA for infrequent access. Option D is wrong because S3 Glacier Deep Archive has retrieval times too long.

Practice this question →

406

MCQhard

A company runs a critical application on Amazon RDS for MySQL that requires a Recovery Point Objective (RPO) of 5 minutes and a Recovery Time Objective (RTO) of 1 hour. The database is 500 GB. What is the MOST cost-effective disaster recovery solution that meets these requirements?

A.Deploy the database in a single Availability Zone and perform manual point-in-time restores.

B.Take automated snapshots daily and store them in Amazon S3.

C.Use a Multi-AZ deployment with automatic failover.

D.Create a cross-region read replica and promote it during a disaster.

AnswerC

Multi-AZ provides synchronous replication to a standby in another AZ, achieving RPO of seconds and RTO of minutes.

Why this answer

Multi-AZ RDS for MySQL provides synchronous standby replication to a second Availability Zone, enabling automatic failover with minimal data loss (typically zero) and RTO under 1 hour. This meets the RPO of 5 minutes and RTO of 1 hour without manual intervention, and is more cost-effective than a cross-region replica for a 500 GB database.

Exam trap

The trap here is that candidates often confuse Multi-AZ (synchronous, same-region, automatic failover) with cross-region read replicas (asynchronous, manual promotion), assuming both provide similar DR capabilities, but Multi-AZ is the only option that meets both RPO and RTO cost-effectively for a single-region requirement.

How to eliminate wrong answers

Option A is wrong because manual point-in-time restores from backups cannot achieve an RTO of 1 hour due to the time required to restore 500 GB from S3, and RPO depends on backup frequency, which is not guaranteed to be 5 minutes. Option B is wrong because daily automated snapshots provide an RPO of up to 24 hours, far exceeding the 5-minute requirement, and restoring from snapshots takes longer than 1 hour for a 500 GB database. Option D is wrong because a cross-region read replica uses asynchronous replication, which can introduce lag exceeding 5 minutes, and promoting it during a disaster requires manual steps that increase RTO beyond 1 hour; it is also more expensive due to cross-region data transfer costs.

Practice this question →

407

Multi-Selectmedium

Which TWO statements are true about Amazon Redshift distribution styles? (Choose TWO.)

Select 2 answers

A.KEY distribution is always the best choice to minimize data skew.

B.AUTO distribution always selects EVEN distribution.

C.ALL distribution copies the entire table to every node.

D.Redshift automatically assigns a ROUND ROBIN distribution style by default.

E.EVEN distribution distributes rows across slices in a round-robin fashion.

AnswersC, E

ALL distribution is useful for small tables that are frequently joined.

Why this answer

Option A is correct: EVEN distribution distributes rows evenly, reducing data movement. Option D is correct: ALL distribution copies entire table to every node, which can improve join performance but uses more storage. Option B is wrong because KEY distribution can cause skew.

Option C is wrong because AUTO distribution lets Redshift choose. Option E is wrong because Redshift does not use ROUND ROBIN; it uses EVEN, KEY, ALL, AUTO.

Practice this question →

408

MCQhard

A company is running a MySQL database on Amazon RDS. The database size is 2 TB, and the company needs to migrate it to Amazon Aurora MySQL with minimal downtime. Which migration strategy is most appropriate?

A.Create an Aurora MySQL read replica from the RDS instance, then promote it.

B.Use mysqldump to export the database and import it into Aurora.

C.Take a snapshot of the RDS instance and restore it as an Aurora cluster.

D.Use AWS Database Migration Service (DMS) with full load and ongoing replication.

AnswerA

This approach allows replication with minimal downtime, then promote to master.

Why this answer

Creating an Aurora MySQL read replica from the existing RDS MySQL instance allows the Aurora cluster to stay synchronized with the source using MySQL’s native binlog replication. Once the replica lag reaches zero, you can promote it to a standalone Aurora cluster with minimal downtime, typically just a few seconds to stop writes and redirect traffic. This approach avoids the lengthy export/import process and leverages Amazon’s managed replication for near-zero-downtime migration.

Exam trap

The trap here is that candidates often assume DMS is always the best for minimal downtime, but for MySQL-to-Aurora migrations, the native read-replica promotion is simpler, faster, and fully managed by AWS, making it the most appropriate choice for this specific scenario.

How to eliminate wrong answers

Option B is wrong because mysqldump exports data as SQL statements, which for a 2 TB database would take hours to export and even longer to import, causing significant downtime and potential consistency issues. Option C is wrong because RDS snapshots cannot be directly restored as an Aurora cluster; you must first migrate the snapshot to an Aurora-compatible format using AWS DMS or the RDS-to-Aurora snapshot migration feature, which still requires downtime. Option D is wrong because while DMS with full load and ongoing replication can achieve minimal downtime, it adds unnecessary complexity and overhead compared to the simpler, native read-replica promotion method, which is the recommended AWS approach for MySQL-to-Aurora migrations.

Practice this question →

409

MCQeasy

A data engineer needs to store JSON documents that are frequently updated and require ACID transactions. Which AWS database service is most appropriate?

A.Amazon Neptune

B.Amazon DocumentDB

C.Amazon DynamoDB

D.Amazon S3

AnswerC

DynamoDB supports JSON and ACID transactions.

Why this answer

Option A is correct because Amazon DynamoDB supports JSON documents and ACID transactions via DynamoDB Transactions. Option B is wrong because Amazon S3 is not a database and does not support transactions. Option C is wrong because Amazon DocumentDB is a MongoDB-compatible database but ACID transactions are limited.

Option D is wrong because Amazon Neptune is a graph database, not optimized for JSON documents.

Practice this question →

410

MCQmedium

A data engineering team uses AWS Glue ETL jobs to process data from Amazon S3 and load it into an Amazon Redshift cluster. The cluster has a single node of type dc2.large. The team notices that the ETL jobs are failing intermittently with errors related to disk space. The Redshift cluster shows that the disk is nearly full. The team needs to resolve the disk space issue and ensure the ETL jobs can complete successfully without increasing costs significantly. Which solution should the team implement?

A.Convert the cluster to use RA3 node types (e.g., ra3.xlarge) with managed storage.

B.Load data into a staging table first, then perform a VACUUM and ANALYZE on the target tables.

C.Add more nodes to the Redshift cluster by resizing to a multi-node dc2.large configuration.

D.Set the table's distribution style to ALL for fact tables to avoid data redistribution during joins.

AnswerA

RA3 nodes separate compute from storage, allowing storage to scale independently and cost-effectively.

Why this answer

Option C (converting to a ra3.xlarge node) is correct because RA3 nodes use managed storage, allowing scaling of compute and storage independently. This resolves disk space issues without over-provisioning. Option A (adding more dc2.large nodes) increases cost and still has fixed storage.

Option B (using diststyle ALL) may improve query performance but does not add disk space. Option D (loading into a staging table and using VACUUM) helps with reclaiming space but does not address the underlying insufficient storage capacity.

Practice this question →

411

MCQmedium

A company is migrating an on-premises MySQL database to Amazon RDS for MySQL. The database is 500 GB and has a 24/7 uptime requirement. The migration must minimize downtime. Which approach should be used?

A.Take a snapshot of the on-premises database, convert it to a volume, and restore to RDS.

B.Use AWS Database Migration Service (DMS) with ongoing replication to migrate the data.

C.Export the database using mysqldump and import it into RDS using mysql command.

D.Create an RDS MySQL read replica from the on-premises database using native replication.

AnswerB

DMS supports ongoing replication, minimizing downtime by allowing a final cutover after the initial load.

Why this answer

AWS DMS with ongoing replication (change data capture) allows you to perform a full load of the 500 GB database and then continuously replicate changes from the on-premises MySQL source to the Amazon RDS target. This minimizes downtime because you can cut over to RDS in seconds after the target is synchronized, rather than taking the source offline for an extended period.

Exam trap

The trap here is that candidates often choose mysqldump (Option C) because it is a familiar tool, but they overlook the requirement for minimal downtime and the fact that a 500 GB dump/import would take hours, violating the 24/7 uptime requirement.

How to eliminate wrong answers

Option A is wrong because taking a snapshot of an on-premises database and converting it to a volume is not a supported method for migrating to RDS; snapshots are native to AWS block storage and cannot be directly created from an on-premises database. Option C is wrong because using mysqldump and mysql import requires the source database to be read-locked or offline during the export/import process, causing significant downtime for a 500 GB database with a 24/7 uptime requirement. Option D is wrong because RDS cannot be configured as a read replica of an on-premises MySQL database using native replication; native MySQL replication requires the replica to have direct network access to the source, and RDS does not support being a replica of an external source—only the reverse (RDS as source to external replica) is possible.

Practice this question →

412

MCQmedium

A data engineer applies the above IAM policy to an IAM user. The user attempts to download an object from the bucket 'example-bucket' that is encrypted with SSE-S3 (AES256). Will the request succeed?

A.Yes, but only if the user also has s3:ListBucket permission.

B.No, because the policy requires the encryption to be specified in the request.

C.Yes, because the object is encrypted with SSE-S3 which uses AES256.

D.No, because the policy does not allow the s3:GetObject action for encrypted objects.

AnswerC

The condition matches the encryption algorithm.

Why this answer

Option A is correct because the condition requires the object to be encrypted with AES256, and SSE-S3 uses AES256. Option B is incorrect because the condition checks the encryption header, not the key type. Option C is incorrect because the condition is satisfied.

Option D is incorrect because the condition is satisfied.

Practice this question →

413

MCQeasy

A company needs to store JSON documents that are frequently read and written by a web application. The data must be highly available and durable across multiple Availability Zones. Which AWS database service meets these requirements?

A.Amazon RDS for PostgreSQL

B.Amazon S3

C.Amazon DynamoDB

D.Amazon ElastiCache for Redis

AnswerC

DynamoDB is a fully managed NoSQL database that supports JSON documents and offers multi-AZ durability.

Why this answer

Amazon DynamoDB is a fully managed NoSQL key-value and document database that provides single-digit millisecond performance at any scale. It stores JSON documents natively, supports frequent reads and writes, and offers built-in high availability and durability by automatically replicating data across multiple Availability Zones (AZs) in an AWS Region. This makes it the ideal choice for the described web application workload.

Exam trap

The trap here is that candidates often confuse Amazon S3's high durability and availability with database capabilities, overlooking that S3 is an object store with higher latency and no native query support, while DynamoDB is purpose-built for low-latency, high-throughput document storage with ACID transactions via DynamoDB Transactions.

How to eliminate wrong answers

Option A is wrong because Amazon RDS for PostgreSQL is a relational database that stores data in tables with a fixed schema, not as JSON documents natively, and while it can be deployed in a Multi-AZ configuration for high availability, it does not provide the same level of automatic, seamless scaling and native JSON document support as DynamoDB. Option B is wrong because Amazon S3 is an object storage service, not a database; it can store JSON files but is not designed for frequent, low-latency read/write operations from a web application, and it lacks features like atomic transactions and query capabilities that a database provides. Option D is wrong because Amazon ElastiCache for Redis is an in-memory cache, not a durable database; while it can store JSON documents using the RedisJSON module, data is primarily stored in memory and is not durable by default across AZs, making it unsuitable for the durability and persistence requirements of a primary data store.

Practice this question →

414

MCQeasy

A company runs a MySQL database on Amazon RDS. The database size is 500 GB and is experiencing high read traffic. The team wants to improve read performance with minimal operational overhead. Which action should they take?

A.Create a read replica in the same region

B.Enable Multi-AZ deployment

C.Implement Amazon ElastiCache for caching

D.Upgrade to a larger instance class

AnswerA

Read replicas offload read traffic with minimal operational overhead.

Why this answer

Option A is correct because adding a read replica offloads read traffic from the primary with minimal overhead. Option B (multi-AZ) is for high availability, not read performance. Option C (larger instance) increases cost but does not specifically address read workload.

Option D (ElastiCache) adds caching but requires application changes.

Practice this question →

415

MCQmedium

A data engineer notices that an Amazon Redshift cluster is running low on disk space. The cluster has three nodes of type dc2.large. Which action will increase the available storage capacity?

A.Increase the number of nodes in the cluster.

B.Mount an Amazon S3 bucket as a file system to store data.

C.Change the volume type to Provisioned IOPS SSD (io1) to increase capacity.

D.Enable automatic compression on the tables.

AnswerA

Adding nodes increases total storage capacity.

Why this answer

Option C is correct because increasing the number of nodes adds more storage. Option A is incorrect because resizing to a larger node type with more storage also works, but the question asks for increasing storage, and that is one way; however, adding nodes is more direct for storage. Actually both A and C could work, but A is about volume type change, which is not available for Redshift.

Wait, Redshift does not have provisioned IOPS. So A is invalid. Option B is incorrect because S3 is not directly attachable.

Option D is incorrect because compression reduces data size but does not add capacity.

Practice this question →

416

MCQmedium

A data engineer is designing a data lake on Amazon S3. Data is ingested from multiple sources in JSON format. The engineer needs to optimize query performance for Amazon Athena while minimizing storage costs. Which storage strategy should the engineer use?

A.Store data as CSV files in a single S3 bucket without prefixes.

B.Convert data to Parquet format and partition by date.

C.Store data as JSON files in a single prefix without partitioning.

D.Store compressed JSON files in Amazon S3 Glacier.

AnswerB

Parquet is columnar and compressed; partitioning improves query performance.

Why this answer

Parquet is a columnar format that reduces storage size and improves query performance in Athena. Partitioning by date further optimizes queries that filter by date. Option B is correct.

Option A: storing as raw JSON with no partitioning leads to higher costs and slower queries. Option C: using Glacier for hot data adds retrieval latency and is not suitable for frequent queries. Option D: storing in a single bucket with no structure causes full scans.

Practice this question →

417

Multi-Selecteasy

Which TWO AWS services can be used to automatically back up an Amazon RDS for SQL Server DB instance? (Choose TWO.)

Select 2 answers

A.AWS Database Migration Service (DMS)

B.AWS Data Pipeline

C.Amazon RDS automated backups

D.Amazon S3

E.AWS Backup

AnswersC, E

RDS provides automated backups by default.

Why this answer

RDS automated backups are enabled by default and retain backups for up to 35 days. AWS Backup is a centralized backup service that can manage RDS backups with custom policies and retention. Options A and C are correct.

Option B: DMS is for migration, not backup. Option D: S3 is storage, not automatic backup service. Option E: Data Pipeline can orchestrate backups but is not the primary automatic backup service.

Practice this question →

418

MCQhard

A data engineering team is designing a data lake on Amazon S3. They need to store raw data in a format that supports schema evolution and is optimized for analytics with Amazon Athena. Which storage format should they use?

A.Parquet

B.CSV

C.Avro

AnswerA

Parquet is columnar, supports schema evolution, and is optimized for Athena.

Why this answer

Option B is correct because Parquet is a columnar format that supports schema evolution and is optimized for Athena. Option A (CSV) does not support schema evolution and is less efficient. Option C (Avro) is row-based and not as efficient for columnar queries.

Option D (JSON) is text-based and not optimized.

Practice this question →

419

MCQeasy

A company is using Amazon S3 to store critical data and needs to ensure that objects are automatically transitioned to S3 Glacier Deep Archive after 180 days to reduce costs. Which S3 lifecycle action should be configured?

A.Expiration

B.Transition

C.AbortIncompleteMultipartUpload

D.NoncurrentVersionTransition

AnswerB

Transition moves objects to another storage class based on age.

Why this answer

Option B is correct because the S3 lifecycle 'Transition' action is specifically designed to move objects between storage classes after a specified number of days. To reduce costs by moving objects to S3 Glacier Deep Archive after 180 days, you configure a lifecycle rule with a Transition action that targets the 'DEEP_ARCHIVE' storage class at the 180-day mark.

Exam trap

The trap here is that candidates often confuse 'Expiration' (deletion) with 'Transition' (storage class change), or incorrectly apply 'NoncurrentVersionTransition' when the question does not mention versioning or noncurrent versions.

How to eliminate wrong answers

Option A is wrong because 'Expiration' is used to permanently delete objects after a set period, not to transition them to a different storage class. Option C is wrong because 'AbortIncompleteMultipartUpload' is used to clean up incomplete multipart uploads after a specified number of days, not to transition objects between storage classes. Option D is wrong because 'NoncurrentVersionTransition' applies only to noncurrent versions of versioned objects, not to current versions, and the question does not specify versioning or noncurrent versions.

Practice this question →

420

MCQeasy

A startup is building a ride-sharing application that uses Amazon DynamoDB to store trip data. The table has a partition key of 'trip_id' and a sort key of 'status'. The application writes a new item when a trip starts and updates the status when the trip ends. The development team is experiencing high write latency during peak hours. The table is provisioned with 5,000 write capacity units (WCU) and 5,000 read capacity units (RCU). CloudWatch metrics show that WriteThrottleEvents are occurring frequently, but the consumed write capacity is never above 4,000 WCU. The team suspects that the issue is due to hot partitions. How should the data engineer resolve this issue?

A.Modify the application to add a random suffix to the partition key when writing items.

B.Enable DynamoDB Accelerator (DAX) to cache write operations.

C.Decrease the provisioned RCU to 2,000 to reduce costs.

D.Increase the provisioned WCU to 10,000 to handle the spikes.

AnswerA

Adding random suffix distributes writes across multiple partitions, reducing hot spots.

Why this answer

Option C is correct because using a random prefix for the partition key distributes writes across partitions, avoiding hot spots. Option A is incorrect because increasing WCU does not solve the hot partition issue; throttling happens at the partition level. Option B is incorrect because DAX is a cache for reads, not writes.

Option D is incorrect because decreasing RCU is unrelated to write throttling.

Practice this question →

421

MCQmedium

A company uses Amazon Redshift for data warehousing. The data engineer notices that query performance has degraded over time. The tables are frequently updated with new data, and the data engineer suspects that the distribution style is causing data skew. Which distribution style should the data engineer use to minimize data skew?

A.KEY distribution on a column with high cardinality

B.AUTO distribution

C.ALL distribution

D.EVEN distribution

AnswerD

Distributes rows evenly, ideal for preventing skew.

Why this answer

Option A is correct because EVEN distribution distributes rows across slices evenly, minimizing skew when no good distribution key exists. Option B is wrong because KEY distribution can cause skew if the key is not unique. Option C is wrong because ALL distribution duplicates the table, which is not efficient for large tables.

Option D is wrong because AUTO lets Redshift choose, but may not minimize skew if the key is poorly chosen.

Practice this question →

422

MCQeasy

A company needs to store archival logs that must be retained for 10 years. The logs are accessed infrequently, but when accessed, retrieval must occur within 12 hours. Which storage class is MOST cost-effective?

A.Amazon S3 Glacier Deep Archive

B.Amazon S3 Intelligent-Tiering

C.Amazon S3 Standard

D.Amazon S3 One Zone-Infrequent Access

AnswerA

Glacier Deep Archive provides the lowest storage cost with retrieval times up to 12 hours.

Why this answer

Option D is correct because S3 Glacier Deep Archive is the lowest-cost storage class with retrieval times within 12 hours. Option A is wrong because S3 Standard is expensive for long-term archival. Option B is wrong because S3 Intelligent-Tiering is designed for unknown access patterns, not pure archival.

Option C is wrong because S3 One Zone-IA is less durable and not cost-effective for 10-year retention due to frequent access fees if retrieved.

Practice this question →

423

MCQmedium

A data engineer is migrating an on-premises Apache Hive data warehouse to Amazon EMR. The warehouse contains partitioned tables stored in HDFS. The engineer wants to use Amazon S3 as the storage layer for the EMR cluster. What is the MOST important consideration for maintaining query performance on S3?

A.Ensure that the table partitions are organized in a way that minimizes S3 LIST requests

B.Configure EMR to use HDFS for storage instead of S3 for better performance

C.Use DynamoDB as the Hive metastore to improve metadata access

D.Use Amazon Redshift Spectrum to query the data directly from S3

AnswerA

S3 LIST operations are slower than HDFS; partitioning by common query filters and using partition projection can improve performance.

Why this answer

When using Amazon S3 as the storage layer for an EMR cluster, the most critical factor for query performance is minimizing S3 LIST requests. S3 LIST operations are significantly slower and more expensive than GET requests, and Hive/Spark queries on partitioned tables often issue LIST requests to discover partition locations. By organizing partitions with a common prefix (e.g., `year=2023/month=01/day=15/`) and using partition pruning, you reduce the number of LIST calls, directly improving query latency and reducing S3 API costs.

Exam trap

The trap here is that candidates may focus on metastore performance (Option C) or alternative query engines (Option D), missing the fundamental S3 performance bottleneck of LIST requests when querying partitioned data on EMR.

How to eliminate wrong answers

Option B is wrong because using HDFS for storage would negate the benefits of S3 (durability, scalability, cost) and is not the 'most important consideration' for maintaining query performance on S3; EMR can use S3 with optimizations like EMRFS and consistent view. Option C is wrong because DynamoDB is used as a Hive metastore for high availability and scalability, not to improve metadata access performance for S3 queries; the metastore choice does not directly address S3 LIST request overhead. Option D is wrong because Redshift Spectrum is a separate service for querying S3 data with Redshift, not an EMR optimization; it does not address the core performance issue of S3 LIST requests in an EMR context.

Practice this question →

424

MCQmedium

A data engineer needs to store clickstream data from a web application in Amazon S3. Each event is about 5 KB, and the application generates 1 million events per hour. The data is used for real-time analytics and also for batch processing. The engineer wants to minimize storage costs while ensuring that data is available for real-time queries as soon as it is written. Which storage class should the engineer use for the S3 bucket?

A.S3 Standard.

B.S3 Intelligent-Tiering.

C.S3 Standard-IA.

D.S3 Glacier Instant Retrieval.

AnswerA

Standard offers the best performance for frequently accessed data and no retrieval fees.

Why this answer

Option C is correct because S3 Standard offers low latency and high throughput for frequently accessed data, and it is suitable for real-time analytics. Option A is incorrect because S3 Intelligent-Tiering has a monitoring cost and is optimal for unknown access patterns, but for a known frequent access pattern, Standard is cheaper. Option B is incorrect because S3 Standard-IA has a retrieval fee and is not ideal for frequent access.

Option D is incorrect because S3 Glacier Instant Retrieval is for long-lived, rarely accessed data requiring millisecond retrieval, but it has a higher cost per GB than Standard for frequent access.

Practice this question →

425

Multi-Selectmedium

A data engineer is designing a disaster recovery plan for an Amazon RDS for MySQL database. The database must be recoverable within 1 hour in a different AWS Region. Which TWO actions should the engineer take?

Select 2 answers

A.Create a cross-Region read replica.

B.Enable Multi-AZ deployment.

C.Enable automated backups with cross-Region copy.

D.Take manual snapshots and copy them to an S3 bucket in the other Region.

E.Use Amazon EventBridge to schedule snapshot copies.

AnswersA, C

A read replica can be promoted to a primary in another Region for disaster recovery.

Why this answer

A cross-Region read replica for Amazon RDS for MySQL provides a fully provisioned secondary database in a different AWS Region that can be promoted to a standalone primary in minutes, meeting the 1-hour recovery time objective (RTO). This approach ensures continuous replication from the source database, minimizing data loss and enabling rapid failover without manual snapshot management.

Exam trap

The trap here is that candidates often confuse Multi-AZ (high availability within a Region) with cross-Region disaster recovery, or they assume that scheduling snapshot copies via EventBridge is sufficient for fast recovery, ignoring the significant restore time required for snapshots.

Practice this question →

426

Multi-Selecthard

A data engineer is using Amazon Athena to query data stored in an S3 bucket. The queries are running slowly. Which THREE actions can improve query performance?

Select 3 answers

A.Partition the data on commonly filtered columns.

B.Convert the data to JSON format for better schema evolution.

C.Move the data to S3 Standard-IA storage class.

D.Convert the data to a columnar format such as Parquet or ORC.

E.Use compression (e.g., Snappy, Gzip) on the data files.

AnswersA, D, E

Partition pruning reduces amount of data scanned.

Why this answer

Options A, B, and C are correct. Partitioning reduces data scanned. Columnar formats like Parquet improve compression and query speed.

Compression reduces data size. Option D is incorrect because S3 Standard-IA may have higher retrieval costs, not performance. Option E is incorrect because converting to JSON (text) would increase data size and slow queries.

Practice this question →

427

MCQeasy

A data engineer is designing a data lake on Amazon S3. Which feature should be used to manage the lifecycle of objects and move them to cheaper storage classes automatically?

A.S3 Lifecycle policies

B.S3 Object Lock

C.S3 Storage Class Analysis

D.S3 Inventory

AnswerA

Automatically transitions objects to cheaper storage.

Why this answer

S3 Lifecycle policies automate transitioning objects between storage classes and can also expire objects. S3 Inventory is for reporting, S3 Analytics for analysis, and Object Lock for compliance.

Practice this question →

428

MCQmedium

Refer to the exhibit. A data engineer needs to connect to the Redshift cluster from an EC2 instance in the same VPC. The engineer can ping the EC2 instance but cannot connect to Redshift using the endpoint address and port 5439. What is the most likely cause?

A.The security group for the Redshift cluster does not allow inbound traffic on port 5439 from the EC2 instance.

B.The Redshift cluster is in a different VPC.

C.The Redshift cluster is not in an available state.

D.The Redshift cluster is publicly accessible and requires an internet gateway.

AnswerA

Security group rules must permit the connection.

Why this answer

The most likely cause is that the security group associated with the Redshift cluster does not have an inbound rule allowing TCP traffic on port 5439 from the security group or IP address of the EC2 instance. Since the engineer can ping the EC2 instance (ICMP works), but cannot connect to Redshift on port 5439, this points to a firewall or security group rule blocking the specific port, not a network reachability issue.

Exam trap

AWS often tests the distinction between ICMP reachability (ping) and TCP port-level connectivity, leading candidates to overlook security group rules when they see successful ping results.

How to eliminate wrong answers

Option B is wrong because if the Redshift cluster were in a different VPC, the engineer would not be able to ping the EC2 instance from the same VPC context, and VPC peering or transit gateway would be required; the question states they are in the same VPC. Option C is wrong because if the cluster were not in an available state, the engineer would likely receive a different error (e.g., 'cluster not found' or connection timeout), and the question does not indicate any cluster status issues. Option D is wrong because the Redshift cluster is in the same VPC as the EC2 instance, so public accessibility and an internet gateway are not required; traffic stays within the VPC and uses private IPs.

Practice this question →

429

Multi-Selecteasy

A company is evaluating Amazon DynamoDB for a new application. The application requires single-digit millisecond latency for read and write operations. Which TWO DynamoDB features should the company enable to achieve this? (Choose TWO.)

Select 2 answers

A.Use DAX with Write-Through caching.

B.Enable DynamoDB Global Tables.

C.Enable DynamoDB Streams.

D.Enable DynamoDB Accelerator (DAX).

E.Enable auto-scaling for read and write capacity.

AnswersA, D

Why this answer

Option A is correct because DAX with Write-Through caching ensures that every write to DynamoDB is also written to the DAX cache, so subsequent reads of the same item are served from the in-memory cache with single-digit millisecond latency. Option D is correct because DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache that reduces read response times from single-digit milliseconds to microseconds, directly meeting the latency requirement for read operations.

Exam trap

The trap here is that candidates often confuse DynamoDB Accelerator (DAX) with DynamoDB Global Tables, assuming that multi-region replication improves local latency, when in fact DAX is the only feature that provides an in-memory cache for single-digit millisecond reads within a region.

Practice this question →

430

Multi-Selectmedium

Which TWO actions can improve query performance on an Amazon Redshift cluster? (Choose two.)

Select 2 answers

A.Define appropriate sort keys

B.Increase the number of nodes

C.Use EVEN distribution style for all tables

D.Use columnar compression

E.Run VACUUM command regularly

AnswersA, D

Sort keys reduce the amount of data scanned.

Why this answer

Options A and B are correct because sort keys improve data organization and compression reduces I/O. Option C (distribution style) affects data distribution but not directly query performance. Option D (increasing number of nodes) adds cost.

Option E (vacuum) reclaims space but does not directly improve query speed.

Practice this question →

431

Multi-Selecthard

A data engineer is designing a data lake on Amazon S3 for analytics. The data includes sensitive PII that must be encrypted at rest. The company requires that the encryption keys be managed by the company's own hardware security module (HSM) and rotated every 90 days. Which TWO options meet these requirements? (Choose TWO.)

Select 2 answers

A.Use S3 server-side encryption with AWS KMS and an AWS managed key

B.Use S3 server-side encryption with customer-provided keys (SSE-C)

C.Use client-side encryption with keys stored in AWS Secrets Manager

D.Use S3 server-side encryption with AWS KMS (SSE-KMS) and a customer-managed key with imported key material from your HSM

E.Use S3 server-side encryption with S3 managed keys (SSE-S3)

AnswersB, D

SSE-C allows you to supply your own encryption keys, which you can rotate by re-encrypting objects.

Why this answer

Option B is correct because SSE-C allows you to provide your own encryption keys, which can be managed and rotated from your own HSM. The keys are used server-side by S3 to encrypt objects at rest, but S3 does not store the keys—you manage them entirely, meeting the requirement for key management on your own HSM with 90-day rotation.

Exam trap

The trap here is that candidates often assume only SSE-KMS can meet key management requirements, but they overlook that SSE-C directly supports customer-supplied keys from an HSM without any AWS key storage, and that SSE-KMS with imported key material also satisfies the HSM and rotation needs when properly configured.

Practice this question →

432

MCQeasy

A data engineer needs to store semi-structured JSON log files from multiple sources and query them using SQL. The data is rarely updated and access frequency is low. Which storage solution is MOST cost-effective?

A.Amazon Redshift with JSON ingestion and compression.

B.Amazon DynamoDB with JSON documents.

C.Amazon S3 with Amazon Athena for querying.

D.Amazon RDS for PostgreSQL with JSONB columns.

AnswerC

S3 provides cheap storage and Athena allows serverless SQL queries, ideal for low-frequency access.

Why this answer

Amazon S3 with Athena is the most cost-effective solution because the data is semi-structured JSON, rarely updated, and accessed infrequently. S3 provides low-cost storage for static data, and Athena uses a serverless, pay-per-query model, eliminating the need for a running cluster or provisioned capacity. This combination avoids the fixed costs of Redshift, DynamoDB, or RDS, making it ideal for low-frequency SQL querying of archival logs.

Exam trap

The trap here is that candidates often choose Redshift or RDS because they associate SQL querying with traditional databases, overlooking that Athena's serverless, pay-per-query model is far more cost-effective for infrequent access to static data stored in S3.

How to eliminate wrong answers

Option A is wrong because Amazon Redshift requires a provisioned cluster with ongoing compute costs, making it overkill and expensive for rarely accessed data; its JSON ingestion and compression do not offset the fixed infrastructure cost. Option B is wrong because Amazon DynamoDB is a NoSQL key-value store optimized for high-frequency, low-latency reads/writes, not for SQL-based ad-hoc querying of large JSON logs; its on-demand capacity mode still incurs per-request charges that are wasteful for infrequent access. Option D is wrong because Amazon RDS for PostgreSQL with JSONB columns requires a provisioned database instance with continuous compute and storage costs, and while JSONB supports indexing, it is not cost-effective for rarely queried, static log data compared to S3's pay-per-byte storage and Athena's pay-per-query model.

Practice this question →

433

MCQeasy

A company is migrating its on-premises MySQL database to Amazon RDS for MySQL. They want to minimize downtime and ensure data consistency. Which AWS service should be used for the migration?

A.AWS S3 Transfer Acceleration

B.AWS Glue

C.AWS Database Migration Service (DMS)

D.AWS Snowball Edge

AnswerC

DMS supports continuous replication and minimal downtime for database migrations.

Why this answer

AWS Database Migration Service (DMS) is the correct choice because it is specifically designed for migrating databases to AWS with minimal downtime. DMS supports homogeneous migrations like MySQL to Amazon RDS for MySQL, and it uses ongoing replication (change data capture) to keep the source and target databases in sync during the migration, ensuring data consistency and allowing a cutover with only seconds of downtime.

Exam trap

The trap here is that candidates may confuse AWS DMS with AWS Glue, thinking both are for data migration, but Glue is for batch ETL and cannot perform live database replication with minimal downtime, while DMS is purpose-built for that task.

How to eliminate wrong answers

Option A is wrong because AWS S3 Transfer Acceleration is a service that speeds up uploads to Amazon S3 by using optimized network paths and edge locations; it has no capability to migrate or replicate a live MySQL database to RDS. Option B is wrong because AWS Glue is a serverless data integration service for ETL (extract, transform, load) jobs, primarily used for preparing and transforming data for analytics, not for ongoing database replication or minimizing downtime during a live database migration. Option D is wrong because AWS Snowball Edge is a physical data transport device used for large-scale data transfers over slow or unreliable networks, but it is not suitable for minimizing downtime in a live database migration as it involves shipping hardware and cannot perform continuous replication.

Practice this question →

434

MCQmedium

A data engineer reviews this IAM policy attached to an S3 bucket. What is the effect of this policy?

A.Denies PutObject when encryption is not SSE-KMS.

B.Denies all PutObject requests.

C.Allows PutObject only when the object is encrypted with SSE-KMS.

D.Allows PutObject only when the object is NOT encrypted with SSE-KMS.

AnswerD

The condition StringNotEquals allows only if encryption is not KMS.

Why this answer

Option D is correct because the IAM policy uses a `Deny` effect with a `StringNotEquals` condition on `s3:x-amz-server-side-encryption` for `aws:kms`. This means that if the encryption header is NOT `aws:kms` (i.e., SSE-KMS), the request is denied. The net effect is that `PutObject` is allowed only when the object is encrypted with SSE-KMS, as any other encryption or no encryption triggers the deny.

Exam trap

The trap here is that candidates misread `StringNotEquals` as `StringEquals`, thinking the policy denies SSE-KMS encryption, when in fact it denies everything except SSE-KMS.

How to eliminate wrong answers

Option A is wrong because the policy denies `PutObject` when encryption is NOT SSE-KMS, not when it is SSE-KMS; the condition `StringNotEquals` denies non-matching values. Option B is wrong because the policy does not deny all `PutObject` requests; it only denies those that do not meet the encryption condition, allowing those with SSE-KMS. Option C is wrong because the policy uses a `Deny` effect, not an `Allow`; it does not explicitly allow `PutObject` with SSE-KMS, but rather denies everything else, making SSE-KMS the only permitted case.

Practice this question →

435

MCQmedium

A company uses Amazon S3 to store sensitive documents. The security team has mandated that all objects must be encrypted at rest using server-side encryption with AWS KMS (SSE-KMS). Additionally, the company wants to ensure that any attempt to upload an unencrypted object is denied. A data engineer has configured a bucket policy that denies PutObject if the encryption header does not include x-amz-server-side-encryption: aws:kms. However, the engineer notices that some objects are still being stored without encryption. Upon investigation, the engineer suspects that the policy is not being evaluated correctly. What should the engineer do to ensure that all objects are encrypted with SSE-KMS?

A.Use an IAM policy to require encryption instead of a bucket policy.

B.Enable S3 Block Public Access settings.

C.Add a condition to the bucket policy that checks for aws:SourceVpce.

D.Enable default encryption on the S3 bucket with SSE-KMS.

AnswerD

Default encryption ensures all objects are encrypted, complementing the bucket policy.

Why this answer

Option D is correct because enabling default encryption on the S3 bucket with SSE-KMS ensures that any object uploaded without an explicit encryption header is automatically encrypted with SSE-KMS. This closes the gap where the bucket policy condition fails to catch uploads that omit the `x-amz-server-side-encryption` header entirely, as the policy only denies requests with an incorrect header but does not block requests that lack the header altogether. Default encryption applies server-side encryption at the bucket level, making it impossible to store an unencrypted object.

Exam trap

The trap here is that candidates assume a bucket policy condition denying PutObject without `x-amz-server-side-encryption: aws:kms` will block all unencrypted uploads, but they overlook that the condition only matches when the header is present with a wrong value, not when the header is absent entirely.

How to eliminate wrong answers

Option A is wrong because IAM policies cannot enforce encryption requirements on S3 PutObject operations as effectively as bucket policies; IAM policies lack the ability to condition on S3-specific headers like `x-amz-server-side-encryption`, and they apply to users/roles rather than the bucket itself, leaving gaps for anonymous or cross-account uploads. Option B is wrong because S3 Block Public Access settings only prevent public access to objects and buckets, not encryption enforcement; they have no effect on whether objects are encrypted at rest. Option C is wrong because checking for `aws:SourceVpce` restricts access based on VPC endpoint origin, which is unrelated to encryption requirements and would not prevent unencrypted uploads from other sources.

Practice this question →

436

Multi-Selectmedium

Which TWO actions can help improve the read performance of an Amazon DynamoDB table that is experiencing throttling? (Choose two.)

Select 2 answers

A.Enable DynamoDB Accelerator (DAX) for write-heavy workloads.

B.Increase the provisioned read capacity units (RCUs) for the table.

C.Use eventually consistent reads instead of strongly consistent reads.

D.Add a global secondary index (GSI) with a different partition key.

E.Enable auto-scaling with a lower minimum capacity.

AnswersB, D

More RCUs allow higher throughput.

Why this answer

Options A and D are correct. Adding a GSI distributes read load, and increasing read capacity directly addresses throttling. Option B is for writes.

Option C reduces latency but not throttling. Option E is for cost reduction.

Practice this question →

437

MCQeasy

A data engineer runs the above SQL commands on an Amazon Redshift cluster. The table 'users' is created with DISTSTYLE EVEN. What is the effect of the DISTSTYLE EVEN on query performance?

A.It stores all data on a single node for fast local queries.

B.It ensures data is evenly distributed across all nodes to prevent data skew.

C.It reduces data movement during queries by co-locating data based on user_id.

D.It improves join performance when joining on user_id.

AnswerB

EVEN distribution spreads rows evenly, avoiding skew.

Why this answer

Option B is correct because DISTSTYLE EVEN distributes rows evenly across nodes, which is beneficial when there is no clear join or aggregation key. Option A is incorrect because it does not optimize for joins on user_id. Option C is incorrect because EVEN distribution does not minimize data movement for joins.

Option D is incorrect because EVEN distribution distributes data, not co-locates.

Practice this question →

438

MCQhard

A data engineer is troubleshooting an Amazon Redshift cluster that is experiencing slow query performance. The engineer notices that the disk space is heavily utilized and queries are spilling to disk. What is the most cost-effective solution to improve performance?

A.Run VACUUM command to reclaim space

B.Change distribution style to KEY

C.Resize the cluster to a larger node type or add nodes

D.Apply compression encoding to tables

AnswerC

Adding memory and disk reduces spilling to disk.

Why this answer

When queries spill to disk due to heavy disk utilization, the root cause is insufficient memory or compute capacity relative to the workload. Resizing the cluster (adding nodes or moving to a larger node type) directly increases available memory and CPU, reducing or eliminating disk spill and improving query performance. This is the most cost-effective solution because it scales resources proportionally without requiring manual tuning or schema changes.

Exam trap

The trap here is that candidates confuse disk space management (VACUUM, compression) with memory/query execution issues, leading them to choose storage optimization options when the real bottleneck is insufficient compute resources.

How to eliminate wrong answers

Option A is wrong because VACUUM reclaims space from deleted rows but does not increase memory or reduce disk spill; it only reorganizes existing data. Option B is wrong because changing distribution style (e.g., to KEY) optimizes data redistribution for joins but does not address insufficient memory or disk spill. Option D is wrong because applying compression encoding reduces storage footprint and I/O, but does not increase memory or compute capacity to prevent queries from spilling to disk.

Practice this question →

439

Multi-Selectmedium

A company uses AWS Glue to catalog data stored in Amazon S3. The data is in Parquet format and partitioned by date. The company wants to improve query performance in Amazon Athena and reduce costs. Which THREE actions should the company take? (Choose THREE.)

Select 3 answers

A.Convert the data to JSON format for better schema evolution.

B.Use Glue DataBrew to clean the data before querying.

C.Partition the data by date so Athena can use partition pruning.

D.Ensure the data is in a columnar format like Parquet or ORC.

E.Compress the data using a codec like Snappy or Gzip.

AnswersC, D, E

Partition pruning limits the amount of data scanned per query.

Why this answer

Using columnar formats like Parquet improves performance and reduces scanned data. Partitioning by date allows Athena to prune partitions. Compressing data reduces storage and scan costs.

Converting to JSON would hurt performance. Increasing partition count may help but is not a guarantee. Using Glue DataBrew is for data preparation, not performance optimization.

Practice this question →

440

MCQmedium

A company uses Amazon Redshift for data warehousing. The data team notices that queries are slow due to high disk usage on the cluster. They need to free up space without deleting any data. What should they do?

A.Change the table's sort keys

B.Run a deep copy to re-sort and reclaim space

C.Run VACUUM command

D.Add more nodes to the cluster

AnswerB

Deep copy reorganizes data and reclaims disk space effectively.

Why this answer

Option B is correct because deep copy reorganizes data and reclaims disk space. Option A (VACUUM) reclaims space from deleted rows but does not reorganize. Option C (increasing node count) adds cost.

Option D (changing sort keys) requires a table redesign.

Practice this question →

441

MCQeasy

A company wants to enforce that all data in an S3 bucket is encrypted at rest using AWS KMS. Which bucket policy condition key should be used?

A.s3:x-amz-acl with value bucket-owner-full-control

B.s3:x-amz-server-side-encryption with value aws:kms

C.s3:x-amz-server-side-encryption with value AES256

D.aws:SourceIp with value 10.0.0.0/8

AnswerB

Enforces SSE-KMS.

Why this answer

Condition key 's3:x-amz-server-side-encryption' with value 'aws:kms' enforces KMS encryption. 's3:x-amz-server-side-encryption-aws-kms-key-id' is for specific key ID.

Practice this question →

442

MCQmedium

A company is using Amazon RDS for MySQL with Multi-AZ deployment. The database experiences intermittent slowdowns during peak hours. The company's DevOps team suspects that the primary instance is overwhelmed. Which action should the team take to distribute the read load without changing the application code?

A.Increase the instance size of the RDS instance.

B.Create a read replica and modify the connection string to point to the replica for read queries.

C.Enable Multi-AZ on the existing instance.

D.Configure DynamoDB Accelerator (DAX) in front of the RDS instance.

AnswerB

Read replicas offload read traffic from the primary instance.

Why this answer

Creating a read replica and modifying the connection string to point to the replica for read queries (Option B) offloads read traffic from the primary RDS instance without requiring application code changes. This directly addresses the intermittent slowdowns during peak hours by distributing the read load, leveraging MySQL’s native replication to keep the replica synchronized. The key constraint is 'without changing the application code,' which is satisfied by simply updating the connection string in the application configuration.

Exam trap

The trap here is that candidates confuse Multi-AZ with read replicas, assuming Multi-AZ can distribute read traffic, but in RDS for MySQL, the standby in a Multi-AZ deployment is not accessible for reads—it only provides failover support.

How to eliminate wrong answers

Option A is wrong because increasing the instance size scales vertically, which does not distribute the read load; it only provides more resources to a single instance, which may still be overwhelmed during peak hours and does not leverage Multi-AZ or read replicas. Option C is wrong because Multi-AZ is already enabled (as stated in the question) and its purpose is high availability and failover, not read load distribution; the standby instance in Multi-AZ cannot serve read traffic. Option D is wrong because DynamoDB Accelerator (DAX) is an in-memory cache for Amazon DynamoDB, not for Amazon RDS for MySQL; it cannot be placed in front of an RDS instance and would require significant application code changes to integrate.

Practice this question →

443

Matchingmedium

Match each AWS monitoring tool to its primary use.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Metrics, logs, and alarms

API call history and auditing

Trace and analyze distributed applications

Event-driven automation

Resource configuration tracking

Why these pairings

Monitoring tools help maintain data pipelines.

Practice this question →

444

Multi-Selectmedium

A data engineer is evaluating storage options for a new application that requires low-latency access to unstructured blobs (up to 5 TB each) with high throughput. The data will be accessed frequently for the first 30 days and then rarely. Which TWO storage solutions meet these requirements? (Choose TWO)

Select 2 answers

A.Amazon S3 with lifecycle policies

B.Amazon EBS with io2 Block Express volumes

C.Amazon EFS

D.Amazon RDS for PostgreSQL

E.Amazon FSx for Lustre

AnswersA, E

S3 can handle large objects and lifecycle policies automate transitions to cost-optimized storage.

Why this answer

Amazon S3 with lifecycle policies is correct because S3 provides low-latency access to unstructured blobs (up to 5 TB each) with high throughput, and lifecycle policies can automatically transition objects to colder storage tiers (e.g., S3 Glacier Deep Archive) after 30 days, matching the access pattern of frequent then rare access.

Exam trap

The trap here is that candidates may confuse block storage (EBS) or file storage (EFS) with object storage (S3), or overlook that lifecycle policies are the key to handling the 'frequent then rare' access pattern, leading them to choose EBS or EFS for blob storage.

Practice this question →

445

MCQmedium

A company has an S3 bucket with millions of objects. The data engineer needs to identify which objects are not accessed for 90 days to move them to a lower-cost storage class. Which feature should be used?

A.S3 Storage Class Analysis

B.S3 Inventory

C.S3 Server Access Logs

D.S3 Event Notifications

AnswerA

It analyzes access patterns and provides recommendations for lifecycle transitions.

Why this answer

S3 Storage Class Analysis (SCA) is the correct feature because it monitors access patterns across objects and provides recommendations for transitioning data to lower-cost storage classes based on last-access dates. SCA can analyze objects that have not been accessed for 90 days and generate a report to inform lifecycle policy creation, directly addressing the requirement to identify objects for cost optimization.

Exam trap

The trap here is that candidates often confuse S3 Inventory (which lists objects) with S3 Storage Class Analysis (which analyzes access patterns), assuming that a list of objects is sufficient to determine access frequency, but Inventory lacks the temporal access data needed for this task.

How to eliminate wrong answers

Option B (S3 Inventory) is wrong because it provides a flat list of all objects and their metadata (e.g., size, storage class) but does not track access patterns or last-accessed timestamps, so it cannot identify objects unused for 90 days. Option C (S3 Server Access Logs) is wrong because it records detailed request-level logs (e.g., requester, operation, timestamp) but requires custom parsing and aggregation to derive last-access dates, and it does not natively provide a summary of objects not accessed for a specific period. Option D (S3 Event Notifications) is wrong because it triggers real-time events for object operations (e.g., PUT, POST, DELETE) but does not store historical access data or analyze access patterns over time, making it unsuitable for identifying long-unused objects.

Practice this question →

446

MCQeasy

A data engineer needs to store time-series data from IoT devices. The data is write-heavy and requires low-latency queries by device ID and timestamp. The data volume is expected to grow to terabytes. Which AWS database service is most suitable?

A.Amazon RDS for MySQL

B.Amazon ElastiCache for Redis

C.Amazon DynamoDB

D.Amazon Timestream

AnswerD

Timestream is designed for time-series data.

Why this answer

Amazon Timestream is purpose-built for time-series data, offering automatic tiered storage (in-memory for recent data and magnetic for historical) to handle write-heavy IoT workloads at scale. It supports low-latency queries by device ID and timestamp via its SQL-compatible query engine, making it the most suitable choice for terabytes of time-series data.

Exam trap

The trap here is that candidates often choose DynamoDB (Option C) because of its high write throughput and low-latency queries, but they overlook the lack of native time-series optimizations, leading to complex manual partitioning and TTL management that Timestream handles automatically.

How to eliminate wrong answers

Option A is wrong because Amazon RDS for MySQL is a relational database optimized for OLTP workloads with structured queries, not for the high-volume, write-heavy, time-series pattern that requires automatic data retention policies and time-based partitioning. Option B is wrong because Amazon ElastiCache for Redis is an in-memory cache designed for sub-millisecond read/write performance on hot data, but it cannot cost-effectively store terabytes of data and lacks native time-series query optimizations like downsampling and interpolation. Option C is wrong because Amazon DynamoDB is a key-value and document database that can handle high write throughput, but it does not have built-in time-series functions (e.g., time-based aggregation, retention policies) and requires manual partitioning and TTL management to handle time-series data efficiently at terabyte scale.

Practice this question →

447

MCQhard

A company has an Amazon RDS for MySQL database that is experiencing performance issues due to a large number of read requests. The application is read-heavy and can tolerate eventually consistent reads. Which action will reduce the load on the primary database with the least operational overhead?

A.Create a read replica in the same region

B.Use Amazon ElastiCache for caching

C.Enable Multi-AZ deployment

D.Increase the instance size of the primary DB

AnswerA

Offloads read traffic with minimal overhead.

Why this answer

Creating a read replica offloads read queries from the primary, reducing load. Multi-AZ is for high availability, not read scaling. Caching or increasing instance size may help but read replica is simplest for read-heavy workloads.

Practice this question →

448

MCQeasy

A data engineer needs to store semi-structured JSON data that is accessed infrequently but requires millisecond retrieval latency. The data is immutable once written. Which AWS service is most cost-effective?

A.Amazon DynamoDB with on-demand capacity

B.Amazon ElastiCache for Redis

C.Amazon RDS for PostgreSQL with JSONB

D.Amazon S3 (Standard-IA) with S3 Select

AnswerD

S3 Select can retrieve subsets of JSON data efficiently, and Standard-IA is cost-effective for infrequent access.

Why this answer

Option A is correct because S3 with S3 Select can retrieve specific JSON fields with low latency for infrequent access, and Standard-IA reduces cost. Option B (DynamoDB) is for frequent access patterns. Option C (RDS) is for relational data.

Option D (ElastiCache) is for caching, not durable storage.

Practice this question →

449

MCQhard

A company uses Amazon DynamoDB to store session data for a web application. The application experiences occasional spikes in traffic, causing throttling on the table. The data engineer needs to implement a solution that handles traffic spikes without manual intervention and minimizes cost. What should the data engineer do?

A.Switch to provisioned capacity with a high fixed read/write capacity.

B.Implement DynamoDB Accelerator (DAX) to cache read requests.

C.Purchase DynamoDB reserved capacity.

D.Enable DynamoDB Auto Scaling for the table.

AnswerD

Auto Scaling adjusts capacity automatically to handle spikes and minimize cost.

Why this answer

DynamoDB Auto Scaling automatically adjusts read and write capacity based on traffic patterns, handling spikes without manual intervention. Option A is wrong because provisioned capacity with fixed values would either underprovision (causing throttling) or overprovision (wasting cost). Option C is wrong because DynamoDB Accelerator (DAX) is a caching layer that reduces read load but doesn't handle write spikes and adds cost.

Option D is wrong because reserved capacity offers discounts but doesn't handle spikes automatically.

Practice this question →

450

Multi-Selecthard

A company is migrating a legacy data warehouse to Amazon Redshift. They need to choose a distribution style to minimize data movement during joins. Which THREE factors should they consider?

Select 3 answers

A.The size of the table (number of rows).

B.The join frequency with other tables on specific columns.

C.The number of columns in the table.

D.Whether the table is a fact or dimension table.

E.The data type of the distribution key column.

AnswersA, B, D

Large tables need careful distribution to avoid skew.

Why this answer

Option A is correct because the size of the table (number of rows) directly influences the distribution strategy. In Amazon Redshift, large tables benefit from a distribution style that evenly distributes rows across slices to avoid data skew, which can cause performance bottlenecks during joins. Choosing a distribution key that aligns with the join columns minimizes data movement, but the table size determines whether an ALL distribution (for small tables) or a KEY distribution (for large tables) is more appropriate to reduce shuffling.

Exam trap

The trap here is that candidates may overthink irrelevant table properties like column count or data types, while the core considerations for minimizing data movement are table size, join frequency, and table role (fact vs. dimension).

Practice this question →

← PreviousPage 6 of 7 · 456 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Data Store Management questions.

Start 20-question session