CCNA Data Store Management Questions

75 of 456 questions · Page 2/7 · Data Store Management · Answers revealed

76
Multi-Selectmedium

Which THREE of the following are valid storage classes in Amazon S3? (Choose THREE.)

Select 3 answers
A.S3 Standard
B.S3 Archive
C.S3 Intelligent-Tiering
D.S3 Cold
E.S3 One Zone-IA
AnswersA, C, E

S3 Standard is a general-purpose storage class.

Why this answer

S3 Standard is a valid storage class designed for frequently accessed data with low latency and high throughput. It offers 99.999999999% durability and 99.99% availability, making it suitable for a wide range of use cases like cloud applications, dynamic websites, and content distribution.

Exam trap

AWS often tests the distinction between valid S3 storage classes and fabricated names like 'S3 Archive' or 'S3 Cold', expecting candidates to recall the exact naming conventions (e.g., S3 Glacier, S3 Glacier Deep Archive) rather than generic terms.

77
Multi-Selectmedium

A data engineer is designing a disaster recovery strategy for an Amazon RDS for PostgreSQL database. The primary database is in us-east-1. Which TWO approaches provide cross-region disaster recovery?

Select 2 answers
A.Configure cross-region automated backups to copy to us-west-2.
B.Take a manual snapshot and copy it to us-west-2 daily.
C.Use Amazon S3 cross-region replication for the database export.
D.Enable Multi-AZ in us-east-1.
E.Create a cross-region read replica in us-west-2.
AnswersA, E

Backups are automatically copied and can be restored.

Why this answer

Cross-region read replica can be promoted to a primary in another region. Cross-region automated backups can be restored to a different region. Multi-AZ is within a region.

Snapshots are manual and region-specific unless copied.

78
MCQmedium

A data engineer is migrating an on-premises Apache HBase workload to Amazon DynamoDB. The application requires strongly consistent reads and the ability to query by a composite key (partition key + sort key). Which DynamoDB table design should be used?

A.Create a table with a partition key and sort key, and use ConsistentRead parameter.
B.Use a secondary index with strongly consistent reads.
C.Use a local secondary index (LSI) with the composite key.
D.Create a global secondary index (GSI) with the composite key.
AnswerA

Table key provides composite key querying; ConsistentRead ensures strong consistency.

Why this answer

DynamoDB natively supports strongly consistent reads when you use the `ConsistentRead` parameter set to `true` on GetItem, Query, or Scan operations. By defining a table with a partition key and sort key, you can directly query by the composite key (partition key + sort key) with strong consistency, meeting both requirements without additional infrastructure.

Exam trap

The trap here is that candidates assume secondary indexes (LSI or GSI) can provide strongly consistent reads, but DynamoDB explicitly restricts strong consistency to base table operations only.

How to eliminate wrong answers

Option B is wrong because secondary indexes (both LSI and GSI) in DynamoDB only support eventually consistent reads by default; strongly consistent reads are not supported on any secondary index. Option C is wrong because a local secondary index (LSI) does not replace the base table's composite key query capability; it provides an alternative sort key but still requires the base table for strongly consistent reads, and LSI itself cannot be read with strong consistency. Option D is wrong because a global secondary index (GSI) supports only eventually consistent reads and cannot be used for strongly consistent queries, regardless of the key schema.

79
MCQhard

A company is using Amazon ElastiCache for Redis to cache frequently accessed data. The cache hit ratio is low, and the engineering team suspects that the eviction policy is causing important data to be removed. Which eviction policy should be used to minimize eviction of the most frequently accessed keys?

A.allkeys-lru
B.allkeys-lfu
C.noeviction
D.volatile-lru
AnswerB

LFU evicts least frequently used keys, retaining popular ones.

Why this answer

Option D is correct because LFU (Least Frequently Used) eviction policy evicts keys that are accessed least frequently, thus preserving frequently accessed keys. Option A is wrong because allkeys-lru evicts the least recently used keys, which may remove frequently accessed keys if they are not accessed recently. Option B is wrong because volatile-lru only applies to keys with TTL.

Option C is wrong because noeviction will return errors when memory is full.

80
MCQmedium

A company uses Amazon S3 to store images that are accessed by a web application. The application generates presigned URLs for users to download images. Recently, the application has been experiencing errors when generating presigned URLs for objects that were uploaded using multipart upload. The errors indicate that the presigned URL does not work. The data engineer needs to ensure that presigned URLs work for all objects, including those uploaded via multipart upload. What should the data engineer do?

A.Use a different signing algorithm when generating the presigned URL.
B.Ensure that the IAM user or role used to generate the presigned URL has s3:GetObject permission for the object.
C.Enable S3 Versioning on the bucket.
D.Re-upload the objects using single-part upload instead of multipart upload.
AnswerB

Permissions are required to generate a valid presigned URL.

Why this answer

Multipart uploads may result in objects with ETags that are not simple MD5 hashes. Presigned URLs work regardless of how the object was uploaded, as long as the bucket policy does not restrict access. The issue is likely that the IAM user or role used to generate the presigned URL does not have s3:GetObject permission for the object, or the presigned URL is generated incorrectly for multipart uploads.

The correct action is to ensure the IAM policy grants s3:GetObject access and that the presigned URL generation uses the correct method (e.g., using AWS SDK). Option C is correct. Option A: enabling versioning does not affect presigned URL generation.

Option B: using a different signing algorithm is unnecessary; SigV4 is default. Option D: multipart upload does not affect presigned URLs.

81
Multi-Selecteasy

Which THREE actions can help improve read performance in Amazon DynamoDB? (Choose THREE.)

Select 3 answers
A.Use DynamoDB global tables to replicate data.
B.Use parallel scans to distribute read load across partitions.
C.Use strongly consistent reads for all queries.
D.Enable DynamoDB Accelerator (DAX) to cache reads.
E.Increase the read capacity units (RCU) for the table.
AnswersB, D, E

Parallel scans can improve scan performance.

Why this answer

Option A is correct: DAX provides in-memory caching for reads. Option C is correct: Using strongly consistent reads can improve performance for read-after-write scenarios, but eventually consistent reads are faster; however, the question asks for actions that help read performance; both consistent read types are supported; but note: strongly consistent reads may have higher latency. Actually, to improve performance, eventually consistent reads are faster.

Wait, the question says 'help improve read performance' – using eventually consistent reads is better. But option C says 'Use strongly consistent reads' which is incorrect for performance. Let me re-evaluate.

I think correct answers are A, D, E. Option B is wrong because global tables replicate writes, not improve read performance. Option C is wrong because strongly consistent reads are slower.

Option D is correct: adjusting read capacity units improves performance. Option E is correct: using parallel scans improves performance. So I will mark A, D, E as correct.

But the stem says 'Choose THREE', so I need exactly three. Let me set correct flags accordingly.

82
MCQhard

A company is using Amazon DynamoDB for an e-commerce application. The application experiences sudden spikes in traffic, causing throttling errors. The data engineer needs to handle the spikes cost-effectively. Which solution should be used?

A.Implement DynamoDB Accelerator (DAX) to cache reads.
B.Switch to DynamoDB on-demand capacity mode.
C.Use DynamoDB auto scaling with a target utilization of 70%.
D.Provision high read and write capacity units to handle peak traffic.
AnswerC

Auto scaling adjusts capacity dynamically based on traffic.

Why this answer

Option C is correct because DynamoDB auto scaling with a target utilization of 70% allows the table to dynamically adjust provisioned read/write capacity based on actual traffic patterns, handling sudden spikes without manual intervention while avoiding over-provisioning. This balances performance and cost by scaling up during spikes and scaling down during low traffic, preventing throttling errors cost-effectively.

Exam trap

The trap here is that candidates often confuse caching (DAX) with scaling, or assume on-demand mode is always the best for spikes without considering cost, when the question explicitly requires a cost-effective solution for sudden but intermittent traffic.

How to eliminate wrong answers

Option A is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that only improves read latency and reduces read throttling, but it does not address write throttling or handle sudden spikes in write traffic, which is the primary issue here. Option B is wrong because DynamoDB on-demand capacity mode automatically scales to handle spikes but is significantly more expensive for predictable or moderate workloads, making it less cost-effective than auto scaling for this scenario. Option D is wrong because provisioning high read and write capacity units to handle peak traffic leads to over-provisioning and wasted cost during normal or low traffic periods, as you pay for the provisioned capacity regardless of actual usage.

83
Multi-Selectmedium

A data engineer is designing a data store for a real-time analytics application that requires sub-millisecond read and write latency for time-series data. The data volume is expected to grow to hundreds of terabytes. Which TWO AWS services should the engineer consider? (Choose TWO.)

Select 2 answers
A.Amazon Redshift
B.Amazon DynamoDB with Time-to-Live (TTL)
C.Amazon ElastiCache for Redis
D.Amazon RDS for PostgreSQL
E.Amazon Timestream
AnswersB, E

DynamoDB supports low-latency reads/writes and TTL for automatic expiration of old data.

Why this answer

Options A and D are correct. Amazon Timestream is a time-series database with fast query performance; Amazon DynamoDB with TTL can handle time-series data, but for sub-millisecond latency, DynamoDB is suitable. Option B (Redshift) is for analytics, not sub-millisecond; Option C (RDS) is relational and slower for time-series; Option E (ElastiCache) is in-memory but limited by memory size for hundreds of terabytes.

84
MCQmedium

A company is running a production Amazon RDS for MySQL Multi-AZ DB instance. The database experiences a sudden spike in read requests, causing performance degradation. The company needs to improve read scalability with minimal application changes. Which solution should the data engineer recommend?

A.Implement DynamoDB Accelerator (DAX) in front of the database.
B.Enable Multi-AZ on the existing DB instance.
C.Increase the DB instance size to a larger instance class.
D.Create an Amazon RDS Read Replica and update the application to use it for read queries.
AnswerD

Read Replicas handle read traffic, offloading the primary instance and improving scalability.

Why this answer

Creating one or more Read Replicas distributes read traffic away from the primary DB instance, improving read scalability with minimal application changes (only need to update connection strings for read queries). Option A is wrong because Multi-AZ is for high availability, not read scaling. Option B is wrong because increasing instance class provides more resources but is a vertical scaling approach and may not be cost-effective.

Option D is wrong because DynamoDB Accelerator is for DynamoDB, not RDS.

85
Multi-Selecteasy

A company is designing a data lake on Amazon S3. The data includes CSV files, Parquet files, and images. The data engineering team needs to catalog the metadata and enable SQL queries. Which TWO AWS services should be used together?

Select 2 answers
A.Amazon EMR
B.Amazon Redshift Spectrum
C.Amazon QuickSight
D.Amazon Athena
E.AWS Glue
AnswersD, E

Athena can directly query data in S3 using the Glue Data Catalog.

Why this answer

Amazon Athena is correct because it is a serverless interactive query service that can directly query data stored in Amazon S3 using standard SQL, without needing to load or transform data. AWS Glue is correct because it provides a fully managed data catalog (AWS Glue Data Catalog) that stores metadata about the data lake's schema, partitions, and locations, which Athena can use to discover and query the data efficiently.

Exam trap

The trap here is that candidates often confuse Amazon Redshift Spectrum (which requires a Redshift cluster) with Athena (which is serverless), or they think Amazon EMR is needed for SQL queries on S3, not realizing Athena provides a simpler, cluster-free solution.

86
MCQmedium

An e-commerce company uses Amazon DynamoDB as the primary data store for its product catalog. The table has a simple primary key (ProductID) and handles 10,000 writes per second during peak hours. Recently, the engineering team noticed increased write latency and throttled requests during peak times. The table's provisioned write capacity is set to 12,000 WCU. What is the most likely cause of the throttling?

A.The table has reached the maximum number of partitions
B.DynamoDB Accelerator (DAX) is not configured
C.Write traffic is unevenly distributed across partitions
D.A global secondary index is consuming write capacity
AnswerC

Uneven distribution can cause some partitions to throttle even if total capacity is adequate.

Why this answer

Option C is correct because DynamoDB partitions data by the primary key's hash value. If write traffic is unevenly distributed across partitions (e.g., a few ProductIDs receive most writes), those hot partitions can exceed their individual throughput limits (3,000 WCU per partition for provisioned tables), causing throttling even when the table's total provisioned WCU of 12,000 is not fully utilized.

Exam trap

The trap here is that candidates assume throttling only occurs when total provisioned capacity is exceeded, overlooking the per-partition throughput limits that cause throttling on hot partitions even when the table's overall WCU is underutilized.

How to eliminate wrong answers

Option A is wrong because DynamoDB tables do not have a maximum number of partitions; partitions are automatically added or removed based on storage and throughput needs. Option B is wrong because DAX is an in-memory cache for reads, not writes; it does not affect write capacity or throttling. Option D is wrong because while a global secondary index (GSI) does consume write capacity from the table's WCU pool, the question states the table has 12,000 WCU provisioned, and throttling occurs during peak writes of 10,000 writes per second, so the GSI would only contribute to throttling if its own provisioned WCU were insufficient, but the scenario does not indicate that.

87
MCQmedium

A media company stores video metadata in Amazon RDS for PostgreSQL. The database is 500 GB and experiences high write traffic. The data engineer notices that the transaction log (WAL) is growing rapidly, causing storage issues. The company needs to retain backups for 30 days for compliance. The database is currently using automated backups with a retention period of 7 days. Which solution should the engineer implement to address the WAL growth while meeting compliance requirements?

A.Create manual snapshots daily and delete automated backups.
B.Change the instance type to a larger one with more storage.
C.Increase the backup retention period to 30 days.
D.Configure the database to stream WAL files to Amazon S3.
AnswerC

Automated backups manage WAL retention; RDS purges WAL older than retention period.

Why this answer

Option C is correct because increasing the backup retention period to 30 days directly addresses the compliance requirement while also managing WAL growth. In Amazon RDS for PostgreSQL, automated backups rely on WAL files to support point-in-time recovery (PITR). When the retention period is too short (e.g., 7 days), RDS may not purge old WAL segments aggressively enough, causing them to accumulate.

By extending the retention to 30 days, RDS properly manages WAL cleanup based on the new retention window, preventing unbounded WAL growth and meeting the 30-day compliance requirement.

Exam trap

The trap here is that candidates may think manual snapshots or streaming WAL to S3 are valid solutions, but RDS for PostgreSQL does not support direct WAL streaming to S3, and manual snapshots do not control WAL retention—only the automated backup retention period governs WAL cleanup in RDS.

How to eliminate wrong answers

Option A is wrong because creating manual snapshots daily and deleting automated backups removes the ability to perform point-in-time recovery (PITR) within the retention window, and manual snapshots do not manage WAL growth—WAL files are still retained for automated backup purposes until they are no longer needed. Option B is wrong because changing the instance type to a larger one with more storage only addresses the symptom (storage filling up) but does not stop the underlying WAL growth; it merely postpones the storage issue and increases cost without solving the root cause. Option D is wrong because streaming WAL files to Amazon S3 is not a native feature of Amazon RDS for PostgreSQL; RDS manages WAL internally and does not expose direct WAL streaming to S3—this option reflects a misunderstanding of RDS architecture.

88
MCQeasy

A company needs to store streaming data from IoT devices with a retention period of 7 days for real-time analysis. Which AWS service is most suitable?

A.Amazon DynamoDB
B.Amazon Kinesis Data Firehose
C.Amazon Kinesis Data Streams
D.Amazon S3
AnswerC

Kinesis Data Streams supports real-time data ingestion with adjustable retention.

Why this answer

Amazon Kinesis Data Streams is the most suitable service because it is designed for real-time ingestion and processing of streaming data, such as IoT device telemetry, with a default retention period of 24 hours, extendable up to 365 days. The requirement for a 7-day retention period for real-time analysis aligns perfectly with Kinesis Data Streams' ability to retain data for exactly that duration, allowing consumers to process records in near real-time using the Kinesis Client Library (KCL) or AWS Lambda.

Exam trap

The trap here is that candidates often confuse Kinesis Data Firehose with Kinesis Data Streams, assuming Firehose can retain data for a period, but Firehose is a delivery service with no retention—data is immediately delivered to a destination, whereas Data Streams provides a durable buffer with configurable retention for real-time consumption.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB is a NoSQL key-value and document database optimized for low-latency reads and writes, not for streaming data ingestion or temporary buffering with a retention period; it stores data indefinitely unless TTL is configured, and lacks native streaming ingestion capabilities. Option B is wrong because Amazon Kinesis Data Firehose is designed for loading streaming data into destinations like S3, Redshift, or Elasticsearch, but it does not support custom retention periods or real-time processing by multiple consumers—data is delivered immediately and not retained for 7 days. Option D is wrong because Amazon S3 is an object storage service with eventual consistency and no built-in streaming ingestion or real-time processing; it is a destination for stored data, not a buffer for real-time analysis with a 7-day retention window.

89
MCQhard

A financial services company has an Amazon DynamoDB table named 'Transactions' with provisioned read capacity of 10,000 RCU and write capacity of 5,000 WCU. The table stores transaction records for the past 90 days. The application performs point reads by transaction ID (partition key) and range queries by customer ID and timestamp (GSI). Recently, the company started a new marketing campaign, causing a sudden spike in write traffic. The write capacity is now at 4,500 WCU, and the application is experiencing occasional throttling on writes. The data engineer needs to ensure that writes are not throttled during future campaigns, while keeping costs low. The table currently has auto scaling enabled with a maximum capacity of 10,000 WCU. Which solution should the engineer implement?

A.Switch the table to DynamoDB on-demand capacity mode.
B.Use DynamoDB Accelerator (DAX) to cache write requests.
C.Implement an Amazon SQS queue to buffer write requests and process them in batches.
D.Increase the maximum write capacity in the auto scaling configuration to 20,000 WCU.
AnswerA

On-demand mode scales instantly to handle any traffic spike, eliminating throttling.

Why this answer

Option C is correct. DynamoDB on-demand mode automatically scales to accommodate traffic spikes, eliminating throttling without manual intervention. Option A is wrong because increasing max auto scaling setting still has a limit and may not react quickly enough.

Option B is wrong because using SQS adds latency and complexity; not ideal for real-time. Option D is wrong because DAX is for reads, not writes.

90
MCQmedium

A data engineer is responsible for a data warehouse on Amazon Redshift that stores 5 TB of data. The engineer needs to load 50 GB of new data daily from Amazon S3 into Redshift. The current load process uses the COPY command and takes 2 hours, which is within the maintenance window. However, the engineer wants to optimize the load time and reduce the impact on concurrent queries. The engineer notices that the tables are not distributed evenly across the slices. The cluster has 4 nodes of dc2.large. Which approach will best improve load performance?

A.Increase the cluster size to 8 nodes.
B.Change the distribution style of the tables to EVEN.
C.Use GZIP compression on the S3 files.
D.Add sort keys to the tables based on the load timestamp.
AnswerB

EVEN distribution ensures each slice gets an equal amount of data, improving parallelism.

Why this answer

Option B is correct because the COPY command distributes data across slices based on the table's distribution style. With dc2.large nodes, each node has 2 slices, so a 4-node cluster has 8 slices. If tables are not distributed evenly, some slices handle more data, causing bottlenecks.

Changing the distribution style to EVEN forces rows to be spread uniformly across all slices, maximizing parallelism during the COPY load and reducing load time.

Exam trap

The trap here is that candidates often assume adding more nodes (scaling out) always improves load performance, but the real bottleneck is slice-level data skew, which EVEN distribution directly fixes without additional cost.

How to eliminate wrong answers

Option A is wrong because increasing the cluster size to 8 nodes adds cost and complexity without addressing the root cause of uneven data distribution; the load time improvement would be marginal if slices are still unbalanced. Option C is wrong because using GZIP compression on S3 files reduces storage and transfer time, but the COPY command already decompresses data automatically; the bottleneck here is slice imbalance, not I/O or network bandwidth. Option D is wrong because adding sort keys based on load timestamp improves query performance for range-restricted scans, but does not affect how the COPY command distributes data across slices during the load process.

91
MCQhard

A company uses Amazon Redshift for its data warehouse. The data engineer notices that queries are slow on a large table that is frequently filtered on a column 'transaction_date'. Which optimization technique best improves query performance?

A.Apply compression encoding to 'transaction_date'.
B.Set the sort key to 'transaction_date'.
C.Set the distribution key to 'transaction_date'.
D.Run VACUUM on the table.
AnswerB

Sort keys enable zone maps to skip irrelevant blocks.

Why this answer

Setting the sort key to 'transaction_date' organizes the table data physically by that column, which allows Redshift to use zone maps to skip blocks that don't match query filters. This dramatically reduces the amount of data scanned for range-restricted queries on 'transaction_date', improving query performance.

Exam trap

The trap here is that candidates confuse distribution keys (which optimize joins) with sort keys (which optimize filtering and range scans), leading them to pick distribution key as the answer for a single-table filter performance issue.

How to eliminate wrong answers

Option A is wrong because compression encoding reduces storage size and I/O but does not directly optimize query filtering on a column; it can even slow down scans if the column is frequently used in predicates. Option C is wrong because setting the distribution key to 'transaction_date' distributes rows across nodes based on that column, which can help with joins but does not improve the efficiency of range-restricted scans on a single table. Option D is wrong because VACUUM reclaims space and re-sorts data but does not improve query performance unless the table is already sorted on a key; without a sort key on 'transaction_date', VACUUM has no effect on filter performance.

92
MCQhard

A company is using Amazon RDS for SQL Server with Multi-AZ. The database has a 500 GB data file and 100 GB log file. The application experiences high latency during peak hours. Monitoring shows high WriteIOPS on the primary. Which change will reduce latency without losing the ability to failover?

A.Reduce the log file size by changing recovery model
B.Increase the provisioned IOPS on the RDS instance
C.Create a Read Replica in a different Availability Zone
D.Switch to Multi-AZ with two readable standbys
AnswerB

Higher IOPS reduces write latency.

Why this answer

Option A is correct because increasing IOPS reduces latency for writes. Option B is wrong because Read Replicas help reads, not writes. Option C is wrong because log file size is not the issue.

Option D is wrong because Multi-AZ does not reduce latency.

93
MCQmedium

A company stores log files in Amazon S3. They want to automatically move logs older than 90 days to S3 Glacier Deep Archive to reduce costs. Which S3 feature should be used?

A.S3 Intelligent-Tiering
B.S3 Lifecycle configuration
C.S3 Replication
D.S3 Object Lock
AnswerB

Lifecycle policies can move objects to Glacier Deep Archive after 90 days.

Why this answer

S3 Lifecycle configuration allows you to define rules that automatically transition objects to colder storage classes, such as S3 Glacier Deep Archive, based on age. By setting a rule to move objects older than 90 days to S3 Glacier Deep Archive, you reduce storage costs without manual intervention. This is the correct feature for automating tier-based data lifecycle management.

Exam trap

The trap here is that candidates may confuse S3 Intelligent-Tiering with lifecycle policies, but Intelligent-Tiering does not support age-based transitions to Glacier Deep Archive and is designed for unpredictable access patterns, not fixed retention schedules.

How to eliminate wrong answers

Option A is wrong because S3 Intelligent-Tiering automatically moves data between access tiers based on changing access patterns, not on a fixed age-based schedule, and it does not support direct transition to S3 Glacier Deep Archive. Option C is wrong because S3 Replication is used to copy objects across buckets or regions for redundancy or compliance, not to transition objects to colder storage classes. Option D is wrong because S3 Object Lock is designed to prevent object deletion or overwrites for a specified retention period, not to manage storage tier transitions.

94
Multi-Selectmedium

A data engineer is designing a data lake on Amazon S3. The data lake must support both batch and streaming ingestion. Which TWO AWS services can ingest data directly into S3? (Choose TWO.)

Select 2 answers
A.Amazon RDS
B.AWS Glue
C.Amazon EMR
D.Amazon DynamoDB
E.Amazon Kinesis Data Firehose
AnswersB, E

AWS Glue can ingest batch data and write to S3.

Why this answer

AWS Glue is correct because it can ingest data directly into S3 via AWS Glue crawlers and ETL jobs, which read from various sources and write the processed data to S3. Amazon Kinesis Data Firehose is correct because it is a fully managed service that can capture, transform, and load streaming data directly into S3 without requiring custom code.

Exam trap

The trap here is that candidates often confuse services that can process data from S3 (like EMR) with services that can directly ingest data into S3, or they mistakenly think RDS or DynamoDB can natively write to S3 without additional services.

95
MCQhard

A company uses Amazon DynamoDB with on-demand capacity. They notice higher than expected costs due to a sudden spike in read traffic from a reporting job. The reporting job scans the entire table daily. What is the most cost-effective way to reduce costs while maintaining the same reporting output?

A.Enable DynamoDB Accelerator (DAX) for caching.
B.Use a Global Secondary Index (GSI) with a sort key that matches the reporting query pattern.
C.Set a TTL attribute to automatically expire old data.
D.Reduce the read capacity units (RCU) in the table.
AnswerB

A GSI allows efficient querying instead of scanning, reducing read costs.

Why this answer

Option B is correct because using a Global Secondary Index (GSI) with a sort key tailored to the reporting query pattern allows the reporting job to query only the relevant items instead of scanning the entire table. This reduces the read capacity units consumed per operation, directly lowering costs under on-demand capacity, which charges per RCU consumed. The reporting output remains identical because the GSI returns the same data filtered by the query pattern.

Exam trap

The trap here is that candidates may confuse DAX as a general cost-saver for all read patterns, but DAX only helps with repeated, cached reads, not with unique full-table scans that read different data each time.

How to eliminate wrong answers

Option A is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that reduces read latency and cost for repeated reads, but the reporting job scans the entire table daily, meaning each scan reads unique data that is not cached from previous runs, so DAX would not reduce costs. Option C is wrong because setting a TTL attribute automatically expires old data after a specified time, which reduces storage costs but does not affect the read cost of the daily scan; the reporting job still scans all remaining items. Option D is wrong because the table uses on-demand capacity, which does not have provisioned read capacity units (RCU) to reduce; on-demand capacity automatically scales and charges per RCU consumed, so reducing RCU is not applicable.

96
MCQeasy

A data engineer needs to store semi-structured JSON data that is accessed infrequently but requires immediate retrieval when needed. The data must be durable and cost-effective. Which Amazon S3 storage class should be used?

A.S3 Standard-IA
B.S3 Glacier
C.S3 Standard
D.S3 One Zone-IA
AnswerA

Standard-IA is for infrequent access with immediate retrieval.

Why this answer

S3 Standard-IA is designed for infrequently accessed data that requires rapid access when needed, with lower storage cost than S3 Standard. Option B is correct. Option A: S3 Standard is for frequently accessed data, higher cost.

Option C: S3 One Zone-IA is less durable (single AZ). Option D: S3 Glacier has retrieval delays (minutes to hours).

97
Multi-Selectmedium

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The primary instance fails, and automatic failover occurs. After failover, the data engineer notices that the new primary instance has a different DNS endpoint. Which TWO statements are true about this scenario? (Choose TWO.)

Select 2 answers
A.The standby instance is created in the same Availability Zone as the failed primary.
B.A manual DNS update is required to connect to the new primary.
C.The DNS CNAME record is updated to point to the new primary.
D.The endpoint changes to the standby instance's endpoint.
E.The applications can continue using the same database endpoint.
AnswersC, E

RDS updates the CNAME automatically.

Why this answer

Option C is correct because when Amazon RDS performs automatic failover in a Multi-AZ deployment, it updates the DNS CNAME record for the primary DB instance to point to the new primary (formerly the standby). This ensures that applications using the original endpoint are transparently redirected to the new primary without manual intervention.

Exam trap

The trap here is that candidates may think the endpoint changes to the standby's endpoint (Option D) or that a manual DNS update is needed (Option B), when in fact the original endpoint remains the same and RDS handles the DNS update automatically via CNAME.

98
MCQhard

Refer to the exhibit. An IAM policy is attached to an IAM role used by an application. The application needs to decrypt objects in an S3 bucket using a customer managed KMS key. What is the effect of this policy?

A.The application cannot perform any KMS operations.
B.The application can decrypt objects from any service.
C.The application can decrypt objects only when accessing them through S3.
D.The application can encrypt but not decrypt objects.
AnswerC

The Deny with condition allows decrypt only via S3 service.

Why this answer

The IAM policy grants the `kms:Decrypt` permission with a `kms:ViaService` condition key set to `s3.amazonaws.com`. This condition restricts the decryption operation to only when the request is made through the S3 service. Therefore, the application can decrypt objects only when accessing them through S3, not via direct KMS API calls or other services.

Exam trap

AWS often tests the `kms:ViaService` condition key to trap candidates who assume that granting `kms:Decrypt` alone allows decryption from any source, ignoring the service-specific restriction.

How to eliminate wrong answers

Option A is wrong because the policy explicitly allows `kms:Decrypt` under the condition, so the application can perform KMS decryption operations when invoked via S3. Option B is wrong because the `kms:ViaService` condition restricts decryption to S3 only, preventing decryption from any other service or direct KMS API calls. Option D is wrong because the policy grants `kms:Decrypt` permission, not `kms:Encrypt`, so the application can decrypt but not encrypt objects.

99
MCQhard

A company uses Amazon DynamoDB as a session store for a web application. During peak hours, the application experiences high latency and throttling on the DynamoDB table. The table has a read capacity of 5000 RCU and write capacity of 2000 WCU. The application reads and writes session data using the session ID as the partition key. What is the most cost-effective solution to reduce throttling?

A.Enable Auto Scaling on the table to automatically adjust capacity.
B.Increase the read capacity units (RCU) and write capacity units (WCU) to 10000 each.
C.Enable DynamoDB global tables to distribute read traffic.
D.Implement DynamoDB Accelerator (DAX) to cache frequent reads.
AnswerD

DAX reduces read load on the table, mitigating throttling cost-effectively.

Why this answer

Option A is correct because implementing DynamoDB Accelerator (DAX) caches reads, reducing read throttling cost-effectively. Option B is incorrect because increasing RCU and WCU would reduce throttling but at higher cost. Option C is incorrect because global tables add complexity and cost, not directly solve throttling.

Option D is incorrect because Auto Scaling adjusts capacity based on usage but may not be immediate; also it could increase costs if not tuned.

100
Multi-Selectmedium

A data engineer needs to store event data from IoT devices that arrives in bursts. The data is key-value and requires single-digit millisecond read and write latency. The engineer also needs to run complex analytical queries on the data for reporting. Which TWO services should be used together? (Choose TWO.)

Select 2 answers
A.Amazon DynamoDB
B.Amazon ElastiCache for Redis
C.Amazon Redshift
D.Amazon S3
E.Amazon RDS for MySQL
AnswersA, C

Provides low-latency access for key-value data.

Why this answer

Amazon DynamoDB is correct because it provides single-digit millisecond read and write latency at any scale, making it ideal for IoT event data arriving in bursts. Its key-value data model matches the requirement, and it can serve as the operational data store for fast ingestion while supporting complex analytical queries when integrated with Amazon Redshift.

Exam trap

The trap here is that candidates often choose ElastiCache for Redis because of its low latency, forgetting that it is not a durable data store for analytical queries, or they pick S3 thinking it can serve as a primary database, ignoring its lack of single-digit millisecond latency for key-value access.

101
Multi-Selectmedium

Which TWO options are valid ways to reduce storage costs for an Amazon S3 data lake that stores historical data rarely accessed after 30 days? (Choose TWO.)

Select 2 answers
A.Enable S3 Transfer Acceleration for all uploads.
B.Create a lifecycle policy to transition objects to S3 Standard-IA after 30 days.
C.Create a lifecycle policy to delete objects after 30 days.
D.Create a lifecycle policy to transition objects to S3 Glacier Deep Archive after 90 days.
E.Enable S3 Versioning to preserve all object versions.
AnswersB, D

Standard-IA reduces storage cost for infrequent access.

Why this answer

S3 Lifecycle policies can transition objects to S3 Standard-IA after 30 days (lower cost for infrequent access) and later to S3 Glacier Deep Archive for long-term archival. Options B and D are correct. Option A: deleting after 30 days would lose data.

Option C: enabling versioning increases storage costs. Option E: S3 Transfer Acceleration is for faster uploads, not cost reduction.

102
MCQhard

A company stores IoT sensor data in an Amazon S3 bucket. The data is ingested every minute and each object is about 10 KB. The data must be stored for at least 7 years for compliance. Which lifecycle policy configuration minimizes storage costs?

A.Transition to S3 One Zone-IA after 30 days, then to S3 Glacier Deep Archive after 365 days, and expire after 2555 days.
B.Transition to S3 Glacier Flexible Retrieval after 90 days and expire after 2555 days.
C.Transition to S3 Glacier Deep Archive after 30 days and expire after 2555 days.
D.Transition to S3 Standard-IA after 30 days, then to S3 Glacier Deep Archive after 365 days, and expire after 2555 days.
AnswerC

This minimizes cost by moving to the cheapest storage class early and retaining for 7 years.

Why this answer

Option D is correct because transitioning to S3 Glacier Deep Archive after 30 days and then to S3 Glacier Deep Archive is redundant; actually the correct strategy is to transition to S3 Glacier Deep Archive after 30 days and then expire after 7 years. But option D says transition to S3 Glacier Deep Archive after 30 days and expire after 2555 days (7 years). That is correct and cost-effective.

Option A is incorrect because S3 Standard for 1 year then Glacier is more expensive. Option B is incorrect because S3 Standard-IA for 90 days then Glacier Flexible Retrieval is not optimal. Option C is incorrect because S3 One Zone-IA is not durable enough for compliance.

103
MCQmedium

A data engineer needs to migrate an on-premises Apache Hadoop cluster to AWS. The cluster stores data in HDFS and runs MapReduce jobs. The company wants to minimize operational overhead and leverage serverless technologies where possible. Which AWS service should the data engineer use to replace HDFS storage?

A.Amazon EBS
B.Amazon EMR
C.Amazon S3
D.Amazon Redshift
AnswerC

S3 is the recommended storage for Hadoop on AWS, replacing HDFS with durable object storage.

Why this answer

Amazon S3 is the recommended storage layer for Hadoop workloads on AWS, providing durable, scalable object storage with no operational overhead. Option A is wrong because EMR is a compute service, not storage. Option B is wrong because EBS is block storage attached to EC2, not suitable as a standalone distributed storage for Hadoop.

Option D is wrong because Redshift is a data warehouse, not a replacement for HDFS.

104
MCQmedium

A data engineering team is using Amazon DynamoDB to store user session data for a web application. The application experiences sudden spikes in traffic, causing throttling on the DynamoDB table. The team wants to minimize throttling without over-provisioning read/write capacity. Which solution should the team implement?

A.Enable DynamoDB Time to Live (TTL) to automatically delete expired items.
B.Disable auto scaling and manually set a high provisioned capacity.
C.Use Amazon RDS read replicas to offload read traffic.
D.Enable DynamoDB Accelerator (DAX) caching layer.
AnswerD

DAX caches frequently read items, reducing read capacity consumption and throttling.

Why this answer

Option D is correct because DynamoDB Accelerator (DAX) provides a fully managed in-memory cache that reduces read load on the table, thus minimizing throttling for read-heavy workloads. Option A is incorrect because disabling auto scaling would worsen throttling. Option B is incorrect because read replicas are for RDS, not DynamoDB.

Option C is incorrect because adding a TTL does not reduce throttling; it only expires old data.

105
MCQmedium

A data engineer is configuring Amazon S3 Lifecycle policies to transition objects between storage classes. The data is accessed frequently for the first 30 days, then rarely for the next 90 days, after which it must be archived. The engineer wants to minimize costs while ensuring immediate retrieval for the first 30 days. Which lifecycle policy should the engineer implement?

A.Transition to Glacier Flexible Retrieval after 30 days, then delete after 120 days
B.Transition to One Zone-IA after 30 days, then to Glacier Deep Archive after 120 days
C.Transition to Glacier Deep Archive after 30 days, then delete after 120 days
D.Transition to Standard-IA after 30 days, then to Glacier Deep Archive after 120 days
AnswerD

Standard-IA is cost-effective for rarely accessed data; Glacier Deep Archive is cheapest for archiving.

Why this answer

Option D is correct because it transitions objects from S3 Standard (immediate retrieval, frequent access) to S3 Standard-IA (lower cost for infrequent access, immediate retrieval) after 30 days, then to S3 Glacier Deep Archive (lowest-cost archival storage) after 120 days. This matches the access pattern: frequent for 30 days, rare for 90 days, then archived, while minimizing cost and maintaining immediate retrieval for the first 30 days.

Exam trap

The trap here is that candidates often choose Glacier Deep Archive too early (e.g., after 30 days) to minimize cost, forgetting that the data must be immediately retrievable for the first 30 days and rarely accessed but still retrievable for the next 90 days, which requires a storage class with immediate retrieval (Standard-IA) before archiving.

How to eliminate wrong answers

Option A is wrong because transitioning to Glacier Flexible Retrieval after 30 days would incur retrieval delays (minutes to hours) for data that is still accessed rarely but may need immediate retrieval within the next 90 days, and it does not archive after 120 days (it deletes). Option B is wrong because transitioning to One Zone-IA after 30 days does not provide the durability or availability of Standard-IA for rarely accessed data that may still need immediate retrieval, and it is not cost-optimal for data that is not accessed frequently enough to justify the higher cost of One Zone-IA. Option C is wrong because transitioning to Glacier Deep Archive after 30 days would make retrieval impossible for the next 90 days (retrieval time is 12-48 hours), violating the requirement for immediate retrieval during the first 30 days and failing to minimize costs for the rare-access period.

106
Multi-Selectmedium

A data engineer is designing a data pipeline that ingests streaming data from IoT devices into Amazon S3 using Amazon Kinesis Data Firehose. The data must be transformed from JSON to Parquet format before storage. Which TWO actions should the data engineer take to achieve this?

Select 2 answers
A.Enable Firehose's built-in Parquet conversion without any additional configuration.
B.Use Amazon Kinesis Data Analytics to convert the data format.
C.Configure Firehose to convert the data to Apache Avro format.
D.Create a Glue Data Catalog table defining the schema and configure Firehose to use the table for Parquet conversion.
E.Create an AWS Lambda function to transform the data to Parquet and use it as a Firehose transformation.
AnswersD, E

Firehose can use the schema from Glue Data Catalog to convert to Parquet.

Why this answer

Kinesis Data Firehose can convert JSON to Parquet using a schema from a Glue Data Catalog table. Option C is correct because Firehose can use an AWS Lambda function for transformation. Option E is correct because Firehose can directly convert to Parquet if a schema is provided via Glue Data Catalog.

Option A is wrong because Firehose does not support direct conversion to Avro without a schema. Option B is wrong because Kinesis Data Analytics is for real-time analytics, not format conversion. Option D is wrong because Firehose cannot directly convert to Parquet without a schema; it needs Glue Data Catalog.

107
MCQhard

A company uses Amazon S3 to store sensitive financial data. The security team requires that all objects be encrypted at rest using AWS KMS with a customer-managed key. Additionally, they want to audit all KMS decrypt calls for compliance. Which configuration should be used to meet these requirements?

A.Enable default encryption on the bucket with SSE-KMS using an AWS managed key.
B.Use SSE-S3 with a bucket policy that denies uploads without encryption.
C.Use SSE-KMS with a customer-managed KMS key and enable CloudTrail data events for the key.
D.Use SSE-C with client-managed keys and log S3 API calls.
AnswerC

SSE-KMS with customer-managed key and CloudTrail auditing meets requirements.

Why this answer

Option C is correct because SSE-KMS with a customer-managed key allows the company to control the encryption key lifecycle and meet the requirement for customer-managed keys. Enabling CloudTrail data events for the KMS key captures all decrypt API calls, providing the necessary audit trail for compliance.

Exam trap

The trap here is that candidates may confuse enabling default encryption on the bucket (which can use SSE-KMS) with the need for a customer-managed key and CloudTrail data events, or they may think SSE-S3 or SSE-C can satisfy the audit requirement without KMS-specific logging.

How to eliminate wrong answers

Option A is wrong because it uses an AWS managed key, not a customer-managed key, so the security team cannot control key rotation or access policies. Option B is wrong because SSE-S3 uses server-side encryption with S3-managed keys, which does not provide customer-managed key control, and the bucket policy only enforces encryption, not auditing of decrypt calls. Option D is wrong because SSE-C requires the client to manage the encryption keys, which does not meet the requirement for AWS KMS, and logging S3 API calls alone does not capture KMS decrypt events.

108
MCQeasy

A startup is building a mobile application that requires a database to store user profiles and preferences. The database must scale automatically with minimal administration. Which AWS service should they use?

A.Amazon Redshift
B.Amazon Aurora
C.Amazon DynamoDB
D.Amazon RDS for PostgreSQL
AnswerC

DynamoDB scales automatically with on-demand capacity.

Why this answer

Option B is correct because Amazon DynamoDB is a fully managed NoSQL database that scales automatically. Option A (RDS) requires manual scaling. Option C (Redshift) is for data warehousing.

Option D (Aurora) is relational and requires some management.

109
Multi-Selecthard

Which TWO of the following are best practices for Amazon Redshift table design? (Choose TWO.)

Select 2 answers
A.Choose sort keys based on query patterns
B.Use INSERT statements for large data loads
C.Avoid compression encoding to reduce CPU overhead
D.Specify distribution keys to minimize data movement
E.Set distribution style to ALL for all tables
AnswersA, D

Sort keys improve query performance for range-filtered queries.

Why this answer

Options B and D are correct. Using sort keys for query performance and distribution keys for data distribution are best practices. Option A is wrong because compression encoding should be applied, not avoided.

Option C is wrong because COPY is the preferred method. Option E is wrong because distribution keys should be chosen carefully, not set to ALL by default.

110
Multi-Selecthard

A company is migrating a large Oracle data warehouse to Amazon Redshift. Which THREE considerations are important for optimizing the Redshift cluster?

Select 3 answers
A.Purchasing reserved instances for the cluster.
B.Using columnar storage format.
C.Defining appropriate sort keys for the tables.
D.Applying compression encoding to columns.
E.Choosing the right distribution style (KEY, ALL, EVEN).
AnswersC, D, E

Improves query performance by reducing scans.

Why this answer

Distribution style, sort keys, and compression encoding are key to Redshift performance. Columnar storage is inherent. Auto Vacuum is automatic.

Reserved instances are for cost savings, not performance.

111
MCQhard

A company has an Amazon DynamoDB table with on-demand capacity mode. The table stores session data for a web application. Recently, the application experienced throttling errors during a traffic spike. The team wants to prevent future throttling while optimizing costs. What should they do?

A.Implement a DynamoDB Accelerator (DAX) cluster
B.Enable DynamoDB auto scaling on the table
C.Switch to provisioned capacity with auto scaling
D.Increase the read and write capacity of the table
AnswerA

DAX provides in-memory caching to reduce read throttling.

Why this answer

A DynamoDB Accelerator (DAX) cluster provides an in-memory cache that absorbs read-heavy traffic spikes, reducing the number of read requests that reach the underlying DynamoDB table. Since the throttling errors occurred during a traffic spike and the table uses on-demand capacity, which already scales automatically for writes and reads, the bottleneck is likely read-heavy traffic overwhelming the table's throughput. DAX offloads reads from the table, preventing throttling without requiring any changes to capacity mode, and it is cost-effective because it reduces read capacity unit consumption.

Exam trap

The trap here is that candidates assume throttling in on-demand mode must be fixed by switching to provisioned capacity or enabling auto scaling, but they overlook that on-demand already scales automatically and the real solution is to reduce read load via caching with DAX.

How to eliminate wrong answers

Option B is wrong because DynamoDB auto scaling is only available for provisioned capacity mode, not on-demand mode; on-demand already scales automatically, so enabling auto scaling is not applicable. Option C is wrong because switching to provisioned capacity with auto scaling would introduce management overhead and potential cost inefficiency compared to on-demand, and it does not address the root cause of read throttling during spikes as effectively as caching. Option D is wrong because increasing read and write capacity is only possible in provisioned mode; in on-demand mode, you cannot manually increase capacity, and doing so would not prevent throttling caused by read-heavy spikes without incurring unnecessary costs.

112
Multi-Selecteasy

A company is using Amazon RDS for MySQL and wants to automate backups for point-in-time recovery. Which TWO actions should be taken? (Choose TWO.)

Select 2 answers
A.Enable automated backups with a retention period.
B.Use AWS Backup to schedule backups.
C.Set the backup retention period to the desired number of days.
D.Enable Multi-AZ deployment.
E.Take manual snapshots daily.
AnswersA, C

Automated backups provide point-in-time recovery and are enabled by default.

Why this answer

Options B and D are correct. Automated backups are enabled by default, and the retention period can be set up to 35 days. Option A is wrong because manual snapshots are not automated.

Option C is wrong because Multi-AZ is for high availability, not backups. Option E is wrong because AWS Backup is an optional service but not required; RDS native backups suffice.

113
MCQhard

An IAM role 'DataLakeRole' has the above S3 bucket policy attached to an S3 bucket. The role is assumed by an AWS Glue job. The Glue job is failing with 'Access Denied' errors when trying to list objects in the bucket. Which action should be added to the policy to fix the issue?

A.Add s3:ListObjects action for the bucket ARN.
B.Add s3:ListBucket action for the bucket ARN (arn:aws:s3:::my-data-lake).
C.Add s3:GetObjectVersion action for the object ARN.
D.Add s3:ListBucket action for the object ARN (arn:aws:s3:::my-data-lake/*).
AnswerB

ListBucket is required to list objects in the bucket.

Why this answer

The Glue job is failing with 'Access Denied' when trying to list objects, which requires the s3:ListBucket permission on the bucket itself (not on objects). Option B correctly adds s3:ListBucket for the bucket ARN (arn:aws:s3:::my-data-lake), which grants permission to list the contents of the bucket. Without this action, even if other permissions exist, the ListObjectsV2 API call used by AWS Glue to enumerate objects will be denied.

Exam trap

The trap here is that candidates confuse s3:ListBucket (which applies to the bucket itself) with s3:GetObject or s3:ListObjects (which are often misapplied to object ARNs), leading them to pick Option D or A, not realizing that listing requires the bucket-level permission and a bucket ARN, not an object ARN.

How to eliminate wrong answers

Option A is wrong because s3:ListObjects is an alias for s3:ListBucket and must be applied to the bucket ARN, not the bucket ARN with a trailing slash or object path; however, the key issue is that the action name itself is correct but the ARN in the answer is unspecified, and the question asks for the action to add, not the ARN—but more critically, s3:ListObjects is a legacy action name and the exam expects s3:ListBucket for consistency with the S3 API. Option C is wrong because s3:GetObjectVersion is used to retrieve a specific version of an object, not to list objects, and does not address the 'list objects' failure. Option D is wrong because s3:ListBucket must be applied to the bucket ARN (arn:aws:s3:::my-data-lake), not to an object ARN (arn:aws:s3:::my-data-lake/*); applying it to an object ARN would be invalid and would not grant the permission to list the bucket's contents.

114
Multi-Selecthard

A company stores sensitive financial data in an Amazon Redshift cluster. The data engineer must ensure that all queries are logged for audit purposes and that the logs are stored in Amazon S3 with server-side encryption. Which THREE steps should the data engineer take to meet these requirements?

Select 3 answers
A.Configure audit logs to be stored in an Amazon S3 bucket.
B.Enable encryption on the Redshift cluster.
C.Enable AWS CloudTrail to log Redshift queries.
D.Enable audit logging on the Redshift cluster.
E.Enable default encryption on the S3 bucket using SSE-S3 or SSE-KMS.
AnswersA, D, E

Audit logs can be delivered to an S3 bucket.

Why this answer

To audit queries, enable audit logging, set the bucket to S3, and ensure encryption is enabled. Option A is correct because audit logging captures query logs. Option B is correct because audit logs are stored in S3.

Option D is correct because enabling default encryption on the S3 bucket encrypts the logs. Option C is wrong because user activity logging in CloudTrail does not capture query logs. Option E is wrong because enabling encryption on the Redshift cluster encrypts data at rest, not the audit logs.

115
Multi-Selecteasy

A data engineer needs to store streaming data from multiple sources into Amazon S3. The data should be organized by source, date, and hour. The engineer wants to minimize processing overhead. Which THREE S3 features should the engineer use to achieve this? (Choose THREE.)

Select 3 answers
A.S3 Inventory to list objects and their metadata.
B.S3 Object Lock to prevent object modifications.
C.S3 Batch Operations to rename objects after upload.
D.S3 Event Notifications to invoke Lambda functions for data processing.
E.S3 prefixes to create a folder structure (e.g., source=.../date=.../hour=...).
AnswersA, D, E

Inventory helps audit and manage the stored data.

Why this answer

S3 prefixes organize objects into a hierarchy. S3 Inventory provides a list of objects. S3 Event Notifications trigger downstream processing.

Batch Operations are for bulk actions, not organization. Object Lock is for retention. S3 Select is for querying within files.

116
MCQhard

A company uses Amazon Redshift for analytics. The data engineer notices that queries are slow and the system is experiencing high disk usage. The engineer suspects that the distribution style is suboptimal. Which action should the engineer take to improve query performance?

A.Convert all tables to use SORTKEY on the most frequently filtered column.
B.Increase the number of nodes in the cluster to distribute data across more slices.
C.Use the DISTSTYLE AUTO setting and analyze query patterns to let Redshift choose.
D.Set all tables to DISTSTYLE EVEN to distribute data evenly.
AnswerC

AUTO adapts distribution based on workload.

Why this answer

Option B is correct because analyzing query patterns helps choose optimal distribution. Option A changes all tables, which may not be ideal. Option C is for storage, not distribution.

Option D is for sort keys.

117
MCQmedium

A data engineer is migrating an on-premises PostgreSQL database to Amazon RDS for PostgreSQL. The database is 2 TB in size and has a tight migration window. Which migration approach minimizes downtime?

A.Use AWS Database Migration Service (DMS) with full load only.
B.Use pg_dump to export the database and pg_restore to import into RDS.
C.Create a read replica in RDS and promote it when ready.
D.Use AWS DMS with ongoing replication to capture changes during migration.
AnswerD

Ongoing replication syncs changes until cutover, minimizing downtime.

Why this answer

Option D is correct because AWS DMS with ongoing replication allows minimal downtime. Option A is full dump and restore, causing downtime. Option B is slow.

Option C requires manual setup.

118
MCQhard

A data engineer ran the command shown in the exhibit on the bucket 'my-data-lake'. The engineer then tries to delete an object version but receives an 'AccessDenied' error. The engineer has full S3 permissions via IAM. What is the most likely reason for the error?

A.Versioning is suspended on the bucket
B.MFA Delete is enabled, requiring multi-factor authentication
C.The bucket policy denies s3:DeleteObjectVersion
D.An S3 Object Lock retention policy is in effect
AnswerB

MFA Delete requires additional authentication to delete versions.

Why this answer

The command shown in the exhibit (likely `aws s3api put-bucket-versioning --bucket my-data-lake --versioning-configuration Status=Enabled,MFADelete=Enabled`) enables both versioning and MFA Delete on the bucket. When MFA Delete is enabled, any operation that permanently deletes an object version or changes the versioning state requires the request to include a multi-factor authentication token. Even though the engineer has full S3 permissions via IAM, the missing MFA token causes the 'AccessDenied' error.

This is a bucket-level setting that overrides IAM permissions for these specific operations.

Exam trap

The trap here is that candidates assume 'full S3 permissions via IAM' guarantees all operations succeed, but MFA Delete is a bucket-level condition that overrides IAM permissions for version deletion and versioning state changes, requiring explicit MFA authentication.

How to eliminate wrong answers

Option A is wrong because suspending versioning does not prevent deletion of existing object versions; it only stops new versions from being created, and the engineer would still be able to delete versions with appropriate IAM permissions. Option C is wrong because the engineer has full S3 permissions via IAM, and there is no indication of a bucket policy explicitly denying s3:DeleteObjectVersion; the error is not caused by a deny statement. Option D is wrong because an S3 Object Lock retention policy prevents deletion or overwrite of objects during the retention period, but the error message is 'AccessDenied' specifically due to missing MFA, not a retention-based block.

119
MCQhard

A data engineer is designing a multi-Region disaster recovery solution for an Amazon DynamoDB table. The table must be available in a secondary Region with minimal data loss and automatic failover. Which feature should be used?

A.DynamoDB on-demand backup and restore in the secondary Region
B.DynamoDB global tables
C.DynamoDB point-in-time recovery (PITR)
D.DynamoDB cross-Region snapshot export to S3
AnswerB

Global tables replicate data across Regions and support automatic failover.

Why this answer

DynamoDB global tables provide a fully managed, multi-Region, multi-active database solution that replicates data automatically across selected AWS Regions. This ensures automatic failover with eventual consistency and minimal data loss, meeting the disaster recovery requirements for high availability and automatic failover without manual intervention.

Exam trap

The trap here is that candidates often confuse point-in-time recovery (PITR) with cross-Region disaster recovery, but PITR is a single-Region feature that does not provide automatic failover or multi-Region replication.

How to eliminate wrong answers

Option A is wrong because on-demand backup and restore is a manual process that requires user intervention to initiate a restore in the secondary Region, not providing automatic failover or minimal data loss in real time. Option C is wrong because point-in-time recovery (PITR) protects against accidental writes or deletes within a single Region by restoring to a point in time, but it does not replicate data across Regions or enable automatic failover. Option D is wrong because cross-Region snapshot export to S3 is a manual, batch-oriented process that exports table data to Amazon S3 in another Region, requiring manual import and setup for failover, and does not provide automatic, continuous replication or failover.

120
MCQmedium

A company is using an Amazon RDS for MySQL database for its e-commerce platform. During a recent flash sale, the database experienced high read traffic, causing slow query performance. The company needs a solution that offloads read traffic with minimal application changes. Which action should be taken?

A.Enable DynamoDB Accelerator (DAX) on the RDS instance.
B.Migrate the database to Amazon Aurora and enable Aurora Global Database.
C.Implement Amazon ElastiCache for Redis to cache database queries.
D.Create an Amazon RDS read replica in the same region.
AnswerD

Read replicas offload read traffic from the primary instance with minimal application changes.

Why this answer

Creating an Amazon RDS read replica in the same region offloads read traffic from the primary DB instance by directing read queries to a read-only copy. This requires minimal application changes—only modifying the database connection string to point read queries to the replica endpoint. RDS read replicas use MySQL's native asynchronous replication, making them ideal for scaling read-heavy workloads like flash sales.

Exam trap

The trap here is that candidates may choose ElastiCache (Option C) because it is a caching solution, but they overlook the explicit requirement for minimal application changes, which caching typically does not satisfy without code modifications.

How to eliminate wrong answers

Option A is wrong because DynamoDB Accelerator (DAX) is an in-memory cache for Amazon DynamoDB, not for RDS for MySQL; it cannot be enabled on an RDS instance. Option B is wrong because migrating to Aurora and enabling Aurora Global Database is designed for cross-region disaster recovery and global reads, not for offloading read traffic within a single region, and it requires significant application and migration effort. Option C is wrong because while ElastiCache for Redis can cache query results, it requires application code changes to implement caching logic (e.g., cache-aside pattern), which contradicts the requirement for minimal application changes.

121
Multi-Selectmedium

Which TWO actions can help optimize Amazon S3 storage costs for a data lake? (Choose two.)

Select 2 answers
A.Enable S3 Replication to another region
B.Use S3 Intelligent-Tiering for unpredictable access patterns
C.Use S3 Select to retrieve only needed data
D.Enable S3 Transfer Acceleration
E.Implement S3 Lifecycle policies to transition objects to Glacier
AnswersB, E

Intelligent-Tiering automatically optimizes costs based on access patterns.

Why this answer

S3 Intelligent-Tiering automatically moves objects between two access tiers (frequent and infrequent) when access patterns change, with no retrieval fees and a small monthly monitoring fee. This is ideal for a data lake where access patterns are unpredictable, as it optimizes costs without requiring manual lifecycle rule adjustments.

Exam trap

The trap here is that candidates confuse cost optimization for storage (reducing stored data cost) with cost optimization for data transfer or retrieval, leading them to select options like S3 Select or Transfer Acceleration that address different cost dimensions.

122
MCQeasy

A company uses Amazon S3 to store customer documents. The data engineer needs to ensure that all objects uploaded to a specific S3 bucket are automatically encrypted with a customer-managed AWS KMS key. What should the data engineer do?

A.Use pre-signed URLs for all uploads that include encryption parameters.
B.Create a bucket policy that denies uploads without encryption.
C.Enable S3 Versioning on the bucket.
D.Set default encryption on the bucket to use SSE-KMS with the customer-managed key.
AnswerD

Default encryption automatically encrypts all objects with the specified KMS key.

Why this answer

Setting a default encryption policy on the bucket with SSE-KMS ensures all objects are encrypted with the specified KMS key. Option A is wrong because bucket policies can enforce encryption but don't automatically encrypt. Option B is wrong because enabling S3 Versioning doesn't enforce encryption.

Option D is wrong because pre-signed URLs don't enforce encryption.

123
MCQhard

A data engineer is troubleshooting an Amazon Redshift cluster that is running out of disk space. The engineer runs STV_PARTITIONS and notices that some slices have significantly more data than others. What is the most likely cause and solution?

A.Poorly chosen sort keys; redefine sort keys
B.Data distribution skew due to uneven distribution style; change distribution style to EVEN or correct KEY
C.Some nodes are underutilized; add more nodes
D.Concurrency scaling is disabled; enable concurrency scaling
AnswerB

Uneven distribution can cause some slices to fill up faster.

Why this answer

B is correct because STV_PARTITIONS shows per-slice disk usage, and significant variation indicates data distribution skew. Uneven distribution causes some slices to fill faster, leading to premature disk-full errors. Changing the distribution style to EVEN (for tables without join keys) or correcting the KEY distribution style (using a high-cardinality, evenly distributed column) rebalances data across slices.

Exam trap

The trap here is that candidates confuse sort keys (which improve query performance via zone maps) with distribution keys (which control data placement across slices), leading them to incorrectly select sort key redefinition as the fix for disk space skew.

How to eliminate wrong answers

Option A is wrong because sort keys affect query performance (min/max zone maps and block pruning), not how data is distributed across slices; disk space skew is a distribution issue, not a sort key issue. Option C is wrong because adding nodes increases total cluster capacity but does not fix existing data skew; the problem is uneven data placement, not insufficient total nodes. Option D is wrong because concurrency scaling handles workload bursts by adding transient compute capacity, not disk space; it does not affect how data is stored on existing slices.

124
Multi-Selecthard

A data engineer is designing a data store for a real-time analytics application that requires sub-millisecond read and write latency. The data is accessed via a REST API. Which AWS services should the engineer consider? (Choose THREE.)

Select 3 answers
A.Amazon S3
B.Amazon RDS for MySQL
C.Amazon ElastiCache for Redis
D.Amazon DynamoDB
E.DynamoDB Accelerator (DAX)
AnswersC, D, E

Redis provides sub-millisecond latency.

Why this answer

Amazon ElastiCache for Redis is correct because it provides an in-memory data store that delivers sub-millisecond read and write latency, ideal for real-time analytics. Redis supports data structures like strings, hashes, and sorted sets, and can be accessed via REST API through a caching layer or directly with Redis commands. This makes it suitable for low-latency, high-throughput workloads where disk-based storage would introduce unacceptable delays.

Exam trap

The trap here is that candidates may overlook DAX as a separate service and assume DynamoDB alone provides sub-millisecond latency, but DynamoDB's base latency is typically 1-10 milliseconds for strongly consistent reads, and DAX is required to achieve sub-millisecond performance for read-heavy workloads.

125
MCQmedium

A company stores sensitive data in an S3 bucket. To meet compliance requirements, they must ensure that all objects are encrypted at rest using server-side encryption with AWS KMS. Which bucket policy statement should be applied to deny uploads that do not use the required encryption?

A.{"Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::bucketname/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"aws:kms"}}}
B.{"Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::bucketname/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"AES256"}}}
C.{"Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::bucketname/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption-aws-kms-key-id":"arn:aws:kms:us-east-1:123456789012:key/abc123"}}}
D.{"Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::bucketname/*","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}}
AnswerA

Correctly denies PutObject if the encryption is not SSE-KMS.

Why this answer

Option D is correct because the condition 's3:x-amz-server-side-encryption': 'aws:kms' ensures that only objects encrypted with SSE-KMS are allowed. Option A uses SSE-S3, which is not KMS. Option B uses the wrong header.

Option C uses SSE-C, which is not allowed. Option D correctly denies if the encryption is not 'aws:kms'.

126
Multi-Selectmedium

A company is using Amazon S3 to store sensitive data. They need to ensure that all objects are encrypted at rest. Which combination of actions should be taken? (Choose TWO.)

Select 2 answers
A.Enable S3 Versioning on the bucket.
B.Enable MFA Delete on the bucket.
C.Configure S3 Access Points with network policies.
D.Use a bucket policy to deny PutObject requests that do not include the x-amz-server-side-encryption header.
E.Enable default encryption on the S3 bucket.
AnswersD, E

Policy enforces encryption at upload time.

Why this answer

Option D is correct because a bucket policy that denies PutObject requests lacking the `x-amz-server-side-encryption` header enforces encryption at the time of upload, ensuring that any object written without explicit encryption headers is rejected. Option E is correct because enabling default encryption on the S3 bucket automatically applies server-side encryption (SSE-S3 or SSE-KMS) to any object uploaded without specifying encryption headers, providing a fallback that covers all objects. Together, these actions ensure that every object stored in the bucket is encrypted at rest, either by explicit client request or by default bucket settings.

Exam trap

The trap here is that candidates often confuse data protection features like Versioning or MFA Delete with encryption controls, or assume that network policies (Access Points) somehow enforce encryption, when in reality only explicit bucket policies and default encryption settings directly ensure objects are encrypted at rest.

127
MCQeasy

A data engineer needs to store semi-structured JSON logs from multiple sources in a centralized data store for querying using SQL. The logs are immutable and need to be retained for 90 days. Which AWS service should be used?

A.Amazon RDS for MySQL.
B.Amazon DynamoDB.
C.Amazon S3 with Amazon Athena.
D.Amazon ElastiCache for Redis.
AnswerC

S3 stores JSON logs, Athena enables SQL queries.

Why this answer

Amazon S3 with Amazon Athena is the correct choice because S3 provides durable, cost-effective storage for immutable semi-structured JSON logs, and Athena enables serverless SQL querying directly against the data in S3 without needing to load or transform it. This combination meets the 90-day retention requirement and supports querying semi-structured data using standard SQL via Athena's built-in JSON SerDe.

Exam trap

The trap here is that candidates may choose DynamoDB for its JSON support and querying flexibility, overlooking that it is not designed for cost-effective long-term retention of immutable logs and lacks native SQL querying, while S3 with Athena directly addresses both requirements.

How to eliminate wrong answers

Option A is wrong because Amazon RDS for MySQL is a relational database designed for structured data with predefined schemas, not optimized for storing large volumes of immutable semi-structured JSON logs, and it incurs higher costs for long-term retention. Option B is wrong because Amazon DynamoDB is a NoSQL key-value and document database that can store JSON, but it is not cost-effective for 90-day retention of immutable logs due to per-request pricing and storage costs, and it lacks native SQL querying capabilities without additional services like DynamoDB Accelerator or PartiQL. Option D is wrong because Amazon ElastiCache for Redis is an in-memory cache designed for low-latency access to transient data, not for durable, long-term storage of immutable logs, and it does not support SQL querying.

128
MCQmedium

A company has an Amazon S3 bucket with versioning enabled. They want to automatically delete noncurrent versions of objects after 30 days. Which lifecycle rule action should be used?

A.Expiration
B.NoncurrentVersionExpiration
C.NoncurrentVersionTransition
D.AbortIncompleteMultipartUpload
AnswerB

This action deletes noncurrent versions after a specified number of days.

Why this answer

The NoncurrentVersionExpiration lifecycle action is specifically designed to remove noncurrent object versions after a specified number of days. Since versioning is enabled and the requirement is to delete noncurrent versions after 30 days, this action directly meets the goal without affecting current versions or other lifecycle aspects.

Exam trap

The trap here is confusing NoncurrentVersionExpiration with Expiration, as candidates often mistakenly apply the standard Expiration action to delete old versions, not realizing it only affects the current version.

How to eliminate wrong answers

Option A is wrong because Expiration deletes the current version of an object (or marks it for deletion in non-versioned buckets), not noncurrent versions. Option C is wrong because NoncurrentVersionTransition moves noncurrent versions to a different storage class (e.g., S3 Glacier), but does not delete them. Option D is wrong because AbortIncompleteMultipartUpload only aborts incomplete multipart uploads that are older than a specified number of days, and has no effect on existing object versions.

129
Multi-Selecthard

A company is using Amazon Redshift for a data warehouse. The data engineer needs to improve query performance for a table that is frequently joined with other tables on a specific column. Which THREE actions would help improve join performance? (Choose THREE.)

Select 3 answers
A.Set the distribution style to KEY on the join column
B.Apply a SORTKEY on the join column
C.Use DISTKEY on the join column to co-locate data
D.Use DISTSTYLE ALL to replicate the table to all nodes
E.Change the column data type to a fixed-length CHAR
AnswersA, B, C

This is the same as option C; it co-locates data for joins.

Why this answer

Setting the distribution style to KEY on the join column (option A) ensures that rows with the same join key value are co-located on the same compute node. This allows Redshift to perform a collocated join, avoiding the expensive redistribution of data across the network during query execution, which significantly improves join performance.

Exam trap

The trap here is that candidates often confuse DISTSTYLE ALL (option D) as a required action for join performance, but the question asks for three specific actions, and DISTSTYLE ALL is a valid but separate optimization not listed among the correct three; the exam expects you to recognize that A, B, and C are the correct trio, with D being a distractor that is also correct in isolation but not part of the required set.

130
MCQeasy

A company wants to store data from thousands of IoT devices with varying data rates. The data must be stored in a schema-on-read fashion and support SQL queries. Which AWS service should be used?

A.Amazon RDS for MySQL
B.Amazon S3 with Amazon Athena
C.Amazon DynamoDB
D.Amazon Redshift
AnswerB

S3 provides scalable storage, and Athena enables SQL queries with schema-on-read.

Why this answer

Option B is correct because Amazon Athena allows querying data directly from S3 using SQL, supporting schema-on-read. Option A is wrong because DynamoDB is NoSQL and does not support SQL directly. Option C is wrong because RDS is schema-on-write.

Option D is wrong because Redshift is schema-on-write.

131
Multi-Selecteasy

Which TWO of the following are features of Amazon RDS Multi-AZ deployments? (Choose 2.)

Select 2 answers
A.Read replicas in the same region for offloading read traffic.
B.Automatic failover to a standby instance in case of an AZ failure.
C.A standby instance that is not accessible for reads or writes.
D.Automatic storage scaling based on usage.
E.Synchronous replication across AWS Regions.
AnswersB, C

Multi-AZ automatically fails over to the standby in another AZ.

Why this answer

Option B is correct because Multi-AZ provides automatic failover. Option D is correct because the standby instance is in a different AZ. Option A is wrong because read replicas are separate from Multi-AZ.

Option C is wrong because Multi-AZ does not support cross-region failover. Option E is wrong because Multi-AZ does not automatically scale storage.

132
MCQmedium

A data engineer is designing a data store for a real-time leaderboard application that requires sub-millisecond read and write latency. The leaderboard stores scores for millions of users and needs to be sorted by score. Which AWS service should the engineer use?

A.Amazon RDS for PostgreSQL with an index on score
B.Amazon DynamoDB with a global secondary index on score
C.Amazon ElastiCache for Redis with a sorted set
D.Amazon Neptune with a graph model
AnswerC

Redis sorted sets provide O(log N) operations and sub-millisecond latency.

Why this answer

Option C is correct because ElastiCache for Redis with sorted sets provides sub-millisecond latency and sorted data. Option A (DynamoDB) is fast but not designed for sorted sets. Option B (RDS) is slower.

Option D (Neptune) is graph database.

133
Multi-Selecthard

Which THREE steps are recommended for migrating an on-premises Oracle database to Amazon RDS for Oracle with minimal downtime? (Choose 3.)

Select 3 answers
A.Set up a VPN or Direct Connect between on-premises and AWS
B.Disable archiving on the source database
C.Use AWS Schema Conversion Tool (SCT) to convert the schema
D.Perform a full load migration without change data capture
E.Use AWS Database Migration Service (DMS) for ongoing replication
AnswersA, C, E

Secure connectivity is essential.

Why this answer

Options A, C, and E are correct because DMS supports ongoing replication, SCT assesses schema, and setting up a VPN ensures secure connectivity. Option B is wrong because full load without CDC does not minimize downtime. Option D is wrong because disabling archiving prevents point-in-time recovery.

134
Multi-Selecteasy

A data engineer is setting up Amazon S3 bucket policies for a data lake. The security team requires that all objects uploaded to the bucket be encrypted at rest using server-side encryption. Which TWO methods can enforce encryption at upload time?

Select 2 answers
A.Enable S3 Transfer Acceleration.
B.Enable AWS CloudTrail to monitor uploads.
C.Enable AWS KMS automatic key rotation.
D.Enable S3 default encryption on the bucket.
E.Create a bucket policy that denies PutObject if the x-amz-server-side-encryption header is missing.
AnswersD, E

Default encryption automatically encrypts objects.

Why this answer

Option D is correct because enabling S3 default encryption on the bucket automatically applies server-side encryption (SSE-S3 or SSE-KMS) to all objects uploaded without an encryption header, ensuring encryption at rest. Option E is correct because a bucket policy with a Deny effect on PutObject when the x-amz-server-side-encryption header is missing enforces encryption at upload time by rejecting unencrypted uploads, providing a complementary enforcement mechanism.

Exam trap

The trap here is that candidates often confuse default encryption (which applies encryption automatically but does not block unencrypted uploads) with a bucket policy that explicitly denies unencrypted uploads, thinking either alone is sufficient, when both are needed for full enforcement.

135
MCQmedium

A data engineer applies the bucket policy shown in the exhibit to an S3 bucket. The bucket contains sensitive data that must be encrypted at rest and accessed only over HTTPS. Which of the following statements is true?

A.The policy allows both HTTP and HTTPS access.
B.The policy allows anonymous access to list objects in the bucket.
C.The policy enforces that all PutObject requests must include the x-amz-server-side-encryption header with value AES256.
D.The policy requires the use of AWS KMS for server-side encryption.
AnswerC

The Allow statement requires the condition s3:x-amz-server-side-encryption equals AES256 for PutObject.

Why this answer

Option C is correct because the bucket policy includes a condition that denies PutObject requests unless the `s3:x-amz-server-side-encryption` header is present and set to `AES256`. This enforces server-side encryption with S3-managed keys (SSE-S3) for all uploads, ensuring data at rest is encrypted.

Exam trap

AWS often tests the distinction between SSE-S3 (`AES256`) and SSE-KMS (`aws:kms`) in bucket policy conditions, and candidates may mistakenly think the policy requires KMS when it actually specifies AES256.

How to eliminate wrong answers

Option A is wrong because the policy includes a `Deny` statement that blocks requests when `aws:SecureTransport` is `false`, which effectively denies HTTP access and allows only HTTPS. Option B is wrong because the policy does not grant any `s3:ListBucket` permission to anonymous principals; it only denies requests that fail encryption or transport conditions, but does not allow anonymous listing. Option D is wrong because the policy requires the `x-amz-server-side-encryption` header with value `AES256`, which corresponds to SSE-S3, not AWS KMS (which would require `aws:kms`).

136
MCQmedium

A company is migrating its on-premises PostgreSQL database to Amazon RDS for PostgreSQL. The database is 5 TB in size and supports a critical application that requires less than 30 minutes of downtime. The company has a 1 Gbps network connection to AWS. The data engineering team plans to use AWS Database Migration Service (DMS) with change data capture (CDC) to keep the target in sync. During the full load phase, DMS is taking longer than expected, and the team is concerned about meeting the downtime window. Which action should the team take to speed up the full load?

A.Increase the compute capacity of the target RDS instance.
B.Enable DMS validation to ensure data integrity.
C.Use AWS Snowball Edge to transfer the data offline.
D.Create multiple DMS tasks to load different tables in parallel.
AnswerD

Parallel tasks increase throughput.

Why this answer

Creating multiple DMS tasks to load different tables in parallel (Option D) is the correct action because DMS performs full load sequentially within a single task. By splitting tables across multiple tasks, the team can parallelize the data transfer, utilizing the 1 Gbps network more efficiently and reducing the overall full load time to meet the 30-minute downtime window.

Exam trap

The trap here is that candidates assume increasing target instance size (Option A) will speed up the full load, but they overlook that DMS's single-task architecture is the primary bottleneck, not the target's write capacity.

How to eliminate wrong answers

Option A is wrong because increasing the compute capacity of the target RDS instance does not address the bottleneck of DMS's sequential full load process; the target can ingest data faster, but DMS still processes tables one at a time. Option B is wrong because enabling DMS validation adds overhead by comparing source and target records, which would slow down the full load further, not speed it up. Option C is wrong because AWS Snowball Edge is designed for offline data transfer over multiple days, not for a migration requiring less than 30 minutes of downtime; the 1 Gbps network connection is sufficient if parallelism is used.

137
MCQeasy

A data engineer needs to store semi-structured JSON logs from an application for up to 30 days, with infrequent access. Which storage solution is the most cost-effective?

A.Amazon S3 Glacier Deep Archive
B.Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
C.Amazon S3 Standard
D.Amazon S3 Standard-Infrequent Access (S3 Standard-IA)
AnswerD

Cost-effective for infrequently accessed data with rapid access needs.

Why this answer

Amazon S3 Standard-Infrequent Access (S3 Standard-IA) is the most cost-effective choice for storing semi-structured JSON logs for up to 30 days with infrequent access. It offers low storage cost (compared to S3 Standard) while providing low-latency retrieval and high durability (99.999999999%) across multiple Availability Zones, making it ideal for data that is accessed less frequently but needs immediate availability when requested.

Exam trap

The trap here is that candidates often choose S3 One Zone-IA (Option B) thinking it is cheaper due to single-AZ storage, but they overlook the durability and availability requirements for logs that may need to be recovered from an AZ failure, and the fact that S3 Standard-IA is actually more cost-effective for this 30-day retention scenario when considering retrieval costs and minimum storage charges.

How to eliminate wrong answers

Option A is wrong because Amazon S3 Glacier Deep Archive is designed for long-term archival (retrieval times of 12-48 hours) and has a minimum storage duration of 180 days, making it unsuitable for a 30-day retention period with infrequent but potentially immediate access needs. Option B is wrong because Amazon S3 One Zone-Infrequent Access stores data in a single Availability Zone, which does not provide the multi-AZ durability required for logs that may need to be recovered from failures; it is also not the most cost-effective for this use case due to its higher retrieval costs and lower resilience. Option C is wrong because Amazon S3 Standard is optimized for frequently accessed data with higher storage costs per GB, making it overpriced for logs that are accessed infrequently over a 30-day period.

138
MCQeasy

A company stores its application logs in an Amazon S3 bucket. The logs are accessed frequently for the first 30 days, after which they are rarely accessed but must be retained for 7 years for compliance. The company wants to optimize storage costs while maintaining immediate retrieval availability for the first 30 days and the ability to retrieve logs within 12 hours after that. Which lifecycle policy should the data engineer configure?

A.Delete objects after 30 days to minimize storage costs.
B.Transition objects to S3 Standard-IA after 30 days and then to S3 Glacier Deep Archive after 1 year.
C.Transition objects to S3 One Zone-IA after 30 days and delete after 7 years.
D.Transition objects to S3 Glacier Flexible Retrieval after 30 days and delete after 7 years.
AnswerB

Standard-IA provides immediate retrieval for the first 30 days, then Deep Archive for cost-effective long-term retention.

Why this answer

Option B is correct because S3 Standard-IA is for infrequently accessed data but with immediate retrieval; after 30 days, transition to S3 Glacier Deep Archive for long-term retention with retrieval within 12 hours. Option A is incorrect because S3 One Zone-IA is not durable enough for compliance. Option C is incorrect because S3 Glacier Flexible Retrieval has retrieval times of minutes to hours, not up to 12 hours; Deep Archive is cheaper.

Option D is incorrect because deleting after 30 days violates retention requirement.

139
MCQeasy

A data engineer is migrating an on-premises PostgreSQL database to Amazon RDS for PostgreSQL. The database is 2 TB in size. The engineer needs to minimize downtime. Which AWS service should be used for the migration?

A.AWS Data Pipeline
B.AWS Database Migration Service (DMS)
C.AWS Snowball
D.Amazon S3
AnswerB

DMS supports continuous replication with minimal downtime.

Why this answer

AWS Database Migration Service (DMS) supports live migration with minimal downtime using change data capture. Option B is correct. Option A: S3 is for object storage, not database migration.

Option C: Snowball is for large data transfer offline, which would cause downtime. Option D: Data Pipeline is for data processing workflows, not direct database migration.

140
Multi-Selectmedium

Which TWO options are valid ways to encrypt data at rest in Amazon S3? (Choose two.)

Select 2 answers
A.Client-Side Encryption
B.SSL/TLS Encryption
C.Server-Side Encryption with S3-Managed Keys (SSE-S3)
D.IAM Policy Encryption
E.Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
AnswersC, E

SSE-S3 is a server-side encryption option.

Why this answer

Server-Side Encryption with S3-Managed Keys (SSE-S3) is a valid method for encrypting data at rest in Amazon S3 because it uses AES-256 encryption to automatically encrypt objects when they are written to S3 and decrypt them when accessed, with the encryption keys managed entirely by AWS. This option is correct as it directly addresses data at rest encryption within S3, requiring no client-side effort beyond setting the `x-amz-server-side-encryption` header to `AES256`.

Exam trap

The trap here is that candidates often confuse encryption in transit (SSL/TLS) or client-side encryption with data at rest encryption, or mistakenly think IAM policies can encrypt data, when only server-side encryption options (SSE-S3, SSE-KMS, SSE-C) are valid for encrypting data at rest in S3.

141
MCQhard

A company has an Amazon Redshift cluster with a mix of frequently accessed hot data and rarely accessed cold data. They want to reduce storage costs without affecting query performance for the hot data. Which strategy is MOST effective?

A.Use RA3 nodes with managed storage to automatically offload cold data to Amazon S3.
B.Reduce the number of nodes and increase the number of slices.
C.Create external tables in Redshift Spectrum to query cold data in S3.
D.Use Dense Compute nodes and unload cold data to Amazon S3 manually.
AnswerA

RA3 nodes use managed storage that automatically moves cold data to S3, reducing local storage costs.

Why this answer

RA3 nodes with managed storage automatically separate compute and storage, offloading cold data to Amazon S3 while keeping hot data on local SSD for fast queries. This reduces storage costs without manual intervention or affecting hot data performance.

Exam trap

The trap here is that candidates may choose Redshift Spectrum (Option C) thinking it automatically offloads cold data, but Spectrum requires manual external table creation and does not integrate with the cluster's automatic storage tiering.

How to eliminate wrong answers

Option B is wrong because reducing nodes and increasing slices does not address cold data storage; it changes cluster configuration without reducing storage costs for cold data. Option C is wrong because creating external tables in Redshift Spectrum allows querying cold data in S3 but does not automatically offload cold data from the cluster; it requires manual data movement and schema management. Option D is wrong because Dense Compute nodes are compute-optimized and do not support managed storage offloading; manually unloading cold data to S3 adds operational overhead and does not leverage automatic tiering.

142
MCQhard

A data engineer runs this CLI command. Which query is MOST efficient against this table?

A.Query the CustomerIndex GSI by CustomerID and OrderDate.
B.Scan the table to find all orders for a CustomerID.
C.Create a local secondary index on CustomerID.
D.Query by OrderID and filter by OrderDate.
AnswerA

The GSI is designed for this query pattern.

Why this answer

The CLI command likely created a global secondary index (GSI) named CustomerIndex on CustomerID and OrderDate. Querying this GSI directly is the most efficient because it uses the index's sort key to retrieve only the relevant items without scanning the entire table, minimizing read capacity consumption.

Exam trap

The trap here is that candidates often default to scanning or creating a local secondary index without recognizing that a GSI already exists and is purpose-built for the query pattern, leading to inefficient or invalid solutions.

How to eliminate wrong answers

Option B is wrong because scanning the entire table to find orders for a specific CustomerID is inefficient and costly, as it reads every item rather than using an index to directly locate the data. Option C is wrong because creating a local secondary index on CustomerID alone would require the table to have the same partition key as the base table (OrderID), which may not align with the query pattern, and it cannot be created after table creation if the table already exists without one. Option D is wrong because querying by OrderID and filtering by OrderDate is inefficient if OrderID is not the partition key for the query pattern; it would either require a scan or an index that supports the filter, and filtering after a query still consumes read capacity for all items returned by the query.

143
MCQhard

A company is using Amazon DynamoDB with on-demand capacity for a gaming application that experiences unpredictable traffic spikes. The application consistently sees 'ProvisionedThroughputExceededException' errors during spikes. The data engineer needs to resolve this issue without changing the application code. What should the engineer do?

A.Switch the table to on-demand capacity mode
B.Enable DynamoDB Accelerator (DAX) to cache read requests
C.Increase the read capacity units
D.Enable auto scaling for the table with a higher maximum capacity
AnswerA

On-demand mode automatically scales to handle traffic spikes without throttling.

Why this answer

The application is already using on-demand capacity, but the error 'ProvisionedThroughputExceededException' indicates the table is actually in provisioned mode, not on-demand. Switching to on-demand capacity mode eliminates throttling by automatically scaling throughput to match traffic spikes, with no code changes required.

Exam trap

The trap here is that candidates assume the table is already on-demand because the question states 'on-demand capacity,' but the error message 'ProvisionedThroughputExceededException' reveals the table is actually in provisioned mode, testing whether you recognize the mismatch between the stated configuration and the error.

How to eliminate wrong answers

Option B is wrong because DynamoDB Accelerator (DAX) only caches read requests to reduce latency and read load, but it does not resolve write throttling or provisioned throughput exceptions, and the error occurs during spikes regardless of read caching. Option C is wrong because increasing read capacity units only addresses read throughput, not write throughput, and the error is generic to both reads and writes; also, it requires manual intervention and does not handle unpredictable spikes. Option D is wrong because enabling auto scaling with a higher maximum capacity still uses provisioned mode, which can throttle during rapid spikes before scaling triggers, and the question specifies the table is already on-demand (though the error suggests it is not), so auto scaling is unnecessary and would not eliminate throttling for unpredictable traffic.

144
MCQeasy

A company is using Amazon S3 to store sensitive data. They need to automatically transition objects to S3 Glacier Deep Archive after 90 days and delete them after 7 years. Which S3 lifecycle configuration action should be used?

A.Transition
B.AbortIncompleteMultipartUpload
C.Expiration
D.NoncurrentVersionExpiration
AnswerA

Transition moves objects to another storage class based on age.

Why this answer

Option A is correct because the S3 lifecycle 'Transition' action is specifically designed to move objects between storage classes. To automatically move objects to S3 Glacier Deep Archive after 90 days, you define a transition rule with a 'Days' value of 90 and a 'StorageClass' of 'DEEP_ARCHIVE'. This action directly meets the requirement for transitioning data to a colder storage tier.

Exam trap

The trap here is that candidates often confuse 'Expiration' with 'Transition', thinking that deleting objects after a period is the same as moving them to a colder storage class, but expiration deletes data while transition preserves it in a different tier.

How to eliminate wrong answers

Option B is wrong because 'AbortIncompleteMultipartUpload' is used to abort multipart uploads that are not completed within a specified number of days; it does not transition or delete objects. Option C is wrong because 'Expiration' is used to delete objects after a specified time period, but the question requires a transition to Glacier Deep Archive after 90 days, not deletion at that point; expiration would delete the objects prematurely. Option D is wrong because 'NoncurrentVersionExpiration' is used to delete noncurrent versions of versioned objects, not to transition or delete current objects based on age.

145
MCQhard

A company uses Amazon Redshift for a data warehouse. They notice that queries are slow due to heavy data skew. Which optimization technique should be applied first?

A.Configure workload management (WLM) queues
B.Define sort keys on frequently filtered columns
C.Set an appropriate distribution style
D.Apply compression encodings to columns
AnswerC

Correct distribution style reduces data skew and improves query performance.

Why this answer

Data skew occurs when rows are distributed unevenly across Redshift slices, causing some nodes to process far more data than others. Setting an appropriate distribution style (e.g., KEY, EVEN, or ALL) redistributes the data to balance the workload, directly addressing the root cause of the slowness. This is the first optimization to apply because skew is a fundamental distribution issue that other tuning steps cannot fix.

Exam trap

The trap here is that candidates often confuse distribution skew with sort key optimization or compression, mistakenly believing that improving data organization on disk (sort keys) or reducing I/O (compression) will fix uneven data distribution across nodes.

How to eliminate wrong answers

Option A is wrong because WLM queues manage concurrency and memory allocation for query slots, not the physical distribution of data across nodes; they cannot fix performance degradation caused by data skew. Option B is wrong because sort keys optimize the order of data on disk to improve range-restricted scans and merge joins, but they do not redistribute data or alleviate skew across slices. Option D is wrong because compression encodings reduce storage footprint and I/O by compressing column data, but they have no effect on how rows are distributed across nodes or on query parallelism.

146
Multi-Selecthard

Which THREE factors should a data engineer consider when choosing between Amazon S3 and Amazon DynamoDB for storing time-series data? (Choose three.)

Select 3 answers
A.Required query complexity (simple key lookups vs. range scans)
B.Application latency requirements
C.Cost per GB of storage
D.Data access patterns (random vs. sequential)
E.Total data volume
AnswersA, B, D

DynamoDB excels at key lookups; S3 is better for scans.

Why this answer

Options B, C, and D are correct because data access patterns, query complexity, and latency requirements are key factors. Option A (cost) is a consideration but not specific to time-series. Option E (data volume) is less relevant as both scale.

147
MCQmedium

A company uses Amazon Redshift for its data warehouse. The data engineer notices that queries are running slower than expected. The system administrator reports that the cluster's disk space is 80% full. Which action should the engineer take to improve query performance?

A.Redesign the sort keys to optimize query performance.
B.Run the VACUUM command to reclaim space.
C.Add more nodes to the cluster to increase storage and compute capacity.
D.Enable concurrency scaling to handle more queries.
AnswerC

Adding nodes increases both storage and compute, improving performance.

Why this answer

When a Redshift cluster's disk space is 80% full, query performance degrades because Redshift relies on large sequential I/O operations, and high disk utilization forces more random I/O and increases the likelihood of spilling to disk. Adding nodes increases both storage capacity and compute resources, directly alleviating the I/O bottleneck and improving query throughput. This is the recommended scaling action when disk space exceeds 70-80% utilization.

Exam trap

The trap here is that candidates often confuse the symptom (slow queries) with a need for sort key optimization or vacuuming, when the root cause is insufficient storage capacity causing I/O bottlenecks, which only adding nodes can resolve.

How to eliminate wrong answers

Option A is wrong because redesigning sort keys optimizes data distribution and pruning for specific query patterns, but it does not address the fundamental issue of insufficient storage capacity causing I/O contention. Option B is wrong because the VACUUM command reclaims space from deleted rows and sorts data, but it does not increase total disk capacity; with 80% disk full, vacuuming may only recover a small amount of space and will not resolve the performance degradation caused by high disk utilization. Option D is wrong because concurrency scaling adds transient compute capacity to handle increased query concurrency, but it does not increase the primary cluster's storage or reduce disk space pressure; performance issues from disk fullness persist even with concurrency scaling enabled.

148
MCQeasy

A data engineer needs to store semi-structured JSON logs from AWS CloudTrail. The logs are append-only and rarely accessed after 90 days. Which storage solution is MOST cost-effective?

A.Amazon S3 Glacier Deep Archive
B.Amazon S3 Standard
C.Amazon EBS with cold HDD volumes
D.Amazon DynamoDB with on-demand capacity
AnswerA

Glacier Deep Archive offers the lowest cost for long-term archival data.

Why this answer

Amazon S3 Glacier Deep Archive is the most cost-effective storage solution for CloudTrail logs that are append-only and rarely accessed after 90 days. It offers the lowest storage cost among AWS options (approximately $0.00099 per GB/month) and is designed for data that is accessed at most once or twice per year, with retrieval times of 12–48 hours. Since the logs are rarely accessed after 90 days, the retrieval latency is acceptable, and the cost savings over S3 Standard (which costs ~$0.023 per GB/month) are substantial.

Exam trap

The trap here is that candidates may choose S3 Standard or DynamoDB because they assume CloudTrail logs need frequent querying, but the question explicitly states 'rarely accessed after 90 days,' making Glacier Deep Archive the correct cost-optimal choice despite its longer retrieval time.

How to eliminate wrong answers

Option B is wrong because Amazon S3 Standard is designed for frequently accessed data and costs significantly more than Glacier Deep Archive, making it cost-inefficient for data that is rarely accessed after 90 days. Option C is wrong because Amazon EBS with cold HDD volumes (sc1) is a block storage service intended for attached EC2 instances, not for storing append-only logs as a standalone object store; it also incurs per-GB costs and requires managing EC2 instances, leading to higher total cost and complexity. Option D is wrong because Amazon DynamoDB with on-demand capacity is a NoSQL database optimized for low-latency queries and high-frequency access, not for cost-effective archival of append-only logs; its storage cost ($0.25 per GB/month) is orders of magnitude higher than Glacier Deep Archive, and it is not designed for infrequent access patterns.

149
MCQhard

A data engineering team is managing an Amazon Redshift cluster that is used for BI reporting. The cluster has a mix of large tables (some over 1 TB) and many smaller tables. The team notices that queries on a large fact table are slow. The fact table is distributed using KEY distribution on the customer_id column, which has high cardinality. The team wants to improve query performance. They have the option to change the distribution style and sort key. Which redesign should they implement?

A.Keep the distribution style as AUTO and set the sort key to customer_id.
B.Change the distribution style to ALL and set the sort key to customer_id.
C.Change the distribution style to KEY on a different column with high cardinality.
D.Change the distribution style to EVEN and set the sort key to a date column used in WHERE clauses.
AnswerD

EVEN distributes evenly; sort key on date improves query performance.

Why this answer

Option D is correct because using EVEN distribution ensures data is evenly distributed across all nodes, avoiding data skew that can occur with KEY distribution on a high-cardinality column like customer_id. Setting the sort key to a date column used in WHERE clauses enables range-restricted scans, significantly reducing the amount of data scanned for common BI queries that filter by date. This combination improves query performance by maximizing parallelism and minimizing I/O.

Exam trap

The trap here is that candidates often assume KEY distribution on a high-cardinality column is optimal for large tables, but they overlook that even high-cardinality keys can cause severe data skew if the distribution key values are not uniformly distributed across nodes, leading to poor query performance.

How to eliminate wrong answers

Option A is wrong because AUTO distribution may default to KEY on customer_id, which already causes data skew and slow performance, and setting the sort key to customer_id does not address the distribution imbalance. Option B is wrong because ALL distribution copies the entire table to every node, which is impractical for a 1 TB fact table due to excessive storage and maintenance overhead, and it does not improve scan efficiency for large tables. Option C is wrong because changing the KEY distribution to a different high-cardinality column does not guarantee even distribution and may still lead to skew; the core issue is that KEY distribution on a high-cardinality column does not inherently balance data across slices.

150
MCQmedium

A data engineer needs to transfer 10 TB of data from an on-premises Hadoop cluster to Amazon S3. The network bandwidth is limited to 100 Mbps, and the transfer must be completed within 48 hours. Which solution meets the requirements?

A.Use AWS DataSync to transfer data online
B.Use AWS Snowball Edge device to transfer data offline
C.Use S3 Transfer Acceleration over the internet
D.Set up AWS Direct Connect to increase bandwidth
AnswerB

Snowball Edge can transfer 10 TB offline within days.

Why this answer

Option D is correct because AWS Snowball Edge can transfer large data volumes offline within the time frame despite low bandwidth. Option A (S3 Transfer Acceleration) improves speed but not enough over 100 Mbps. Option B (Direct Connect) would require setup time.

Option C (AWS DataSync) still depends on network bandwidth.

← PreviousPage 2 of 7 · 456 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Data Store Management questions.