Knowledge + Practice

CCNA Data Store Management Questions

75 of 456 questions · Page 4/7 · Data Store Management topic · Answers revealed

Practice these questions Exam hub All questions

226

MCQhard

A data engineer runs the AWS CLI command shown in the exhibit to list objects in an S3 bucket. The command returns only two objects even though the bucket contains thousands of objects under the prefix. What should the engineer do to retrieve the next batch of objects?

A.Increase the --max-items value to a larger number.

B.Use the --starting-token parameter with the value from the NextToken field.

C.Use the --page-size parameter to request more items per API call.

D.Change the --prefix to a more specific value.

AnswerB

The NextToken is used with --starting-token to get the next page of results.

Why this answer

The AWS CLI `list-objects-v2` command paginates results by default. When the output is truncated, the response includes a `NextToken` field. To retrieve the next batch, the engineer must use the `--starting-token` parameter with the value from that `NextToken` field, which tells the CLI to resume listing from where it left off.

Exam trap

The trap here is confusing `--page-size` (which controls API call size but not pagination) with `--starting-token` (which actually advances the pagination cursor), leading candidates to incorrectly choose option C.

How to eliminate wrong answers

Option A is wrong because `--max-items` controls the maximum number of items returned per paginated output, not the total number of items retrieved; increasing it would still only return a single page of up to that many items, not the next batch. Option C is wrong because `--page-size` controls the number of items requested per underlying API call (e.g., `ListObjectsV2`), but the CLI automatically handles pagination; changing it does not retrieve the next batch—it only affects the size of each API request. Option D is wrong because changing the `--prefix` would filter to a different set of objects, not retrieve the next batch of objects under the original prefix.

Practice this question →

227

MCQeasy

A company uses Amazon DynamoDB to store session data for a web application. The application experiences sudden spikes in traffic, causing occasional throttling errors. The data engineer needs to handle these spikes without over-provisioning capacity. What is the MOST cost-effective solution?

A.Set up a TTL (Time to Live) attribute to automatically delete old session data.

B.Enable DynamoDB Accelerator (DAX) to cache read requests.

C.Configure DynamoDB Auto Scaling to automatically adjust provisioned capacity.

D.Switch to DynamoDB on-demand mode.

AnswerC

Auto Scaling dynamically adapts to traffic patterns, preventing throttling and reducing cost.

Why this answer

DynamoDB Auto Scaling (option C) is the most cost-effective solution because it automatically adjusts the provisioned read/write capacity based on actual traffic patterns, handling sudden spikes without manual intervention or over-provisioning. This avoids paying for unused capacity during low-traffic periods while still accommodating bursts within the configured limits.

Exam trap

The trap here is that candidates often confuse DynamoDB on-demand mode (option D) as the default solution for unpredictable traffic, but the exam tests cost optimization—on-demand is premium-priced per request, while Auto Scaling with provisioned capacity is more cost-effective for workloads with variable but not extreme traffic patterns.

How to eliminate wrong answers

Option A is wrong because TTL (Time to Live) only deletes expired session data to reduce storage costs and stale items, but it does not address throttling errors caused by capacity limits during traffic spikes. Option B is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that improves read performance and reduces read throttling, but it does not help with write throttling or capacity management for sudden spikes. Option D is wrong because DynamoDB on-demand mode automatically scales to handle any traffic level, but it is significantly more expensive for predictable or steady-state workloads compared to provisioned capacity with Auto Scaling, making it less cost-effective for this use case.

Practice this question →

228

MCQmedium

A data engineer needs to store JSON documents that are accessed by a key-value pattern. The workload requires single-digit millisecond latency at any scale. Which AWS service is most appropriate?

A.Amazon DocumentDB (with MongoDB compatibility)

B.Amazon RDS for PostgreSQL

C.Amazon DynamoDB

D.Amazon Neptune

AnswerC

Key-value and document store with consistent low latency.

Why this answer

Amazon DynamoDB is a key-value and document database that provides single-digit millisecond latency at any scale. RDS is relational, Neptune is graph, and DocumentDB is MongoDB compatible but may not guarantee same latency at extreme scale.

Practice this question →

229

MCQmedium

A data engineer is designing a data lake on Amazon S3. The data is accessed frequently for the first 30 days, then rarely after that. The engineer needs to minimize storage costs while ensuring data is available within minutes for the first 30 days and can be retrieved within 12 hours after that. Which lifecycle policy should be applied?

A.Transition to S3 One Zone-IA after 30 days.

B.Transition to S3 Standard-IA after 30 days.

C.Transition to S3 Glacier Deep Archive after 30 days.

D.Transition to S3 Glacier Flexible Retrieval after 30 days.

AnswerC

Cost-effective for rarely accessed data with 12-hour retrieval.

Why this answer

Option C is correct because S3 Glacier Deep Archive offers the lowest storage cost for data that is rarely accessed after 30 days, and its retrieval time (within 12 hours) matches the requirement. The lifecycle policy transitions objects from S3 Standard (or S3 Intelligent-Tiering) to S3 Glacier Deep Archive after 30 days, minimizing costs while meeting the 12-hour retrieval window.

Exam trap

AWS often tests the distinction between retrieval time and cost, leading candidates to choose S3 Glacier Flexible Retrieval (Option D) because it is a 'Glacier' tier, but they overlook that Deep Archive is cheaper and still meets the 12-hour retrieval requirement.

How to eliminate wrong answers

Option A is wrong because S3 One Zone-IA is designed for infrequently accessed data that can be recreated if lost, but it does not provide the lowest cost for long-term archival and its retrieval is immediate, not within 12 hours. Option B is wrong because S3 Standard-IA is for infrequently accessed data with immediate retrieval, but it is more expensive than Glacier Deep Archive for data that is rarely accessed after 30 days. Option D is wrong because S3 Glacier Flexible Retrieval offers retrieval times from minutes to hours (typically 1-5 minutes for expedited, 3-5 hours for standard), but it is more expensive than Glacier Deep Archive and does not meet the 12-hour retrieval requirement as precisely as Deep Archive.

Practice this question →

230

Multi-Selectmedium

A data engineer is designing a data lake on Amazon S3 that will be accessed by multiple AWS Glue ETL jobs. The engineer needs to ensure that the data is organized efficiently for querying and that sensitive columns are masked for certain users. Which TWO actions should the engineer take? (Choose TWO.)

Select 2 answers

A.Use AWS Lake Formation to define column-level permissions for sensitive data.

B.Configure AWS Glue Data Catalog to automatically mask sensitive columns in table definitions.

C.Organize data in S3 using a partition structure like 'year=YYYY/month=MM/day=DD/region=XX/'.

D.Use S3 object tags to label sensitive data and apply bucket policies to restrict access.

E.Implement S3 lifecycle policies to transition sensitive data to S3 Glacier after 30 days.

AnswersA, C

Lake Formation provides column-level security to mask sensitive columns.

Why this answer

Option A is correct because AWS Lake Formation provides fine-grained access control at the column level, allowing you to mask or restrict sensitive columns (e.g., PII) for specific IAM roles or users without altering the underlying data in S3. This is achieved through Lake Formation’s column-level permissions and data filtering, which integrate directly with the AWS Glue Data Catalog and query engines like Athena and Redshift Spectrum.

Exam trap

The trap here is that candidates often confuse S3 object tags or bucket policies with fine-grained column-level access control, or assume the Glue Data Catalog can natively mask columns, when in fact only Lake Formation provides that capability.

Practice this question →

231

MCQeasy

A startup uses Amazon S3 to store user-uploaded images. The images are accessed frequently for the first week after upload, but after that they are rarely accessed. The company wants to optimize storage costs without compromising availability. The data engineer must implement a lifecycle policy to transition objects to a more cost-effective storage class after 30 days. The objects must be retrievable within minutes. Which storage class should the engineer transition the objects to?

A.S3 One Zone-Infrequent Access

B.S3 Standard

C.S3 Standard-Infrequent Access

D.S3 Glacier Instant Retrieval

AnswerC

Cost-effective for infrequent access with rapid retrieval.

Why this answer

S3 Standard-Infrequent Access (S3 Standard-IA) is the correct choice because it offers a per-GB storage cost lower than S3 Standard while maintaining high durability (99.999999999%) and availability (99.9%), with retrieval times in milliseconds. The lifecycle policy transitions objects after 30 days, aligning with the access pattern where images are rarely accessed after the first week, and the requirement for retrieval within minutes is satisfied by S3 Standard-IA's instant access.

Exam trap

The trap here is that candidates confuse 'retrievable within minutes' with S3 Glacier Instant Retrieval, overlooking that S3 Standard-IA also provides immediate access and is more cost-effective for data that is rarely accessed but not archival, especially with a 30-day transition window that avoids the 90-day minimum storage charge of Glacier Instant Retrieval.

How to eliminate wrong answers

Option A is wrong because S3 One Zone-Infrequent Access stores data in a single Availability Zone, which compromises availability and durability (99.99% object durability) and does not meet the requirement of 'without compromising availability'. Option B is wrong because S3 Standard is the default storage class with higher storage costs, and transitioning to it would not optimize costs; it is intended for frequently accessed data, not for rarely accessed objects after 30 days. Option D is wrong because S3 Glacier Instant Retrieval, while offering millisecond retrieval, is designed for long-term archival with a minimum storage duration charge of 90 days, making it cost-ineffective for a 30-day transition and not aligned with the 'rarely accessed' pattern described.

Practice this question →

232

MCQmedium

A data engineer needs to store semi-structured JSON data that is accessed infrequently but must be retrievable within minutes. The data is generated by IoT devices and each object is about 500 KB. The engineer wants the most cost-effective storage solution. Which AWS service should be used?

A.Amazon S3 Glacier Deep Archive

B.Amazon S3 Standard

C.Amazon S3 Standard-Infrequent Access (S3 Standard-IA)

D.Amazon Elastic Block Store (EBS)

AnswerC

Cost-effective for infrequent access with rapid retrieval.

Why this answer

Amazon S3 Standard-Infrequent Access (S3 Standard-IA) is the correct choice because it is designed for data that is accessed infrequently but requires rapid retrieval (within minutes). The 500 KB JSON objects from IoT devices fit the use case, and S3 Standard-IA offers lower storage costs than S3 Standard while maintaining the same low-latency retrieval performance, making it the most cost-effective option for this scenario.

Exam trap

The trap here is that candidates often confuse 'infrequent access' with 'archival' and choose Glacier Deep Archive, overlooking the retrieval time requirement of 'within minutes' which S3 Standard-IA satisfies but Glacier does not.

How to eliminate wrong answers

Option A is wrong because Amazon S3 Glacier Deep Archive is intended for long-term archival data with retrieval times of 12 hours or more, not within minutes, and its retrieval costs are higher for urgent access. Option B is wrong because Amazon S3 Standard is optimized for frequently accessed data with higher storage costs, making it less cost-effective for infrequently accessed IoT data. Option D is wrong because Amazon Elastic Block Store (EBS) is a block-level storage service designed for EC2 instances, not for storing semi-structured JSON objects as a standalone data store, and it incurs costs even when not in use.

Practice this question →

233

MCQhard

A company runs a MySQL-compatible Amazon Aurora database for its e-commerce platform. The database experiences high write latency during peak hours. The application performs frequent INSERT and UPDATE operations on a table with 50 million rows. The DB instance is db.r5.large with 500 GB of Provisioned IOPS storage. A recent performance analysis shows that the average queue depth is consistently above 32 and write latency exceeds 50 ms. The company needs to reduce write latency without changing the application code. What should a data engineer do?

A.Enable Aurora Auto Scaling to automatically add reader instances.

B.Convert the cluster to Aurora Serverless v2 to automatically scale compute capacity.

C.Resize the DB instance to a larger instance type such as db.r5.xlarge.

D.Create a read replica and configure the application to offload read queries.

AnswerC

Increasing instance size provides more CPU and memory, reducing queue depth and write latency.

Why this answer

Option C is correct because the high queue depth (consistently above 32) and write latency exceeding 50 ms indicate that the current db.r5.large instance is CPU or I/O constrained for the write workload. Resizing to a larger instance type such as db.r5.xlarge increases the available vCPUs, memory, and network bandwidth, which directly reduces queue depth and write latency by allowing more concurrent write operations to be processed. This solution does not require application code changes and addresses the root cause of insufficient compute capacity for the frequent INSERT and UPDATE operations on the 50-million-row table.

Exam trap

The trap here is that candidates confuse read scaling solutions (Auto Scaling, read replicas) with write performance improvements, or assume that Aurora Serverless v2 automatically solves all performance issues without considering that write latency is often tied to instance size and storage configuration.

How to eliminate wrong answers

Option A is wrong because Aurora Auto Scaling adds reader instances to handle read traffic, not write traffic; write operations are always handled by the writer instance, so adding readers does not reduce write latency. Option B is wrong because converting to Aurora Serverless v2 changes the scaling model but does not guarantee lower write latency; it may even introduce cold start delays or scaling cooldown periods that could worsen latency during peak hours, and it does not directly address the queue depth issue caused by insufficient instance resources. Option D is wrong because creating a read replica and offloading read queries does not affect write latency; write operations still go to the writer instance, which remains the bottleneck.

Practice this question →

234

Multi-Selectmedium

A company is using Amazon S3 to store sensitive financial data. They need to ensure that all objects are encrypted at rest. Which TWO methods can achieve this? (Choose TWO.)

Select 2 answers

A.Enable S3 Transfer Acceleration on the bucket.

B.Apply a bucket policy that denies PutObject without encryption.

C.Use SSE-KMS to encrypt objects with AWS KMS.

D.Use client-side encryption before uploading objects.

E.Enable default encryption on the S3 bucket using SSE-S3.

AnswersC, E

SSE-KMS provides server-side encryption with customer-managed keys.

Why this answer

Options A and B are correct. SSE-S3 and SSE-KMS both provide encryption at rest. Option C is wrong because client-side encryption is not managed by S3.

Option D is wrong because bucket policies do not enforce encryption; they require encryption. Option E is wrong because S3 Transfer Acceleration is for speed, not encryption.

Practice this question →

235

MCQhard

A company is building a real-time analytics dashboard using Amazon Kinesis Data Streams and Amazon DynamoDB. The data engineer needs to ensure that the DynamoDB table can handle write spikes without throttling. Which approach is the most cost-effective?

A.Use DynamoDB Accelerator (DAX) to cache writes.

B.Use provisioned capacity with auto-scaling set to a maximum of 10,000 WCU.

C.Use an Amazon Lambda function to buffer writes and batch them to DynamoDB.

D.Use DynamoDB on-demand capacity mode.

AnswerD

On-demand scales instantly and is cost-effective for unpredictable workloads.

Why this answer

DynamoDB on-demand capacity mode automatically scales to handle write spikes without requiring capacity planning or management, making it the most cost-effective choice for unpredictable workloads like a real-time analytics dashboard. It charges per request, so you only pay for the writes you actually use, avoiding over-provisioning costs.

Exam trap

The trap here is that candidates often confuse DAX's read caching with write handling, or assume that auto-scaling provisioned capacity can handle sudden spikes instantly, when in reality it scales gradually and can still throttle during rapid bursts.

How to eliminate wrong answers

Option A is wrong because DAX is an in-memory cache for reads, not writes; it does not prevent write throttling. Option B is wrong because provisioned capacity with auto-scaling can still throttle during sudden spikes due to the lag in scaling up, and setting a maximum of 10,000 WCU may be insufficient or wasteful. Option C is wrong because using Lambda to buffer writes adds latency and complexity, and while it can batch writes, it does not eliminate the risk of throttling if the batch rate exceeds the table's capacity.

Practice this question →

236

MCQeasy

A data engineer needs to store time-series sensor data from thousands of IoT devices. The data is written once, read frequently for the last 24 hours, and rarely accessed after 30 days. Which storage solution is MOST cost-effective?

A.Amazon Redshift with automatic compression and distribution keys.

B.Amazon DynamoDB with TTL to expire data after 30 days.

C.Amazon Timestream with a 30-day retention policy.

D.Amazon S3 with lifecycle policies to transition to S3 Glacier after 30 days.

AnswerC

Timestream is designed for time-series data, with cost-effective tiered storage and built-in analytics.

Why this answer

Amazon Timestream is purpose-built for time-series data, offering automatic storage tiering where recent data resides in memory for fast queries and historical data is moved to a cost-optimized magnetic store. A 30-day retention policy aligns perfectly with the requirement to keep data accessible for frequent reads over the last 24 hours while automatically expiring older data, minimizing storage costs without manual intervention.

Exam trap

The trap here is that candidates often choose Amazon S3 with lifecycle policies (Option D) because they associate S3 with cost-effective storage, but they overlook the requirement for frequent reads of recent data, which S3 cannot serve with low latency without additional caching layers, and they miss that Timestream is the only AWS service natively designed for time-series data with automatic tiering and retention.

How to eliminate wrong answers

Option A is wrong because Amazon Redshift is a columnar data warehouse optimized for complex analytical queries on structured data, not for high-frequency time-series ingestion from thousands of IoT devices; its cost and overhead are excessive for simple sensor data storage. Option B is wrong because Amazon DynamoDB with TTL is designed for key-value and document workloads, not for time-series queries like range scans over the last 24 hours; TTL only deletes expired items but does not provide efficient time-based querying or automatic tiering, leading to higher read costs and complexity. Option D is wrong because Amazon S3 with lifecycle policies to transition to S3 Glacier after 30 days is cost-effective for archival but does not support low-latency frequent reads for the last 24 hours without additional services like S3 Select or Athena, and S3 is not optimized for high-write, time-ordered data ingestion from IoT devices.

Practice this question →

237

MCQeasy

The exhibit shows the output of describe-table for a DynamoDB table. The application is experiencing throttling errors when reading data. What is the MOST likely cause?

A.The sort key is not used correctly for queries.

B.The table has a hot partition due to the HASH key.

C.The table size is too large, causing slow reads.

D.The table's provisioned read capacity is too low.

AnswerD

5 RCUs is very low; if the application reads more than 5 RCUs, throttling occurs.

Why this answer

The describe-table output shows the table has provisioned read capacity set to 5, but the application is experiencing throttling errors. Throttling occurs when read requests exceed the provisioned read capacity units (RCUs). Increasing the read capacity or implementing retries with exponential backoff would resolve this.

The throttling is directly caused by insufficient provisioned read capacity for the workload.

Exam trap

The trap here is that candidates may confuse throttling with performance issues like hot partitions or inefficient queries, but the describe-table output directly shows low provisioned read capacity, making insufficient capacity the most likely cause.

How to eliminate wrong answers

Option A is wrong because the sort key not being used correctly would cause inefficient queries (e.g., full table scans) but not necessarily throttling; throttling is a capacity issue, not a query pattern issue. Option B is wrong because a hot partition due to the HASH key would cause throttling on specific partitions, but the question states the application is experiencing throttling errors when reading data generally, not just on a single partition; the describe-table output does not indicate a hot partition. Option C is wrong because table size does not directly cause throttling; DynamoDB can handle large tables efficiently with proper partitioning, and throttling is based on provisioned capacity, not storage size.

Practice this question →

238

MCQmedium

A data engineer needs to migrate an on-premises PostgreSQL database to Amazon RDS for PostgreSQL. The database is 2 TB and has a continuous stream of write operations. The migration should minimize downtime. Which AWS service should be used?

A.AWS DataSync

B.AWS Database Migration Service (DMS)

C.AWS Snowball Edge

D.AWS Glue

AnswerB

DMS supports ongoing replication and minimal downtime for database migrations.

Why this answer

Option B is correct because AWS DMS supports ongoing replication from on-premises PostgreSQL to RDS, allowing minimal downtime. Option A (S3) is for bulk data transfer. Option C (Snowball) is for offline data transfer, not for ongoing replication.

Option D (Glue) is for ETL, not for database migration with replication.

Practice this question →

239

MCQmedium

A company is using Amazon S3 to store sensitive data. To meet compliance requirements, they need to automatically transition objects to S3 Glacier Deep Archive after 90 days and delete them after 7 years. What is the MOST cost-effective way to configure this?

A.Configure an S3 Lifecycle policy to transition objects to Glacier Deep Archive after 90 days and expire them after 7 years.

B.Manually move objects to Glacier Deep Archive and delete them using a script.

C.Use S3 Intelligent-Tiering to automatically move objects to Glacier Deep Archive and set expiration.

D.Enable S3 Object Lock with a retention period of 7 years and use a lifecycle policy to transition to Glacier Deep Archive.

AnswerA

Lifecycle policies provide automated transitions and expirations based on object age.

Why this answer

Option C is correct because a lifecycle policy can transition objects based on age and delete them after a specified period. Option A is wrong because manual deletion is error-prone and not automated. Option B is wrong because S3 Intelligent-Tiering does not delete objects.

Option D is wrong because S3 Object Lock is for retention, not lifecycle management.

Practice this question →

240

MCQmedium

Refer to the exhibit. A data engineer is troubleshooting write throttling on the Orders table. The table has a composite primary key (OrderID as partition key, CustomerID as sort key). The engineer notices that writes are throttled even though the write capacity is not fully utilized. What is the most likely cause?

A.The table is empty and has no items.

B.A global secondary index (GSI) is consuming write capacity.

C.The read capacity units are too low.

D.Writes are concentrated on a single partition key value.

AnswerD

Hot partition causes throttling even if total capacity is not exceeded.

Why this answer

D is correct because write throttling on an Amazon DynamoDB table occurs when requests exceed the provisioned throughput for a specific partition, even if the overall table write capacity is underutilized. With a composite primary key where OrderID is the partition key, writes concentrated on a single OrderID value (e.g., a hot key) will hit that partition's 3,000 WCU or 1,000 WCU (on-demand) limit, causing throttling while other partitions remain idle.

Exam trap

AWS often tests the misconception that overall table capacity utilization is the sole indicator of throttling, but the trap here is that throttling can occur at the partition level due to hot keys, even when the table's total write capacity is underutilized.

How to eliminate wrong answers

Option A is wrong because an empty table does not cause write throttling; throttling is based on capacity consumption, not table size. Option B is wrong because a global secondary index (GSI) consumes write capacity from its own provisioned throughput, not from the base table's write capacity, and the question states the write capacity is not fully utilized. Option C is wrong because read capacity units are independent of write operations; low RCU would throttle reads, not writes.

Practice this question →

241

Multi-Selecthard

A company runs an Amazon RDS for PostgreSQL instance for an OLTP application. The database size is 500 GB. The company wants to minimize downtime during backups and ensure point-in-time recovery (PITR) for the last 7 days. Which TWO features should the company use? (Choose TWO.)

Select 2 answers

A.Enable Multi-AZ deployment for high availability.

B.Create a read replica in a different Availability Zone.

C.Enable automated backups with a retention period of 7 days.

D.Create daily manual snapshots and copy them to another region.

E.Enable Enhanced Monitoring to track backup progress.

AnswersA, C

Multi-AZ reduces downtime during automated backups by taking backups from the standby.

Why this answer

Automated backups with a 7-day retention provide PITR. Multi-AZ deployment ensures high availability and reduces downtime during backups. Manual snapshots are not needed.

Enhanced Monitoring is for performance. Read replicas are for read scaling, not backup.

Practice this question →

242

MCQmedium

A company stores sensitive data in an Amazon S3 bucket. A compliance requirement mandates that all data must be encrypted at rest with a key that is automatically rotated every year. The company also needs to maintain an audit trail of who used the key. Which solution meets these requirements?

A.Use AWS KMS customer managed keys (SSE-KMS) with automatic key rotation enabled.

B.Use customer-provided encryption keys (SSE-C) and rotate keys manually.

C.Use S3 managed keys (SSE-S3) and enable S3 server access logs.

D.Configure a bucket policy to enforce encryption using the 'aws:SecureTransport' condition.

AnswerA

KMS customer managed keys support automatic annual rotation and CloudTrail auditing.

Why this answer

Option C is correct because SSE-KMS uses AWS KMS customer managed keys which support automatic annual rotation and provide CloudTrail audit logs for key usage. Option A is wrong because SSE-S3 does not provide key usage audit trail. Option B is wrong because SSE-C requires customer to manage keys and rotation.

Option D is wrong because bucket policies do not encrypt data.

Practice this question →

243

MCQmedium

A company is migrating its on-premises Oracle database to Amazon RDS for Oracle. The database is 2 TB in size and has a 24-hour maintenance window. The migration must have minimal downtime. Which AWS service should be used for the migration?

A.Amazon RDS native backup and restore

B.AWS Database Migration Service (DMS)

C.Amazon S3 Transfer Acceleration

D.AWS Snowball Edge

AnswerB

DMS supports minimal downtime migrations.

Why this answer

AWS Database Migration Service (DMS) is the correct choice because it supports heterogeneous migrations (Oracle to RDS for Oracle) with minimal downtime using ongoing replication via Oracle LogMiner or binary reader. DMS can perform a full load of the 2 TB database and then continuously replicate changes from the source to the target during the 24-hour maintenance window, allowing a final cutover with only seconds of downtime.

Exam trap

The trap here is that candidates often confuse AWS Snowball Edge for any large data migration, but Snowball is designed for offline transfers where downtime is acceptable, not for minimal-downtime online migrations that require continuous replication.

How to eliminate wrong answers

Option A is wrong because Amazon RDS native backup and restore requires creating a backup file from the on-premises Oracle database and restoring it into RDS, which involves significant downtime for the backup and restore process, and does not support ongoing replication for minimal downtime. Option C is wrong because Amazon S3 Transfer Acceleration is a service for speeding up uploads to S3 over the internet, but it does not provide database migration capabilities, schema conversion, or ongoing replication needed for a live database migration. Option D is wrong because AWS Snowball Edge is a physical data transfer device for moving large volumes of data (e.g., 2 TB) offline, which introduces days of latency for shipping and cannot achieve minimal downtime; it also lacks the ability to capture and apply ongoing transactional changes during transit.

Practice this question →

244

Multi-Selectmedium

A company uses Amazon DynamoDB as a key-value store for a high-traffic application. The table has a provisioned read capacity of 10,000 RCUs and write capacity of 5,000 WCUs. The application experiences occasional throttling during peak hours. Which TWO actions can reduce throttling without changing the application code? (Choose TWO.)

Select 2 answers

A.Enable DynamoDB Auto Scaling for both read and write capacity.

B.Add a DynamoDB Accelerator (DAX) cluster for read-heavy workloads.

C.Switch to on-demand capacity mode for the table.

D.Change the read consistency model from eventually consistent to strongly consistent.

E.Use DynamoDB Global Tables to distribute the workload across regions.

AnswersA, B

Auto Scaling adjusts capacity based on traffic patterns.

Why this answer

Enabling Auto Scaling adjusts capacity automatically. Adding a read replica (DAX) reduces read load. Adding a Global Table does not increase capacity.

Changing to on-demand mode can help but may increase cost. Reducing read consistency is not recommended.

Practice this question →

245

Multi-Selecthard

A data engineer is troubleshooting slow query performance on an Amazon Redshift cluster. The cluster has 10 nodes and is using automatic distribution style. The engineer suspects that data distribution is causing excessive data movement. Which steps should the engineer take to diagnose and resolve the issue? (Choose THREE.)

Select 3 answers

A.Choose appropriate distribution keys for large tables

B.Use the EXPLAIN command to analyze query plans

C.Run the VACUUM command to reclaim space

D.Query the STL_DIST and STL_BCAST system tables

E.Increase the number of nodes in the cluster

AnswersA, B, D

Proper distribution keys minimize data movement.

Why this answer

Option A is correct because choosing appropriate distribution keys for large tables ensures that data is evenly distributed across the cluster slices, minimizing the need for data redistribution during joins and aggregations. Automatic distribution style may not always select the optimal key, leading to excessive data movement and slow query performance.

Exam trap

The trap here is that candidates often confuse VACUUM (which only reorganizes data within slices) with distribution optimization, or assume scaling out nodes automatically resolves distribution-related performance issues without addressing the underlying key choice.

Practice this question →

246

MCQhard

An application uses the 'orders' DynamoDB table with the schema and provisioned throughput shown in the exhibit. The application frequently queries by customer_id (range key) without specifying the order_id (partition key). What is the most likely impact on performance?

A.Queries will require a full table scan, consuming significant read capacity.

B.Queries will be throttled because the table does not have a global secondary index.

C.Queries will be fast because the sort key is indexed.

D.Queries will cause hot partitions on the table.

AnswerA

Without partition key, DynamoDB scans the entire table.

Why this answer

Option B is correct because queries without a partition key result in a full table scan, which is inefficient and consumes read capacity. Option A is incorrect because the query will not use the sort key efficiently. Option C is incorrect because the issue is not hot partitions but full scans.

Option D is incorrect because it will not cause throttling immediately but will consume capacity.

Practice this question →

247

MCQmedium

A company is using Amazon RDS for MySQL with Multi-AZ deployment. The database size is 2 TB and the workload is read-heavy. To improve read performance, which option should be used?

A.Use Amazon ElastiCache to cache database queries

B.Increase the instance size to 16xlarge

C.Create Read Replicas in the same or different regions

D.Enable Multi-AZ on additional instances

AnswerC

Read Replicas allow offloading read traffic.

Why this answer

Option C is correct because Read Replicas offload read traffic from the primary instance. Option A is wrong because Multi-AZ is for high availability, not read scaling. Option B is wrong because increasing instance size helps but is less efficient than adding replicas.

Option D is wrong because ElastiCache is for caching, not directly for MySQL read scaling.

Practice this question →

248

MCQhard

A data engineer attaches the above IAM policy to an IAM user. The user tries to download an object from my-bucket using the AWS CLI without specifying SSE headers. The object is stored with SSE-S3. Will the download succeed?

A.No, because the object is encrypted and the user does not have decrypt permission.

B.No, because the request does not include the required encryption header.

C.Yes, because the object is encrypted with SSE-S3, which uses AES256.

D.Yes, because the policy allows s3:GetObject on the bucket.

AnswerB

The condition requires the request to have x-amz-server-side-encryption: AES256.

Why this answer

Option B is correct because when an object is stored with SSE-S3, AWS S3 requires that any request to download it without specifying the `x-amz-server-side-encryption` header (or the equivalent CLI parameter) will fail. The IAM policy grants `s3:GetObject` but does not override the S3 API's requirement for the encryption header to be present in the request. Without the header, S3 rejects the request with a `400 Bad Request` error, even though the user has the necessary IAM permissions.

Exam trap

The trap here is that candidates assume SSE-S3 decryption is fully transparent and that any `s3:GetObject` permission suffices, overlooking the S3 API's requirement for the encryption header on GET requests for SSE-S3 objects.

How to eliminate wrong answers

Option A is wrong because SSE-S3 uses server-side encryption managed by AWS, and the user does not need a separate 'decrypt permission' — S3 handles decryption transparently when the request includes the required encryption header. Option C is wrong because while the object is encrypted with AES256 under SSE-S3, the download will still fail if the request does not include the required `x-amz-server-side-encryption` header; the encryption algorithm alone does not bypass the header requirement. Option D is wrong because the IAM policy allows `s3:GetObject`, but the S3 API enforces an additional condition: for SSE-S3 objects, the request must include the encryption header; the policy alone is insufficient to guarantee success.

Practice this question →

249

MCQmedium

A data engineer is troubleshooting a slow-running query on Amazon Redshift. The query scans a large table but returns few rows. Which diagnostic step should be taken first?

A.Use EXPLAIN to review the query plan.

B.Run ANALYZE on the table.

C.Check the concurrency scaling status.

D.Run VACUUM on the table.

AnswerA

Reveals how the query is executed, helps identify bottlenecks.

Why this answer

When a query scans a large table but returns few rows, the most likely cause is an inefficient query plan—such as a full table scan instead of using indexes or zone maps. Using EXPLAIN first reveals the execution plan, allowing the engineer to identify whether the query is performing unnecessary sequential scans, missing filter pushdown, or using suboptimal join strategies. This diagnostic step should always precede tuning actions like ANALYZE or VACUUM, which address data distribution or storage bloat rather than query planning.

Exam trap

The trap here is that candidates often jump to performance-tuning commands like ANALYZE or VACUUM without first diagnosing the query plan, but the DEA-C01 exam emphasizes that EXPLAIN is the foundational step for identifying inefficient scan patterns before applying any corrective actions.

How to eliminate wrong answers

Option B is wrong because ANALYZE updates table statistics for the query optimizer, but if the query plan is already suboptimal (e.g., missing a WHERE clause filter), fresh statistics won't fix the root cause—EXPLAIN must be checked first. Option C is wrong because concurrency scaling handles increased query load by adding cluster capacity, but it does not improve the efficiency of a single slow query that scans many rows unnecessarily. Option D is wrong because VACUUM reclaims disk space and sorts rows for better compression, but it does not change the query execution path—a full table scan will remain a full table scan even after a vacuum.

Practice this question →

250

Multi-Selecthard

A company is migrating a large Oracle database to Amazon Aurora PostgreSQL. The migration must have minimal downtime and preserve data consistency. Which THREE AWS services or features should be used?

Select 3 answers

A.Amazon RDS for Oracle as the target

B.AWS DataSync for initial load

C.AWS Schema Conversion Tool (SCT) for schema conversion

D.Amazon Aurora PostgreSQL as the target database

E.AWS Database Migration Service (DMS) for continuous replication

AnswersC, D, E

SCT converts Oracle schema to Aurora PostgreSQL compatible schema.

Why this answer

The AWS Schema Conversion Tool (SCT) is required to convert the source Oracle database schema (including stored procedures, functions, and data types) to a format compatible with Amazon Aurora PostgreSQL. Without SCT, the heterogeneous migration would fail due to incompatible SQL dialects and database objects.

Exam trap

The trap here is that candidates often confuse AWS DataSync (a file-transfer service) with database migration tools, or mistakenly think RDS for Oracle can serve as a migration target when the question explicitly specifies Aurora PostgreSQL.

Practice this question →

251

MCQmedium

A company is using Amazon RDS for PostgreSQL with Multi-AZ deployment. The primary instance fails and a failover occurs. After the failover, the application cannot connect to the database. What is the MOST likely cause?

A.The database instance is in a 'stopped' state after failover.

B.The Multi-AZ failover requires manual intervention to complete.

C.The security group for the RDS instance was not updated during failover.

D.The application is using the old primary instance endpoint instead of the RDS CNAME.

AnswerD

The application should use the CNAME, which updates automatically after failover.

Why this answer

After a Multi-AZ failover in Amazon RDS for PostgreSQL, the DNS CNAME record is automatically updated to point to the new primary instance in the standby Availability Zone. If the application is configured with the old primary instance's endpoint (the specific IP or DNS name of the original instance) instead of the RDS CNAME (which remains constant), it will attempt to connect to the failed instance, which is no longer available. This is the most likely cause of connectivity loss because the CNAME is the stable connection point that follows the active primary.

Exam trap

The trap here is that candidates may assume security groups or instance state are the issue, but AWS explicitly tests the concept that the RDS CNAME is the correct connection target and that hardcoding endpoints leads to failover failures.

How to eliminate wrong answers

Option A is wrong because RDS Multi-AZ failover does not stop the database instance; the new primary is promoted and remains in an 'available' state. Option B is wrong because Multi-AZ failover is fully automated and requires no manual intervention to complete. Option C is wrong because security groups are associated with the RDS instance itself, not with a specific AZ or IP, and they remain unchanged during failover; the new primary inherits the same security group configuration.

Practice this question →

252

MCQhard

The exhibit shows the output of describe-table for a DynamoDB table. The table is used for a reporting job that queries by 'pk' and filters on 'sk' using a range condition. The job is running slowly. What is the most likely cause?

A.The table lacks a global secondary index (GSI).

B.The provisioned read capacity is too low.

C.The table uses provisioned throughput instead of on-demand.

D.The table needs a local secondary index (LSI) on 'sk'.

AnswerB

5 RCU is very low for reporting queries.

Why this answer

The table has only 5 read capacity units, which is likely too low for the reporting job. Auto scaling or increasing RCU would help. Indexes and LSI are not shown to be causing slowness.

Practice this question →

253

MCQmedium

A company is storing sensitive user data in an Amazon S3 bucket. The security team requires that all data be encrypted at rest using a customer-managed key stored in AWS KMS. The bucket policy must deny any PUT request that does not include the appropriate encryption header. Which bucket policy condition key should be used?

A.s3:x-amz-server-side-encryption-aws-kms-key-id

B.s3:x-amz-server-side-encryption

C.s3:x-amz-acl

D.aws:SourceArn

AnswerA

This condition key allows requiring a specific KMS key ID for encryption.

Why this answer

Option A is correct because the `s3:x-amz-server-side-encryption-aws-kms-key-id` condition key allows the bucket policy to enforce that PUT requests include a specific customer-managed KMS key ID in the `x-amz-server-side-encryption-aws-kms-key-id` header, ensuring encryption at rest with the required key. This directly meets the security team's requirement to deny PUT requests that lack the appropriate encryption header tied to a customer-managed KMS key.

Exam trap

The trap here is that candidates often confuse `s3:x-amz-server-side-encryption` (which only checks the encryption algorithm, not the key) with `s3:x-amz-server-side-encryption-aws-kms-key-id` (which checks the specific KMS key ID), leading them to pick option B when the requirement explicitly demands a customer-managed key.

How to eliminate wrong answers

Option B is wrong because `s3:x-amz-server-side-encryption` only checks whether the `x-amz-server-side-encryption` header is present (e.g., `AES256` or `aws:kms`), but it cannot enforce that a specific customer-managed KMS key ID is used; it would allow any KMS key, including AWS-managed keys. Option C is wrong because `s3:x-amz-acl` is used to control access control list (ACL) headers in requests, not encryption headers, so it is irrelevant to encryption enforcement. Option D is wrong because `aws:SourceArn` is a global condition key used to restrict requests based on the ARN of the source resource (e.g., an SNS topic or Lambda function), not to enforce encryption headers in S3 PUT requests.

Practice this question →

254

Multi-Selectmedium

A company is designing a data lake on Amazon S3 for analytics. The data includes sensitive personally identifiable information (PII). Which TWO actions should the company take to protect the data? (Choose TWO.)

Select 2 answers

A.Enable S3 Block Public Access.

B.Enable Requester Pays.

C.Enable S3 Transfer Acceleration.

D.Enable cross-region replication.

E.Enable default encryption with SSE-KMS.

AnswersA, E

Why this answer

Options A and D are correct. Option A is correct because encrypting at rest with SSE-KMS provides encryption and key management. Option D is correct because S3 Block Public Access prevents accidental public exposure.

Option B is wrong because S3 Transfer Acceleration is for speed, not security. Option C is wrong because cross-region replication does not protect data in the source bucket. Option E is wrong because Requester Pays does not control access.

Practice this question →

255

MCQmedium

A data engineer runs the above CLI command to describe the DynamoDB table 'Orders'. The table has a partition key 'OrderID' and sort key 'CustomerID'. Which query operation is most efficient for retrieving all orders for a specific customer?

A.Query the table using CustomerID as the partition key

B.Scan the table and filter by CustomerID

C.Use GetItem with CustomerID as the key

D.Create a Global Secondary Index on CustomerID and query the index

AnswerD

A GSI allows efficient query by CustomerID alone.

Why this answer

Option D is correct because a Global Secondary Index (GSI) on CustomerID allows you to query efficiently using CustomerID as the partition key, avoiding a full table scan. Since the base table's primary key is (OrderID, CustomerID), you cannot directly query by CustomerID alone; a GSI provides an alternative access pattern optimized for this query.

Exam trap

The trap here is that candidates assume the sort key can be used as a query filter without an index, but DynamoDB requires the partition key for Query operations, and a Scan is often mistakenly chosen as a simpler alternative despite its performance cost.

How to eliminate wrong answers

Option A is wrong because CustomerID is the sort key, not the partition key, so a Query operation requires the partition key (OrderID) to be specified; you cannot query using only the sort key. Option B is wrong because a Scan reads every item in the table, which is inefficient and costly for large datasets, especially when a targeted query is possible. Option C is wrong because GetItem requires both the partition key and sort key to retrieve a single item; it cannot return multiple orders for a customer.

Practice this question →

256

MCQhard

Refer to the exhibit. A data engineer applies this bucket policy to an S3 bucket. A user within the 10.0.0.0/24 IP range attempts to upload an object to the bucket using an HTTP (non-HTTPS) request. What is the outcome?

A.The upload succeeds because the Allow statement grants permission.

B.The upload succeeds because the user's IP is allowed.

C.The upload fails because the user's IP is not in the allowed range for PutObject.

D.The upload fails because the request is not using HTTPS.

AnswerD

Explicit Deny for non-HTTPS requests.

Why this answer

Option B is correct because the Deny statement explicitly denies all s3 actions when SecureTransport is false, regardless of the Allow statement. The Explicit Deny overrides Allow. Option A is wrong because the Deny applies.

Option C is wrong because the IP condition is only on Allow. Option D is wrong because the policy is valid.

Practice this question →

257

MCQmedium

A data engineer is migrating an on-premises MongoDB database to Amazon DocumentDB. Which migration strategy minimizes downtime?

A.Take a snapshot of the MongoDB database and restore it to DocumentDB.

B.Use AWS Database Migration Service (AWS DMS) with full load only.

C.Export data using mongodump and import using mongorestore.

D.Use AWS DMS with full load and ongoing replication from MongoDB to DocumentDB.

AnswerD

Ongoing replication allows near-zero downtime cutover.

Why this answer

AWS DMS with full load and ongoing replication (change data capture) minimizes downtime by continuously synchronizing changes from the source MongoDB to the target DocumentDB after the initial full load, allowing a cutover with only a brief pause. This is the only option that supports near-zero downtime migration for live databases.

Exam trap

The trap here is that candidates assume any AWS DMS migration automatically minimizes downtime, but only the full load plus ongoing replication (CDC) option achieves near-zero downtime, while full load only still requires a write stop.

How to eliminate wrong answers

Option A is wrong because taking a snapshot and restoring it captures only a point-in-time copy, requiring the source database to be offline or read-only during the snapshot, causing downtime. Option B is wrong because AWS DMS full load only transfers the current data once, without capturing ongoing changes, so any writes during the migration are lost and downtime is needed to stop writes before cutover. Option C is wrong because mongodump and mongorestore are offline tools that require the source MongoDB to stop accepting writes during the export, resulting in significant downtime.

Practice this question →

258

Multi-Selecthard

A data engineer is troubleshooting an AWS Glue ETL job that reads from an S3 bucket and writes to a DynamoDB table. The job fails with an AccessDeniedException. The IAM role attached to the Glue job has the policy shown in the exhibit. Which TWO additional permissions are required to resolve the issue?

Select 2 answers

A.kms:Decrypt

B.dynamodb:DescribeTable

C.iam:PassRole

D.s3:ListBucket

E.glue:GetJobRun

AnswersC, E

Glue needs to pass the role to itself.

Why this answer

Option C is correct because the AWS Glue job must pass the IAM role to itself when it runs, which requires the `iam:PassRole` permission. Without this, the job cannot assume the role and will fail with an AccessDeniedException, even if the role has the necessary S3 and DynamoDB permissions.

Exam trap

The trap here is that candidates often focus on the S3 or DynamoDB permissions listed in the exhibit and overlook the fundamental requirement for Glue to pass the role to itself, which is a prerequisite for any Glue job execution.

Practice this question →

259

MCQhard

A financial services company stores transaction data in Amazon RDS for PostgreSQL. The company requires that all changes to the database be logged for audit purposes, including before and after images of updated rows. Which feature should the data engineer enable?

A.Enable automated backups and export logs to Amazon S3

B.Enable Enhanced Monitoring and publish logs to CloudWatch Logs

C.Set up logical replication using pglogical or native publication/subscription

D.Enable Multi-AZ deployment and read replicas

AnswerC

Logical replication provides row-level changes with before and after images.

Why this answer

Option C is correct because logical replication, using either pglogical or native PostgreSQL publication/subscription, captures row-level changes (INSERT, UPDATE, DELETE) and can include both the old and new values of updated rows. This meets the audit requirement for before-and-after images, as logical replication decodes the write-ahead log (WAL) to produce a change stream that includes full row snapshots.

Exam trap

The trap here is that candidates confuse database-level logging features (like Enhanced Monitoring or automated backups) with row-level change data capture, assuming any logging mechanism will capture before-and-after images, when only logical replication (or triggers with audit tables) provides that granularity.

How to eliminate wrong answers

Option A is wrong because automated backups capture point-in-time snapshots of the entire database, not a continuous, row-level change stream with before-and-after images; exporting logs to S3 provides error logs or slow query logs, not row-level audit trails. Option B is wrong because Enhanced Monitoring collects OS-level metrics (CPU, memory, disk I/O) and publishes them to CloudWatch Logs, not database row changes. Option D is wrong because Multi-AZ deployment provides high availability via synchronous standby replication, and read replicas serve read traffic; neither logs individual row modifications or provides before-and-after images.

Practice this question →

260

MCQhard

A company uses Amazon EMR to process large datasets stored in Amazon S3. The cluster uses a transient configuration and stores intermediate data on HDFS. After a job fails due to a spot instance termination, the data engineer needs to rerun the job. What should the engineer do to minimize data loss and cost?

A.Use a long-running cluster with all on-demand instances to avoid interruptions.

B.Configure the cluster with a mix of spot and on-demand instances and set HDFS replication to 3.

C.Configure the cluster to use EMRFS and store intermediate data in S3.

D.Increase the HDFS replication factor to 5 and use only spot instances.

AnswerB

This balances cost and reliability; on-demand instances provide stability for HDFS NameNode and critical nodes.

Why this answer

Option B is correct because using a mix of spot and on-demand instances balances cost savings with fault tolerance, while setting HDFS replication to 3 ensures that intermediate data on HDFS survives the loss of a single node (e.g., a terminated spot instance). This allows the job to resume from the last checkpoint or reduce recomputation, minimizing both data loss and cost.

Exam trap

The trap here is that candidates assume storing intermediate data in S3 (EMRFS) is always better for durability, but they overlook that HDFS replication with spot/on-demand mix provides a cheaper, faster recovery for transient cluster workloads without incurring S3 write costs.

How to eliminate wrong answers

Option A is wrong because using all on-demand instances eliminates cost savings from spot instances and does not address the transient cluster design; a long-running cluster increases costs unnecessarily. Option C is wrong because storing intermediate data in S3 via EMRFS introduces higher latency and cost for transient data, and EMRFS does not natively support checkpointing for HDFS-dependent intermediate data. Option D is wrong because increasing HDFS replication to 5 consumes more storage and network bandwidth, and using only spot instances increases the risk of frequent failures without on-demand fallback, leading to higher recomputation costs.

Practice this question →

261

MCQeasy

A data engineer needs to store archival data that is rarely accessed but must be retained for 7 years. The data should be retrievable within 12 hours. Which Amazon S3 storage class is MOST cost-effective?

A.S3 Intelligent-Tiering

B.S3 Glacier Flexible Retrieval

C.S3 Standard

D.S3 Glacier Deep Archive

AnswerD

Lowest cost with 12-hour retrieval.

Why this answer

Option C is correct because Glacier Deep Archive is the cheapest storage class with retrieval within 12 hours. Option A (Standard) is expensive. Option B (Glacier Flexible Retrieval) is more expensive.

Option D (Intelligent-Tiering) may not archive automatically.

Practice this question →

262

MCQeasy

A data engineer needs to store semi-structured JSON files that are accessed infrequently but must be retrievable within minutes. The data should be stored cost-effectively. Which storage solution meets these requirements?

A.Amazon S3 Glacier Flexible Retrieval storage class.

B.Amazon S3 Glacier Deep Archive storage class.

C.Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class.

D.Amazon S3 Standard storage class.

AnswerC

S3 Standard-IA is cost-effective for infrequent access with millisecond retrieval.

Why this answer

Option B is correct because S3 Glacier Deep Archive is cost-effective for infrequently accessed data with retrieval within minutes. Wait, Glacier retrieval is minutes to hours, but Deep Archive is 12 hours. Actually S3 Glacier Instant Retrieval is for infrequent access with millisecond retrieval.

But the question says 'within minutes', so S3 Glacier Flexible Retrieval (minutes to hours) or S3 Standard-IA (milliseconds). However, Standard-IA is for infrequent but immediate access. The best cost-effective option for infrequent access with retrieval within minutes is S3 Standard-IA.

Option A is for frequent access. Option C is for archives with longer retrieval. Option D is for archival with very long retrieval.

So Option A is too expensive, C and D have longer retrieval times. Thus B is correct.

Practice this question →

263

MCQeasy

A data engineer is designing a data lake on Amazon S3. The data is accessed frequently for the first 30 days, then rarely accessed after 90 days, and must be archived after 1 year. Which S3 lifecycle policy configuration meets these requirements with the lowest cost?

A.Transition to S3 One Zone-IA after 30 days, then to S3 Glacier Deep Archive after 365 days.

B.Transition to S3 Glacier Flexible Retrieval after 90 days.

C.Transition to S3 Glacier Instant Retrieval after 30 days, then expire after 365 days.

D.Transition to S3 Standard-IA after 30 days, then to S3 Glacier Deep Archive after 365 days.

AnswerD

Standard-IA is cost-effective for infrequent access, and Deep Archive is the cheapest archival tier.

Why this answer

Option B is correct because it transitions to S3 Standard-IA after 30 days (cost-effective for infrequent access) and to S3 Glacier Deep Archive after 365 days (lowest cost archival). Option A transitions to S3 One Zone-IA, which is not recommended for durability. Option C transitions to S3 Glacier Flexible Retrieval, which costs more than Deep Archive.

Option D transitions to S3 Glacier Instant Retrieval, which is expensive for archival.

Practice this question →

264

MCQeasy

A retail company stores customer transaction data in an Amazon S3 bucket. The data is encrypted using server-side encryption with AWS KMS (SSE-KMS). The company uses an IAM role to allow an Amazon Athena query service to read the data. The data engineer creates a new Athena workgroup and attempts to run a query on the S3 bucket. The query fails with an access denied error. The IAM role has permissions to decrypt the KMS key and read from the bucket. The engineer checks the S3 bucket policy and finds that it does not explicitly allow access. What is the most likely cause of the failure?

A.The S3 bucket is in a different AWS account than the Athena workgroup.

B.The S3 bucket policy does not grant the required permissions to the Athena service principal.

C.The IAM role does not have permission to use the KMS key for encryption operations.

D.Athena does not support querying data encrypted with SSE-KMS.

AnswerB

The S3 bucket policy must explicitly allow the Athena service or the IAM role to access the bucket.

Why this answer

Option D is correct because although the IAM role has permissions, the S3 bucket policy might explicitly deny access or not grant access to the Athena service. Option A is wrong because the IAM role has encryption permissions. Option B is wrong because cross-account access is not mentioned; the role is in the same account.

Option C is wrong because Athena can query encrypted data with proper permissions.

Practice this question →

265

MCQmedium

A company stores sensitive data in Amazon S3. They need to ensure that all objects are encrypted at rest. Which approach meets this requirement with minimal effort?

A.Use client-side encryption before uploading

B.Enable default encryption on the S3 bucket with SSE-S3

C.Enable S3 Versioning and MFA Delete

D.Use a bucket policy to deny PutObject without encryption

AnswerB

Automatically encrypts all new objects with minimal effort.

Why this answer

Enabling S3 default encryption (SSE-S3) ensures all new objects are encrypted automatically. Bucket policies can enforce encryption but require more effort.

Practice this question →

266

MCQmedium

Refer to the exhibit. An IAM policy is attached to a user. The user cannot upload objects to the S3 bucket 'example-bucket' using the AWS CLI. What is the most likely cause?

A.The user is not using HTTPS for API calls

B.The policy does not allow s3:PutObject

C.The user is not in the same AWS region

D.The resource ARN does not include the bucket itself

AnswerA

The condition aws:SecureTransport requires HTTPS.

Why this answer

The IAM policy explicitly denies all actions unless the request uses HTTPS (via the `aws:SecureTransport` condition key). Since the AWS CLI by default can use HTTP if not explicitly configured to use HTTPS, the user's upload attempt fails. The `s3:PutObject` action is allowed in the policy, but the condition block overrides that permission when the request is not made over HTTPS.

Exam trap

AWS often tests the `aws:SecureTransport` condition key as a hidden denial, leading candidates to incorrectly assume the action is missing or the ARN is malformed.

How to eliminate wrong answers

Option B is wrong because the policy does allow `s3:PutObject` on the bucket; the issue is the condition key, not the action. Option C is wrong because S3 is a global service and bucket operations are not restricted by the user's AWS region; the region is specified in the bucket ARN, not the user's location. Option D is wrong because the resource ARN `arn:aws:s3:::example-bucket/*` correctly includes all objects within the bucket, which is the standard way to grant object-level permissions; the bucket itself is not needed for object uploads.

Practice this question →

267

MCQeasy

A data engineer is designing a data lake on Amazon S3. The data includes customer PII that must be encrypted at rest. The company also requires that the encryption keys be rotated automatically every year. Which encryption solution should the engineer use?

A.SSE-KMS with automatic key rotation enabled

B.SSE-S3

C.SSE-C

D.Client-side encryption with AWS KMS

AnswerA

SSE-KMS allows you to use a customer managed key with automatic annual rotation, giving you control and auditability.

Why this answer

SSE-KMS with automatic key rotation enabled meets both requirements: it encrypts data at rest in S3 and allows the company to automatically rotate the customer master key (CMK) every year. AWS KMS supports automatic annual rotation for symmetric CMKs, which satisfies the compliance need without manual intervention.

Exam trap

The trap here is that candidates often assume SSE-S3 provides automatic key rotation, but SSE-S3 rotates keys on a schedule managed entirely by AWS (approximately every 90 days) and does not allow customer control over the rotation frequency, whereas SSE-KMS with automatic rotation meets the explicit annual requirement.

How to eliminate wrong answers

Option B (SSE-S3) is wrong because while it encrypts data at rest, it does not support automatic key rotation; the encryption keys are managed and rotated by S3 but not on a customer-defined schedule. Option C (SSE-C) is wrong because it requires the customer to provide and manage their own encryption keys, and AWS does not handle key rotation, making it unsuitable for automated annual rotation. Option D (Client-side encryption with AWS KMS) is wrong because it encrypts data before sending it to S3, but the key rotation applies only to the KMS key used for client-side encryption, not to the S3-side encryption; moreover, client-side encryption adds complexity and does not directly address the requirement for encryption at rest within S3.

Practice this question →

268

MCQhard

A company runs an e-commerce platform on AWS. The product catalog is stored in Amazon DynamoDB with a table that has a partition key of 'product_id' and a sort key of 'category'. The application frequently queries products by category and by product_id. Recently, the operations team noticed that read latency has increased significantly for queries that filter by category. The DynamoDB table has auto scaling enabled. The data engineer examines the CloudWatch metrics and sees that the ReadThrottleEvents metric is non-zero for the table, but the consumed read capacity is well below the provisioned limit. The table has a global secondary index (GSI) on the 'category' attribute. Which action is most likely to resolve the latency issue?

A.Switch the table to DynamoDB on-demand capacity mode.

B.Enable DynamoDB Accelerator (DAX) to cache read queries.

C.Increase the provisioned read capacity on the main table.

D.Redesign the GSI partition key to include a random suffix to distribute load across multiple partitions.

AnswerD

This prevents a hot partition on the GSI.

Why this answer

The issue is that the GSI on 'category' is experiencing hot partitions because 'category' has low cardinality, causing uneven data distribution. The non-zero ReadThrottleEvents on the GSI (not the main table) indicate throttling on the GSI's provisioned capacity, even though the main table's consumed read capacity is below its limit. Adding a random suffix to the GSI partition key distributes reads across multiple physical partitions, reducing hot spots and latency.

Exam trap

The trap here is that candidates assume throttling always relates to the base table's provisioned capacity, overlooking that GSIs have independent capacity and can throttle even when the base table is underutilized, especially with low-cardinality sort keys like 'category'.

How to eliminate wrong answers

Option A is wrong because switching to on-demand mode would not resolve the hot partition issue; it only eliminates the need to manage provisioned capacity but does not fix the underlying data skew that causes throttling on the GSI. Option B is wrong because DynamoDB Accelerator (DAX) caches read results but does not address the root cause of throttling on the GSI due to uneven partition access; it would only mask the symptom for cached queries. Option C is wrong because increasing provisioned read capacity on the main table does not affect the GSI's separate capacity; the throttling is occurring on the GSI, not the base table, and the consumed read capacity on the main table is already well below its limit.

Practice this question →

269

MCQhard

A company is migrating an on-premises PostgreSQL database to Amazon Aurora PostgreSQL. The database is 2 TB in size. The migration must have minimal downtime. Which approach should the data engineer use?

A.Use AWS Schema Conversion Tool (SCT) to convert the schema and then copy data using S3.

B.Create an Aurora read replica from the on-premises database using native replication.

C.Use AWS Database Migration Service (DMS) with ongoing replication to minimize downtime.

D.Use pg_dump to export the database and pg_restore to import into Aurora.

AnswerC

DMS supports full load + CDC for minimal downtime.

Why this answer

AWS DMS with ongoing replication (change data capture) is the correct approach because it allows a full load of the 2 TB database followed by continuous replication of changes from the on-premises PostgreSQL source to the Aurora PostgreSQL target, minimizing downtime to a short cutover window. This is the only option that supports near-zero downtime migration for large databases by capturing ongoing transactions.

Exam trap

The trap here is that candidates often confuse native PostgreSQL replication (e.g., streaming replication) with AWS-managed replication, assuming an Aurora read replica can be created from any PostgreSQL source, but Aurora read replicas are only supported within the Aurora cluster itself.

How to eliminate wrong answers

Option A is wrong because AWS SCT is used for schema conversion (e.g., from Oracle to PostgreSQL), but the source is already PostgreSQL, so no schema conversion is needed; copying data via S3 adds unnecessary complexity and does not provide ongoing replication for minimal downtime. Option B is wrong because Aurora read replicas can only be created from an existing Aurora DB instance, not from an on-premises PostgreSQL database; native PostgreSQL replication (e.g., streaming replication) is not supported across an on-premises-to-Aurora boundary without a custom intermediary. Option D is wrong because pg_dump/pg_restore is a logical backup and restore method that requires the source database to be offline or in a read-only state during the dump to ensure consistency, resulting in significant downtime for a 2 TB database.

Practice this question →

270

MCQmedium

A data engineer is designing a data lake on Amazon S3. The data is accessed frequently for the first 30 days, then rarely after that. Compliance requires that data be retained for 7 years. What is the MOST cost-effective storage strategy?

A.Use S3 Intelligent-Tiering for the entire 7 years.

B.Store all data in S3 Standard for 7 years.

C.Use S3 Standard for 30 days, then S3 One Zone-IA for 7 years.

D.Use S3 Standard for 30 days, then transition to S3 Standard-IA, then to S3 Glacier Deep Archive after 90 days.

AnswerD

Standard-IA for infrequent access, then Glacier Deep Archive for long-term retention.

Why this answer

Option D is the most cost-effective because it matches the access pattern: S3 Standard for the first 30 days of frequent access, then S3 Standard-IA for the next 60 days (reduced storage cost with a retrieval fee), and finally S3 Glacier Deep Archive for the remaining ~6.9 years to meet the 7-year retention requirement at the lowest possible storage cost. This lifecycle policy minimizes total cost by transitioning data to progressively cheaper storage tiers as access frequency drops, while still meeting compliance.

Exam trap

AWS often tests the misconception that S3 Intelligent-Tiering is always the most cost-effective for unknown access patterns, but in this scenario with a known access pattern (frequent for 30 days, then rarely), a lifecycle policy with explicit transitions is cheaper because it avoids the per-object monitoring fee of Intelligent-Tiering.

How to eliminate wrong answers

Option A is wrong because S3 Intelligent-Tiering incurs a monthly monitoring and automation fee per object, which over 7 years would be more expensive than a lifecycle-based approach, especially for rarely accessed data. Option B is wrong because storing all data in S3 Standard for 7 years is the most expensive option, as it does not take advantage of lower-cost tiers for data that is rarely accessed after 30 days. Option C is wrong because S3 One Zone-IA does not provide the durability or availability needed for long-term compliance (data is stored in a single Availability Zone) and is more expensive than Glacier Deep Archive for data accessed rarely over 7 years.

Practice this question →

271

MCQhard

A company uses Amazon DynamoDB with provisioned capacity. During a sales event, write traffic spikes and some requests receive ProvisionedThroughputExceeded exceptions. The reads are within limits. The data engineer needs to minimize latency for the spike without manual intervention. Which solution is MOST cost-effective?

A.Use Amazon SQS to buffer write requests and process them in batches.

B.Disable auto scaling and set write capacity to the peak observed value.

C.Enable DynamoDB auto scaling for write capacity with a target utilization of 70%.

D.Enable DynamoDB Accelerator (DAX) to cache write operations.

AnswerC

Auto scaling adjusts capacity dynamically based on traffic, handling spikes cost-effectively.

Why this answer

Option C is correct because DynamoDB auto scaling for write capacity automatically adjusts the provisioned write capacity units (WCUs) based on the actual traffic pattern, using a target utilization of 70% to balance cost and performance. This eliminates manual intervention and handles spikes efficiently by scaling up before throttling occurs, while remaining cost-effective since capacity scales down when traffic subsides.

Exam trap

The trap here is that candidates may confuse DAX as a solution for write performance, but DAX only accelerates reads (via caching) and does not mitigate write throttling, leading to an incorrect choice of Option D.

How to eliminate wrong answers

Option A is wrong because using Amazon SQS to buffer write requests introduces additional latency for processing batches, which contradicts the requirement to minimize latency during the spike, and it adds complexity and cost for queue management. Option B is wrong because disabling auto scaling and setting write capacity to the peak observed value is wasteful and costly, as it permanently allocates high capacity that is only needed during spikes, and it requires manual intervention to adjust. Option D is wrong because DynamoDB Accelerator (DAX) is an in-memory cache for read operations, not writes; it does not reduce write throttling or ProvisionedThroughputExceeded exceptions, and it adds cost without addressing the write capacity issue.

Practice this question →

272

MCQeasy

A company stores critical financial data in Amazon DynamoDB. To meet compliance requirements, the data must be encrypted at rest with a customer-managed key. Which solution should the data engineer implement?

A.Configure the DynamoDB table to use a customer managed key from AWS KMS.

B.Use AWS CloudHSM to generate a key and import it into DynamoDB.

C.Enable default encryption on the DynamoDB table using S3-managed keys.

D.Use AWS Certificate Manager to issue a certificate and configure TLS.

AnswerA

DynamoDB integrates with KMS for customer-managed keys.

Why this answer

Option C is correct because DynamoDB supports encryption at rest using AWS KMS customer managed keys. Option A is for S3. Option B is for RDS.

Option D is not a DynamoDB feature.

Practice this question →

273

MCQhard

A company is using Amazon DynamoDB with on-demand capacity for a gaming application. During a new game launch, write traffic spikes to 50,000 writes per second, but the application experiences throttling. The DynamoDB table has a partition key of 'game_id' and a sort key of 'timestamp'. What is the MOST likely cause of throttling?

A.The table has not enabled auto-scaling for writes.

B.The table's on-demand capacity is insufficient for the write spike.

C.Hot partitions due to a skewed access pattern on the partition key 'game_id'.

D.The sort key is not optimal for write-heavy workloads.

AnswerC

If many writes go to the same partition key, that partition can be throttled even with on-demand capacity.

Why this answer

Option C is correct because DynamoDB on-demand capacity automatically scales to handle traffic spikes, but it still has per-partition throughput limits. With 'game_id' as the partition key, a single popular game can create a hot partition where all writes target the same partition, exceeding the partition's maximum write capacity (1,000 write capacity units per partition) and causing throttling, even though the overall table capacity is sufficient.

Exam trap

The trap here is that candidates assume on-demand capacity eliminates all throttling, but they overlook DynamoDB's per-partition throughput limits, which can cause throttling on hot partitions even with on-demand mode.

How to eliminate wrong answers

Option A is wrong because on-demand capacity does not use auto-scaling; it automatically adjusts capacity without needing auto-scaling enabled. Option B is wrong because on-demand capacity is designed to handle sudden spikes without manual provisioning, so insufficient capacity is not the issue—the problem is partition-level limits. Option D is wrong because the sort key does not affect write throughput distribution; partition key selection determines write distribution, and a sort key is irrelevant to throttling caused by hot partitions.

Practice this question →

274

MCQmedium

A data engineer needs to store semi-structured JSON logs from multiple microservices in a cost-effective manner for ad-hoc querying using SQL. Which AWS service should be used?

A.Amazon Athena with data in S3

B.Amazon DynamoDB

C.Amazon RDS for MySQL

D.Amazon Kinesis Data Analytics

AnswerA

Athena can query JSON in S3 directly using SQL, cost-effective for ad-hoc queries.

Why this answer

Amazon Athena is the correct choice because it allows you to query semi-structured JSON logs stored in S3 directly using standard SQL, without needing to load or transform the data. Athena's schema-on-read approach and pay-per-query pricing make it highly cost-effective for ad-hoc analysis of large volumes of log data, as you only pay for the data scanned during queries.

Exam trap

The trap here is that candidates often confuse Amazon Athena with Amazon Kinesis Data Analytics, mistakenly thinking that Kinesis is the go-to service for SQL-based log analysis, when in fact Kinesis is for real-time streaming and Athena is the correct serverless query service for stored data in S3.

How to eliminate wrong answers

Option B (Amazon DynamoDB) is wrong because it is a NoSQL key-value and document database optimized for low-latency, high-throughput transactional workloads, not for ad-hoc SQL querying of semi-structured logs; it lacks native SQL support and would require expensive scanning of large datasets. Option C (Amazon RDS for MySQL) is wrong because it requires you to predefine a schema, load the JSON logs into relational tables, and pay for provisioned compute and storage even when idle, making it less cost-effective for sporadic ad-hoc queries compared to Athena's serverless model. Option D (Amazon Kinesis Data Analytics) is wrong because it is designed for real-time stream processing and analytics on streaming data using SQL, not for querying stored JSON logs in S3; it would require continuous ingestion and incurs ongoing costs regardless of query frequency.

Practice this question →

275

MCQhard

A company uses Amazon RDS for MySQL with Multi-AZ deployment. During a recent failover, the application experienced a brief outage because it cached the old database endpoint. Which solution would minimize application disruption during future failovers?

A.Use Amazon ElastiCache to cache database queries and absorb the failover delay.

B.Create a read replica in another Availability Zone and promote it during failover.

C.Configure the application to connect using the RDS instance's private IP address.

D.Use the RDS cluster endpoint in the application configuration.

AnswerD

The cluster endpoint automatically points to the current primary instance, enabling seamless failover.

Why this answer

The correct answer is D because the RDS cluster endpoint (also known as the writer endpoint) automatically points to the primary instance in a Multi-AZ deployment. During a failover, DNS is updated to resolve the cluster endpoint to the new primary instance, so the application does not need to cache or change the endpoint. This minimizes disruption by ensuring the application always connects to the current primary without manual intervention.

Exam trap

The trap here is that candidates confuse the cluster endpoint with the instance endpoint or private IP, assuming that static IPs or read replicas provide better failover behavior, when in fact the cluster endpoint is specifically designed for automatic failover in Multi-AZ deployments.

How to eliminate wrong answers

Option A is wrong because ElastiCache caches query results, not database endpoints; it does not address the DNS resolution or endpoint caching issue during a failover. Option B is wrong because promoting a read replica creates a new standalone instance with a different endpoint, requiring application reconfiguration and causing longer disruption than Multi-AZ automatic failover. Option C is wrong because private IP addresses can change after a failover (the new primary may have a different IP), and using IPs bypasses DNS-based failover mechanisms, leading to connectivity failures.

Practice this question →

276

MCQeasy

A company is using Amazon S3 as a data lake. The data engineer needs to ensure that all objects uploaded to a specific bucket are automatically replicated to a bucket in another AWS Region for disaster recovery. Which configuration should the engineer implement?

A.Enable S3 Same-Region Replication (SRR) on the source bucket.

B.Enable S3 Cross-Region Replication (CRR) on the source bucket.

C.Use S3 Transfer Acceleration to copy objects to the destination.

D.Use S3 Batch Operations to copy existing objects.

AnswerB

CRR replicates objects across regions automatically.

Why this answer

S3 Cross-Region Replication (CRR) is the correct choice because it automatically replicates objects from a source bucket in one AWS Region to a destination bucket in a different AWS Region, meeting the disaster recovery requirement for geographic separation. CRR requires versioning to be enabled on both buckets and replicates new objects asynchronously after upload.

Exam trap

The trap here is that candidates confuse S3 Transfer Acceleration (which speeds up uploads) with replication, or assume S3 Batch Operations can be used for ongoing replication, when only CRR provides automatic, cross-region object replication for disaster recovery.

How to eliminate wrong answers

Option A is wrong because S3 Same-Region Replication (SRR) replicates objects within the same AWS Region, not across regions, so it does not provide disaster recovery across geographic boundaries. Option C is wrong because S3 Transfer Acceleration speeds up uploads over long distances using AWS edge locations but does not replicate objects to another bucket; it only improves transfer performance for clients. Option D is wrong because S3 Batch Operations is used for bulk actions like copying existing objects or tagging, but it is a one-time operation, not an automatic, ongoing replication configuration for new objects.

Practice this question →

277

MCQhard

A company is using Amazon DynamoDB with auto scaling enabled. During a marketing campaign, write traffic spikes, and some write requests fail with ProvisionedThroughputExceededException. The auto scaling policy has a target utilization of 70% and a maximum capacity that is high enough. What is the most likely cause of the throttling?

A.The table has a global secondary index that is throttling.

B.Auto scaling cannot react quickly enough to sudden traffic spikes.

C.The table does not have enough maximum capacity.

D.The auto scaling policy is not configured correctly.

AnswerB

Auto scaling has a lag, so sudden spikes can cause throttling.

Why this answer

Auto scaling in DynamoDB adjusts capacity based on the average utilization over a period (typically 5-10 minutes). When a sudden traffic spike occurs, the write requests can exceed the current provisioned capacity before the auto scaling policy has time to react and increase the capacity. This delay causes ProvisionedThroughputExceededException errors, even though the maximum capacity is set high enough.

Exam trap

The trap here is that candidates assume a correctly configured auto scaling policy with sufficient maximum capacity will always prevent throttling, ignoring the inherent latency in auto scaling's reaction to sudden, short-lived traffic spikes.

How to eliminate wrong answers

Option A is wrong because a throttling global secondary index (GSI) would cause its own ProvisionedThroughputExceededException, but the question states the write requests fail directly on the table, and a GSI throttling would typically manifest as errors on writes that affect the index, not necessarily all table writes. Option C is wrong because the question explicitly states that the maximum capacity is high enough, so insufficient maximum capacity is not the cause. Option D is wrong because the auto scaling policy is configured with a target utilization of 70% and a high enough maximum capacity, which is a standard and correct configuration; the issue is the inherent lag in auto scaling's response to sudden spikes, not a misconfiguration.

Practice this question →

278

MCQmedium

A data engineer is analyzing a DynamoDB table for a session management application. The table currently has 10,000 items and is 1 MB in size. The application expects 1,000 writes per second during peak hours. What should the data engineer do to accommodate the write workload?

A.Reduce the item size to improve write performance.

B.Use global tables to distribute writes across regions.

C.Increase the write capacity units (WCU) for the table.

D.Enable DynamoDB Accelerator (DAX) to offload writes.

AnswerC

The current WCU is 5, insufficient for 1,000 writes/second.

Why this answer

Option C is correct because the table has 5 write capacity units (WCU), which allows only 5 writes per second. To handle 1,000 writes per second, the WCU must be increased. Option A is wrong because DAX is a read cache.

Option B is wrong because global tables replicate writes, but don't increase capacity. Option D is wrong because reducing item size doesn't increase write capacity enough.

Practice this question →

279

MCQhard

A company is designing a data lake on Amazon S3. The data includes personal identifiable information (PII). The data engineer must ensure that only authorized users can access the data, and that access is logged for auditing. Which combination of services should the data engineer use?

A.S3 bucket policies with IAM policies and AWS CloudTrail with data events

B.Amazon S3 access points and VPC endpoints

C.Amazon Macie to discover PII and S3 Object Lock to prevent deletion

D.AWS KMS to encrypt data and AWS CloudTrail to log access

AnswerA

Bucket and IAM policies control access; CloudTrail logs data events for auditing.

Why this answer

Option C is correct because S3 bucket policies and IAM policies control access, and CloudTrail logs data events for auditing. Option A is wrong because KMS is for encryption, not access control or logging. Option B is wrong because VPC endpoints are for network isolation, not logging.

Option D is wrong because Macie is for data discovery and classification, not access control.

Practice this question →

280

MCQhard

A data engineer is troubleshooting an access denied error when an AWS Lambda function tries to decrypt an object encrypted with the KMS key 'abc123'. The Lambda function's execution role has the above policy attached. What is the likely cause of the error?

A.The Deny statement blocks all decrypt requests

B.The Lambda function does not have permission to call kms:GenerateDataKey

C.The KMS key policy does not grant the Lambda role decrypt permission

D.The IAM policy does not include kms:Decrypt permission

AnswerC

Key policies must also grant access; IAM alone may not be sufficient.

Why this answer

The error occurs because KMS requires both the IAM policy and the key policy to grant the necessary permissions. While the IAM policy attached to the Lambda execution role includes kms:Decrypt, the KMS key policy for 'abc123' does not explicitly grant the Lambda role permission to call kms:Decrypt. Since KMS key policies act as a resource-based policy, they must allow the principal (the Lambda role) to perform the action; otherwise, the request is denied even if the IAM policy allows it.

Exam trap

The trap here is that candidates assume IAM permissions alone are sufficient for KMS operations, overlooking that KMS key policies must explicitly grant access to the IAM role, which is a common source of access denied errors in cross-account or cross-service scenarios.

How to eliminate wrong answers

Option A is wrong because the Deny statement in the policy only blocks decrypt requests that do not include the encryption context 'Project=Alpha', not all decrypt requests; the error is likely due to missing key policy permissions, not a blanket Deny. Option B is wrong because the error is about decrypting an object, not generating a data key; kms:GenerateDataKey is used for encryption operations, not decryption, and the Lambda function is trying to decrypt, not encrypt. Option D is wrong because the IAM policy shown in the question includes kms:Decrypt permission (the policy lists kms:Decrypt as an allowed action), so the issue is not a missing IAM permission but rather the KMS key policy not granting the Lambda role decrypt permission.

Practice this question →

281

MCQeasy

A company uses Amazon DynamoDB as the primary data store for a gaming application. The application stores user profiles and game state. During peak hours, the application experiences throttling on writes to the UserProfiles table. The table's read capacity is underutilized. Which solution should resolve the write throttling?

A.Increase the provisioned write capacity units for the table.

B.Enable DynamoDB Accelerator (DAX) on the table.

C.Add a global secondary index (GSI) to the table.

D.Configure auto scaling for read capacity units.

AnswerA

Increasing write capacity resolves write throttling.

Why this answer

Write throttling occurs when the number of write requests exceeds the provisioned write capacity units (WCUs) for the DynamoDB table. Since the read capacity is underutilized, the correct solution is to increase the provisioned WCUs to accommodate the peak write traffic. This directly addresses the capacity deficit without affecting read operations.

Exam trap

The trap here is that candidates may confuse read and write capacity solutions, such as selecting DAX (which only helps reads) or auto scaling for reads, when the issue is specifically write throttling.

How to eliminate wrong answers

Option B is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that improves read performance, not write throughput; it does not increase write capacity or reduce write throttling. Option C is wrong because adding a global secondary index (GSI) consumes additional write capacity from the base table and can actually increase write throttling, not resolve it. Option D is wrong because auto scaling for read capacity units does not affect write throttling; write throttling requires adjusting write capacity, not read capacity.

Practice this question →

282

MCQmedium

A data engineer is designing a data lake on Amazon S3. The data consists of sensitive personally identifiable information (PII) that must be encrypted at rest. The company requires that encryption keys be rotated every 90 days and that access to the keys be logged. Which encryption solution meets these requirements?

A.Use server-side encryption with customer-provided keys (SSE-C).

B.Use client-side encryption with a master key stored in AWS Secrets Manager.

C.Use server-side encryption with S3 managed keys (SSE-S3).

D.Use server-side encryption with AWS KMS (SSE-KMS) and enable automatic key rotation.

AnswerD

SSE-KMS provides customer-managed keys with rotation and logging via CloudTrail.

Why this answer

SSE-KMS with automatic key rotation (Option D) meets the requirements because it encrypts data at rest in S3, allows key rotation every 90 days via AWS KMS automatic rotation, and logs all key usage in AWS CloudTrail for auditing. This provides the necessary encryption, rotation, and access logging without managing keys externally.

Exam trap

The trap here is that candidates often confuse SSE-S3's automatic annual rotation with the required 90-day rotation, or assume SSE-C or client-side encryption can meet logging and rotation requirements without realizing they lack native AWS rotation and auditing capabilities.

How to eliminate wrong answers

Option A is wrong because SSE-C requires the customer to provide and manage their own encryption keys, and AWS does not support automatic key rotation or logging of key access for customer-provided keys. Option B is wrong because client-side encryption encrypts data before it reaches S3, but storing the master key in AWS Secrets Manager does not provide automatic key rotation every 90 days (Secrets Manager rotation is configurable but not native to KMS key rotation) and does not log key usage in CloudTrail as KMS does. Option C is wrong because SSE-S3 uses S3-managed keys that are rotated annually (not every 90 days) and do not provide granular access logging for key usage.

Practice this question →

283

MCQhard

A CloudFormation template includes this IAM policy for a cross-account S3 upload use case. What is the purpose of the condition?

A.To enforce server-side encryption with KMS.

B.To limit the size of objects that can be uploaded.

C.To restrict uploads to only a specific AWS account.

D.To ensure that uploaded objects grant full control to the bucket owner.

AnswerD

The ACL bucket-owner-full-control grants the bucket owner full permissions.

Why this answer

The condition in the IAM policy uses the `s3:x-amz-acl` key with a value of `bucket-owner-full-control`. This ensures that any object uploaded to the S3 bucket explicitly grants the bucket owner full control over the object, overriding the default behavior where the uploading account retains ownership. This is critical in cross-account uploads to prevent the uploading account from retaining exclusive access to the objects.

Exam trap

The trap here is that candidates confuse the `s3:x-amz-acl` condition key with account-level restrictions or encryption settings, when in fact it specifically controls the Access Control List (ACL) applied to the uploaded object.

How to eliminate wrong answers

Option A is wrong because server-side encryption with KMS is enforced using the `s3:x-amz-server-side-encryption-aws:kms` condition key, not the `s3:x-amz-acl` key. Option B is wrong because object size limits are enforced using the `s3:content-length-range` condition key, not the ACL-related condition shown. Option C is wrong because restricting uploads to a specific AWS account is done using the `aws:SourceAccount` or `aws:SourceArn` condition keys, not the `s3:x-amz-acl` key which controls object ACL permissions.

Practice this question →

284

Multi-Selectmedium

Which THREE of the following are best practices for managing data storage costs in Amazon S3? (Choose 3.)

Select 3 answers

A.Store all data in S3 Standard for maximum durability.

B.Use S3 Lifecycle policies to transition objects to S3 Glacier Deep Archive after a specified period.

C.Use S3 Object Lock to prevent object deletion and then apply a lifecycle policy to expire objects after a retention period.

D.Create multiple bucket replicas in different regions to ensure availability.

E.Enable S3 Intelligent-Tiering for data with unknown or changing access patterns.

AnswersB, C, E

Lifecycle policies reduce costs by moving data to cheaper storage.

Why this answer

Option A is correct because lifecycle policies transition data to lower-cost storage classes. Option C is correct because S3 Intelligent-Tiering automatically optimizes costs for unknown access patterns. Option E is correct because S3 Object Lock prevents accidental deletion but can be used with lifecycle policies to archive.

Option B is wrong because S3 Standard costs more for infrequent access. Option D is wrong because it increases costs.

Practice this question →

285

MCQhard

Refer to the exhibit. A data engineer is analyzing a query performance issue on an Amazon Redshift table. The table 'sales' has 100 million rows. The query is performing a full table scan. Which optimization should the engineer apply to improve query performance?

A.Change DISTKEY to region.

B.Use an interleaved sort key on (sale_date, region).

C.Use a compound sort key on (sale_date, region).

D.Change DISTSTYLE to ALL.

AnswerC

Compound sort key on sale_date first enables efficient range restriction, then region for aggregation.

Why this answer

Option D is correct. The query filters on sale_date and aggregates by region. A compound sort key starting with sale_date enables Redshift to skip blocks that don't match the date range, reducing scan.

Option A is wrong because DISTKEY product_id is fine for joins but not for this query. Option B is wrong because DISTSTYLE ALL would replicate data, increasing storage and not improving scan. Option C is wrong because interleaved sort key can be less efficient for range-restricted queries.

Practice this question →

286

MCQmedium

A data engineering team is managing an Amazon DynamoDB table that stores user session data. The table has a primary key of user_id (partition key) and session_id (sort key). The application performs strongly consistent reads on individual items. The team notices that read latency increases during peak hours. They suspect that the table is experiencing hot partitions. The team needs to improve read performance without changing the application code. Which solution should they implement?

A.Enable DynamoDB global tables to distribute reads across regions.

B.Enable DynamoDB Accelerator (DAX) for the table.

C.Increase the read capacity units for the table.

D.Change the application to use eventually consistent reads.

AnswerB

DAX caches reads and reduces latency.

Why this answer

DynamoDB Accelerator (DAX) is a fully managed, in-memory cache that delivers up to 10x read performance improvement by caching hot partition data. Since the application uses strongly consistent reads and cannot be changed, DAX provides a drop-in caching layer that reduces read latency on hot partitions without requiring any code modifications.

Exam trap

AWS often tests the misconception that increasing provisioned capacity (RCUs) can solve hot partition issues, but candidates must remember that DynamoDB enforces a per-partition throughput limit that cannot be exceeded regardless of total table capacity.

How to eliminate wrong answers

Option A is wrong because global tables replicate data across regions for disaster recovery and low-latency global access, but they do not solve hot partition issues within a single table; they add complexity and cross-region latency. Option C is wrong because increasing read capacity units (RCUs) only increases the provisioned throughput, but if a single partition key (user_id) is hot, the partition-level throughput limit (3,000 RCUs or 1,000 WCUs) still caps performance; more RCUs cannot overcome a single partition's bottleneck. Option D is wrong because changing to eventually consistent reads would reduce latency by sacrificing consistency, but the requirement explicitly states the application performs strongly consistent reads and the code cannot be changed.

Practice this question →

287

MCQmedium

A data engineer runs the above command and gets the output. What does the 'MFADelete' setting imply?

A.Any modification to an object requires MFA.

B.To permanently delete a version of an object, the user must provide MFA.

C.MFA is required for all read operations as well.

D.All operations on the bucket require MFA authentication.

AnswerB

MFADelete adds an extra layer of security for version deletions.

Why this answer

Option B is correct because MFADelete requires multi-factor authentication to permanently delete object versions. Option A is wrong because MFA is not required for all operations, only for version deletion and suspending versioning. Option C is wrong because it's not for all modifications.

Option D is wrong because it's not for all operations.

Practice this question →

288

MCQhard

A company uses Amazon DynamoDB for a gaming leaderboard. The table has a partition key of 'game_id' and a sort key of 'score'. The read capacity is provisioned at 1000 RCUs. During peak hours, users report high latency when querying the top 10 scores for a specific game. The DynamoDB metrics show ConsumedReadCapacityUnits averaging 800 but occasional throttling. What is the most likely cause and solution?

A.Create a global secondary index with the same key schema to distribute reads

B.Remove the sort key and use a global secondary index

C.Increase the provisioned RCUs to 2000

D.The hot game_id partition is exceeding its throughput; add DynamoDB Accelerator (DAX) to cache reads

AnswerD

DAX caches frequent reads, reducing load on the hot partition and lowering latency.

Why this answer

The hot game_id partition is exceeding its provisioned throughput because DynamoDB distributes RCUs evenly across partitions, and a single partition can only handle up to (1000 RCUs / number of partitions) per second. When a specific game_id becomes popular, all reads hit the same partition, causing throttling despite low overall consumed capacity. Adding DynamoDB Accelerator (DAX) caches the top 10 scores for that partition, reducing read pressure and eliminating throttling without increasing RCUs.

Exam trap

The trap here is that candidates see 'ConsumedReadCapacityUnits averaging 800' and assume overall capacity is sufficient, missing that DynamoDB throttles at the partition level, not the table level, so a hot partition can be throttled even when table-level consumption is below provisioned RCUs.

How to eliminate wrong answers

Option A is wrong because creating a global secondary index with the same key schema would not distribute reads across partitions—it would still have the same hot partition issue, as the GSI inherits the same partition key. Option B is wrong because removing the sort key and using a GSI would break the leaderboard's ability to query by score order, and the GSI would still suffer from the same hot partition if the partition key remains 'game_id'. Option C is wrong because increasing RCUs to 2000 would only double the per-partition limit, but the hot partition would still be throttled if the traffic spike exceeds the new per-partition limit; it does not address the root cause of uneven access patterns.

Practice this question →

289

MCQhard

A company runs an Amazon RDS for PostgreSQL instance that stores financial data. The company requires point-in-time recovery (PITR) with a retention period of 35 days. Additionally, the company needs to create a new database from a specific snapshot every night for testing. Which combination of actions should the data engineer take to meet these requirements?

A.Enable automated backups with a 35-day retention period and create a manual snapshot each night for testing.

B.Create a read replica and promote it to a new instance for testing each night.

C.Enable Multi-AZ and use the standby instance for testing.

D.Disable automated backups to reduce storage costs and take manual snapshots with 35-day retention.

AnswerA

Automated backups provide PITR; manual snapshots are independent and can be restored for testing.

Why this answer

Option A is correct because automated backups in Amazon RDS for PostgreSQL support a maximum retention period of 35 days, which satisfies the PITR requirement. Additionally, creating a manual snapshot each night provides a stable, independent copy for testing without interfering with the automated backup schedule or the source database's performance.

Exam trap

The trap here is that candidates often confuse the purpose of Multi-AZ standby instances (which are not directly usable for testing) or assume that manual snapshots alone can provide PITR, but automated backups are strictly required for point-in-time recovery in RDS.

How to eliminate wrong answers

Option B is wrong because a read replica is designed for read scaling and high availability, not for creating a nightly test database; promoting a read replica each night would disrupt replication and require re-creating the replica, which is inefficient and does not meet the PITR retention requirement. Option C is wrong because Multi-AZ provides high availability and automatic failover, but the standby instance is not directly accessible for testing; it cannot be used to create a new database without promoting it, which would break the Multi-AZ configuration. Option D is wrong because disabling automated backups eliminates the ability to perform point-in-time recovery (PITR), and manual snapshots alone do not support PITR; automated backups are required for transaction log retention and restore to any point within the retention window.

Practice this question →

290

MCQeasy

A data engineer needs to store log files from multiple applications in a centralized location. The logs are generated in JSON format and each log entry is about 1 KB. The engineer needs to query the logs occasionally using SQL-like queries. Which AWS service is most appropriate?

A.Amazon DynamoDB

B.Amazon Redshift

C.Amazon Athena with data stored in S3

D.Amazon RDS for MySQL

AnswerC

Athena queries S3 data directly with SQL, suitable for occasional queries.

Why this answer

Amazon Athena is the most appropriate service because it allows you to query log files stored in S3 directly using standard SQL, without needing to load or transform the data. Since the logs are in JSON format and each entry is about 1 KB, Athena's schema-on-read approach works perfectly for occasional SQL-like queries, and you only pay for the data scanned per query, making it cost-effective for infrequent access.

Exam trap

The trap here is that candidates often choose Amazon Redshift or RDS because they think 'SQL-like queries' require a traditional database, overlooking Athena's ability to query data directly in S3 without loading it, which is a key serverless pattern for log analytics.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB is a NoSQL key-value and document database optimized for low-latency, high-throughput access patterns, not for ad-hoc SQL-like queries on large volumes of log data, and it would require schema design and provisioning. Option B is wrong because Amazon Redshift is a petabyte-scale data warehouse designed for complex analytical queries on structured, transformed data, which is overkill and costly for occasional log queries on JSON files stored in S3. Option D is wrong because Amazon RDS for MySQL is a relational database that requires schema definition, data loading, and ongoing management, making it unsuitable for storing raw JSON log files directly without ETL, and it lacks the serverless, pay-per-query model for infrequent access.

Practice this question →

291

MCQeasy

A company is using Amazon EMR to process large datasets stored in Amazon S3. The data engineer wants to reduce the time it takes to read data from S3 by optimizing the data format. Which file format should the engineer recommend?

A.CSV

B.Parquet

C.ORC

AnswerB

Parquet is columnar, compressed, and ideal for analytics.

Why this answer

Parquet is the correct choice because it is a columnar storage format that significantly reduces the amount of data read from Amazon S3 during analytical queries. By storing data column-wise, Parquet enables predicate pushdown and compression, which minimizes I/O and speeds up data processing in Amazon EMR, especially for large datasets.

Exam trap

The trap here is that candidates often confuse ORC and Parquet as equally optimal for all engines, but Cisco tests that Parquet is the recommended columnar format for Amazon EMR due to its tighter integration with Spark and better performance on S3.

How to eliminate wrong answers

Option A is wrong because CSV is a row-oriented text format that requires full file scans and offers no compression or predicate pushdown, leading to slower reads. Option C is wrong because ORC is also a columnar format optimized for Hive workloads, but it is not natively as performant with Spark and EMR as Parquet, and the question asks for the best recommendation for EMR. Option D is wrong because JSON is a row-oriented, self-describing format that is verbose and lacks efficient compression or columnar access patterns, resulting in high I/O and slower processing.

Practice this question →

292

Multi-Selecthard

Which TWO are benefits of using Amazon S3 Object Lock? (Choose TWO.)

Select 2 answers

A.Helps meet regulatory requirements for write-once-read-many (WORM) storage.

B.Encrypts objects at rest using AWS KMS.

C.Prevents objects from being deleted or overwritten for a fixed time.

D.Automatically transitions objects to lower-cost storage classes.

E.Enables automatic versioning of objects.

AnswersA, C

Object Lock supports compliance and governance modes.

Why this answer

Option A is correct: Object Lock prevents object deletion for a specified retention period. Option D is correct: Object Lock helps meet regulatory requirements for WORM storage. Option B is wrong: Versioning is separate from Object Lock.

Option C is wrong: Object Lock does not automatically transition data. Option E is wrong: Encryption is separate from Object Lock.

Practice this question →

293

MCQhard

A company uses Amazon Redshift for its data warehouse. The data engineer notices that the most frequently accessed table is sorted by date, but queries often filter by customer_id. The table has 500 million rows and uses AUTO distribution style. What change would MOST improve query performance?

A.Change distribution style to KEY on customer_id.

B.Change distribution style to EVEN.

C.Change the sort key to include customer_id as a compound sort key.

D.Change the sort key to an interleaved sort key on date and customer_id.

AnswerC

A compound sort key with customer_id first will optimize queries filtering by customer_id.

Why this answer

Changing the sort key to include customer_id or using a compound sort key with customer_id first will improve performance for queries filtering by customer_id. Option A is wrong because AUTO distribution style is fine and may already distribute data well. Option B is wrong because changing to KEY distribution on customer_id might help but the question asks about sort key.

Option D is wrong because interleaved sort key is useful for multiple predicates but compound sort key with customer_id as first key is simpler and likely more efficient given the access pattern.

Practice this question →

294

MCQhard

A data engineer needs to set up a new Amazon RDS for PostgreSQL database for a production workload. The database must be highly available and resilient to a single Availability Zone failure. Which configuration should the engineer choose?

A.Single-AZ with automated backups

B.Multi-AZ deployment with one standby in a different AZ

C.Multi-AZ with two readable standbys

D.Single-AZ with a read replica

AnswerB

Provides automatic failover and high availability.

Why this answer

Multi-AZ deployment with a standby in a different AZ provides high availability and automatic failover. Read replicas are for read scaling, not failover. Single-AZ lacks resilience.

Practice this question →

295

MCQhard

A data engineering team is building a real-time analytics pipeline using Amazon Kinesis Data Streams, AWS Lambda, and Amazon DynamoDB. The Lambda function consumes records from the stream and writes aggregated data to a DynamoDB table. The application requires that each record be processed exactly once to avoid duplicates. The Lambda function is idempotent, but occasionally duplicate records are written due to retries from Kinesis. The team needs to ensure exactly-once semantics for DynamoDB writes. Which solution should they implement?

A.Enable DynamoDB Streams and use a second Lambda to deduplicate.

B.Use DynamoDB TransactWriteItems with a condition check on a unique transaction ID.

C.Use the Kinesis Client Library (KCL) to checkpoint after processing and ignore duplicates.

D.Ensure the Lambda function is idempotent by using upsert operations.

AnswerB

Condition check ensures only one write succeeds per unique ID.

Why this answer

Option B is correct because DynamoDB TransactWriteItems with a condition check on a unique transaction ID ensures that the write only succeeds if the transaction ID does not already exist in the table. This provides exactly-once semantics by preventing duplicate writes even when Kinesis retries deliver the same record multiple times. The condition check acts as a distributed lock at the item level, guaranteeing idempotency without relying on downstream deduplication.

Exam trap

The trap here is that candidates often assume idempotent Lambda functions alone guarantee exactly-once processing, but they overlook that Kinesis retries can still cause duplicate writes unless a conditional write with a unique identifier is used at the database level.

How to eliminate wrong answers

Option A is wrong because enabling DynamoDB Streams and using a second Lambda to deduplicate introduces eventual consistency and additional latency, and does not prevent duplicate writes at the point of ingestion; it only attempts to clean up duplicates after they have already been written. Option C is wrong because the Kinesis Client Library (KCL) checkpointing tracks processing progress but does not prevent duplicate records from being delivered to the Lambda function during retries, so duplicates can still be written to DynamoDB. Option D is wrong because simply ensuring the Lambda function is idempotent using upsert operations (e.g., UpdateItem) does not guarantee exactly-once semantics; upsert can still overwrite data or create duplicate items if the record lacks a unique identifier or condition check.

Practice this question →

296

MCQmedium

A media company stores large video files in Amazon S3 and uses Amazon CloudFront for content delivery. Users in different regions report slow download speeds for popular content. The data engineer needs to improve performance while minimizing cost. Which solution should the engineer implement?

A.Change the S3 storage class to S3 Standard-IA

B.Enable S3 Transfer Acceleration on the bucket

C.Create multiple S3 buckets in different regions and configure CloudFront with multiple origins

D.Enable CloudFront Origin Shield

AnswerD

Origin Shield provides an additional cache layer that improves cache hit ratio and reduces load on the origin, thereby improving download performance for users.

Why this answer

CloudFront Origin Shield acts as a centralized caching layer in front of the S3 origin, reducing the number of requests that reach the origin and improving cache hit ratios. This minimizes latency for users in different regions by serving popular content from the edge or shield cache, while also reducing origin load and data transfer costs.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration (which optimizes uploads) with download acceleration, or assume that multiple regional origins are needed when CloudFront's global edge network already handles geographic distribution.

How to eliminate wrong answers

Option A is wrong because changing the storage class to S3 Standard-IA reduces storage costs for infrequently accessed data but does not improve download speeds or reduce latency for users. Option B is wrong because S3 Transfer Acceleration speeds up uploads to S3 over long distances using AWS edge locations, but it does not accelerate downloads or improve CloudFront delivery performance. Option C is wrong because creating multiple S3 buckets in different regions and configuring CloudFront with multiple origins increases complexity and cost without addressing the core issue of cache efficiency; CloudFront already uses a global edge network, and adding more origins does not inherently improve cache hit ratios or reduce origin load.

Practice this question →

297

MCQhard

A company is using Amazon S3 to store sensitive data. The security team requires that all data be encrypted at rest using a customer-managed AWS KMS key. The data engineer must ensure that only a specific IAM role can decrypt the data. Which policy should the data engineer attach to the KMS key?

A.A KMS key policy that allows the IAM role to perform kms:Decrypt

B.An IAM user policy that allows kms:Decrypt for the specific key

C.An IAM policy attached to the role that allows kms:Decrypt

D.An S3 bucket policy that denies access unless encryption is used

AnswerA

KMS key policies grant permissions to use the key.

Why this answer

Option C is correct because the KMS key policy must grant the IAM role permission to decrypt. Option A is wrong because the IAM policy alone is not sufficient; KMS key policy must allow the role to decrypt. Option B is wrong because the S3 bucket policy controls access to S3, not KMS.

Option D is wrong because the IAM user policy is less secure and doesn't address role-based access.

Practice this question →

298

Multi-Selecthard

Which THREE factors should a data engineer consider when choosing between Amazon Redshift and Amazon Athena for querying large datasets in Amazon S3? (Choose three.)

Select 3 answers

A.Both support standard SQL queries.

B.Redshift requires provisioning and managing clusters, while Athena is serverless.

C.Athena charges per query based on data scanned, while Redshift charges for cluster compute capacity.

D.Athena can only query data stored in Amazon S3, while Redshift can also query data in S3.

E.Redshift is optimized for highly structured, frequently queried data, while Athena is better for ad-hoc queries on raw data.

AnswersB, C, E

Redshift needs cluster management; Athena is serverless.

Why this answer

Option B is correct because Amazon Redshift requires manual provisioning, configuration, and ongoing management of clusters, including node sizing, scaling, and maintenance windows. In contrast, Amazon Athena is a serverless service that automatically handles infrastructure, requiring no cluster management and allowing users to query data directly from Amazon S3 without any setup overhead.

Exam trap

The trap here is that candidates may assume Athena is limited to S3-only queries or that both services have identical SQL support, overlooking the fundamental architectural differences in provisioning, cost models, and workload optimization that are the real decision factors.

Practice this question →

299

Multi-Selecteasy

Which TWO data stores are considered fully managed, serverless, and suitable for storing JSON documents?

Select 2 answers

A.Amazon Redshift

B.Amazon ElastiCache for Redis

C.Amazon DocumentDB (with MongoDB compatibility)

D.Amazon DynamoDB

E.Amazon RDS for MySQL

AnswersC, D

DocumentDB is a managed document database, supports JSON.

Why this answer

Amazon DocumentDB (with MongoDB compatibility) is a fully managed, serverless document database that natively stores JSON documents. It supports MongoDB workloads, allowing you to store, query, and index JSON data without managing infrastructure, making it ideal for content management and catalog applications.

Exam trap

AWS often tests the distinction between fully managed serverless services (DocumentDB, DynamoDB) and those requiring provisioning or cluster management (Redshift, ElastiCache, RDS), leading candidates to mistakenly select ElastiCache for Redis due to its JSON module support, ignoring its non-serverless nature and primary use as a cache.

Practice this question →

300

MCQeasy

A data engineer has set up an Amazon S3 lifecycle policy to transition objects to Glacier Instant Retrieval after 30 days. After 60 days, objects should transition to Deep Archive. However, objects are not transitioning to Deep Archive. What is the most likely cause?

A.The bucket has versioning enabled.

B.Deep Archive is not supported in the bucket's region.

C.Objects are smaller than 128 KB.

D.The transition to Deep Archive requires a minimum of 30 days after the previous transition.

AnswerD

S3 requires at least 30 days between transitions to different storage classes.

Why this answer

Amazon S3 lifecycle policies require a minimum of 30 days between successive transitions when moving objects from one storage class to another. Since the policy transitions objects to Glacier Instant Retrieval after 30 days, the subsequent transition to Deep Archive cannot occur until at least 60 days after creation (30 days for the first transition plus 30 days minimum gap). The current policy attempts the Deep Archive transition at 60 days, which is exactly 30 days after the first transition, meeting the minimum requirement.

However, the most likely cause of the failure is that the policy is incorrectly configured or the objects are too new; given the options, the correct answer is that the transition to Deep Archive requires a minimum of 30 days after the previous transition, and the policy as described should work if the objects are older than 60 days, but the question implies a timing issue.

Exam trap

The trap here is that candidates often overlook the 30-day minimum transition interval requirement and assume any transition timing is allowed, or they mistakenly attribute the failure to versioning or region limitations.

How to eliminate wrong answers

Option A is wrong because S3 versioning does not prevent lifecycle transitions; lifecycle policies can be applied to both current and noncurrent versions independently. Option B is wrong because Deep Archive is supported in all AWS regions where S3 is available, including the standard commercial regions. Option C is wrong because the 128 KB minimum object size restriction applies only to S3 Intelligent-Tiering and S3 Glacier Instant Retrieval for automatic tiering, not to lifecycle transitions to Deep Archive; lifecycle policies can transition objects of any size.

Practice this question →

← PreviousPage 4 of 7 · 456 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Data Store Management questions.

Start 20-question session