Knowledge + Practice

CCNA Data Store Management Questions

75 of 456 questions · Page 3/7 · Data Store Management · Answers revealed

Practice these questions Domain overview All questions

151

Multi-Selecthard

Which THREE of the following are benefits of using Amazon DynamoDB Accelerator (DAX)? (Choose three.)

Select 3 answers

A.Offloads read traffic from the DynamoDB table.

B.Improves write throughput by batching writes.

C.Reduces read latency from single-digit milliseconds to microseconds.

D.Supports write-through caching to improve write performance.

E.Provides in-memory caching for DynamoDB tables.

AnswersA, C, E

DAX handles read requests, reducing load on the table.

Why this answer

Option A is correct because DAX acts as a read-through cache that offloads read traffic from the DynamoDB table, reducing the number of read requests that hit the underlying table and thus lowering the consumed read capacity units (RCUs). This allows the table to handle more concurrent reads without scaling up provisioned capacity.

Exam trap

The trap here is that candidates often assume DAX improves write performance or supports write-through caching, but DAX is strictly a read cache and does not accelerate or batch writes.

Practice this question →

152

MCQeasy

A company wants to migrate its on-premises MySQL database to Amazon RDS for MySQL with minimal downtime. Which AWS service should be used for the migration?

A.AWS Database Migration Service (DMS)

B.AWS Schema Conversion Tool (SCT)

C.AWS DataSync

D.AWS Direct Connect

AnswerA

Supports minimal downtime via ongoing replication.

Why this answer

AWS Database Migration Service (DMS) supports continuous replication for minimal downtime. SCT helps with schema conversion but not migration. Direct Connect provides connectivity.

Practice this question →

153

MCQhard

A data engineer is migrating an on-premises Apache HBase workload to Amazon DynamoDB. The HBase table has a row key with composite structure: customer_id (10 chars) + timestamp (10 digits). The access pattern is to query by customer_id and retrieve the latest entries. How should the DynamoDB table be designed to optimize performance?

A.Create a table with partition key = customer_id and sort key = timestamp.

B.Use Amazon S3 with customer_id as prefix and timestamp as object name.

C.Create a table with partition key = concatenated customer_id and timestamp.

D.Create a table with partition key = timestamp and sort key = customer_id.

AnswerA

Allows querying by customer_id and sorting by timestamp to get latest entries.

Why this answer

Option A is correct because using customer_id as partition key and timestamp as sort key allows efficient queries for latest entries per customer. Option B is incorrect because using timestamp as partition key leads to hot partitions. Option C is incorrect because a single partition key cannot support range queries efficiently.

Option D is incorrect because S3 is not suitable for low-latency queries.

Practice this question →

154

MCQhard

Refer to the exhibit. A data engineer is troubleshooting an IAM policy attached to a user. The user reports that they cannot upload objects to the S3 bucket 'data-lake-bucket' unless they explicitly specify the 'x-amz-server-side-encryption' header with value 'AES256'. The engineer wants to modify the policy to allow uploads without requiring encryption headers, but still enforce encryption on the bucket itself. Which change should the engineer make?

A.Remove the entire Deny statement.

B.Remove the Condition block from the Allow statement.

C.Change the Condition in the Allow statement to use aws:kms instead of AES256.

D.Set the bucket's default encryption to AES256 and keep the policy unchanged.

AnswerA

Removing the Deny allows uploads without encryption header; bucket default encryption can be used.

Why this answer

Option B is correct. The Deny statement with condition StringNotEquals requires encryption header to be exactly AES256; removing the Deny statement allows uploads without encryption header, and bucket policy can enforce encryption. Option A is wrong because removing the condition from Allow still requires encryption header due to Deny.

Option C is wrong because setting default encryption on bucket does not override explicit Deny. Option D is wrong because changing to aws:kms still requires encryption header.

Practice this question →

155

MCQhard

A company is using Amazon Redshift for analytics. The cluster has 20 nodes and the data is evenly distributed. Query performance has degraded over time. The data engineer suspects that table maintenance is needed. Which set of operations should be performed to improve query performance?

A.Run VACUUM and ANALYZE commands on all tables

B.Run VACUUM FULL on all tables

C.Run REINDEX on all tables

D.Run ALTER TABLE APPEND to reorganize data

AnswerA

VACUUM reclaims space and sorts rows; ANALYZE updates statistics for the optimizer.

Why this answer

Option A is correct because VACUUM reclaims space and sorts data, and ANALYZE updates statistics. Option B is wrong because VACUUM FULL is more intensive and not recommended unless necessary. Option C is wrong because REINDEX is for indexes, not sort order.

Option D is wrong because ALTER TABLE APPEND does not help.

Practice this question →

156

MCQhard

A company runs an Amazon RDS for PostgreSQL database. To meet disaster recovery requirements, they set up a cross-Region read replica. The replica has been lagging by several minutes. Which action is MOST effective to reduce the replica lag?

A.Enable Multi-AZ on the primary database.

B.Increase the instance size (memory and CPU) of the replica.

C.Increase the instance size of the primary database.

D.Decrease the instance size of the replica to reduce cost.

AnswerB

A larger replica can apply changes faster.

Why this answer

Increasing the instance size (memory and CPU) of the cross-Region read replica is the most effective action because replica lag in Amazon RDS for PostgreSQL is often caused by the replica being unable to keep up with the volume of write-ahead log (WAL) data arriving from the primary. A larger replica instance provides more compute and memory resources to apply WAL changes faster, reducing the replay lag. This directly addresses the bottleneck at the replica side without impacting the primary database.

Exam trap

The trap here is that candidates often assume the primary database is the bottleneck and choose to scale it up, but the lag is caused by the replica's inability to apply changes quickly enough, making the replica's instance size the correct lever to adjust.

How to eliminate wrong answers

Option A is wrong because enabling Multi-AZ on the primary database provides high availability within a single Region but does not reduce cross-Region replica lag; it may even increase lag due to synchronous replication overhead on the primary. Option C is wrong because increasing the instance size of the primary database improves its write performance but does not help the replica apply WAL data faster; the bottleneck is on the replica side, not the primary. Option D is wrong because decreasing the instance size of the replica would reduce its CPU and memory resources, worsening the replica lag by making it even harder to keep up with WAL replay.

Practice this question →

157

MCQhard

A data engineer is reviewing an IAM policy that controls access to an S3 bucket. The policy is attached to a user group. The engineer notices that users are unable to download objects from the bucket. What is the likely cause?

A.The policy is attached to a user group instead of an IAM role.

B.The policy does not specify the correct bucket ARN.

C.The policy does not allow the s3:GetObject action.

D.The objects are encrypted using SSE-KMS, not SSE-S3.

AnswerD

The condition requires SSE-S3 (AES256), so SSE-KMS objects are denied.

Why this answer

Option B is correct because the policy condition requires that objects be encrypted with SSE-S3 (AES256). If the objects are encrypted with SSE-KMS or not encrypted, the request fails. Option A is wrong because the policy allows s3:GetObject.

Option C is wrong because the policy does not restrict bucket names. Option D is wrong because the policy is attached to a user group, not a role; but that is not the issue.

Practice this question →

158

Multi-Selectmedium

A company is designing a data lake on AWS using Amazon S3. The data includes sensitive customer information that must be encrypted at rest. The company requires that encryption keys be managed by AWS, but the keys must be rotated automatically every year. Which TWO options meet these requirements? (Choose TWO.)

Select 2 answers

A.Use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3).

B.Use client-side encryption with an AWS KMS key.

C.Use Server-Side Encryption with Customer-Provided Keys (SSE-C).

D.Use Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS) with manual key rotation.

E.Use Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS) with automatic key rotation enabled.

AnswersA, E

SSE-S3 automatically rotates keys every year.

Why this answer

SSE-S3 uses Amazon S3-managed keys with automatic rotation every year. SSE-KMS with automatic key rotation (enabled by default) also meets the key rotation requirement. SSE-C does not use AWS-managed keys.

SSE-KMS with manual rotation is not automatic. Client-side encryption does not use AWS keys.

Practice this question →

159

MCQhard

A company has an Amazon Redshift cluster that stores petabytes of data. Queries are experiencing high disk usage due to large intermediate results. The data engineer needs to improve query performance without adding more nodes. Which action should the engineer take?

A.Set appropriate distribution keys to minimize data movement.

B.Configure workload management (WLM) queues to limit concurrency.

C.Apply column compression encoding to reduce data size.

D.Define sort keys on all columns used in WHERE clauses.

AnswerC

Compression reduces disk footprint and I/O, improving performance.

Why this answer

Option C is correct because compression reduces data size on disk and can improve query performance by reducing I/O. Option A is wrong because sort keys affect order, not intermediate result size. Option B is wrong because distribution keys affect data distribution, not intermediate results directly.

Option D is wrong because WLM queues manage concurrency, not disk usage.

Practice this question →

160

MCQhard

A company runs a real-time analytics platform using Amazon Kinesis Data Streams with a shard count of 10. The data is consumed by an AWS Lambda function that writes to an Amazon DynamoDB table. The DynamoDB table has a partition key of 'user_id' and a sort key of 'timestamp'. The table is provisioned with 5000 RCUs and 5000 WCUs. Recently, the application experienced increased write latency and throttling errors (ProvisionedThroughputExceededException) on the DynamoDB table. The CloudWatch metrics show that ConsumedWriteCapacityUnits averages 4500 with occasional spikes to 6000. The Lambda function’s concurrency is set to 1000. The data engineer suspects the issue is due to hot partitions. Upon investigation, the engineer finds that a small number of users generate a disproportionately large amount of data. Which course of action would best resolve the throttling while minimizing cost?

A.Enable DynamoDB adaptive capacity and implement write sharding by adding a suffix to the partition key for high-volume users

B.Increase the provisioned WCUs to 10000 to handle spikes

C.Switch the DynamoDB table to on-demand capacity mode

D.Reduce the Lambda function concurrency to 100 to limit write requests

AnswerA

Adaptive capacity automatically manages partition throughput, and write sharding distributes writes across multiple partitions, reducing hot spots.

Why this answer

Option A is correct because the root cause is hot partitions caused by a small number of high-volume users. Enabling DynamoDB adaptive capacity allows the table to automatically adjust throughput to accommodate uneven access patterns, but the key fix is write sharding — adding a random or calculated suffix to the partition key for those high-volume users. This distributes writes across multiple physical partitions, eliminating the hot spot without requiring a global increase in provisioned capacity, thus resolving throttling while minimizing cost.

Exam trap

AWS often tests the misconception that throttling is always solved by increasing total provisioned capacity or switching to on-demand, when the real issue is partition-level hot spots that require key design changes like write sharding.

How to eliminate wrong answers

Option B is wrong because simply increasing provisioned WCUs to 10000 does not address the hot partition issue; the throttling is due to uneven distribution of writes across partitions, not a lack of total capacity, so this would waste money without fixing the root cause. Option C is wrong because switching to on-demand capacity mode would handle spikes but at a significantly higher cost for sustained high write volumes, and it still does not solve the hot partition problem — on-demand tables can still throttle individual partitions if a single partition exceeds 1,000 WCUs (the per-partition throughput limit). Option D is wrong because reducing Lambda concurrency to 100 would limit the total write throughput, potentially causing data backlogs in Kinesis, and it does not address the uneven distribution of writes across DynamoDB partitions; the hot partition would still be throttled even with fewer concurrent writers.

Practice this question →

161

MCQeasy

A company is migrating its on-premises Oracle data warehouse to Amazon Redshift. The data engineering team needs to load data from Oracle to Redshift using AWS DMS (Database Migration Service). The source database is 2 TB in size. The team wants to minimize downtime and ensure data consistency during full load. Which approach should they take?

A.Use UNLOAD command to export data from Oracle to S3, then COPY into Redshift.

B.Use AWS DMS to perform a full load, then enable ongoing replication to capture changes.

C.Stop the source database, export data to flat files, upload to S3, and use COPY to load into Redshift.

D.Use COPY command directly from Oracle to Redshift over JDBC.

AnswerB

Minimizes downtime and ensures consistency.

Why this answer

Option B is correct because AWS DMS can perform a full load of the 2 TB Oracle database while simultaneously capturing ongoing changes via CDC (Change Data Capture). After the full load completes, DMS applies the cached changes to Redshift, ensuring data consistency with minimal downtime. This approach avoids stopping the source database and leverages DMS's native replication capabilities.

Exam trap

The trap here is that candidates may confuse the UNLOAD command (Redshift export) with an Oracle export tool, or assume that a full database stop is required for consistency, when DMS's CDC capability eliminates that need.

How to eliminate wrong answers

Option A is wrong because the UNLOAD command is an Amazon Redshift command for exporting data from Redshift to S3, not for exporting from Oracle; it cannot be used on the source Oracle database. Option C is wrong because stopping the source database to export flat files introduces unnecessary downtime, which contradicts the goal of minimizing downtime; DMS can achieve consistency without halting operations. Option D is wrong because the COPY command in Redshift cannot read directly from Oracle over JDBC; it only loads data from S3, DynamoDB, or other supported sources, not from a live JDBC connection.

Practice this question →

162

MCQeasy

A company is migrating an on-premises MySQL database to Amazon RDS for MySQL. The database is 500 GB in size. The migration must have minimal downtime and must be completed within a week. Which AWS service should the data engineer use to perform the migration?

A.Amazon S3 Transfer Acceleration.

B.AWS Snowball Edge.

C.AWS DataSync.

D.AWS Database Migration Service (DMS).

AnswerD

DMS supports ongoing replication for minimal downtime.

Why this answer

Option A is correct because AWS Database Migration Service (DMS) supports minimal downtime migration with ongoing replication from on-premises to RDS. Option B is incorrect because S3 is object storage, not for database migration. Option C is incorrect because Snowball is for large data transfer but involves shipping and is not suitable for minimal downtime.

Option D is incorrect because DataSync is for file storage, not databases.

Practice this question →

163

MCQeasy

A company stores sensitive customer data in Amazon S3. The security team requires that all data be encrypted at rest using server-side encryption with AWS KMS managed keys (SSE-KMS). Which S3 bucket policy condition will enforce this requirement?

A.s3:x-amz-server-side-encryption-aws-kms-key-id

B.s3:x-amz-server-side-encryption-customer-algorithm

C.s3:x-amz-server-side-encryption

D.s3:x-amz-server-side-encryption-aws-kms-key-id or s3:x-amz-server-side-encryption

AnswerA

This condition can enforce the use of a specific KMS key for encryption.

Why this answer

Option A is correct because the condition `s3:x-amz-server-side-encryption-aws-kms-key-id` enforces that objects uploaded to S3 must use a specific AWS KMS key for server-side encryption (SSE-KMS). This satisfies the security team's requirement that all data be encrypted at rest using SSE-KMS with AWS KMS managed keys, as it explicitly checks for the presence and value of the KMS key ID in the request.

Exam trap

The trap here is that candidates often confuse `s3:x-amz-server-side-encryption` (which only checks the encryption header value, not the key) with `s3:x-amz-server-side-encryption-aws-kms-key-id` (which enforces a specific KMS key), leading them to choose option C or D without realizing that SSE-S3 (AES256) would also satisfy the generic condition.

How to eliminate wrong answers

Option B is wrong because `s3:x-amz-server-side-encryption-customer-algorithm` is used to enforce server-side encryption with customer-provided encryption keys (SSE-C), not SSE-KMS. Option C is wrong because `s3:x-amz-server-side-encryption` only checks whether the `x-amz-server-side-encryption` header is present (e.g., with value `AES256` for SSE-S3 or `aws:kms` for SSE-KMS), but it does not enforce the use of a specific KMS key or even that KMS is used; it could be satisfied by SSE-S3. Option D is wrong because combining `s3:x-amz-server-side-encryption-aws-kms-key-id` with `s3:x-amz-server-side-encryption` is redundant and does not add enforcement for SSE-KMS; the key ID condition alone is sufficient, and the additional condition could allow SSE-S3 if not carefully combined with a deny rule.

Practice this question →

164

MCQmedium

Refer to the exhibit. A DynamoDB table 'Orders' has a GSI 'CustomerDateIndex'. A developer tries to query the GSI for all orders of a customer between two dates. The query fails. What is the most likely reason?

A.The GSI does not include 'order_id' in the key schema

B.The 'customer_id' attribute is not a partition key in the GSI

C.The query uses a date format that is not lexicographically sortable as a string

D.The 'order_date' attribute is not a sort key in the GSI

AnswerC

String sort on dates requires ISO 8601 format.

Why this answer

Option C is correct because DynamoDB's Query operation requires the sort key to be lexicographically sortable when using comparison operators like BETWEEN. If the 'order_date' attribute is stored as a non-lexicographically sortable string format (e.g., 'MM-DD-YYYY' instead of 'YYYY-MM-DD'), the BETWEEN condition will fail to return correct results or may throw an error. The GSI's sort key must be in a format that supports string comparison for range queries to work properly.

Exam trap

AWS often tests the misconception that any string date format works for range queries, but the trap here is that DynamoDB requires lexicographically sortable strings for BETWEEN conditions, and non-ISO formats will silently fail or return incorrect results.

How to eliminate wrong answers

Option A is wrong because 'order_id' is not required in the GSI key schema for querying by customer and date range; the GSI only needs the partition key (customer_id) and sort key (order_date) for this query. Option B is wrong because 'customer_id' must be the partition key of the GSI for the query to work, and the question states the GSI is 'CustomerDateIndex', implying customer_id is the partition key. Option D is wrong because 'order_date' must be the sort key of the GSI for the BETWEEN query to work, and the index name 'CustomerDateIndex' suggests it is the sort key; the failure is due to date format, not missing sort key.

Practice this question →

165

MCQmedium

A company runs a critical application on Amazon RDS for MySQL. To ensure high availability and automatic failover, the database is deployed as a Multi-AZ DB instance. The application uses read-heavy workloads. Which additional configuration should be used to offload read traffic without impacting write performance?

A.Use the Multi-AZ standby instance for read queries.

B.Create one or more Read Replicas in different AZs.

C.Use Amazon ElastiCache to cache read queries.

D.Change to a Single-AZ deployment with a larger instance size.

AnswerB

Read Replicas can handle read traffic, and Multi-AZ ensures high availability for writes.

Why this answer

Option C is correct because Read Replicas can serve read traffic and reduce load on the primary instance, while Multi-AZ provides high availability. Option A (Multi-AZ for reads) does not offload reads; the standby is not used for reads. Option B (single AZ with larger instance) does not provide high availability.

Option D (ElastiCache) caches data but does not serve read replicas for complex queries.

Practice this question →

166

MCQmedium

A data engineering team is using Amazon DynamoDB to store time-series data for a monitoring application. The table has a primary key of device_id (partition key) and timestamp (sort key). The application queries data for a specific device over a time range. The team notices that read latency is high for devices that generate large amounts of data. They need to improve query performance. Which solution should they implement?

A.Enable DynamoDB Accelerator (DAX) for the table.

B.Change the application to use eventually consistent reads.

C.Create a global secondary index with device_id as partition key and a truncated timestamp as sort key to distribute writes.

D.Increase the read capacity units for the table.

AnswerC

Helps spread write load and improve read performance for time-range queries.

Why this answer

Option C is correct because creating a global secondary index (GSI) with device_id as the partition key and a truncated timestamp as the sort key helps distribute write traffic more evenly across partitions. This reduces hot partitions caused by devices that generate large amounts of data, thereby improving read latency by preventing throttling and reducing contention on a single partition.

Exam trap

The trap here is that candidates often mistake high read latency as a pure read-throughput issue and choose to increase RCUs or add DAX, overlooking the fact that the real bottleneck is write-side hot partitions causing throttling and increased latency for reads on that partition.

How to eliminate wrong answers

Option A is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that reduces read latency for frequently accessed items, but it does not address the root cause of high read latency—hot partitions due to uneven write distribution. Option B is wrong because eventually consistent reads only reduce latency by returning data that may be slightly stale, but they do not solve the underlying partition-level performance issue caused by skewed write patterns. Option D is wrong because increasing read capacity units (RCUs) can help with throughput but does not mitigate the hot partition problem; if a single partition is overwhelmed, additional RCUs cannot be fully utilized due to DynamoDB's partition-level throughput limits.

Practice this question →

167

Multi-Selectmedium

Which TWO of the following are valid approaches to implement fine-grained access control for Amazon DynamoDB items based on user attributes? (Choose 2.)

Select 2 answers

A.Enable row-level security in DynamoDB using AWS Lake Formation.

B.Configure a VPC endpoint with a bucket policy to restrict access to specific items.

C.Use Amazon Cognito identity pools with IAM roles that include conditions based on user attributes.

D.Store user-specific items in separate S3 buckets and use IAM policies to restrict bucket access.

E.Use IAM policies with condition keys like 'dynamodb:LeadingKeys' to restrict access to items with a specific partition key value.

AnswersC, E

Cognito can map user attributes to IAM roles with fine-grained policies.

Why this answer

Option C is correct because Amazon Cognito identity pools can be configured to assume IAM roles with fine-grained policies that use condition keys such as `dynamodb:LeadingKeys` or custom attribute-based conditions. This allows access to DynamoDB items to be restricted based on user-specific attributes (e.g., user ID) without hardcoding permissions per user.

Exam trap

The trap here is that candidates often confuse DynamoDB's fine-grained access control with S3 bucket policies or Lake Formation row-level security, mistakenly applying S3 or data lake concepts to DynamoDB item-level permissions.

Practice this question →

168

MCQmedium

A data engineer is migrating an on-premises Apache Cassandra cluster to Amazon Keyspaces (for Apache Cassandra). The cluster has 10 TB of data. The migration must minimize application downtime. Which strategy should the engineer use?

A.Set up a dual-write pattern where the application writes to both the on-premises cluster and Keyspaces, then switch reads to Keyspaces once data is synchronized.

B.Export the data using the Cassandra COPY command and import it into Keyspaces using the COPY command.

C.Take a snapshot of the on-premises cluster and restore it to Keyspaces using the Keyspaces console.

D.Use AWS Database Migration Service (DMS) to continuously replicate data from the on-premises cluster to Keyspaces.

AnswerA

This minimizes downtime by keeping both systems in sync and allows for a gradual cutover.

Why this answer

Option A is correct because the dual-write pattern allows the application to write to both the on-premises Cassandra cluster and Amazon Keyspaces simultaneously, ensuring data consistency with minimal downtime. Once the existing data is backfilled and the systems are synchronized, reads can be switched to Keyspaces with near-zero application interruption. This approach avoids the downtime required for bulk export/import or snapshot restore.

Exam trap

The trap here is that candidates often assume AWS DMS can handle any database migration, but DMS does not support Cassandra as a source, making Option D a distractor for those who overestimate DMS's capabilities.

How to eliminate wrong answers

Option B is wrong because the Cassandra COPY command is a bulk export/import tool that requires the application to stop writes during the migration to ensure consistency, causing significant downtime for 10 TB of data. Option C is wrong because taking a snapshot of the on-premises cluster and restoring it to Keyspaces via the console is not supported; Keyspaces does not provide a native snapshot restore feature from on-premises snapshots. Option D is wrong because AWS DMS does not support Apache Cassandra as a source for continuous replication; DMS supports relational databases and some NoSQL databases but not Cassandra.

Practice this question →

169

MCQmedium

A media company stores video files in Amazon S3 buckets organized by content type. The company has a requirement to automatically archive files that are older than 90 days to Amazon S3 Glacier Deep Archive to reduce costs. However, the company wants to retain the ability to restore files within 12 hours if needed. The data engineer creates an S3 Lifecycle policy to transition objects to Glacier Deep Archive after 90 days. After deploying the policy, the engineer notices that the storage costs have not decreased significantly. On reviewing the bucket metrics, the engineer sees that many objects are being deleted directly by users before the lifecycle policy takes effect. The company needs to enforce the lifecycle policy and prevent premature deletions. What should the data engineer do to enforce the lifecycle policy?

A.Apply an S3 Object Lock retention policy with a default retention mode of GOVERNANCE and a retention period of 90 days.

B.Enable S3 Versioning on the bucket and add a lifecycle rule to transition noncurrent versions to Glacier Deep Archive after 90 days.

C.Enable S3 Multi-Factor Authentication (MFA) Delete on the bucket to require MFA for all delete operations.

D.Use S3 Glacier Select to query the objects and restore them if needed.

AnswerB

Versioning preserves deleted objects as noncurrent versions, which can then be transitioned by lifecycle rules.

Why this answer

Option A is correct because an S3 Lifecycle policy with a noncurrent version transition will move older versions to Deep Archive, and enabling S3 Versioning ensures that deletions create delete markers rather than permanently deleting objects. Option B is wrong because MFA Delete is for extra security, not for lifecycle enforcement. Option C is wrong because Object Lock prevents object deletion entirely, which may not be desired.

Option D is wrong because S3 Glacier Select is for querying archived data, not for lifecycle management.

Practice this question →

170

MCQhard

Refer to the exhibit. A data engineer ran the CLI command to check the configuration of an RDS instance named 'mydb'. Which statement accurately describes the current configuration?

A.The database is in the 'stopped' state

B.The database is a Single-AZ deployment and is not a read replica

C.The database is a read replica of another instance

D.The database is a Multi-AZ deployment

AnswerB

MultiAZ False and no ReadReplicaSourceDBInstanceIdentifier.

Why this answer

Option B is correct because the CLI command output shows 'DBInstanceStatus: available' and 'MultiAZ: False', indicating the database is running as a Single-AZ deployment. Additionally, the absence of a 'SourceDBInstanceIdentifier' field confirms it is not a read replica. The 'ReadReplicaSourceDBInstanceIdentifier' is not present, which would be required if it were a read replica.

Exam trap

AWS often tests the misconception that a database with 'available' status must be a read replica or Multi-AZ, but the absence of the 'ReadReplicaSourceDBInstanceIdentifier' and 'MultiAZ: False' clearly identify it as a standalone Single-AZ instance.

How to eliminate wrong answers

Option A is wrong because the CLI output shows 'DBInstanceStatus: available', not 'stopped', so the database is running, not stopped. Option C is wrong because the output lacks a 'ReadReplicaSourceDBInstanceIdentifier' field, which is mandatory for a read replica; without it, the instance is a primary database, not a replica. Option D is wrong because the output explicitly shows 'MultiAZ: False', which means it is a Single-AZ deployment, not Multi-AZ.

Practice this question →

171

MCQmedium

A company runs a production Amazon RDS for MySQL database with Multi-AZ deployment. The database experiences high read latency during peak hours. The company wants to improve read performance with minimal application changes. Which solution should a data engineer recommend?

A.Create one or more read replicas in a different Availability Zone.

B.Use Amazon DynamoDB Accelerator (DAX) as a caching layer.

C.Increase the DB instance size to a larger instance class.

D.Enable Multi-AZ on the existing instance.

AnswerA

Read replicas offload read traffic from the primary instance.

Why this answer

Option B is correct because creating a read replica reduces load on the primary instance and improves read performance; Multi-AZ handles failover but does not offload reads. Option A is wrong because Multi-AZ does not offload reads. Option C is wrong because increasing instance size may help but is more expensive and may require downtime.

Option D is wrong because DynamoDB Accelerator (DAX) is for DynamoDB, not RDS.

Practice this question →

172

Multi-Selectmedium

A data engineering team is designing a data lake on Amazon S3 for storing sensor data from IoT devices. The data is written in near real-time and needs to be queried using Amazon Athena. Which TWO configurations should the team implement to optimize query performance and minimize costs?

Select 2 answers

A.Compress the data using GZIP.

B.Use S3 Standard-IA storage class.

C.Store the data in Apache Parquet format.

D.Partition the data by date and sensor ID.

E.Enable Requester Pays on the S3 bucket.

AnswersC, D

Parquet is columnar and reduces scan size.

Why this answer

Apache Parquet is a columnar storage format that allows Athena to read only the columns needed for a query, drastically reducing I/O and scan costs. Combined with compression (like Snappy or GZIP), Parquet minimizes the amount of data scanned per query, which directly lowers Athena's cost (charged per TB scanned) and improves query performance through predicate pushdown and efficient encoding.

Exam trap

AWS often tests the misconception that any compression (like GZIP alone) is sufficient for Athena optimization, but the trap is that without a columnar format like Parquet or ORC, compression alone does not enable column pruning or predicate pushdown, leading to higher scan costs and slower queries.

Practice this question →

173

MCQeasy

An e-commerce application uses Amazon ElastiCache for Redis to cache product catalog data. The cache currently uses lazy loading. The team wants to ensure that frequently accessed product data is always fresh. Which caching strategy should they implement?

A.Write-through caching

B.Set a TTL of 5 minutes for all cached items

C.Use database read replicas to serve data

D.Lazy loading with TTL

AnswerA

Write-through updates cache directly on writes, ensuring data is always fresh.

Why this answer

Write-through caching ensures that data is written to the cache simultaneously with the database, guaranteeing that frequently accessed product data is always fresh. This strategy eliminates stale reads by synchronously updating the cache on every write, which directly addresses the requirement for freshness without relying on expiration or lazy population.

Exam trap

The trap here is that candidates often assume lazy loading with a short TTL is sufficient for freshness, but the exam tests the understanding that only write-through (or write-behind) strategies guarantee synchronous cache updates without relying on expiration windows.

How to eliminate wrong answers

Option B is wrong because setting a TTL of 5 minutes does not guarantee freshness; data can still become stale within the TTL window, and frequently accessed items may be served from the cache even after they have been updated in the database. Option C is wrong because database read replicas serve stale data asynchronously and do not cache product data in ElastiCache, failing to meet the caching freshness requirement. Option D is wrong because lazy loading with TTL still allows stale data to be served until the TTL expires or a cache miss triggers a refresh, which does not ensure that frequently accessed data is always fresh.

Practice this question →

174

MCQhard

A data engineer at a media company is managing an Amazon RDS for MySQL database that stores user profiles and preferences. The database has been running on a db.r5.large instance with 500 GB of General Purpose SSD (gp2) storage. Recently, the application team has noticed increased query latency during peak hours. Amazon CloudWatch metrics show that the ReadIOPS metric is consistently peaking at 5,000 IOPS, which is near the baseline performance of the gp2 volume (1,500 IOPS baseline for 500 GB, but with bursts up to 3,000 IOPS for short periods). The database is not CPU-bound, and memory utilization is moderate. The data engineer needs to resolve the I/O bottleneck with minimal cost increase. The company is open to changing the storage type or instance class, but wants to avoid over-provisioning. What should the data engineer do?

A.Change the storage type to General Purpose SSD (gp3) and set the provisioned IOPS to 5,000.

B.Enable Multi-AZ deployment to offload reads to the standby instance.

C.Change the storage type to Provisioned IOPS SSD (io1) and provision 5,000 IOPS.

D.Upgrade the instance to a db.r5.xlarge to get more memory and reduce I/O.

AnswerA

gp3 provides a baseline of 3,000 IOPS and can be scaled up to 5,000 at lower cost than io1.

Why this answer

Option D is correct because gp3 provides a baseline of 3,000 IOPS and 125 MB/s throughput at no additional cost, and can be increased independently. This would give 5,000 IOPS without the burst limitations of gp2. Option A is incorrect because moving to Provisioned IOPS (io1) would be more expensive and requires provisioning IOPS.

Option B is incorrect because increasing instance size to memory-optimized classes does not directly improve IOPS; it adds unnecessary memory cost. Option C is incorrect because Multi-AZ does not improve read IOPS performance; the standby is not used for reads.

Practice this question →

175

MCQhard

A company is using Amazon S3 to store sensitive customer data. The security team requires that all data be encrypted in transit and at rest. Additionally, they want to prevent any accidental public access. Which combination of actions should the data engineer take?

A.Enable default encryption with SSE-S3, enforce HTTPS only via bucket policy, and enable S3 Block Public Access.

B.Enable default encryption with SSE-KMS, allow both HTTP and HTTPS, and set bucket ACLs to private.

C.Use client-side encryption, enforce HTTPS via bucket policy, and enable S3 Block Public Access.

D.Enable default encryption with SSE-S3, allow HTTP and HTTPS, and use bucket ACLs to block public access.

AnswerA

SSE-S3 encrypts at rest, bucket policy enforces HTTPS, Block Public Access prevents public access.

Why this answer

Option A is correct because it covers all requirements. Option B allows public access via bucket policy. Option C uses HTTPS but doesn't enforce it.

Option D doesn't enforce HTTPS or block public access.

Practice this question →

176

MCQmedium

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The database experiences high write latency during peak hours. The application uses InnoDB tables. Which action would reduce write latency without changing the application code?

A.Enable storage autoscaling on the DB instance

B.Add a read replica to offload writes

C.Enable Multi-AZ on the DB instance

D.Increase the DB instance class size

AnswerD

A larger instance class provides more resources, improving write throughput.

Why this answer

Increasing the DB instance class size (Option D) provides more CPU and memory resources, which directly improves the database's ability to handle high write loads by reducing contention and speeding up InnoDB transaction processing. This action requires no application code changes and is the most direct way to address write latency caused by resource constraints.

Exam trap

The trap here is that candidates often confuse read replicas with write scaling, assuming they can offload writes, when in fact they only handle SELECT queries and do not reduce write latency on the primary.

How to eliminate wrong answers

Option A is wrong because storage autoscaling only increases storage capacity when space is low, which does not address write latency caused by CPU or memory bottlenecks. Option B is wrong because read replicas are designed to offload read traffic, not write operations; writes still go to the primary instance, so write latency remains unchanged. Option C is wrong because Multi-AZ deployment provides high availability and automatic failover, but it does not improve write performance; in fact, synchronous replication to the standby can slightly increase write latency.

Practice this question →

177

Drag & Dropmedium

Arrange the steps to set up cross-region replication for an S3 bucket.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First, enable versioning on source and destination. Then create the destination bucket, add a replication rule, and assign an IAM role for replication.

Practice this question →

178

MCQmedium

A company uses Amazon S3 to store historical financial records. A compliance policy requires that all objects be encrypted with a customer-managed key stored in AWS KMS. The bucket is already configured with SSE-S3. What is the LEAST disruptive way to change the encryption to SSE-KMS?

A.Add a bucket policy to enforce SSE-KMS.

B.Update the bucket's default encryption settings to SSE-KMS.

C.Copy all objects to a new bucket that has default encryption set to SSE-KMS.

D.Use S3 Batch Operations to apply SSE-KMS to all existing objects.

AnswerC

Copying objects to a new bucket with SSE-KMS default encryption will re-encrypt them with the new key and is straightforward.

Why this answer

Option C is correct because changing the default encryption settings of an existing bucket (SSE-S3 to SSE-KMS) does not retroactively encrypt objects that were already stored with SSE-S3. Copying all objects to a new bucket that has default encryption set to SSE-KMS ensures every object is encrypted with a customer-managed key, as the copy operation re-encrypts each object using the new bucket's default settings. This approach is the least disruptive because it avoids modifying the original bucket's configuration or policies, which could break existing applications or access patterns.

Exam trap

The trap here is that candidates assume updating default encryption settings (Option B) will retroactively encrypt existing objects, but S3 default encryption only applies to new uploads, not to objects already stored with a different encryption method.

How to eliminate wrong answers

Option A is wrong because adding a bucket policy to enforce SSE-KMS only affects future uploads and does not change the encryption of existing objects, leaving them non-compliant. Option B is wrong because updating the bucket's default encryption settings to SSE-KMS only applies to new objects; existing objects remain encrypted with SSE-S3 and are not retroactively re-encrypted. Option D is wrong because S3 Batch Operations can apply SSE-KMS to existing objects, but this process is more disruptive than copying to a new bucket, as it requires careful management of permissions, potential downtime, and does not guarantee a clean separation of old and new encryption configurations.

Practice this question →

179

MCQhard

A large e-commerce company uses Amazon DynamoDB to store shopping cart data. The table has a partition key of 'user_id' and a sort key of 'item_id'. The application performs frequent updates to the 'quantity' attribute for items in a user's cart. Recently, the operations team noticed that write requests are being throttled during peak shopping hours. The table is provisioned with 10,000 write capacity units (WCUs) and uses DynamoDB Accelerator (DAX) for read caching. The data engineer suspects that the throttling is due to hot partitions. The application uses a single AWS SDK client configured with retries. After reviewing the Amazon CloudWatch metrics, the engineer sees that the WriteThrottleEvents metric spikes for a few partition keys. The table has a high number of partitions. What should the data engineer do to resolve the throttling issue with minimal application changes?

A.Increase the provisioned write capacity to 20,000 WCUs permanently.

B.Enable DynamoDB Global Tables to distribute writes across regions.

C.Add more nodes to the DAX cluster to offload write traffic.

D.Configure DynamoDB Auto Scaling with a maximum WCU setting of 20,000 and a target utilization of 70%.

AnswerD

Auto Scaling dynamically adjusts capacity based on traffic, reducing throttling without permanent overprovisioning.

Why this answer

Option D is correct because using DynamoDB Auto Scaling with a higher maximum WCUs allows the table to scale up during peak demand, addressing hot partitions without code changes. Option A is wrong because increasing WCUs manually does not adapt to variable traffic. Option B is wrong because DAX is for reads, not writes.

Option C is wrong because Global Tables replicate data but do not increase write capacity.

Practice this question →

180

MCQmedium

A company is using Amazon RDS for MySQL with Multi-AZ deployment. They notice that during a recent failover test, the application experienced a brief write outage. The application uses a connection string that points to the RDS instance endpoint. What is the MOST likely cause of the write outage?

A.The application is using a read replica endpoint, which does not support write operations.

B.The application is using the RDS instance endpoint instead of the cluster endpoint, so it does not automatically route to the standby after failover.

C.The application is connecting through a Network Load Balancer, which is not configured for cross-zone failover.

D.The application connection pool is exhausted because the failover caused all existing connections to drop simultaneously.

AnswerB

The instance endpoint is static and remains pointed to the original primary; after failover, the application must reconnect to the new primary using the CNAME which takes time to update.

Why this answer

Option B is correct because in a Multi-AZ RDS deployment, the instance endpoint always points to the current primary instance. During a failover, the DNS record for the instance endpoint is updated to point to the new primary, but existing connections to the old primary are dropped, and the DNS change can take time to propagate. The application's connection string using the instance endpoint means it does not automatically route to the standby during the failover transition, causing a brief write outage until the DNS update completes and the application reconnects.

In contrast, using a cluster endpoint (available for Aurora, not standard RDS) or implementing retry logic in the application would mitigate this.

Exam trap

The trap here is that candidates often confuse the RDS instance endpoint with the cluster endpoint used in Amazon Aurora, assuming that Multi-AZ automatically provides a seamless, zero-downtime failover for writes, when in fact the instance endpoint requires DNS propagation and connection re-establishment.

How to eliminate wrong answers

Option A is wrong because a read replica endpoint is used for read-only traffic; while it does not support writes, the scenario describes a write outage during failover, not a persistent inability to write, and the application is using the instance endpoint, not a read replica endpoint. Option C is wrong because a Network Load Balancer is not a standard component in an RDS Multi-AZ architecture; RDS handles failover internally via DNS, and NLB is not involved in routing to RDS instances. Option D is wrong because while failover does cause existing connections to drop, connection pool exhaustion is a symptom of poor application retry logic, not the root cause of the write outage; the primary issue is the DNS propagation delay and the application's use of the instance endpoint.

Practice this question →

181

MCQhard

Refer to the exhibit. A data engineer has attached this bucket policy to an S3 bucket. What is the effect of this policy?

A.It enforces server-side encryption for all objects written to the bucket.

B.It allows the DataLakeRole to read and write objects, but only over HTTPS.

C.It allows anonymous access to the bucket for HTTPS requests.

D.It denies all access to the bucket except for requests from the DataLakeRole.

AnswerB

The allow statement grants GetObject and PutObject to the role; the deny statement blocks non-HTTPS requests for everyone.

Why this answer

Option B is correct because the bucket policy uses a condition key `aws:SecureTransport` set to `true`, which restricts access to HTTPS (TLS) connections only. The `Principal` is `DataLakeRole`, and the `Action` includes `s3:GetObject` and `s3:PutObject`, so the policy allows that role to read and write objects exclusively over HTTPS, enforcing encrypted data in transit.

Exam trap

AWS often tests the distinction between encryption in transit (HTTPS/TLS) and encryption at rest (SSE), leading candidates to confuse the `aws:SecureTransport` condition with server-side encryption requirements.

How to eliminate wrong answers

Option A is wrong because the policy does not reference `s3:x-amz-server-side-encryption` or any condition enforcing server-side encryption (SSE) at rest; it only enforces encryption in transit via `aws:SecureTransport`. Option C is wrong because the `Principal` is explicitly set to `DataLakeRole` (an IAM role ARN), not `"*"` or `{"AWS": "*"}`, so anonymous access is not granted. Option D is wrong because the policy includes an `Allow` effect for `DataLakeRole` under the HTTPS condition, but it does not contain a `Deny` statement for other principals or conditions; without an explicit `Deny`, other access may still be allowed by other policies (e.g., bucket ACLs or IAM policies), so it does not deny all other access.

Practice this question →

182

MCQeasy

A company uses Amazon S3 to store sensitive data. The security team requires that all data be encrypted at rest using a customer-managed key that is rotated annually. Which encryption option should be used?

A.SSE-KMS (Server-Side Encryption with AWS KMS).

B.SSE-S3 (Server-Side Encryption with S3-managed keys).

C.Client-side encryption.

D.SSE-C (Server-Side Encryption with Customer-Provided keys).

AnswerA

Allows customer-managed KMS key with annual rotation.

Why this answer

SSE-KMS is the correct choice because it allows you to use a customer-managed key (CMK) in AWS KMS, which you can configure to rotate automatically on an annual schedule. This satisfies the security team's requirement for encryption at rest with a key you control and rotate yearly, while still leveraging server-side encryption that integrates with S3's existing infrastructure.

Exam trap

The trap here is that candidates often confuse SSE-C with customer-managed keys, but SSE-C requires you to supply the key on every operation and does not support AWS-managed rotation, making it unsuitable for the 'rotated annually' requirement.

How to eliminate wrong answers

Option B (SSE-S3) is wrong because it uses S3-managed keys that are automatically rotated by AWS, not customer-managed keys, so you cannot control the rotation schedule or manage the key yourself. Option C (Client-side encryption) is wrong because it encrypts data before it reaches S3, which does not meet the requirement for server-side encryption at rest managed by AWS; it also places the key management burden entirely on the client, not the customer-managed key service. Option D (SSE-C) is wrong because it requires you to provide your own encryption key with each request, and AWS does not manage or rotate the key—you must handle key storage and rotation entirely outside of AWS, which contradicts the requirement for a customer-managed key that is rotated annually within AWS.

Practice this question →

183

MCQhard

Refer to the exhibit. A data engineer has attached this bucket policy to an S3 bucket named data-lake-bucket. The engineer wants to allow only GET requests from the corporate network (10.0.0.0/16) over HTTPS. However, users report that they cannot access objects even when connected to the corporate network. What is the issue?

A.The Deny statement should include a condition on the source IP.

B.The Allow statement should include a condition for SecureTransport.

C.The Allow statement should specify s3:GetObject instead of s3:GetObject.

D.The Deny statement blocks all requests that are not using HTTPS, including those from the corporate network.

AnswerD

Deny overrides Allow when condition is met.

Why this answer

Option D is correct because the Deny statement with `aws:SecureTransport` set to `false` blocks all HTTP requests. Since the Allow statement only permits GET requests from the corporate network (10.0.0.0/16) but does not require HTTPS, any request from that network that uses HTTP is denied by the explicit Deny. The Deny statement overrides the Allow, so even legitimate corporate users are blocked if they use HTTP.

Exam trap

AWS often tests the principle that an explicit Deny overrides any Allow, leading candidates to focus on fixing the Allow statement rather than recognizing that the Deny unconditionally blocks HTTP traffic from all sources, including the corporate network.

How to eliminate wrong answers

Option A is wrong because the Deny statement already includes a condition on `aws:SecureTransport`, not on source IP; adding a source IP condition would not fix the HTTPS enforcement issue. Option B is wrong because the Allow statement already includes a condition for `aws:SecureTransport` equal to `true` in the Deny, but the Allow itself lacks a SecureTransport condition, so it permits both HTTP and HTTPS; adding SecureTransport to the Allow would not resolve the Deny blocking HTTP. Option C is wrong because `s3:GetObject` is the correct action for GET requests; the typo 's3:GetObject' in the question is a red herring, and the actual policy uses the correct action.

Practice this question →

184

MCQhard

A data engineering team is responsible for an Amazon RDS for PostgreSQL instance that stores financial data. The database is 500 GB in size. The team needs to create a read replica in a different AWS Region for disaster recovery. The source database has automated backups enabled with a retention period of 7 days. The team initiates the cross-region read replica creation. After several hours, the replica status shows 'Replication Lag' of 30 minutes and is increasing. What should the team do to reduce the replication lag?

A.Modify the source DB instance to use a larger instance class.

B.Delete the replica and create a new one from a snapshot.

C.Increase the backup retention period to 35 days.

D.Enable Multi-AZ on the source database instance.

AnswerD

Multi-AZ provides a synchronous standby that reduces replication lag.

Why this answer

Option C is correct because enabling Multi-AZ on the source provides synchronous standby and reduces replication lag by offloading backups and improving stability. Option A is incorrect because increasing backup retention does not affect replication. Option B is incorrect because modifying the DB instance class may help but is not the primary solution; Multi-AZ is more effective.

Option D is incorrect because deleting and recreating may not solve the underlying issue.

Practice this question →

185

MCQmedium

A company is using Amazon RDS for MySQL and needs to automate backups with a retention period of 35 days. They also want to be able to restore to any point within the retention period. Which configuration should be used?

A.Enable manual snapshots daily and retain for 35 days.

B.Set the backup retention period to 35 days and enable automatic backups.

C.Set the backup retention period to 7 days and create daily manual snapshots.

D.Disable automated backups and rely on Multi-AZ for recovery.

AnswerB

Automated backups allow point-in-time recovery within the retention period.

Why this answer

Amazon RDS for MySQL supports automated backups with a configurable retention period of up to 35 days. By setting the backup retention period to 35 days and enabling automatic backups, RDS automatically performs daily snapshots and transaction log backups, enabling point-in-time recovery (PITR) to any second within the retention window. This meets the requirement for both a 35-day retention and full PITR capability without manual intervention.

Exam trap

The trap here is that candidates often confuse manual snapshots (which are retained indefinitely but do not support PITR) with automated backups (which support PITR but have a maximum retention of 35 days), leading them to choose Option A or C, thinking manual snapshots can extend the PITR window.

How to eliminate wrong answers

Option A is wrong because manual snapshots are not automatically taken daily and do not support point-in-time recovery; they only provide a single point-in-time restore, not continuous PITR. Option C is wrong because setting the backup retention period to 7 days limits automated backups and PITR to only 7 days, and adding daily manual snapshots does not extend the PITR window beyond 7 days. Option D is wrong because disabling automated backups eliminates both automated snapshots and transaction log backups, making PITR impossible; Multi-AZ provides high availability but does not create backups or enable recovery to any point in time.

Practice this question →

186

MCQeasy

A startup is building a real-time analytics application using Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics. The application processes clickstream data from a website. The data is also stored in Amazon S3 for historical analysis. The company uses an S3 bucket with a lifecycle policy that transitions objects to Amazon S3 Glacier Deep Archive after 30 days. The data engineering team has configured a Kinesis Data Firehose delivery stream to write data to the S3 bucket. The team notices that the data in S3 is not being transitioned to Glacier Deep Archive after 30 days. The lifecycle policy is correctly configured and has been verified. What is the most likely cause of this issue?

A.The S3 lifecycle rule is configured with a filter that does not match the prefix used by Kinesis Data Firehose.

B.The Glacier Deep Archive storage class requires a minimum 90-day storage period, so the lifecycle policy cannot transition objects after 30 days.

C.The S3 bucket is not enabled for S3 Intelligent-Tiering, which is required for lifecycle transitions to Glacier Deep Archive.

D.The S3 bucket does not have S3 Batch Operations enabled to invoke the lifecycle policy.

AnswerA

If the prefix filter doesn't match the objects' prefixes, the rule won't apply.

Why this answer

Option C is correct because Kinesis Data Firehose writes data with a prefix that includes the date, but it often uses the delivery stream's timestamp rather than the object creation date. The lifecycle rule is based on the object creation date. If the prefix does not match the rule filter, the rule may not apply.

Option A is wrong because S3 Batch Operations are not related to lifecycle transitions. Option B is wrong because Glacier Deep Archive does have a minimum 90-day storage charge, but the lifecycle rule can still transition after 30 days (though you'll pay a penalty). Option D is wrong because S3 Intelligent-Tiering is an alternative storage class, not a requirement.

Practice this question →

187

Multi-Selectmedium

A company is designing a data store for IoT sensor data that is written once and never updated. The data must be stored with high durability and low cost. Which TWO AWS storage services are most suitable? (Choose TWO.)

Select 2 answers

A.Amazon ElastiCache

B.Amazon EBS

C.Amazon S3

D.Amazon DynamoDB

E.Amazon S3 Glacier Deep Archive

AnswersC, E

S3 provides 99.999999999% durability and low cost for infrequently accessed data.

Why this answer

Amazon S3 is correct because it provides 99.999999999% (11 9's) durability, is designed for write-once-read-many (WORM) workloads, and offers low-cost storage tiers suitable for IoT sensor data that is never updated. S3's object storage model and lifecycle policies allow automatic transition to colder storage, making it ideal for immutable data at scale.

Exam trap

The trap here is that candidates often choose DynamoDB (D) for its scalability and low latency, overlooking that the question emphasizes low cost and write-once immutability, where S3 and Glacier Deep Archive are orders of magnitude cheaper per GB stored.

Practice this question →

188

Multi-Selectmedium

Which TWO actions should a data engineer take to encrypt data at rest in an Amazon S3 bucket? (Select TWO.)

Select 2 answers

A.Enable S3 Transfer Acceleration on the bucket.

B.Use client-side encryption before uploading objects to S3.

C.Configure the bucket to use SSE-KMS.

D.Enable default encryption on the bucket using SSE-S3.

E.Attach a bucket policy that denies unencrypted PUT requests.

AnswersC, D

SSE-KMS encrypts objects at rest using AWS KMS keys.

Why this answer

Option C is correct because SSE-KMS (Server-Side Encryption with AWS Key Management Service) encrypts data at rest in S3 by using a KMS key to manage encryption keys. This provides envelope encryption, where a CMK generates a data key that encrypts the object, and the data key is then encrypted by the CMK. Option D is correct because enabling default encryption on an S3 bucket using SSE-S3 (AES-256) ensures that all objects uploaded without explicit encryption headers are automatically encrypted at rest by S3's managed key.

Exam trap

The trap here is that candidates confuse enforcing encryption (via bucket policies) with actually performing encryption, or they mistakenly think client-side encryption is a bucket-level action rather than a client-side responsibility.

Practice this question →

189

Multi-Selecthard

A company uses Amazon Redshift for its data warehouse. The cluster has multiple node types and is configured with automated snapshots. The company needs to ensure high availability and disaster recovery across AWS Regions. Which THREE actions should the company take to meet these requirements? (Choose THREE.)

Select 3 answers

A.Enable automated snapshots with a retention period of at least 1 day.

B.Restore a snapshot from the secondary Region in the event of a disaster.

C.Create manual snapshots on a daily basis and copy them to another Region.

D.Configure the cluster to use multiple Availability Zones (multi-AZ) for high availability.

E.Configure cross-Region snapshot copy to replicate snapshots to another Region.

AnswersB, D, E

Restoring from a cross-Region snapshot provides DR capability.

Why this answer

Enabling cross-Region snapshot copy, restoring a snapshot to a different Region, and configuring a multi-AZ cluster (if supported) provide high availability and DR. Automated snapshots are already present. Manual snapshots are not needed for DR if automated snapshots are configured.

Resizing the cluster does not provide DR.

Practice this question →

190

MCQmedium

A data engineer is designing a data lake on Amazon S3. The data includes sensitive customer information that must be encrypted at rest. Which combination of actions meets this requirement with minimal operational overhead?

A.Enable default encryption on the S3 bucket using SSE-S3

B.Encrypt objects client-side before uploading

C.Use an S3 bucket policy to deny writes without encryption

D.Use an S3 Lifecycle policy to transition to Glacier

AnswerA

Default encryption ensures all new objects are encrypted automatically.

Why this answer

SSE-S3 provides server-side encryption with Amazon S3-managed keys, which encrypts data at rest with minimal operational overhead because AWS handles key management, rotation, and encryption/decryption transparently. Enabling default encryption on the S3 bucket ensures that all objects written to the bucket are automatically encrypted without requiring any client-side changes or additional code, meeting the requirement with the least administrative effort.

Exam trap

The trap here is that candidates often confuse 'enforcing encryption via bucket policy' (Option C) with 'automatically encrypting data' — the policy only denies unencrypted writes but does not reduce operational overhead because the client must still implement encryption logic.

How to eliminate wrong answers

Option B is wrong because client-side encryption requires the data engineer to manage encryption keys and perform encryption/decryption in the application code, adding significant operational overhead compared to server-side encryption. Option C is wrong because an S3 bucket policy that denies writes without encryption only enforces encryption at upload time but does not itself encrypt the data; it relies on the client to provide encryption headers, which still requires client-side logic and does not reduce overhead. Option D is wrong because an S3 Lifecycle policy to transition to Glacier only moves data to a different storage class for cost optimization, it does not provide encryption at rest; Glacier itself supports encryption but the lifecycle policy does not enable or enforce it.

Practice this question →

191

MCQeasy

A company needs to migrate an on-premises 10 TB PostgreSQL database to Amazon RDS for PostgreSQL with minimal downtime. Which AWS service should be used for the migration?

A.AWS Storage Gateway

B.AWS Snowball Edge

C.AWS DataSync

D.AWS Database Migration Service (DMS)

AnswerD

DMS supports continuous replication.

Why this answer

Option A is correct because AWS DMS supports ongoing replication to minimize downtime. Option B is wrong because S3 is for object storage. Option C is wrong because Snowball is for large data transfers but does not support ongoing replication.

Option D is wrong because Database Migration Service (DMS) is the correct service.

Practice this question →

192

MCQmedium

A data engineer is troubleshooting an Amazon RDS for MySQL instance that is experiencing high read latency. The instance is a Single-AZ db.r5.large with 100 GB of General Purpose (gp2) storage. Which action is most likely to reduce read latency?

A.Create a read replica and direct read queries to it.

B.Enable automatic backups with a 7-day retention.

C.Increase the allocated storage to 200 GB.

D.Convert the instance to a Multi-AZ deployment.

AnswerA

Offloads read traffic, reducing load on the primary.

Why this answer

Creating a read replica offloads SELECT queries from the primary instance, directly reducing the read load and thus read latency. Since the instance is Single-AZ and experiencing high read latency, a read replica distributes the read traffic without altering the existing storage or availability configuration.

Exam trap

The trap here is that candidates often confuse Multi-AZ with read replicas, assuming Multi-AZ provides read scaling, but Multi-AZ only provides a standby for failover, not a read endpoint.

How to eliminate wrong answers

Option B is wrong because enabling automatic backups with a 7-day retention does not reduce read latency; backups consume I/O and CPU resources, potentially increasing latency. Option C is wrong because increasing allocated storage to 200 GB on gp2 improves baseline IOPS (from 300 to 600) but does not directly address high read latency caused by read workload saturation; the issue is read demand, not storage throughput. Option D is wrong because converting to Multi-AZ provides high availability and failover support but does not reduce read latency; the standby replica is not used for read traffic unless you explicitly configure a read replica.

Practice this question →

193

MCQeasy

A company stores its application logs in Amazon S3. The logs are generated daily and need to be retained for 3 years for compliance. The logs are accessed frequently for the first 30 days, occasionally for the next 6 months, and rarely after that. The data engineering team wants to minimize storage costs while ensuring that logs are available for retrieval within 12 hours for the first 6 months and within 48 hours after that. The team also wants to automatically delete logs after 3 years. Which lifecycle policy should the team implement?

A.Transition to S3 One Zone-IA after 30 days, delete after 6 months.

B.Transition to S3 Standard-IA after 30 days, delete after 3 years.

C.Transition to S3 Standard-IA after 30 days, to S3 Glacier after 6 months, delete after 3 years.

D.Transition to S3 Standard-IA after 30 days, to S3 Glacier Deep Archive after 6 months, delete after 3 years.

AnswerD

Meets cost and retrieval time requirements.

Why this answer

Option D is correct because it aligns with the access patterns and retrieval requirements: S3 Standard-IA after 30 days for occasional access with immediate retrieval, then S3 Glacier Deep Archive after 6 months for rare access with a 12-hour retrieval time (via expedited or standard retrieval), and deletion after 3 years. This minimizes storage costs while meeting the 48-hour retrieval window for older logs.

Exam trap

The trap here is that candidates may choose Option C (S3 Glacier) thinking it is the cheapest cold storage, but S3 Glacier Deep Archive is actually the lowest-cost option for data that is rarely accessed and can tolerate a 12-hour retrieval time, which still satisfies the 48-hour requirement.

How to eliminate wrong answers

Option A is wrong because it transitions to S3 One Zone-IA after 30 days, which does not provide the durability or availability needed for compliance logs, and it deletes after 6 months instead of 3 years. Option B is wrong because it keeps logs in S3 Standard-IA for the entire 3 years, which is more expensive than transitioning to a colder storage class after 6 months, and it does not meet the cost-minimization goal. Option C is wrong because it transitions to S3 Glacier after 6 months, which has a retrieval time of 1-5 minutes for expedited or 3-5 hours for standard, exceeding the 48-hour requirement but not being the most cost-effective option; S3 Glacier Deep Archive is cheaper and still meets the 48-hour retrieval window.

Practice this question →

194

MCQmedium

A data engineer needs to store semi-structured JSON data from IoT devices. The data is written once, read rarely, but must be queryable using SQL. The storage cost must be minimized. Which storage solution should the engineer choose?

A.Store JSON in Amazon Redshift as SUPER data type

B.Store JSON in an Amazon RDS for MySQL table

C.Store JSON documents in Amazon DynamoDB and use PartiQL for queries

D.Store JSON files in Amazon S3 and use Amazon Athena for queries

AnswerD

S3 provides low-cost storage; Athena enables SQL querying over JSON.

Why this answer

Amazon S3 provides the lowest-cost storage for data that is written once and rarely read, while Amazon Athena enables serverless SQL querying directly on JSON files stored in S3. This combination minimizes storage costs because S3 charges only for the data stored and retrieval, with no minimum fees or provisioning required, and Athena charges only for the data scanned per query. The workload's write-once, read-rarely pattern aligns perfectly with S3's durability and lifecycle policies, making it the most cost-effective choice.

Exam trap

The trap here is that candidates often choose DynamoDB (Option C) because it supports JSON natively and PartiQL provides SQL-like queries, but they overlook that DynamoDB's provisioned throughput and storage costs are significantly higher than S3 for write-once, read-rarely workloads, and that Athena on S3 is the serverless, cost-optimized solution for ad-hoc SQL queries on infrequently accessed data.

How to eliminate wrong answers

Option A is wrong because Amazon Redshift is a petabyte-scale data warehouse designed for high-performance analytics on structured and semi-structured data, but it incurs significant costs for provisioned clusters even when data is rarely queried, making it unsuitable for minimizing storage costs. Option B is wrong because Amazon RDS for MySQL is a relational database that requires provisioning and paying for a database instance 24/7, and storing JSON in a MySQL table incurs overhead for indexing and transactions that are unnecessary for write-once, read-rarely data. Option C is wrong because Amazon DynamoDB is a NoSQL key-value and document database optimized for low-latency, high-throughput workloads, but its storage costs are higher than S3 for rarely accessed data, and PartiQL queries on DynamoDB still consume read capacity units, leading to ongoing costs that exceed S3+Athena for infrequent queries.

Practice this question →

195

MCQmedium

A company is using Amazon DynamoDB to store session data for a web application. The data engineer needs to ensure that the data is encrypted at rest. Which action should the data engineer take?

A.Enable encryption at rest on the DynamoDB Accelerator (DAX) cluster.

B.Use client-side encryption before writing to DynamoDB.

C.Ensure encryption at rest is enabled on the DynamoDB table (default).

D.Enable DynamoDB Time to Live (TTL) to encrypt data.

AnswerC

DynamoDB encrypts at rest by default using AWS KMS.

Why this answer

Option B is correct because DynamoDB supports encryption at rest using AWS KMS, and it is enabled by default. Option A is wrong because DynamoDB Accelerator (DAX) does not provide encryption at rest for the underlying table. Option C is wrong because enabling encryption on the client side is unnecessary and adds complexity.

Option D is wrong because the question is about encryption at rest, not TTL.

Practice this question →

196

MCQeasy

A data engineer deploys the CloudFormation template shown in the exhibit. After 60 days, what will be the storage class of objects in the bucket?

A.The objects will be in GLACIER storage class.

B.The objects will be deleted.

C.The objects will remain in STANDARD storage class because the rule is not triggered.

D.The objects will be immediately transitioned to GLACIER upon creation.

AnswerA

The rule transitions objects to GLACIER after 30 days, so by day 60, they are in GLACIER.

Why this answer

The CloudFormation template includes a lifecycle rule that transitions objects to the GLACIER storage class after 60 days. Since the rule is configured with a transition action to GLACIER, objects will be moved from their initial storage class (typically STANDARD) to GLACIER once they reach 60 days of age. This is a standard S3 lifecycle policy behavior, and the objects will not be deleted unless a separate expiration action is defined.

Exam trap

The trap here is that candidates may confuse the Days parameter with a countdown from the rule creation date rather than from the object's creation date, or assume that a lifecycle rule without an explicit expiration means objects are deleted by default.

How to eliminate wrong answers

Option B is wrong because the lifecycle rule only specifies a transition to GLACIER, not an expiration action; objects are not deleted unless explicitly configured with an expiration policy. Option C is wrong because the lifecycle rule is triggered automatically by S3 based on the object's age, and the rule will execute the transition to GLACIER after 60 days, so objects will not remain in STANDARD. Option D is wrong because the transition is not immediate upon creation; it occurs after the specified number of days (60 days) have elapsed from the object's creation date, as per the Days parameter in the lifecycle rule.

Practice this question →

197

MCQeasy

A data engineer needs to store transaction data that requires strong consistency, ACID transactions, and complex join queries. Which AWS service is most appropriate?

A.Amazon DynamoDB

B.Amazon RDS for PostgreSQL

C.Amazon S3

D.Amazon Redshift

AnswerB

RDS PostgreSQL provides ACID transactions and complex join support.

Why this answer

Amazon RDS for PostgreSQL is the most appropriate choice because it provides full ACID transaction support, strong consistency, and the ability to perform complex join queries using standard SQL. Unlike NoSQL or data warehouse solutions, PostgreSQL is a relational database that excels at enforcing referential integrity and supporting multi-table joins with advanced indexing.

Exam trap

The trap here is that candidates often confuse DynamoDB's 'eventually consistent reads' with strong consistency, or assume its limited transaction API can replace full ACID relational databases, but the question explicitly requires complex joins and ACID transactions, which only a relational database like PostgreSQL can provide.

How to eliminate wrong answers

Option A is wrong because Amazon DynamoDB is a NoSQL key-value and document database that does not support complex join queries or ACID transactions across multiple items (it only offers limited transactional APIs with restrictions). Option C is wrong because Amazon S3 is an object storage service with no support for ACID transactions, complex joins, or relational query capabilities. Option D is wrong because Amazon Redshift is a columnar data warehouse optimized for analytical queries on large datasets, not for transactional workloads requiring ACID compliance and complex joins at the row level.

Practice this question →

198

MCQeasy

A data engineer applies the above IAM policy to a user. The user attempts to upload an object to the bucket 'my-data-lake' without specifying server-side encryption. What will happen?

A.The upload fails only if the bucket has a default encryption setting

B.The upload succeeds if the bucket policy allows unencrypted uploads

C.The upload succeeds because the policy allows s3:PutObject

D.The upload fails because the condition requires encryption

AnswerD

The condition requires AES256 encryption; not provided.

Why this answer

The IAM policy includes a condition that requires the `s3:x-amz-server-side-encryption` header to be present with a value of `AES256`. When the user attempts to upload an object without specifying server-side encryption, the condition is not satisfied, so the `s3:PutObject` permission is denied. This causes the upload to fail, regardless of any bucket default encryption settings or bucket policies.

Exam trap

The trap here is that candidates assume bucket default encryption automatically satisfies an IAM condition requiring the encryption header, but the condition checks the request headers, not the bucket's configuration.

How to eliminate wrong answers

Option A is wrong because the failure is due to the IAM policy condition, not the bucket's default encryption setting; even if the bucket has no default encryption, the IAM policy still denies the upload. Option B is wrong because the bucket policy is irrelevant here—the IAM policy explicitly denies unencrypted uploads via a condition, and bucket policies cannot override an explicit IAM deny. Option C is wrong because while the policy allows `s3:PutObject` in general, the condition `s3:x-amz-server-side-encryption` must be met; without it, the permission is effectively denied.

Practice this question →

199

MCQhard

A data engineer is troubleshooting an Amazon Redshift cluster that has been experiencing slow query performance. The engineer checks the system tables and finds that many queries are waiting on 'wlm_queued' time. The cluster has 10 nodes and uses automatic WLM. What is the most likely cause?

A.Network bandwidth saturation between nodes.

B.Sorting operations are too expensive.

C.The number of concurrent queries exceeds the available query slots.

D.Insufficient disk space on the cluster.

AnswerC

Automatic WLM limits concurrency, and queries queue when exceeded.

Why this answer

Option D is correct because 'wlm_queued' time indicates queries are waiting for a slot in a queue, often due to concurrency scaling limits or insufficient queue slots. With automatic WLM, the number of concurrent queries is limited by the number of slices (usually 2 per node, so 20 concurrent queries). If many queries are submitted, they queue.

Option A (disk) would show disk-related waits. Option B (network) would show network waits. Option C (sort) would show sort-related waits.

Practice this question →

200

Multi-Selecthard

Which THREE factors should a data engineer consider when choosing between Amazon RDS and Amazon DynamoDB for a new application? (Choose THREE.)

Select 3 answers

A.Requirement for ACID transactions across multiple tables.

B.Expected latency requirements for read/write operations.

C.Need for complex joins and relationships.

D.Ability to encrypt data at rest.

E.Support for multi-region disaster recovery.

AnswersA, B, C

RDS offers full ACID; DynamoDB transactions are limited.

Why this answer

RDS is relational and supports complex joins, DynamoDB is NoSQL with flexible schema. RDS offers ACID transactions natively, while DynamoDB supports transactions with limitations. DynamoDB provides single-digit millisecond latency at scale; RDS latency can be higher for complex queries.

Options A, B, and C are correct. Option D: both support encryption at rest. Option E: both support multi-AZ deployments.

Practice this question →

201

MCQmedium

A data engineer needs to migrate an on-premises MySQL database to Amazon RDS for MySQL with minimal downtime. Which approach should they use?

A.Use mysqldump to export the database and import into RDS.

B.Use AWS Database Migration Service (DMS) with ongoing replication from the source database.

C.Create an RDS read replica and promote it.

D.Use AWS Schema Conversion Tool (SCT) to convert the schema and then copy data.

AnswerB

DMS with ongoing replication minimizes downtime by continuously syncing changes.

Why this answer

AWS DMS with ongoing replication (change data capture, CDC) is the correct approach because it allows continuous synchronization from the on-premises MySQL source to the RDS target, enabling a cutover with minimal downtime. Unlike one-time export/import tools, DMS captures ongoing changes during the migration, so the target stays up-to-date until you switch over.

Exam trap

The trap here is that candidates confuse 'minimal downtime' with 'zero data loss' and assume a simple dump/import or a read replica (which only works for RDS-to-RDS) is sufficient, overlooking the need for ongoing replication to keep the target synchronized during the migration window.

How to eliminate wrong answers

Option A is wrong because mysqldump performs a logical backup that requires the source database to be locked or read-only during the dump, causing significant downtime; it also does not support ongoing replication. Option C is wrong because RDS read replicas can only be created from an existing RDS instance, not from an on-premises database, and promoting a replica does not migrate data from an external source. Option D is wrong because AWS Schema Conversion Tool (SCT) is designed for heterogeneous migrations (e.g., Oracle to Aurora) and does not handle data replication; for a homogeneous MySQL-to-MySQL migration, SCT is unnecessary and does not provide ongoing sync.

Practice this question →

202

MCQeasy

A data engineer needs to store semi-structured JSON data that is accessed infrequently but must be retrievable within minutes when needed. The data will be stored for 7 years for compliance. Which storage solution is MOST cost-effective?

A.Amazon S3 One Zone-Infrequent Access

B.Amazon S3 Standard

C.Amazon S3 Intelligent-Tiering

D.Amazon S3 Glacier Deep Archive

AnswerD

Lowest cost for long-term archival with retrieval within 12 hours (standard) or minutes (expedited at extra cost).

Why this answer

Option D is correct because S3 Glacier Deep Archive is the lowest-cost storage for long-term archival with retrieval times within minutes (standard retrieval). Option A is wrong because S3 Standard is more expensive for infrequent access. Option B is wrong because S3 Intelligent-Tiering has monitoring costs and may not be optimal for long-term archival.

Option C is wrong because S3 One Zone-IA is not durable for long-term compliance.

Practice this question →

203

MCQhard

A company uses DynamoDB with global tables in two AWS Regions. The data engineer observes that a write to the table in us-east-1 is not immediately visible in a read from eu-west-1. What is the most likely reason?

A.Replication between regions is eventually consistent.

B.The read is using strongly consistent reads.

C.There is a write conflict that needs to be resolved.

D.DynamoDB Streams is not enabled on the table.

AnswerA

Global tables replicate asynchronously, so there is a propagation delay.

Why this answer

DynamoDB global tables use asynchronous replication between regions. When a write occurs in us-east-1, the change is propagated to eu-west-1 with a replication lag that is typically sub-second but not instantaneous. Reads in eu-west-1 are eventually consistent by default, meaning they may not reflect the most recent write until replication completes.

This is the expected behavior of DynamoDB global tables, which prioritize availability and partition tolerance over immediate consistency across regions.

Exam trap

The trap here is that candidates often assume DynamoDB global tables provide strong consistency across regions because they are familiar with single-region strongly consistent reads, but the exam tests the specific knowledge that cross-region replication is always eventually consistent and that strongly consistent reads are only valid within the same region.

How to eliminate wrong answers

Option B is wrong because strongly consistent reads would actually increase the chance of seeing stale data in a cross-region scenario, as they are only guaranteed to return the most recent write within the same region, not across regions; DynamoDB does not support cross-region strongly consistent reads. Option C is wrong because write conflicts in global tables are automatically resolved using a last-writer-wins algorithm based on the timestamp, and they do not cause writes to be invisible; a conflict would result in one write being overwritten, not a delay in visibility. Option D is wrong because DynamoDB Streams is not required for global table replication; global tables use their own internal replication mechanism, and enabling Streams is optional for change data capture or triggering Lambda functions, not for the core replication functionality.

Practice this question →

204

MCQhard

A company uses Amazon Redshift for analytics. A data engineer notices that queries are slow due to high disk usage on the compute nodes. The engineer needs to reclaim disk space without interrupting ongoing queries. Which action should the engineer take?

A.Use the COPY command to reload data

B.Run VACUUM FULL on all tables

C.Run VACUUM DELETE to reclaim space from deleted rows

D.Resize the cluster to a larger instance type

AnswerC

VACUUM DELETE reclaims space without exclusive locks and can run concurrently.

Why this answer

Option C is correct because VACUUM DELETE specifically reclaims disk space from deleted rows without requiring an exclusive table lock, allowing ongoing queries to continue. In Amazon Redshift, deleted rows consume disk space until reclaimed, and VACUUM DELETE operates in the background to free that space while maintaining query concurrency.

Exam trap

The trap here is that candidates confuse VACUUM FULL with VACUUM DELETE, assuming any VACUUM operation reclaims space without considering the lock requirement and interruption to queries.

How to eliminate wrong answers

Option A is wrong because the COPY command loads data into Redshift but does not reclaim disk space; it only adds new data, potentially worsening disk usage. Option B is wrong because VACUUM FULL reclaims space and resorts rows but requires an exclusive table lock, which interrupts ongoing queries and is not suitable for a no-interruption requirement. Option D is wrong because resizing the cluster to a larger instance type adds more storage capacity but does not reclaim existing disk space; it also involves a temporary interruption during the resize process.

Practice this question →

205

Multi-Selecthard

A company uses Amazon Redshift for analytics. They notice that some queries are slow due to data redistribution. The data engineer wants to minimize data movement across nodes. Which table design strategy should be used? (Choose TWO.)

Select 2 answers

A.Set the distribution style to AUTO for all tables.

B.Define compound sort keys on frequently filtered columns.

C.Choose a distribution key that matches the join key for large tables.

D.Use EVEN distribution for all tables.

E.Use distribution style ALL for small dimension tables.

AnswersC, E

Matching distribution keys on joined tables keeps data co-located.

Why this answer

Option C is correct because when large tables are joined on their distribution keys, Redshift can perform a collocated join, meaning the matching rows are already on the same node slice, eliminating the need to redistribute data across the network. This directly minimizes data movement and speeds up query execution.

Exam trap

The trap here is that candidates often confuse distribution keys with sort keys, thinking that sorting alone can reduce data movement, or they assume AUTO distribution always optimizes for joins, when in fact it may default to EVEN or ALL without guaranteeing collocation for specific join patterns.

Practice this question →

206

Drag & Dropmedium

Order the steps to set up an Amazon EMR cluster for processing data in S3 using Spark.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First, prepare the S3 bucket. Then launch the EMR cluster with Spark, configure instances, submit the job, and terminate.

Practice this question →

207

MCQmedium

A company is migrating its on-premises Oracle database to Amazon Aurora PostgreSQL. The migration must have minimal downtime. The source database is 2 TB and runs on a single server. Which AWS service should be used for the migration?

A.AWS DataSync

B.Amazon S3 Transfer Acceleration

C.AWS Database Migration Service (DMS)

D.AWS Snowball Edge

AnswerC

DMS provides minimal downtime migration with change data capture.

Why this answer

AWS Database Migration Service (DMS) is the correct choice because it supports homogeneous migrations from Oracle to Amazon Aurora PostgreSQL with minimal downtime using ongoing replication (change data capture). DMS can handle a 2 TB source database by performing a full load followed by continuous replication of changes from the Oracle redo logs, allowing the target Aurora database to stay nearly in sync until cutover.

Exam trap

The trap here is that candidates may confuse AWS DataSync or Snowball Edge as viable for database migrations because they handle large data volumes, but they lack the schema conversion and ongoing replication capabilities required for minimal-downtime database migrations.

How to eliminate wrong answers

Option A is wrong because AWS DataSync is designed for moving large datasets over the network between on-premises storage and AWS storage services (e.g., S3, EFS, FSx), not for database migrations with schema conversion and ongoing replication. Option B is wrong because Amazon S3 Transfer Acceleration only speeds up uploads to S3 buckets over the internet using optimized network paths; it does not migrate databases or handle schema conversion and CDC. Option D is wrong because AWS Snowball Edge is a physical data transfer device for offline bulk data movement, which would introduce significant downtime and cannot perform live replication or schema conversion for a database migration.

Practice this question →

208

MCQeasy

A data engineer needs to store semi-structured JSON data for a real-time analytics application. The data will be queried using SQL-like statements and must support high-speed ingestion with minimal latency. Which AWS service is best suited for this use case?

A.Amazon S3

B.Amazon Redshift

C.Amazon DynamoDB

D.Amazon Kinesis Data Analytics

AnswerD

Kinesis Data Analytics can query streaming data using SQL in real time.

Why this answer

Amazon Kinesis Data Analytics is best suited because it natively processes streaming JSON data using SQL-like statements (via Kinesis Data Analytics for SQL applications) with sub-second latency, enabling real-time analytics on semi-structured data without requiring a separate storage layer for ingestion. It directly integrates with Kinesis Data Streams or Firehose for high-speed ingestion and supports in-application queries on JSON payloads using the `json_extract` function or schema discovery.

Exam trap

The trap here is that candidates often confuse 'SQL-like queries' with traditional relational databases and pick Amazon Redshift, overlooking that Kinesis Data Analytics provides SQL-on-streaming capabilities specifically designed for real-time, semi-structured data without the batch-oriented latency of data warehouses.

How to eliminate wrong answers

Option A is wrong because Amazon S3 is an object store optimized for batch storage and retrieval, not for real-time streaming ingestion or SQL querying with low latency; querying JSON in S3 via Athena or S3 Select incurs seconds of overhead and does not support continuous, sub-second analytics. Option B is wrong because Amazon Redshift is a data warehouse designed for complex analytical queries on large, structured datasets, not for high-speed, real-time ingestion of semi-structured JSON; loading JSON into Redshift requires batch COPY commands or streaming via Kinesis Firehose with transformation, adding latency and complexity. Option C is wrong because Amazon DynamoDB is a NoSQL key-value and document database that supports JSON documents but does not natively support SQL-like queries; it uses a limited query language (PartiQL) with no support for complex analytical SQL operations like joins or aggregations, and its write capacity is constrained by provisioned throughput, making it unsuitable for high-speed ingestion with minimal latency for analytics.

Practice this question →

209

MCQhard

A data engineer is designing a data lake on Amazon S3. The data is ingested from multiple sources in Parquet format, partitioned by date. The engineer needs to ensure that queries using Amazon Athena are cost-effective and perform well. Which approach should the engineer take?

A.Store data in uncompressed CSV format and partition by year, month, day, hour.

B.Use JSON format with Snappy compression and partition by date only.

C.Use Gzip-compressed CSV files with no partitioning.

D.Use Parquet format with Snappy compression and partition by year, month, day.

AnswerD

Parquet is columnar, reducing I/O, and partitioning limits data scanned.

Why this answer

Option C is correct because partitioning and columnar storage reduce data scanned. Option A increases cost due to many small files. Option B is less efficient than Parquet.

Option D compresses but doesn't partition.

Practice this question →

210

MCQmedium

A data engineer runs the AWS CLI command to retrieve the lifecycle configuration of the 'my-data-lake' bucket. The output is shown in the exhibit. What is the effect of this lifecycle policy?

A.Objects in the 'logs/' prefix are deleted after 365 days and their delete markers are removed.

B.All objects in the bucket are moved to STANDARD_IA after 30 days.

C.Objects in the 'logs/' prefix are moved to S3 Standard-IA after 30 days, to Glacier after 90 days, and deleted after 365 days.

D.Objects in the 'logs/' prefix are moved to Glacier after 90 days and expired after 90 days.

AnswerC

Matches the transitions and expiration.

Why this answer

Option C is correct because the lifecycle policy explicitly applies to the 'logs/' prefix, transitioning objects to S3 Standard-IA after 30 days, then to Glacier after 90 days, and finally expiring (deleting) them after 365 days. The 'Expiration' action with 'Days: 365' permanently removes the objects, while the 'Transitions' define the storage class changes at the specified intervals.

Exam trap

The trap here is that candidates often overlook the prefix filter and assume the policy applies to the entire bucket, or they misread the expiration as occurring at 90 days instead of 365 days, leading to incorrect answers like B or D.

How to eliminate wrong answers

Option A is wrong because the lifecycle policy does not include any action to remove delete markers; the 'Expiration' action simply deletes the objects after 365 days, and delete marker removal would require a separate 'ExpiredObjectDeleteMarker' setting. Option B is wrong because the policy only applies to objects under the 'logs/' prefix, not to all objects in the bucket, and the transition to STANDARD_IA occurs after 30 days, not immediately. Option D is wrong because it omits the initial transition to S3 Standard-IA after 30 days and incorrectly states that objects are expired after 90 days, whereas the actual expiration is after 365 days.

Practice this question →

211

MCQhard

A data engineering team uses Amazon Redshift for analytics. They notice that queries on a large fact table are slow. The table is distributed using DISTSTYLE ALL. Which design change would most likely improve query performance?

A.Change DISTSTYLE to EVEN to distribute rows evenly across slices.

B.Increase the number of nodes in the Redshift cluster.

C.Change the table to use a SORTKEY on the most frequently filtered column.

D.Change DISTSTYLE to KEY on a column used in frequent joins.

AnswerD

KEY distribution collocates rows on the same node, reducing data movement during joins.

Why this answer

DISTSTYLE ALL copies the entire table to every node, which is inefficient for large fact tables because it wastes storage and network bandwidth during data loading and query execution. Changing to DISTSTYLE KEY on a column used in frequent joins collocates related rows on the same slice, reducing the need to broadcast or redistribute data across the network during joins, which directly improves query performance.

Exam trap

The trap here is that candidates often assume adding a SORTKEY (Option C) is the universal performance fix, but for large fact tables the dominant bottleneck is data distribution and join collocation, not scan efficiency.

How to eliminate wrong answers

Option A is wrong because DISTSTYLE EVEN distributes rows randomly across slices, which can still cause significant data movement during joins and does not leverage join key locality, often leading to slower queries on large fact tables. Option B is wrong because simply increasing the number of nodes adds more slices and parallelism but does not address the root cause of inefficient data distribution; it may even worsen the overhead of broadcasting the ALL-distributed table. Option C is wrong because adding a SORTKEY improves the efficiency of range-restricted scans and ORDER BY operations, but it does not reduce the network shuffling required during joins, which is the primary bottleneck for a large fact table with DISTSTYLE ALL.

Practice this question →

212

MCQmedium

Refer to the exhibit. A data engineer runs the above CLI command and sees the output. The security team requires that the RDS instance not be accessible from the internet. Which change should the engineer make?

A.Change the storage type to io1 for better performance.

B.Modify the DB instance to set PubliclyAccessible to false.

C.Enable Multi-AZ deployment to improve security.

D.Update the VPC security group to deny inbound traffic from 0.0.0.0/0.

AnswerB

This removes the public IP address.

Why this answer

Option A is correct because setting PubliclyAccessible to false removes public IP. Option B (Multi-AZ) is for high availability. Option C (storage type) does not affect accessibility.

Option D (security group) can restrict but public IP is still assigned.

Practice this question →

213

MCQmedium

A data engineer is designing a data lake on Amazon S3 and needs to ensure that objects are automatically encrypted at rest using server-side encryption with AWS KMS. Which bucket policy statement achieves this?

A.Deny PutObject requests where the x-amz-server-side-encryption header is not set to aws:kms.

B.Deny PutObject requests that do not include the x-amz-server-side-encryption header.

C.Deny PutObject requests where the x-amz-server-side-encryption header is not set to AES256.

D.Allow PutObject requests only if the x-amz-server-side-encryption header is set to AES256.

AnswerA

Enforces SSE-KMS encryption.

Why this answer

Option A is correct because it enforces server-side encryption with AWS KMS (SSE-KMS) by denying any PutObject request that does not include the `x-amz-server-side-encryption` header set to `aws:kms`. This bucket policy ensures that all objects written to the S3 bucket are automatically encrypted at rest using AWS KMS, meeting the requirement for mandatory encryption with a specific key management service.

Exam trap

The trap here is that candidates often confuse the encryption header values (`aws:kms` vs `AES256`) and mistakenly choose an option that enforces SSE-S3 (AES256) instead of SSE-KMS, or they pick a Deny statement that only checks for the presence of the header without validating its specific value.

How to eliminate wrong answers

Option B is wrong because it denies PutObject requests that do not include the `x-amz-server-side-encryption` header at all, but it does not enforce the use of `aws:kms`; a request with the header set to `AES256` (SSE-S3) would still be denied, which is overly restrictive and not aligned with the requirement for KMS encryption. Option C is wrong because it denies PutObject requests where the header is not set to `AES256`, which would enforce SSE-S3 instead of SSE-KMS, directly contradicting the requirement for AWS KMS encryption. Option D is wrong because it allows PutObject requests only if the header is set to `AES256`, which again enforces SSE-S3, not SSE-KMS, and an Allow statement alone does not block requests that omit the header entirely, leaving a gap for unencrypted uploads.

Practice this question →

214

MCQmedium

A company is using an Amazon RDS for MySQL database for an e-commerce application. During a sales event, the database experiences high read traffic, causing slow query performance. The company wants to reduce the read load on the primary database without changing the application code. Which solution meets these requirements?

A.Enable Multi-AZ on the RDS instance.

B.Create an Amazon RDS read replica and direct read traffic to it.

C.Increase the instance size of the RDS database.

D.Deploy Amazon ElastiCache to cache query results.

AnswerB

Read replicas handle read-only traffic, reducing load on the primary.

Why this answer

Option B is correct because an RDS read replica offloads read queries from the primary DB without application changes. Option A is wrong because ElastiCache requires code changes to cache queries. Option C is wrong because Multi-AZ is for high availability, not read scaling.

Option D is wrong because increasing instance size helps but may require downtime and is less efficient.

Practice this question →

215

MCQmedium

A data engineer is troubleshooting a slow-running query on an Amazon Redshift cluster. The query involves joining two large tables. The engineer notices that the query plan shows a large number of distribution and broadcast operations. Which design change would most likely improve query performance?

A.Change the distribution style of both tables to ALL

B.Change the distribution style of both tables to KEY on the join column

C.Change the distribution style of both tables to EVEN

D.Add a sort key on the join column

AnswerB

KEY distribution on the join column ensures matching rows are on the same node, reducing redistribution.

Why this answer

Option B is correct because changing the distribution style of both tables to KEY on the join column ensures that rows with the same join key value are co-located on the same node. This eliminates the need for expensive broadcast or redistribution operations during the join, as Redshift can perform the join locally on each slice without moving data across the network.

Exam trap

The trap here is that candidates often confuse distribution and sort keys, thinking a sort key on the join column will reduce data movement, when in fact only distribution key alignment eliminates broadcast/redistribution operations in the query plan.

How to eliminate wrong answers

Option A is wrong because setting both tables to ALL distribution replicates the entire table to every node, which increases storage and maintenance overhead, and does not address the root cause of excessive data movement during joins; it can also degrade performance for large tables due to increased load and memory pressure. Option C is wrong because EVEN distribution distributes rows round-robin across nodes, which does not co-locate join keys and forces Redshift to redistribute or broadcast rows during the join, exacerbating the problem. Option D is wrong because adding a sort key on the join column improves the efficiency of range-restricted scans and merge joins but does not reduce the number of distribution or broadcast operations; the query plan's large number of such operations indicates a distribution mismatch, not a sorting issue.

Practice this question →

216

Matchingmedium

Match each AWS Glue component to its role.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Scans data sources and populates catalog

Central metadata repository

Transform and load data

Orchestrates multiple jobs and crawlers

Interactive development environment

Why these pairings

Glue components work together for ETL.

Practice this question →

217

MCQmedium

A company uses Amazon S3 to store historical stock market data as CSV files. They run daily Amazon Athena queries to generate reports. Recently, the finance team reported that queries are timing out and costs have increased significantly. The data engineering team notices that the S3 bucket contains thousands of small files (average 100 KB) due to a misconfigured ingestion pipeline. They need to improve query performance and reduce costs without changing the existing reporting schedule. The team has access to AWS Glue and can create new tables. Which solution should they implement?

A.Partition the data by date and create a new Athena table with partitions.

B.Use S3 Select to filter rows within each file before Athena processes them.

C.Increase the Athena query timeout to 30 minutes.

D.Use AWS Glue ETL to read the CSV files, convert them to Parquet, and write them back to S3 in fewer, larger files.

AnswerD

Consolidates small files and uses columnar format to reduce scan size.

Why this answer

Option D is correct because converting the thousands of small CSV files into fewer, larger Parquet files using AWS Glue ETL directly addresses the root cause of poor Athena performance and high costs. Parquet is a columnar format that reduces the amount of data scanned per query, and larger files minimize the overhead of S3 LIST and GET operations, improving throughput. This solution does not change the reporting schedule and leverages existing Glue capabilities to create new optimized tables.

Exam trap

The trap here is that candidates often assume partitioning (Option A) is a universal performance fix, but they overlook that partitioning does not address the 'small files problem' which is a distinct performance killer in Athena due to S3 request overhead and file open costs.

How to eliminate wrong answers

Option A is wrong because partitioning by date does not solve the problem of thousands of tiny files; while partitioning can help prune scanned data, the overhead of reading many small files per partition still causes high latency and cost due to excessive S3 API calls. Option B is wrong because S3 Select operates at the object level to filter rows within a single file, but it does not consolidate files or change the file format; Athena would still need to process thousands of small files, and S3 Select cannot be used directly within Athena queries to replace table scans. Option C is wrong because increasing the query timeout does not reduce the amount of data scanned or the number of S3 requests; it merely allows the query to run longer without addressing the performance bottleneck or cost issue.

Practice this question →

218

MCQmedium

A company is migrating an on-premises Hadoop cluster to AWS. The cluster processes large files in CSV format using Apache Spark. Which data store should be used as the primary storage for the data lake to optimize cost and performance?

A.Amazon EMR File System (EMRFS) backed by HDFS

B.Amazon RDS for MySQL

C.Amazon EBS volumes attached to the EMR cluster

D.Amazon S3

AnswerD

S3 provides unlimited storage, high durability, and integrates with EMR via EMRFS.

Why this answer

Option C is correct because Amazon S3 is the best choice for data lake storage due to its durability, scalability, and cost-effectiveness. Option A is wrong because EBS is block storage for EC2 instances, not suitable for large-scale data lakes. Option B is wrong because EMRFS is a connector for S3, not a separate storage.

Option D is wrong because RDS is relational and not designed for large file storage.

Practice this question →

219

Multi-Selecthard

Which THREE factors should a data engineer consider when choosing between Amazon RDS and Amazon DynamoDB for a new application? (Choose three.)

Select 3 answers

A.Whether the workload requires serverless scaling.

B.Whether the data model is relational or key-value.

C.Whether the data must be encrypted at rest by default.

D.Whether the application requires VPC isolation.

E.Whether the application needs to scale horizontally for high throughput.

AnswersA, B, E

DynamoDB is serverless; RDS requires manual scaling.

Why this answer

Options B, C, and D are correct. B: relational vs NoSQL. C: DynamoDB is serverless; RDS requires provisioning.

D: DynamoDB scales horizontally; RDS scales vertically. A is irrelevant because both can use VPC. E is not a primary factor.

Practice this question →

220

MCQeasy

A data engineer needs to set up a new Amazon RDS for MySQL database for a web application. The application experiences variable read traffic and requires low read latency. The engineer needs to minimize downtime during maintenance and provide read scalability. Which configuration meets these requirements?

A.Multi-AZ db.r5.large instance with two Read Replicas

B.Multi-AZ db.r5.large instance

C.Single-AZ db.r5.large instance

D.Single-AZ db.r5.xlarge instance

AnswerA

Multi-AZ provides failover, and Read Replicas provide read scalability.

Why this answer

Option A is correct because a Multi-AZ deployment provides high availability and automatic failover to minimize downtime during maintenance, while adding two Read Replicas offloads read traffic from the primary instance, reducing read latency and enabling read scalability. The db.r5.large instance size is sufficient for the variable read workload, and Read Replicas can be promoted to standalone instances if needed.

Exam trap

The trap here is that candidates often assume Multi-AZ alone provides read scalability, but Multi-AZ only provides high availability and failover, not read offloading—Read Replicas are required for read scaling.

How to eliminate wrong answers

Option B is wrong because a Multi-AZ instance alone provides high availability and failover but does not offer read scalability or reduce read latency for variable read traffic, as all reads still hit the primary instance. Option C is wrong because a Single-AZ instance lacks high availability, meaning any maintenance or failure causes downtime, and it provides no read scalability. Option D is wrong because a Single-AZ db.r5.xlarge instance, while larger, still lacks high availability and read scalability; scaling vertically does not address variable read traffic efficiently and does not minimize downtime during maintenance.

Practice this question →

221

MCQhard

A data engineer is migrating an on-premises Oracle database to Amazon RDS for Oracle. The database is 5 TB in size and has a 1 Gbps network connection. The migration must be completed within 48 hours. Which service should be used?

A.AWS DataSync.

B.Amazon S3 Transfer Acceleration.

C.AWS Snowball Edge.

D.AWS Database Migration Service (DMS).

AnswerD

Online migration over network, capable of migrating 5 TB within 48 hours.

Why this answer

AWS DMS is the correct choice because it is designed for migrating databases to AWS with minimal downtime, and it can handle a 5 TB Oracle database over a 1 Gbps network within 48 hours. DMS supports ongoing replication to keep the source and target in sync, and it can use Oracle-specific features like supplemental logging and change data capture (CDC) to reduce migration time. The 1 Gbps connection provides sufficient bandwidth to transfer 5 TB in under 12 hours at full utilization, leaving ample time for setup and validation.

Exam trap

The trap here is that candidates might choose Snowball Edge (option C) thinking 5 TB is too large for a 1 Gbps connection within 48 hours, but they overlook that the bandwidth is sufficient (5 TB at 1 Gbps takes ~11 hours), making an online migration via DMS the correct and more practical choice.

How to eliminate wrong answers

Option A is wrong because AWS DataSync is designed for moving large volumes of file data (e.g., NFS, SMB) to Amazon S3 or EFS, not for database migrations; it cannot handle Oracle-specific schema, stored procedures, or ongoing replication. Option B is wrong because Amazon S3 Transfer Acceleration is a feature for speeding up uploads to S3 buckets over long distances using AWS edge locations, but it does not migrate databases or support Oracle database engines. Option C is wrong because AWS Snowball Edge is a physical device for offline data transfer when network bandwidth is insufficient (e.g., less than 1 Gbps or limited time), but here the 1 Gbps connection is adequate to transfer 5 TB within 48 hours, making an online migration via DMS more efficient and less complex.

Practice this question →

222

MCQhard

A data engineer is designing a data warehouse using Amazon Redshift. The workload includes complex queries that join large tables. The engineer notices that queries are slow due to disk-based operations. Which configuration change would MOST improve query performance?

A.Define appropriate sort keys on the large tables.

B.Increase the number of slices per node by choosing a different node type.

C.Choose an appropriate distribution style (e.g., KEY or ALL) for the tables.

D.Enable compression on all columns.

AnswerC

Proper distribution minimizes data movement across nodes, reducing disk I/O for joins.

Why this answer

Option C is correct because choosing an appropriate distribution style (KEY or ALL) minimizes data movement between nodes during query execution. In Amazon Redshift, disk-based operations often result from large volumes of data being redistributed across the network for joins. By colocating related data on the same slices via KEY distribution or replicating small tables with ALL distribution, you reduce the need for broadcast or redistribution, which directly alleviates disk-based spills and improves query performance.

Exam trap

The trap here is that candidates often confuse sort keys (which improve scan efficiency) with distribution keys (which reduce data movement), leading them to choose sort keys when the real bottleneck is disk-based operations from join-related data shuffling.

How to eliminate wrong answers

Option A is wrong because sort keys primarily optimize data skipping and range-restricted scans, not the data movement or disk spills caused by large joins. Option B is wrong because increasing the number of slices per node (by choosing a different node type) does not inherently reduce disk-based operations; it may even increase network shuffling if distribution is not optimized, and the bottleneck is often data redistribution, not slice count. Option D is wrong because compression reduces storage size and I/O for scans, but it does not address the root cause of disk-based operations during joins, which is excessive data movement and intermediate result spills.

Practice this question →

223

MCQhard

An IAM policy is attached to a user who tries to upload an object to the S3 bucket example-bucket using the AWS CLI without specifying the --server-side-encryption flag. What will happen?

A.The upload fails with an AccessDenied error.

B.The upload succeeds and the object is encrypted with SSE-S3 by default.

C.The upload fails because the user does not have permission to use KMS.

D.The upload succeeds because the policy allows s3:PutObject.

AnswerA

The condition is not satisfied, so the upload is denied.

Why this answer

The IAM policy denies s3:PutObject unless the request includes the `x-amz-server-side-encryption` header with a value of `AES256`. Since the user did not specify `--server-side-encryption` in the AWS CLI command, the request lacks this required header, causing S3 to evaluate the policy and return an AccessDenied error. The upload fails before any default encryption setting on the bucket is applied.

Exam trap

AWS often tests the misconception that bucket default encryption automatically satisfies an IAM policy requiring encryption headers, but in reality, the policy condition is evaluated first and the request is denied if the header is missing, regardless of the bucket's default encryption setting.

How to eliminate wrong answers

Option B is wrong because the bucket's default encryption (SSE-S3) only applies when the PutObject request does not include an encryption header and the policy does not explicitly require one; here the policy requires the header, so the request is denied before default encryption can take effect. Option C is wrong because the policy does not mention KMS at all; the error is due to the missing `x-amz-server-side-encryption` header, not any KMS permission issue. Option D is wrong because the policy condition `s3:x-amz-server-side-encryption` is not satisfied, so the `s3:PutObject` action is effectively denied despite the user having the action allowed in the policy.

Practice this question →

224

MCQeasy

A data engineer is designing a data lake on Amazon S3. The data is ingested from multiple sources and needs to be partitioned by year, month, day, and event type for efficient querying with Amazon Athena. Which S3 key prefix structure is most appropriate?

A.s3://bucket/events/2024-01-01/event_type=data.parquet

B.s3://bucket/2024/01/01/event_type/events/data.parquet

C.s3://bucket/event_type=events/year=2024/month=01/day=01/data.parquet

D.s3://bucket/event_type=events/year=2024/month=01/day=01/data.parquet

AnswerC

This uses Hive-style partitioning with partition column names, which Athena supports.

Why this answer

Option C uses Hive-style partitioning (event_type=events/year=2024/month=01/day=01), which Athena and other query engines natively support. This structure allows Athena to perform partition pruning, reading only the relevant directories based on WHERE clause filters, significantly reducing data scanned and improving query performance.

Exam trap

AWS often tests the distinction between Hive-style partitioning (key=value) and flat or date-only prefixes, where candidates mistakenly choose a structure that does not support partition pruning or is incompatible with Athena's partition discovery.

How to eliminate wrong answers

Option A is wrong because it embeds the date as a single prefix (2024-01-01) and places event_type as a filename suffix, which does not create separate partition directories; Athena cannot prune partitions efficiently without explicit partition columns. Option B is wrong because it uses a date-only hierarchy (year/month/day) but does not include event_type as a partition column, forcing full scans when filtering by event type. Option D is identical to C and is also correct, but the question expects the most appropriate structure; since both C and D are the same, the intended correct answer is C (the first occurrence).

Practice this question →

225

Multi-Selecthard

A company is using Amazon S3 for a data lake. The data engineer needs to ensure that all new objects are automatically encrypted with a customer-managed KMS key and that the bucket policy enforces encryption. Which THREE steps should be taken? (Choose THREE.)

Select 3 answers

A.Configure default encryption on the bucket to use SSE-KMS.

B.Add a bucket policy that denies PutObject if the x-amz-server-side-encryption header is not set to aws:kms.

C.Create a customer-managed KMS key.

D.Use a lifecycle policy to apply encryption to existing objects.

E.Enable AWS CloudTrail to monitor encryption.

AnswersA, B, C

Ensures new objects are encrypted with SSE-KMS by default.

Why this answer

Option A is correct because configuring default encryption on the S3 bucket to use SSE-KMS ensures that any object uploaded without an explicit encryption header is automatically encrypted with the specified customer-managed KMS key. This satisfies the requirement for automatic encryption of all new objects.

Exam trap

The trap here is that candidates often confuse lifecycle policies (which manage object transitions) with encryption enforcement, or they think CloudTrail can enforce encryption rather than just audit it.

Practice this question →

← PreviousPage 3 of 7 · 456 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Data Store Management questions.

Start 20-question session