Knowledge + Practice

AWS Certified Data Engineer Associate DEA-C01 (DEA-C01) — Questions 376–450

1786 questions total · 24pages · All types, answers revealed

Take a mock exam Exam hub

Page 6 of 24

376

Multi-Selecteasy

A data engineer is designing a serverless data ingestion pipeline that uses Amazon Kinesis Data Firehose to deliver data to Amazon S3. The data must be transformed using AWS Lambda before being written to S3. Which two steps are required to enable this transformation? (Select TWO.)

Select 2 answers

A.Set up an S3 event notification to trigger the Lambda function on object creation.

B.Configure a Lambda function as a data transformation source in the Firehose delivery stream.

C.Ensure the Lambda function returns the transformed data in the format required by Firehose.

D.Subscribe the Lambda function to the CloudWatch Logs log group for the Firehose stream.

E.Have the Lambda function write the transformed data directly to the S3 bucket.

AnswersB, C

This enables Firehose to invoke Lambda for transformation.

Why this answer

Option B is correct because Amazon Kinesis Data Firehose can be configured to invoke a Lambda function as a data transformation source. This allows Firehose to pass incoming records to the Lambda function, which processes and returns the transformed records before they are delivered to the S3 destination. Option C is correct because the Lambda function must return data in the specific format that Firehose expects, including a record ID, result status, and base64-encoded data, otherwise the transformation will fail.

Exam trap

The trap here is that candidates often confuse post-delivery transformations (using S3 event notifications) with in-stream transformations (using Firehose's built-in Lambda integration), leading them to select Option A instead of the correct Firehose-specific configuration.

Full explanation →

377

Multi-Selectmedium

A data engineer is designing a data lake on Amazon S3. The data lake must support both batch and streaming ingestion. Which TWO AWS services can ingest data directly into S3? (Choose TWO.)

Select 2 answers

A.Amazon RDS

B.AWS Glue

C.Amazon EMR

D.Amazon DynamoDB

E.Amazon Kinesis Data Firehose

AnswersB, E

AWS Glue can ingest batch data and write to S3.

Why this answer

AWS Glue is correct because it can ingest data directly into S3 via AWS Glue crawlers and ETL jobs, which read from various sources and write the processed data to S3. Amazon Kinesis Data Firehose is correct because it is a fully managed service that can capture, transform, and load streaming data directly into S3 without requiring custom code.

Exam trap

The trap here is that candidates often confuse services that can process data from S3 (like EMR) with services that can directly ingest data into S3, or they mistakenly think RDS or DynamoDB can natively write to S3 without additional services.

Full explanation →

378

MCQhard

A data engineer runs the command shown in the exhibit to check the bucket policy. A user from another AWS account is trying to download an object using HTTP (not HTTPS). What will happen?

A.The download will succeed because the principal is not specified

B.The download will fail with an access denied error

C.The download will succeed if the object is encrypted at rest

D.The download will succeed because the policy only denies write operations

AnswerB

The policy denies access when using HTTP.

Why this answer

Option B is correct because the bucket policy denies all actions when aws:SecureTransport is false (i.e., HTTP). Therefore, HTTP requests are denied. Option A is wrong because the policy denies HTTP requests.

Option C is wrong because the policy does not require encryption at rest. Option D is wrong because the policy explicitly denies HTTP.

Full explanation →

379

MCQhard

A company uses Amazon DynamoDB with on-demand capacity. They notice higher than expected costs due to a sudden spike in read traffic from a reporting job. The reporting job scans the entire table daily. What is the most cost-effective way to reduce costs while maintaining the same reporting output?

A.Enable DynamoDB Accelerator (DAX) for caching.

B.Use a Global Secondary Index (GSI) with a sort key that matches the reporting query pattern.

C.Set a TTL attribute to automatically expire old data.

D.Reduce the read capacity units (RCU) in the table.

AnswerB

A GSI allows efficient querying instead of scanning, reducing read costs.

Why this answer

Option B is correct because using a Global Secondary Index (GSI) with a sort key tailored to the reporting query pattern allows the reporting job to query only the relevant items instead of scanning the entire table. This reduces the read capacity units consumed per operation, directly lowering costs under on-demand capacity, which charges per RCU consumed. The reporting output remains identical because the GSI returns the same data filtered by the query pattern.

Exam trap

The trap here is that candidates may confuse DAX as a general cost-saver for all read patterns, but DAX only helps with repeated, cached reads, not with unique full-table scans that read different data each time.

How to eliminate wrong answers

Option A is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that reduces read latency and cost for repeated reads, but the reporting job scans the entire table daily, meaning each scan reads unique data that is not cached from previous runs, so DAX would not reduce costs. Option C is wrong because setting a TTL attribute automatically expires old data after a specified time, which reduces storage costs but does not affect the read cost of the daily scan; the reporting job still scans all remaining items. Option D is wrong because the table uses on-demand capacity, which does not have provisioned read capacity units (RCU) to reduce; on-demand capacity automatically scales and charges per RCU consumed, so reducing RCU is not applicable.

Full explanation →

380

MCQeasy

A data engineer notices that an Amazon RDS for PostgreSQL instance's CPU utilization is consistently above 90% during business hours. The database is used for reporting queries. Which action should be taken FIRST to improve performance?

A.Enable Multi-AZ deployment for automatic failover.

B.Enable Performance Insights and review slow queries.

C.Create a read replica to offload reporting queries.

D.Increase the instance size to a larger instance class.

AnswerB

Identifying and optimizing slow queries reduces CPU usage.

Why this answer

Option B is correct because the first step in diagnosing high CPU utilization on an RDS for PostgreSQL instance used for reporting queries is to identify the root cause. Enabling Performance Insights provides a detailed view of database load, wait events, and SQL query performance, allowing the data engineer to pinpoint slow or inefficient queries that are consuming CPU resources. Without this diagnostic data, any other action would be premature and could lead to unnecessary cost or complexity.

Exam trap

The trap here is that candidates often jump to scaling solutions (like increasing instance size or adding a read replica) without first diagnosing the root cause, but AWS emphasizes observability and optimization before capacity changes.

How to eliminate wrong answers

Option A is wrong because enabling Multi-AZ deployment improves availability and failover, not performance; it does not reduce CPU utilization or address query performance issues. Option C is wrong because creating a read replica offloads read traffic but does not fix the underlying inefficient queries that are causing high CPU on the source instance; the replica would also suffer from the same workload if queries are poorly optimized. Option D is wrong because increasing the instance size may temporarily mask the problem by providing more CPU capacity, but it does not resolve the root cause of inefficient queries and incurs higher costs without guaranteeing sustained performance improvement.

Full explanation →

381

MCQmedium

A data pipeline using AWS Glue ETL jobs is failing intermittently with the error 'Rate exceeded' when writing to an Amazon Redshift cluster. Which action is MOST effective to resolve this issue?

A.Increase the timeout of the Glue ETL job to allow more time for retries.

B.Disable workload management (WLM) concurrency scaling in Redshift.

C.Enable auto-tuning on the Redshift cluster and use concurrency scaling.

D.Change the output file format from Parquet to CSV to reduce write size.

AnswerC

Auto-tuning with concurrency scaling dynamically adds capacity to handle increased write requests.

Why this answer

Option A is correct because enabling Redshift auto-tuning with concurrency scaling can automatically handle write spikes. Option B is wrong because increasing Glue job timeout does not address rate limiting. Option C is wrong because disabling Redshift WLM concurrency scaling would exacerbate the issue.

Option D is wrong because using a different data format does not affect write rate limits.

Full explanation →

382

MCQmedium

Refer to the exhibit. A data engineer applied this bucket policy to an S3 bucket. What is the effect of this policy?

A.Allows only HTTPS requests to get objects

B.Blocks HTTP requests to get objects

C.Allows only HTTP requests to get objects

D.Blocks all access to the bucket

AnswerB

The Deny effect with condition aws:SecureTransport false blocks HTTP requests.

Why this answer

The policy denies s3:GetObject when the request is made over HTTP (not HTTPS). This enforces HTTPS for object retrievals. Option A is wrong because it allows HTTPS.

Option B is wrong because it allows HTTPS. Option D is wrong because the policy does not block all access; it only blocks insecure transport.

Full explanation →

383

MCQeasy

A company wants to grant read-only access to an S3 bucket for a data analyst. The analyst should be able to list objects and read object content. Which IAM policy effect and action combination is correct?

A.Effect: Allow, Actions: s3:GetObject, s3:DeleteObject

B.Effect: Allow, Actions: s3:ListAllMyBuckets, s3:GetObject

C.Effect: Allow, Actions: s3:PutObject, s3:GetObject

D.Effect: Allow, Actions: s3:ListBucket, s3:GetObject

AnswerD

Provides read-only access to list and read objects.

Why this answer

Option A is correct because s3:ListBucket allows listing objects, and s3:GetObject allows reading objects. Option B is incorrect because s3:PutObject is write access. Option C is incorrect because s3:DeleteObject is not needed.

Option D is incorrect because s3:ListAllMyBuckets is for listing all buckets, not bucket contents.

Full explanation →

384

MCQeasy

A data engineer needs to store semi-structured JSON data that is accessed infrequently but requires immediate retrieval when needed. The data must be durable and cost-effective. Which Amazon S3 storage class should be used?

A.S3 Standard-IA

B.S3 Glacier

C.S3 Standard

D.S3 One Zone-IA

AnswerA

Standard-IA is for infrequent access with immediate retrieval.

Why this answer

S3 Standard-IA is designed for infrequently accessed data that requires rapid access when needed, with lower storage cost than S3 Standard. Option B is correct. Option A: S3 Standard is for frequently accessed data, higher cost.

Option C: S3 One Zone-IA is less durable (single AZ). Option D: S3 Glacier has retrieval delays (minutes to hours).

Full explanation →

385

MCQeasy

A company is using AWS Lake Formation to manage permissions on a data lake. They want to grant a data scientist the ability to query tables in the 'analytics' database using Amazon Athena, but prevent them from accessing the underlying S3 data directly. What is the best way to achieve this?

A.Grant the data scientist an IAM policy with s3:GetObject on the S3 bucket.

B.Grant SELECT permission on the 'analytics' database tables in Lake Formation.

C.Create an IAM policy that allows Athena queries only.

D.Add the data scientist to a Lake Formation data lake location with read access.

AnswerB

Lake Formation fine-grained permissions allow querying via Athena without direct S3 access.

Why this answer

Option A is correct because Lake Formation can grant SELECT permission on tables, and with Lake Formation metadata filtering, the user can query via Athena without direct S3 access. Option B is incorrect because granting s3:GetObject on the entire bucket would allow direct access. Option C is incorrect because they need access to the database objects, not just the database.

Option D is incorrect because IAM policies for Athena do not restrict S3 access.

Full explanation →

386

MCQhard

A company uses a DynamoDB table with on-demand capacity for a gaming application. During a new game launch, the table experienced throttling errors. The engineer checks CloudWatch metrics and sees that the 'ConsumedWriteCapacityUnits' exceeded the 'ProvisionedWriteCapacityUnits' (on-demand uses the table's previous peak). The application is writing at 50,000 WCU but the table's peak was 30,000 WCU. What should the engineer do to resolve throttling?

A.Add a DynamoDB Accelerator (DAX) cluster in front of the table.

B.Increase the number of partitions by splitting the partition key.

C.Contact AWS Support to pre-warm the table for higher throughput.

D.Switch the table to provisioned capacity and set WCU to 50,000.

AnswerC

Pre-warming increases the table's initial throughput limit to handle spikes.

Why this answer

Option B is correct because on-demand capacity automatically scales based on traffic, but it has a warm-up limit. Pre-warming the table increases the initial throughput limit. Option A is wrong because on-demand does not have a provisioned setting.

Option C is wrong because increasing partition count does not directly increase throughput limit. Option D is wrong because DAX improves read performance, not write throughput.

Full explanation →

387

Multi-Selectmedium

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The primary instance fails, and automatic failover occurs. After failover, the data engineer notices that the new primary instance has a different DNS endpoint. Which TWO statements are true about this scenario? (Choose TWO.)

Select 2 answers

A.The standby instance is created in the same Availability Zone as the failed primary.

B.A manual DNS update is required to connect to the new primary.

C.The DNS CNAME record is updated to point to the new primary.

D.The endpoint changes to the standby instance's endpoint.

E.The applications can continue using the same database endpoint.

AnswersC, E

RDS updates the CNAME automatically.

Why this answer

Option C is correct because when Amazon RDS performs automatic failover in a Multi-AZ deployment, it updates the DNS CNAME record for the primary DB instance to point to the new primary (formerly the standby). This ensures that applications using the original endpoint are transparently redirected to the new primary without manual intervention.

Exam trap

The trap here is that candidates may think the endpoint changes to the standby's endpoint (Option D) or that a manual DNS update is needed (Option B), when in fact the original endpoint remains the same and RDS handles the DNS update automatically via CNAME.

Full explanation →

388

Multi-Selecthard

A company uses Amazon Kinesis Data Firehose to deliver data to an S3 bucket. The data contains personally identifiable information (PII) that must be redacted before storage. Which THREE actions can achieve this requirement? (Choose THREE.)

Select 3 answers

A.Use AWS Glue ETL to read from the S3 bucket and write redacted data to another S3 bucket.

B.Use Amazon Athena to query the data and redact PII on the fly.

C.Use Amazon Macie to discover and automatically redact PII before storage.

D.Use AWS Database Migration Service (AWS DMS) to replicate data and apply transformations.

E.Use an AWS Lambda function as a transformation in the Firehose delivery stream.

AnswersA, C, E

Glue can process and redact PII in a batch job after data is in S3.

Why this answer

Option A is correct because AWS Glue ETL can read data from the source S3 bucket, apply transformations (including redacting PII fields using custom scripts or built-in transforms), and write the cleaned data to a target S3 bucket. This decouples the redaction from the ingestion pipeline and allows for complex, schema-aware transformations using Apache Spark or Python.

Exam trap

The trap here is that candidates may assume Macie can automatically redact PII during ingestion, but Macie is a discovery and classification service, not a data transformation service; it cannot modify data in flight or in place without additional automation.

Full explanation →

389

MCQhard

Refer to the exhibit. An IAM policy is attached to an IAM role used by an application. The application needs to decrypt objects in an S3 bucket using a customer managed KMS key. What is the effect of this policy?

A.The application cannot perform any KMS operations.

B.The application can decrypt objects from any service.

C.The application can decrypt objects only when accessing them through S3.

D.The application can encrypt but not decrypt objects.

AnswerC

The Deny with condition allows decrypt only via S3 service.

Why this answer

The IAM policy grants the `kms:Decrypt` permission with a `kms:ViaService` condition key set to `s3.amazonaws.com`. This condition restricts the decryption operation to only when the request is made through the S3 service. Therefore, the application can decrypt objects only when accessing them through S3, not via direct KMS API calls or other services.

Exam trap

AWS often tests the `kms:ViaService` condition key to trap candidates who assume that granting `kms:Decrypt` alone allows decryption from any source, ignoring the service-specific restriction.

How to eliminate wrong answers

Option A is wrong because the policy explicitly allows `kms:Decrypt` under the condition, so the application can perform KMS decryption operations when invoked via S3. Option B is wrong because the `kms:ViaService` condition restricts decryption to S3 only, preventing decryption from any other service or direct KMS API calls. Option D is wrong because the policy grants `kms:Decrypt` permission, not `kms:Encrypt`, so the application can decrypt but not encrypt objects.

Full explanation →

390

MCQmedium

Your team uses Amazon Kinesis Data Analytics to process real-time streaming data from an Amazon Kinesis Data Stream. The application calculates windowed aggregations and writes results to an Amazon S3 bucket using a delivery stream. Recently, the application has been failing with a 'LimitExceededException' when writing to the delivery stream. You have checked the CloudWatch metrics and see that the IncomingBytes and IncomingRecords for the delivery stream are well below the provisioned limits. The delivery stream has a buffer size of 5 MB and a buffer interval of 60 seconds. The application generates about 500 records per second, each about 1 KB. What is the most likely cause and correct action?

A.Increase the number of shards in the Kinesis Data Stream to reduce the load on the application.

B.Modify the application to use PutRecordBatch with smaller batch sizes to stay within the 4 MB per-call limit.

C.Reduce the buffer interval of the delivery stream to 30 seconds to flush data more frequently.

D.Increase the buffer size of the delivery stream to 10 MB to accommodate larger writes.

AnswerB

The LimitExceededException on Firehose is often due to exceeding the 4 MB per PutRecordBatch call. Reducing batch size fixes it.

Why this answer

Option B is correct because Kinesis Data Analytics writes to Firehose through a PutRecord or PutRecordBatch call. Each call has a maximum payload size of 1 MB (for PutRecord) or 4 MB (for PutRecordBatch). If the application uses PutRecordBatch and the total payload exceeds 4 MB, it gets a LimitExceededException.

Increasing the buffer size or interval does not affect the per-call limit. Option A is wrong because the stream is not the source of the error. Option C is wrong because the buffer settings are not causing the per-call limit.

Option D is wrong because the data size is small.

Full explanation →

391

MCQeasy

A data engineer is using AWS Glue to run an ETL job that reads data from Amazon DynamoDB and writes to Amazon Redshift. The job fails with a 'ThroughputExceededException' error. What is the most likely cause?

A.The Glue job has a timeout setting that is too low

B.The Redshift cluster's concurrency scaling is insufficient

C.The DynamoDB table's read capacity is insufficient for the Glue job's read rate

D.The S3 bucket where Glue writes temporary data does not have proper permissions

AnswerC

Glue reads from DynamoDB and may exceed provisioned read capacity, causing throttling.

Why this answer

Option A is correct because DynamoDB throttles requests when read/write capacity is exceeded. The Glue job may be reading too fast for the table's provisioned capacity. Option B is wrong because Redshift concurrency scaling does not cause this error.

Option C is wrong because Glue job timeout would be a different error. Option D is wrong because S3 permissions would be a different error.

Full explanation →

392

MCQhard

A data pipeline uses AWS Glue ETL to process data from an S3 bucket and write results to a Redshift cluster. The job fails with a 'DiskFull' error on the Glue worker nodes. What is the best way to resolve this issue?

A.Increase the number of Glue DPUs or use G.1X worker type.

B.Decrease the number of partitions in the output.

C.Use a different file format like Parquet to reduce storage.

D.Increase the job timeout setting.

AnswerA

More DPUs or larger workers provide additional disk and memory.

Why this answer

Increasing the number of DPUs or using G.1X worker types provides more disk space. Option A (increase timeout) doesn't help; Option B (use G.2X) is better; Option D (compression) might reduce data size but not the root cause.

Full explanation →

393

MCQhard

A data engineer is troubleshooting a Kinesis Data Firehose delivery stream that is experiencing high error rates when writing to an S3 bucket. The error logs indicate 'AccessDenied' errors. The S3 bucket policy allows access from the Firehose service, but the errors persist. What is the most likely cause?

A.The S3 bucket has a lifecycle policy that is deleting objects too quickly

B.The IAM role assumed by Firehose does not have the s3:PutObject permission

C.The S3 bucket has default encryption enabled

D.The S3 bucket uses an AWS KMS key for encryption and Firehose does not have kms:Decrypt permission

AnswerB

Firehose requires the IAM role to have S3 write permissions.

Why this answer

Option C is correct because Firehose uses a service-linked IAM role to write to S3; if the role lacks proper permissions, AccessDenied occurs even if the bucket policy allows it. Option A (encryption) would cause different errors. Option B (KMS key) is needed only if SSE-KMS is used.

Option D (S3 event notifications) is unrelated.

Full explanation →

394

MCQeasy

A company uses Amazon RDS for MySQL with encryption at rest enabled. The security team requires that all database audit logs be stored in Amazon S3 for at least 7 years. Which AWS service should the data engineer use to collect and store the logs?

A.Amazon S3 with S3 Object Lock enabled for write-once-read-many (WORM) protection.

B.Amazon Kinesis Data Firehose to stream logs directly to Amazon S3.

C.Amazon CloudWatch Logs with a subscription filter to Amazon S3.

D.AWS CloudTrail to capture database queries and store in S3.

AnswerC

RDS audit logs can be sent to CloudWatch Logs, and then exported to S3.

Why this answer

Option A is correct because RDS for MySQL can publish audit logs to CloudWatch Logs, and a subscription filter can forward them to S3 for long-term storage. Option B is wrong because Kinesis Data Firehose is not directly integrated with RDS audit logs. Option C is wrong because S3 is for storage, not collection.

Option D is wrong because CloudTrail does not capture RDS audit logs.

Full explanation →

395

MCQhard

A company uses Amazon DynamoDB as a session store for a web application. During peak hours, the application experiences high latency and throttling on the DynamoDB table. The table has a read capacity of 5000 RCU and write capacity of 2000 WCU. The application reads and writes session data using the session ID as the partition key. What is the most cost-effective solution to reduce throttling?

A.Enable Auto Scaling on the table to automatically adjust capacity.

B.Increase the read capacity units (RCU) and write capacity units (WCU) to 10000 each.

C.Enable DynamoDB global tables to distribute read traffic.

D.Implement DynamoDB Accelerator (DAX) to cache frequent reads.

AnswerD

DAX reduces read load on the table, mitigating throttling cost-effectively.

Why this answer

Option A is correct because implementing DynamoDB Accelerator (DAX) caches reads, reducing read throttling cost-effectively. Option B is incorrect because increasing RCU and WCU would reduce throttling but at higher cost. Option C is incorrect because global tables add complexity and cost, not directly solve throttling.

Option D is incorrect because Auto Scaling adjusts capacity based on usage but may not be immediate; also it could increase costs if not tuned.

Full explanation →

396

MCQmedium

Refer to the exhibit. This KMS key policy is attached to a customer managed key. A data engineer finds that the DataEngineer role can encrypt but cannot decrypt data. What is the most likely cause?

A.The key policy does not include kms:Decrypt in the IAM policy section

B.The role does not have an IAM policy allowing kms:Decrypt

C.The key policy does not allow kms:Decrypt

D.The key policy does not allow kms:GenerateDataKey

AnswerB

The role needs an IAM policy that allows kms:Decrypt; the key policy alone is insufficient for IAM roles.

Why this answer

The key policy allows kms:Decrypt and kms:GenerateDataKey for the role. The role likely lacks an IAM policy that allows kms:Decrypt. KMS requires both key policy and IAM policy permissions for IAM roles (except when the key policy explicitly enables IAM).

Option A is wrong because the key policy does allow decrypt. Option B is wrong because the key policy allows it. Option D is wrong because the key policy does not need to enable IAM; by default, IAM policies are allowed.

Full explanation →

397

MCQmedium

A company uses Amazon Redshift for data warehousing. The security team requires that all data stored in Redshift be encrypted at rest using a customer-managed KMS key. How should the data engineer configure this?

A.Enable encryption using a KMS key when creating the Redshift cluster

B.Configure S3 SSE-KMS on the underlying S3 storage

C.Use the AWS KMS console to encrypt the Redshift cluster after creation

D.Set a cluster parameter group with encryption enabled

AnswerA

Encryption must be enabled at launch; you cannot add it later.

Why this answer

Redshift supports encryption at rest using KMS. You enable encryption when launching the cluster by choosing a KMS key. Option B is wrong because Redshift doesn't use S3 SSE-KMS for its own storage.

Option C is wrong because you cannot encrypt an existing cluster without restoring from snapshot. Option D is wrong because cluster parameter groups do not control encryption. Option A is correct.

Full explanation →

398

Multi-Selectmedium

A data engineer is troubleshooting an AWS Glue ETL job that fails with the error: 'An error occurred while calling o123.pyWriteDynamicFrame. Access Denied when writing to S3 bucket: my-bucket'. The job uses a Glue service role named 'GlueServiceRole'. Which TWO actions should the engineer take to resolve the issue? (Choose TWO.)

Select 2 answers

A.Disable S3 Block Public Access on the bucket.

B.Grant the GlueServiceRole permission to write to the AWS Glue Data Catalog.

C.Check if the S3 bucket policy denies access from the GlueServiceRole.

D.Verify that the IAM policy attached to GlueServiceRole includes s3:PutObject on the bucket.

E.Ensure the Glue job is in the same VPC as the S3 bucket.

AnswersC, D

Bucket policy may override IAM permissions.

Why this answer

Option C is correct because the error message indicates an access denied when writing to S3, which can be caused by a bucket policy that explicitly denies the Glue service role's access, even if the IAM policy allows it. Option D is correct because the IAM policy attached to GlueServiceRole must include the s3:PutObject permission on the specific bucket to allow the Glue job to write data.

Exam trap

The trap here is that candidates may confuse S3 access errors with network or VPC issues, but S3 is a global service and access is governed by IAM and bucket policies, not VPC placement.

Full explanation →

399

Multi-Selectmedium

A data engineer needs to store event data from IoT devices that arrives in bursts. The data is key-value and requires single-digit millisecond read and write latency. The engineer also needs to run complex analytical queries on the data for reporting. Which TWO services should be used together? (Choose TWO.)

Select 2 answers

A.Amazon DynamoDB

B.Amazon ElastiCache for Redis

C.Amazon Redshift

D.Amazon S3

E.Amazon RDS for MySQL

AnswersA, C

Provides low-latency access for key-value data.

Why this answer

Amazon DynamoDB is correct because it provides single-digit millisecond read and write latency at any scale, making it ideal for IoT event data arriving in bursts. Its key-value data model matches the requirement, and it can serve as the operational data store for fast ingestion while supporting complex analytical queries when integrated with Amazon Redshift.

Exam trap

The trap here is that candidates often choose ElastiCache for Redis because of its low latency, forgetting that it is not a durable data store for analytical queries, or they pick S3 thinking it can serve as a primary database, ignoring its lack of single-digit millisecond latency for key-value access.

Full explanation →

400

MCQmedium

A data engineering team needs to encrypt data at rest in an Amazon S3 bucket that stores sensitive customer information. The team must use an AWS Key Management Service (AWS KMS) customer managed key with automatic rotation enabled. Which configuration meets these requirements?

A.Use default encryption with SSE-KMS and specify the customer managed key ID.

B.Use default encryption with SSE-S3.

C.Use default encryption with SSE-C and provide a customer-provided key.

D.Use default encryption with SSE-KMS and leave the key ID empty to use the AWS managed key.

AnswerA

SSE-KMS with a customer managed key allows automatic rotation and customer control.

Why this answer

Option B is correct because it enables SSE-KMS with a customer managed key that has automatic key rotation. Option A is wrong because SSE-S3 uses AWS managed keys without customer control. Option C is wrong because SSE-C requires the customer to manage the key and does not support automatic rotation.

Option D is wrong because SSE-KMS with an AWS managed key does not allow customer control.

Full explanation →

401

Multi-Selectmedium

Which TWO options are valid ways to reduce storage costs for an Amazon S3 data lake that stores historical data rarely accessed after 30 days? (Choose TWO.)

Select 2 answers

A.Enable S3 Transfer Acceleration for all uploads.

B.Create a lifecycle policy to transition objects to S3 Standard-IA after 30 days.

C.Create a lifecycle policy to delete objects after 30 days.

D.Create a lifecycle policy to transition objects to S3 Glacier Deep Archive after 90 days.

E.Enable S3 Versioning to preserve all object versions.

AnswersB, D

Standard-IA reduces storage cost for infrequent access.

Why this answer

S3 Lifecycle policies can transition objects to S3 Standard-IA after 30 days (lower cost for infrequent access) and later to S3 Glacier Deep Archive for long-term archival. Options B and D are correct. Option A: deleting after 30 days would lose data.

Option C: enabling versioning increases storage costs. Option E: S3 Transfer Acceleration is for faster uploads, not cost reduction.

Full explanation →

402

MCQeasy

A data engineer is troubleshooting a failed AWS Glue ETL job that reads from Amazon S3 and writes to Amazon Redshift. The job fails with the error: 'ERROR: Cannot insert a duplicate key into unique index'. The Redshift table has a primary key on the 'id' column. The data in S3 contains multiple records with the same 'id'. The engineer needs to ensure that only the latest record for each 'id' is loaded into Redshift. The data has a 'timestamp' column. Which approach should the engineer take?

A.Use the 'dropDuplicates' transformation in the Glue ETL script, ordering by 'timestamp' descending to keep the latest record for each 'id'.

B.Set the write mode to 'overwrite' in the Glue job to replace the entire Redshift table.

C.Load data into a staging table in Redshift and then use a MERGE operation to insert only new records.

D.Disable primary key constraints on the Redshift table before loading.

AnswerA

This removes duplicate IDs while preserving the most recent record.

Why this answer

Option B is correct because using the AWS Glue 'dropDuplicates' transformation on the 'id' column, ordering by 'timestamp' descending, will remove duplicate 'id' values, keeping the latest record. Option A is wrong because disabling constraints does not prevent duplicates; it only defers the error. Option C is wrong because setting 'overwrite' mode replaces the entire table, not just duplicates.

Option D is wrong because staging tables and MERGE require additional steps and are not directly available in Glue without custom logic.

Full explanation →

403

MCQmedium

A data engineer applies the above bucket policy to an S3 bucket containing sensitive data. The goal is to allow only encrypted (HTTPS) requests. However, a user reports being able to access an object using an HTTP (non-HTTPS) request. What is the most likely reason?

A.The policy uses Allow instead of Deny

B.The resource ARN does not include the bucket itself

C.The condition key aws:SecureTransport is used with BoolIfExists instead of Bool

D.The principal is set to "*", which allows anonymous access

AnswerC

BoolIfExists allows access if the key is missing, which happens with HTTP.

Why this answer

Option A is correct. The policy uses "BoolIfExists", which evaluates to true if the key is not present (e.g., HTTP requests do not have the SecureTransport condition key). Option B is wrong because the condition is not on the Deny effect.

Option C is wrong because the principal is "*" which includes everyone. Option D is wrong because the resource is correct.

Full explanation →

404

Drag & Dropmedium

Arrange the steps to implement a data lake on Amazon S3 with AWS Lake Formation.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start by creating the S3 bucket. Then register it in Lake Formation, set up administrators, define the schema in Glue Catalog, and finally grant access to users.

Full explanation →

405

MCQmedium

A company uses Amazon RDS for MySQL to store transactional data. The database contains sensitive financial information. The company's security policy requires that all data at rest be encrypted using a customer-managed KMS key. The database was originally launched without encryption at rest. The security team now needs to enable encryption without significant downtime. What should they do?

A.Create a snapshot of the database, copy the snapshot with encryption enabled, and restore a new DB instance from the encrypted snapshot.

B.Enable encryption by modifying the DB instance's storage type to 'encrypted'.

C.Use the AWS DMS (Database Migration Service) to migrate data to a new encrypted RDS instance.

D.Modify the DB instance and enable encryption under the 'Storage' settings.

AnswerA

This is the standard procedure to enable encryption on an existing RDS instance.

Why this answer

Option B is correct because you cannot enable encryption on an existing RDS instance; you must create a snapshot, copy it with encryption, and restore a new encrypted instance. Option A is wrong because RDS does not support enabling encryption in place. Option C is wrong because modifying the DB instance does not allow enabling encryption.

Option D is wrong because you cannot change the storage encryption setting directly.

Full explanation →

406

Multi-Selecthard

A data engineer is designing an ETL pipeline that uses AWS Glue to process data from an Amazon DynamoDB table and write results to an S3 bucket in Parquet format. The pipeline must handle schema changes in the source DynamoDB table. Which THREE steps should the engineer take to ensure the pipeline handles schema evolution? (Choose THREE.)

Select 3 answers

A.Use Glue's 'recast' transformation to handle type changes.

B.Set the Glue crawler to update the table's schema in the Data Catalog.

C.Convert the Parquet output to CSV to avoid schema constraints.

D.Partition the data by date and delete old partitions.

E.Use Spark's 'mergeSchema' option when writing to S3.

AnswersA, B, E

recast can change data types to match the target schema.

Why this answer

Options A, B, and D are correct. Updating the Glue Data Catalog allows the crawler to update schema. 'mergeSchema' is a Spark option that merges schemas. 'recast' option in Glue helps handle type changes. Option C is wrong because deleting partitions is not related to schema evolution.

Option E is wrong because converting to CSV is not a schema evolution strategy.

Full explanation →

407

MCQeasy

A company stores sensitive data in Amazon S3. To meet compliance requirements, they need to ensure that any data older than 1 year is automatically moved to a lower-cost storage class. Which S3 feature should they use?

A.S3 Replication

B.S3 Lifecycle policies

C.S3 Glacier

D.S3 Intelligent-Tiering

AnswerB

Lifecycle policies can transition objects to lower-cost storage classes based on age.

Why this answer

Option B is correct because S3 Lifecycle policies automate transitioning objects between storage classes. Option A is wrong because S3 Glacier is a storage class, not a feature to automate transitions. Option C is wrong because S3 Intelligent-Tiering automatically optimizes costs but does not enforce a specific age-based transition.

Option D is wrong because S3 Replication is for copying objects across buckets.

Full explanation →

408

MCQmedium

A data engineer is configuring an S3 bucket to host sensitive data. The security policy requires that all objects be encrypted with a key that is generated and managed by the customer, and that the key be stored in AWS KMS. Which encryption option should be used?

A.Server-Side Encryption with S3-Managed Keys (SSE-S3)

B.Client-Side Encryption

C.Server-Side Encryption with Customer-Provided Keys (SSE-C)

D.Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)

AnswerD

SSE-KMS allows using customer-managed keys in KMS.

Why this answer

Option C is correct because SSE-KMS allows customer-managed keys in KMS. Option A is wrong because SSE-S3 uses AWS-managed keys. Option B is wrong because SSE-C uses customer-provided keys, not stored in KMS.

Option D is wrong because client-side encryption is not managed by S3.

Full explanation →

409

Multi-Selecteasy

A data engineer is setting up a data pipeline to ingest streaming data from an IoT fleet. The data must be processed in near real-time and stored in Amazon S3 for analytics. Which THREE AWS services should the engineer consider using?

Select 3 answers

A.Amazon EMR

B.Amazon Kinesis Data Firehose

C.AWS Lambda

D.AWS Glue

E.Amazon Kinesis Data Streams

AnswersB, C, E

Delivers streaming data to S3.

Why this answer

Option A is correct because Kinesis Data Streams ingests streaming data. Option C is correct because Kinesis Data Firehose can deliver data to S3. Option D is correct because Lambda can process records in near real-time.

Option B is incorrect because Glue is batch-oriented. Option E is incorrect because EMR is for big data processing, not streaming ingestion.

Full explanation →

410

MCQmedium

A data engineer is setting up a data pipeline that ingests streaming data from Amazon Kinesis Data Streams into an S3 data lake using Amazon Kinesis Data Firehose. The data contains personally identifiable information (PII). The security team requires that all data be encrypted at rest in S3 using an AWS KMS customer managed key (CMK) that is specific to the application. Additionally, the data must be encrypted in transit between all services. The engineer creates the KMS key and configures Firehose to use server-side encryption with the key for the S3 destination. However, Firehose delivery fails with an error indicating that the KMS key is not accessible. What is the most likely cause?

A.The KMS key policy does not grant the firehose.amazonaws.com service principal the required permissions.

B.The Kinesis data stream is not encrypted at rest.

C.The Firehose delivery stream is not in the same region as the KMS key.

D.The S3 bucket policy does not grant the Firehose delivery stream access to write objects.

AnswerA

Firehose must be allowed to use the key via the key policy.

Why this answer

Kinesis Data Firehose needs permission to use the KMS key. The key policy must grant the Firehose service principal (firehose.amazonaws.com) permission to call kms:GenerateDataKey and kms:Decrypt. Without this, Firehose cannot encrypt the data.

Full explanation →

411

MCQmedium

A data engineer is monitoring a Redshift cluster that is experiencing slow query performance. The cluster has 4 dc2.large nodes. The engineer notices that disk space usage is at 85% across all nodes. Which action would MOST likely improve query performance?

A.Change the table design to use DISTKEY and SORTKEY.

B.Enable compression on all columns.

C.Increase the number of nodes to 8.

D.Run the VACUUM command to reclaim space.

AnswerC

Adding nodes increases disk capacity and I/O throughput, reducing disk pressure and improving query performance.

Why this answer

Option D is correct because adding more nodes distributes data and improves I/O parallelism. Option A is wrong because vacuum reclaims space but does not help if disk usage is high due to data volume. Option B is wrong because DISTKEY and SORTKEY changes are design-time decisions.

Option C is wrong because compression is already applied at load time.

Full explanation →

412

Multi-Selecteasy

A data engineer needs to monitor and log changes to IAM policies in an AWS account. Which TWO AWS services can be used together to achieve this?

Select 2 answers

A.Amazon GuardDuty

B.AWS Config

C.VPC Flow Logs

D.AWS CloudTrail

E.Amazon CloudWatch Logs

AnswersD, E

CloudTrail records all IAM API calls.

Why this answer

Options A and B are correct. AWS CloudTrail logs IAM API calls, and Amazon CloudWatch Logs can store and monitor those logs. Option C is incorrect because AWS Config tracks resource configuration but not API calls directly.

Option D is incorrect because Amazon GuardDuty is a threat detection service. Option E is incorrect because VPC Flow Logs capture network traffic.

Full explanation →

413

MCQmedium

A data engineer uses Amazon EMR to run a Spark job that reads from S3 and writes to HDFS on the cluster. The job fails with an 'OutOfMemoryError: Java heap space' error in the executors. Which parameter adjustment should be made to resolve this?

A.Increase spark.default.parallelism

B.Increase spark.sql.shuffle.partitions

C.Increase spark.executor.memory

D.Increase spark.driver.memory

AnswerC

This directly increases the heap size available to each executor.

Why this answer

Option A is correct because increasing spark.executor.memory allocates more heap space to executors. Option B is wrong because spark.driver.memory affects the driver, not executors. Option C is wrong because spark.sql.shuffle.partitions affects shuffle behavior, not memory.

Option D is wrong because spark.default.parallelism controls task parallelism, not memory.

Full explanation →

414

Multi-Selecteasy

Which TWO AWS services can be used to transform data in transit during ingestion? (Choose 2.)

Select 2 answers

A.Amazon S3 Transfer Acceleration

B.Amazon Kinesis Data Firehose with Lambda transformation

C.AWS Glue ETL

D.Amazon Athena

E.AWS Data Pipeline

AnswersB, C

Firehose can call Lambda to transform records.

Why this answer

AWS Glue ETL can transform data before writing to target. Kinesis Data Firehose can invoke Lambda for transformation. Both transform data in transit.

Full explanation →

415

MCQhard

A company uses AWS DMS to migrate a 2 TB Oracle database to Amazon RDS for PostgreSQL. The migration completes successfully, but data validation shows some tables have missing rows. The task is configured for ongoing replication using change data capture (CDC). What is the MOST likely cause of the missing rows?

A.Source database archive log retention period too short

B.Large objects (LOBs) not supported by the target

C.Source tables missing primary keys

D.Insufficient storage on the DMS replication instance

AnswerC

Without primary keys, DMS cannot track changes for CDC, leading to missing rows.

Why this answer

Option C is correct because if a table lacks a primary key, DMS cannot uniquely identify rows for CDC, leading to missed changes. Option A is wrong because the endpoint connection is valid (migration completed). Option B is wrong because CDC captures changes from redo logs, not the source database directly.

Option D is wrong because DMS supports large objects with proper configuration.

Full explanation →

416

MCQeasy

A company uses AWS Database Migration Service (DMS) to migrate an on-premises PostgreSQL database to Amazon RDS for PostgreSQL. The migration is ongoing and uses change data capture (CDC). The engineer notices that the target database is falling behind the source by several hours. What is the MOST likely cause?

A.The target table has disabled parallel apply.

B.A large full load is being transferred, which delays the start of CDC.

C.The replication instance is under-provisioned and needs to be larger.

D.Change data capture has been disabled on the source database.

AnswerB

Full load must complete before CDC begins, causing a backlog.

Why this answer

Option B is correct because a large full load can delay the start of CDC, causing the target to fall behind. Option A is incorrect because increasing the replication instance size typically improves performance. Option C is wrong because CDC requires an active transaction log; disabling it would stop replication.

Option D is wrong because parallel apply can actually speed up replication.

Full explanation →

417

MCQmedium

A company uses Amazon EMR to run Spark jobs on data stored in S3. After upgrading the EMR cluster to a new release, one of the Spark jobs fails with 'OutOfMemoryError' in the executor. Which configuration change is MOST likely to resolve this issue?

A.Increase the number of core nodes in the EMR cluster.

B.Decrease spark.sql.shuffle.partitions to reduce overhead.

C.Increase spark.driver.memory in the Spark configuration.

D.Increase spark.executor.memory to allocate more memory per executor.

AnswerD

More memory per executor prevents OutOfMemoryError.

Why this answer

Option D is correct because increasing spark.executor.memory gives more memory per executor. Option A is wrong because increasing driver memory helps the driver, not executors. Option B is wrong because the number of instances doesn't directly fix executor memory.

Option C is wrong because reducing partitions may cause data skew and more memory pressure.

Full explanation →

418

MCQhard

A company stores IoT sensor data in an Amazon S3 bucket. The data is ingested every minute and each object is about 10 KB. The data must be stored for at least 7 years for compliance. Which lifecycle policy configuration minimizes storage costs?

A.Transition to S3 One Zone-IA after 30 days, then to S3 Glacier Deep Archive after 365 days, and expire after 2555 days.

B.Transition to S3 Glacier Flexible Retrieval after 90 days and expire after 2555 days.

C.Transition to S3 Glacier Deep Archive after 30 days and expire after 2555 days.

D.Transition to S3 Standard-IA after 30 days, then to S3 Glacier Deep Archive after 365 days, and expire after 2555 days.

AnswerC

This minimizes cost by moving to the cheapest storage class early and retaining for 7 years.

Why this answer

Option D is correct because transitioning to S3 Glacier Deep Archive after 30 days and then to S3 Glacier Deep Archive is redundant; actually the correct strategy is to transition to S3 Glacier Deep Archive after 30 days and then expire after 7 years. But option D says transition to S3 Glacier Deep Archive after 30 days and expire after 2555 days (7 years). That is correct and cost-effective.

Option A is incorrect because S3 Standard for 1 year then Glacier is more expensive. Option B is incorrect because S3 Standard-IA for 90 days then Glacier Flexible Retrieval is not optimal. Option C is incorrect because S3 One Zone-IA is not durable enough for compliance.

Full explanation →

419

MCQmedium

A company uses AWS KMS to encrypt data at rest in Amazon S3. The security team requires that all encryption keys be automatically rotated every year. Which key type should be used to meet this requirement without manual intervention?

A.Use a customer managed key with automatic rotation enabled.

B.Use an imported key material because it supports automatic rotation.

C.Use a KMS key generated by S3 on each object upload.

D.Use an AWS managed key (aws/s3).

AnswerD

AWS managed keys are automatically rotated every year without any manual intervention.

Why this answer

AWS managed keys (AWS_KMS) are automatically rotated annually. Customer managed keys (CMK) support automatic rotation but require manual enablement; however, the question specifies 'without manual intervention'. AWS managed keys are rotated automatically every year.

Option A is wrong because customer managed keys require manual enablement of rotation. Option B is wrong because imported key material cannot be rotated. Option C is correct as described.

Option D is wrong because KMS does not generate keys on demand for S3; S3 uses a bucket key or CMK.

Full explanation →

420

MCQmedium

A data engineer needs to migrate an on-premises Apache Hadoop cluster to AWS. The cluster stores data in HDFS and runs MapReduce jobs. The company wants to minimize operational overhead and leverage serverless technologies where possible. Which AWS service should the data engineer use to replace HDFS storage?

A.Amazon EBS

B.Amazon EMR

C.Amazon S3

D.Amazon Redshift

AnswerC

S3 is the recommended storage for Hadoop on AWS, replacing HDFS with durable object storage.

Why this answer

Amazon S3 is the recommended storage layer for Hadoop workloads on AWS, providing durable, scalable object storage with no operational overhead. Option A is wrong because EMR is a compute service, not storage. Option B is wrong because EBS is block storage attached to EC2, not suitable as a standalone distributed storage for Hadoop.

Option D is wrong because Redshift is a data warehouse, not a replacement for HDFS.

Full explanation →

421

MCQeasy

A data engineer has set up an AWS Lambda function that processes files uploaded to an S3 bucket. The function is triggered by S3 event notifications. However, the function is not being invoked when a file is uploaded. The engineer checks the Lambda function's CloudWatch Logs and finds no execution logs. What should the engineer check FIRST?

A.Check the Lambda function's code for errors.

B.Verify that the Lambda function's IAM role has permissions to read from S3.

C.Verify that the S3 bucket has an event notification configured for the Lambda function.

D.Check if the Lambda function is attached to a VPC.

AnswerC

Without event notification, S3 will not invoke the function.

Why this answer

Option A is correct because the S3 bucket must have an event notification configured to trigger the Lambda function. Option B is wrong because function code errors would appear in logs after invocation. Option C is wrong because the IAM role affects execution, not invocation trigger.

Option D is wrong because VPC configuration affects network access, not whether the function is triggered.

Full explanation →

422

Multi-Selectmedium

Which TWO AWS services can be used to ingest streaming data into Amazon S3 with minimal code? (Choose two.)

Select 2 answers

A.AWS Lambda

B.Amazon Kinesis Data Firehose

C.Amazon Managed Streaming for Apache Kafka (MSK) with S3 sink connector

D.AWS Database Migration Service (DMS)

E.AWS DataSync

AnswersB, C

Firehose is serverless and delivers streaming data to S3 without code.

Why this answer

Option A (Kinesis Data Firehose) and Option D (Amazon MSK with S3 connector) are correct. Option B (Lambda) requires code. Option C (DMS) is for databases.

Option E (DataSync) is for batch file transfers.

Full explanation →

423

Multi-Selectmedium

A company is using Amazon Kinesis Data Streams to ingest clickstream data from a website. The data is consumed by an AWS Lambda function that enriches records and writes to Amazon S3. The Lambda function is experiencing high error rates due to records exceeding the 256 KB payload limit. Which TWO actions should the team take to resolve this issue?

Select 2 answers

A.Increase the Lambda function timeout.

B.Enable compression on the producer side before sending records to Kinesis.

C.Use the Kinesis Producer Library (KPL) to aggregate multiple small records into a single larger record.

D.Switch from Kinesis Data Streams to Kinesis Data Firehose.

E.Increase the number of shards in the Kinesis stream.

AnswersB, C

Compression reduces record size below the 256 KB limit.

Why this answer

The correct answers are A and C. Enabling compression reduces record size. Using Kinesis Aggregation Library (KPL) aggregates multiple records into a single larger record while staying within limits.

Option B (increasing shards) does not reduce record size. Option D (using Kinesis Firehose) changes architecture but does not solve the payload limit. Option E (increasing Lambda timeout) does not address size.

Full explanation →

424

MCQmedium

A data engineer needs to transform CSV files arriving in an S3 bucket into Parquet format and store them in another S3 bucket. The transformation is simple and on-demand, triggered by data arrival. Which solution is the MOST cost-effective and requires the least operational overhead?

A.Use Amazon EMR with Spark streaming

B.Use Amazon Athena to create a new table with Parquet format

C.Use AWS Glue ETL jobs scheduled to run every hour

D.Use S3 Events to trigger an AWS Lambda function that transforms the data

AnswerD

Lambda is event-driven, cost-effective, and serverless.

Why this answer

Option C is correct because S3 Events can trigger a Lambda function to perform the transformation when a new object is created. Option A is wrong because Glue jobs have startup time and cost more. Option B is wrong because EMR requires cluster management.

Option D is wrong because Athena is for querying, not transforming.

Full explanation →

425

MCQhard

A company uses Amazon Kinesis Data Firehose to deliver log data to Amazon S3. The data is transformed by a Lambda function that adds a timestamp field. Recently, the Firehose delivery stream has been failing with 'Lambda invocation failed' errors. The Lambda function's CloudWatch Logs show that the function is timing out. What is the MOST likely cause?

A.The Lambda function lacks permission to write to CloudWatch Logs.

B.The Firehose buffer size is too large, causing the Lambda function to receive too many records.

C.The Lambda function timeout is set to 1 minute, which is adequate.

D.The Lambda function is running out of memory.

AnswerB

Large buffer size leads to large batches that exceed Lambda timeout.

Why this answer

Option D is correct because Firehose sends batches of records to Lambda; if the batch size is too large, the function may exceed its timeout. Option A is wrong because the Lambda function is being invoked, so permissions are fine. Option B is wrong because the error is about Lambda invocation, not memory.

Option C is wrong because a 1-minute timeout is typical and may be insufficient for large batches.

Full explanation →

426

MCQmedium

A company uses Amazon Redshift for its data warehouse. A data engineer notices that queries are running slowly and the system's disk space is nearly full. The engineer runs the STV_PARTITIONS view and sees that many slices have high 'tossed' counts. What does this indicate, and what should the engineer do?

A.The tossed rows are permanent and cannot be reclaimed; the engineer should perform a deep copy to a new table.

B.The tossed rows indicate that the sort key is not optimal; redefining the sort key will reduce tossed rows.

C.The tossed rows are due to data skew; redistribute the table on a different distribution key.

D.The tossed rows are deleted rows that need to be reclaimed by running VACUUM.

AnswerD

VACUUM removes deleted rows and reclaims disk space, improving query performance.

Why this answer

Option A is correct because 'tossed' rows indicate data that was deleted or updated and is waiting for VACUUM. A high tossed count means wasted space. Running VACUUM reclaims that space.

Option B is wrong because 'tossed' does not indicate sort key issues. Option C is wrong because 'tossed' is not about distribution. Option D is wrong because deep copy is a more drastic alternative, but VACUUM is the standard action.

Full explanation →

427

MCQmedium

A data engineering team is using Amazon DynamoDB to store user session data for a web application. The application experiences sudden spikes in traffic, causing throttling on the DynamoDB table. The team wants to minimize throttling without over-provisioning read/write capacity. Which solution should the team implement?

A.Enable DynamoDB Time to Live (TTL) to automatically delete expired items.

B.Disable auto scaling and manually set a high provisioned capacity.

C.Use Amazon RDS read replicas to offload read traffic.

D.Enable DynamoDB Accelerator (DAX) caching layer.

AnswerD

DAX caches frequently read items, reducing read capacity consumption and throttling.

Why this answer

Option D is correct because DynamoDB Accelerator (DAX) provides a fully managed in-memory cache that reduces read load on the table, thus minimizing throttling for read-heavy workloads. Option A is incorrect because disabling auto scaling would worsen throttling. Option B is incorrect because read replicas are for RDS, not DynamoDB.

Option C is incorrect because adding a TTL does not reduce throttling; it only expires old data.

Full explanation →

428

Drag & Dropmedium

Order the steps to troubleshoot a failed AWS Glue job that reads from JDBC and writes to S3.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start with logs to identify errors, then check connectivity, IAM permissions, test connection, and review script.

Full explanation →

429

MCQhard

A financial services company ingests real-time stock trade data from multiple exchanges into Amazon Kinesis Data Streams. Each trade record is a JSON object containing fields: trade_id, symbol, price, quantity, and timestamp. The data is consumed by an AWS Lambda function that performs data validation and enrichment, then writes the processed records to an Amazon DynamoDB table for low-latency querying. Recently, the Lambda function has been timing out and failing to process all records. The Lambda function is configured with a 5-second timeout and 128 MB memory. The average record size is 2 KB, and the stream receives about 1000 records per second. The Lambda function's concurrency limit is 1000. Which set of actions should the data engineer take to resolve the issue without losing data?

A.Increase the Lambda function timeout to 60 seconds and memory to 1024 MB. Set the batch size to 100 records and enable parallelization factor of 10.

B.Increase the number of shards in the Kinesis data stream to 20 and keep the Lambda configuration unchanged.

C.Replace the Lambda function with a Kinesis Data Firehose delivery stream that writes directly to DynamoDB using a Lambda transformation.

D.Increase the Lambda function timeout to 60 seconds and memory to 1024 MB. Set the batch size to 100 records.

AnswerA

This combination increases processing capacity and prevents timeouts.

Why this answer

Option A is correct because increasing the Lambda timeout and memory addresses the processing bottleneck, while setting the batch size to 100 and enabling a parallelization factor of 10 allows each shard to process up to 10 concurrent batches, dramatically increasing throughput to handle 1000 records/sec (each shard can process 10 batches of 100 records concurrently, yielding 1000 records/sec per shard if the stream has at least 1 shard). This combination ensures no data loss by keeping up with the ingestion rate without exceeding the Lambda concurrency limit of 1000.

Exam trap

The trap here is that candidates often overlook the parallelization factor setting, assuming that increasing batch size and Lambda resources alone will suffice, but without parallelization, each shard can only process one batch at a time, creating a throughput bottleneck that leads to data loss.

How to eliminate wrong answers

Option B is wrong because simply increasing shards to 20 does not resolve the Lambda timeout issue; the function still has only 5 seconds and 128 MB, so it will continue to fail even with more shards. Option C is wrong because Kinesis Data Firehose cannot write directly to DynamoDB; it only supports destinations like S3, Redshift, Elasticsearch, and Splunk, and using a Lambda transformation would still require sufficient timeout and memory. Option D is wrong because increasing timeout and memory alone without enabling parallelization factor means each shard can only process one batch at a time, which at 100 records per batch would only handle 100 records per second per shard, insufficient for the 1000 records/sec load.

Full explanation →

430

MCQhard

A company has an S3 bucket with versioning enabled and a bucket policy that denies access if the request does not include encryption. A data engineer notices that some objects are not encrypted. What is the most likely cause?

A.The bucket policy does not evaluate requests from the same account.

B.The policy only applies to new uploads; existing objects remain unencrypted.

C.Default encryption was not enabled at the bucket level.

D.Versioning was enabled after the objects were uploaded.

AnswerB

Bucket policies do not retroactively encrypt existing objects.

Why this answer

Option D is correct because bucket policies with deny for unencrypted requests only apply to new uploads, not existing objects. Option A is wrong because bucket policy evaluates all requests. Option B is wrong because versioning does not affect encryption.

Option C is wrong because default encryption only applies to new objects.

Full explanation →

431

MCQeasy

A data engineer notices that a nightly AWS Glue ETL job has been failing for the past three days with the error 'Unable to locate credentials'. The job uses an IAM role for execution. What is the most likely cause of this error?

A.The IAM role does not have an access key attached.

B.The S3 bucket name in the job parameters is misspelled.

C.The IAM role's trust policy does not include glue.amazonaws.com as a trusted entity.

D.The JDBC connection string contains an incorrect password.

AnswerC

Without the trust policy, Glue cannot assume the role and gets 'Unable to locate credentials'.

Why this answer

The error 'Unable to locate credentials' indicates that the AWS Glue job cannot obtain AWS credentials to authenticate API calls. Since the job uses an IAM role for execution, the most likely cause is that the trust policy of that IAM role does not include 'glue.amazonaws.com' as a trusted entity. Without this trust relationship, AWS Glue cannot assume the role and thus has no credentials to sign requests.

Exam trap

AWS often tests the distinction between IAM role trust policies (who can assume the role) and IAM role permission policies (what actions the role can perform), and candidates mistakenly focus on permission policies when the error is about credential acquisition.

How to eliminate wrong answers

Option A is wrong because IAM roles do not use access keys; they use temporary security credentials obtained via the AWS Security Token Service (STS). Option B is wrong because a misspelled S3 bucket name would cause a 'NoSuchBucket' or 'Access Denied' error, not a credentials-related error. Option D is wrong because an incorrect JDBC password would result in a connection failure or authentication error from the database, not an 'Unable to locate credentials' error from AWS.

Full explanation →

432

Multi-Selectmedium

A data engineering team uses AWS Glue to extract, transform, and load (ETL) data from Amazon RDS for MySQL to Amazon S3. The job runs daily and processes incremental data. The team notices that the job is taking longer than expected. Which TWO actions can improve the job performance? (Choose two.)

Select 2 answers

A.Change the worker type to Standard (single node).

B.Use pushdown predicates to filter data at the source.

C.Add more transformations to the ETL script to clean data.

D.Increase the number of DPUs for the Glue job.

E.Disable compression on the output data to reduce CPU usage.

AnswersB, D

Reduces the data scanned and transferred from RDS.

Why this answer

B is correct because increasing the number of DPUs (Data Processing Units) allocates more resources to the Glue job, speeding it up. D is correct because using pushdown predicates filters data at the source, reducing the amount of data transferred. A is wrong because writing to S3 with no compression increases I/O.

C is wrong because using a single node (Standard) reduces parallelism. E is wrong because adding more transformations increases processing time.

Full explanation →

433

MCQhard

An Amazon RDS for PostgreSQL instance is experiencing high CPU utilization and slow query performance. The data engineer suspects that a specific query is causing the problem. The engineer wants to identify the query and analyze its execution plan. Which steps should the engineer take?

A.Enable CloudWatch Logs for the RDS instance and search for slow query logs.

B.Enable the performance_schema in the PostgreSQL parameter group and query the performance_schema.events_statements_summary_by_digest table.

C.Enable Enhanced Monitoring and analyze the CPU metrics.

D.Use RDS Performance Insights to identify the top queries.

AnswerB

This provides detailed query statistics and can help identify problematic queries and their execution plans.

Why this answer

Option D is correct because enabling performance_schema and querying performance_schema.events_statements_summary_by_digest helps identify high-load queries. Option A is incorrect because Enhanced Monitoring provides OS-level metrics, not query details. Option B is incorrect because CloudWatch Logs captures database logs, but not real-time query performance.

Option C is incorrect because RDS Performance Insights provides query performance data, but the specific query and plan are best obtained via performance_schema.

Full explanation →

434

MCQmedium

A data engineer is configuring Amazon S3 Lifecycle policies to transition objects between storage classes. The data is accessed frequently for the first 30 days, then rarely for the next 90 days, after which it must be archived. The engineer wants to minimize costs while ensuring immediate retrieval for the first 30 days. Which lifecycle policy should the engineer implement?

A.Transition to Glacier Flexible Retrieval after 30 days, then delete after 120 days

B.Transition to One Zone-IA after 30 days, then to Glacier Deep Archive after 120 days

C.Transition to Glacier Deep Archive after 30 days, then delete after 120 days

D.Transition to Standard-IA after 30 days, then to Glacier Deep Archive after 120 days

AnswerD

Standard-IA is cost-effective for rarely accessed data; Glacier Deep Archive is cheapest for archiving.

Why this answer

Option D is correct because it transitions objects from S3 Standard (immediate retrieval, frequent access) to S3 Standard-IA (lower cost for infrequent access, immediate retrieval) after 30 days, then to S3 Glacier Deep Archive (lowest-cost archival storage) after 120 days. This matches the access pattern: frequent for 30 days, rare for 90 days, then archived, while minimizing cost and maintaining immediate retrieval for the first 30 days.

Exam trap

The trap here is that candidates often choose Glacier Deep Archive too early (e.g., after 30 days) to minimize cost, forgetting that the data must be immediately retrievable for the first 30 days and rarely accessed but still retrievable for the next 90 days, which requires a storage class with immediate retrieval (Standard-IA) before archiving.

How to eliminate wrong answers

Option A is wrong because transitioning to Glacier Flexible Retrieval after 30 days would incur retrieval delays (minutes to hours) for data that is still accessed rarely but may need immediate retrieval within the next 90 days, and it does not archive after 120 days (it deletes). Option B is wrong because transitioning to One Zone-IA after 30 days does not provide the durability or availability of Standard-IA for rarely accessed data that may still need immediate retrieval, and it is not cost-optimal for data that is not accessed frequently enough to justify the higher cost of One Zone-IA. Option C is wrong because transitioning to Glacier Deep Archive after 30 days would make retrieval impossible for the next 90 days (retrieval time is 12-48 hours), violating the requirement for immediate retrieval during the first 30 days and failing to minimize costs for the rare-access period.

Full explanation →

435

Multi-Selectmedium

A data engineer is designing a data ingestion pipeline for IoT sensor data. The sensors send JSON messages every second, and the data must be stored in Amazon S3 in near real-time (within 5 minutes). The engineer also needs to transform the data by adding a timestamp and filtering out malformed records. Which THREE services should be used together?

Select 2 answers

A.AWS Glue

B.Amazon Athena

C.Amazon Simple Queue Service (SQS)

D.AWS IoT Core

E.Amazon Kinesis Data Firehose

AnswersD, E

IoT Core can receive sensor messages and route to Firehose.

Why this answer

IoT Core can ingest sensor data, Kinesis Data Firehose can buffer and write to S3, and Lambda can transform records within Firehose. Option A is wrong because SQS is not needed. Option D is wrong because Glue is for batch ETL, not real-time.

Option E is wrong because Athena is for querying.

Full explanation →

436

Multi-Selectmedium

A data engineer is designing a data pipeline that ingests streaming data from IoT devices into Amazon S3 using Amazon Kinesis Data Firehose. The data must be transformed from JSON to Parquet format before storage. Which TWO actions should the data engineer take to achieve this?

Select 2 answers

A.Enable Firehose's built-in Parquet conversion without any additional configuration.

B.Use Amazon Kinesis Data Analytics to convert the data format.

C.Configure Firehose to convert the data to Apache Avro format.

D.Create a Glue Data Catalog table defining the schema and configure Firehose to use the table for Parquet conversion.

E.Create an AWS Lambda function to transform the data to Parquet and use it as a Firehose transformation.

AnswersD, E

Firehose can use the schema from Glue Data Catalog to convert to Parquet.

Why this answer

Kinesis Data Firehose can convert JSON to Parquet using a schema from a Glue Data Catalog table. Option C is correct because Firehose can use an AWS Lambda function for transformation. Option E is correct because Firehose can directly convert to Parquet if a schema is provided via Glue Data Catalog.

Option A is wrong because Firehose does not support direct conversion to Avro without a schema. Option B is wrong because Kinesis Data Analytics is for real-time analytics, not format conversion. Option D is wrong because Firehose cannot directly convert to Parquet without a schema; it needs Glue Data Catalog.

Full explanation →

437

MCQhard

A company runs a data lake on Amazon S3 with AWS Glue and Amazon Athena. The security team recently ran a report using Amazon Macie and found that multiple S3 objects containing PII are publicly accessible. The data engineer is tasked with remediating this issue immediately. The S3 bucket is configured with a bucket policy that grants public read access to all objects. The data engineer needs to ensure that no objects are publicly accessible while maintaining the ability for authorized IAM users and roles to access the data via Athena. The bucket must also remain accessible to the Glue crawler. What is the MOST effective course of action?

A.Use Amazon Macie to automatically remediate the public access by updating the object ACLs.

B.Remove the bucket policy granting public access and attach an IAM policy to the Glue and Athena roles to allow access to the bucket.

C.Set the bucket ACL to private and add a bucket policy that allows access to the Glue crawler and Athena.

D.Enable S3 Block Public Access on the bucket and use a bucket policy to allow access from the Glue and Athena service principals.

AnswerB

This removes public access while allowing authorized access.

Why this answer

Option B is correct because removing the bucket policy that grants public access and using an S3 bucket policy or IAM policies to allow specific IAM principals (like Glue and Athena) resolves the public exposure while allowing authorized access. Option A is incorrect because Macie does not enforce access control. Option C is incorrect because enabling block public access is a good practice but does not grant access to authorized users; it may also block Glue if not configured correctly.

Option D is incorrect because ACLs are legacy and less secure; also, public objects may have ACLs granting public access.

Full explanation →

438

MCQhard

A company has a requirement to store audit logs for 7 years for compliance. The logs are stored in S3 and must be immutable. Which S3 feature should be used?

A.Use a bucket policy that denies s3:DeleteObject

B.Enable MFA Delete on the bucket

C.Enable S3 Versioning and set a lifecycle policy

D.Enable S3 Object Lock in compliance mode

AnswerD

Compliance mode prevents any user, including root, from overwriting or deleting objects.

Why this answer

S3 Object Lock prevents objects from being deleted or overwritten for a specified retention period. Option A is wrong because versioning does not prevent deletion. Option B is wrong because MFA Delete adds an extra step but can be bypassed by root.

Option C is wrong because bucket policies do not enforce immutability. Option D is correct.

Full explanation →

439

MCQmedium

Refer to the exhibit. A data engineer has an IAM policy attached to an IAM role used by an AWS Glue job. The Glue job reads from S3 bucket 'example-bucket' and writes to an S3 bucket 'output-bucket'. The job fails with an 'Access Denied' error when writing to 'output-bucket'. What is the MOST likely cause?

A.The policy does not allow s3:PutObject on any bucket.

B.The policy does not allow s3:PutObject on 'output-bucket'.

C.The policy does not allow s3:GetObject on 'output-bucket'.

D.The policy has a condition that restricts s3:PutObject to 'example-bucket'.

AnswerB

The resource is only example-bucket/*.

Why this answer

Option B is correct. The policy only allows s3:PutObject on 'example-bucket/*', not on 'output-bucket/*'. The job needs permission on the output bucket.

Option A is incorrect because s3:PutObject is allowed on example-bucket, but not on output-bucket. Option C is incorrect because there is no condition that restricts PutObject to example-bucket. Option D is incorrect because the policy allows s3:GetObject on example-bucket, which is for reading.

Full explanation →

440

Multi-Selecthard

A data engineer needs to ensure that an S3 bucket policy follows the principle of least privilege. Which of the following are valid conditions to restrict access based on the requester's identity? (Choose THREE.)

Select 3 answers

A.aws:PrincipalOrgID

B.s3:x-amz-server-side-encryption

C.aws:Referer

D.aws:SourceIp

E.aws:userId

AnswersA, D, E

Restricts to accounts in an AWS Organization.

Why this answer

Options A, C, and D are correct. aws:SourceIp restricts based on IP address. aws:PrincipalOrgID restricts to a specific AWS Organization. aws:userId restricts to a specific IAM user ID. Option B is wrong because aws:Referer is for HTTP referrer, not identity. Option E is wrong because s3:x-amz-server-side-encryption is for encryption, not identity.

Full explanation →

441

MCQhard

A company uses Amazon S3 to store sensitive financial data. The security team requires that all objects be encrypted at rest using AWS KMS with a customer-managed key. Additionally, they want to audit all KMS decrypt calls for compliance. Which configuration should be used to meet these requirements?

A.Enable default encryption on the bucket with SSE-KMS using an AWS managed key.

B.Use SSE-S3 with a bucket policy that denies uploads without encryption.

C.Use SSE-KMS with a customer-managed KMS key and enable CloudTrail data events for the key.

D.Use SSE-C with client-managed keys and log S3 API calls.

AnswerC

SSE-KMS with customer-managed key and CloudTrail auditing meets requirements.

Why this answer

Option C is correct because SSE-KMS with a customer-managed key allows the company to control the encryption key lifecycle and meet the requirement for customer-managed keys. Enabling CloudTrail data events for the KMS key captures all decrypt API calls, providing the necessary audit trail for compliance.

Exam trap

The trap here is that candidates may confuse enabling default encryption on the bucket (which can use SSE-KMS) with the need for a customer-managed key and CloudTrail data events, or they may think SSE-S3 or SSE-C can satisfy the audit requirement without KMS-specific logging.

How to eliminate wrong answers

Option A is wrong because it uses an AWS managed key, not a customer-managed key, so the security team cannot control key rotation or access policies. Option B is wrong because SSE-S3 uses server-side encryption with S3-managed keys, which does not provide customer-managed key control, and the bucket policy only enforces encryption, not auditing of decrypt calls. Option D is wrong because SSE-C requires the client to manage the encryption keys, which does not meet the requirement for AWS KMS, and logging S3 API calls alone does not capture KMS decrypt events.

Full explanation →

442

MCQeasy

A startup is building a mobile application that requires a database to store user profiles and preferences. The database must scale automatically with minimal administration. Which AWS service should they use?

A.Amazon Redshift

B.Amazon Aurora

C.Amazon DynamoDB

D.Amazon RDS for PostgreSQL

AnswerC

DynamoDB scales automatically with on-demand capacity.

Why this answer

Option B is correct because Amazon DynamoDB is a fully managed NoSQL database that scales automatically. Option A (RDS) requires manual scaling. Option C (Redshift) is for data warehousing.

Option D (Aurora) is relational and requires some management.

Full explanation →

443

Multi-Selecthard

Which TWO of the following are best practices for Amazon Redshift table design? (Choose TWO.)

Select 2 answers

A.Choose sort keys based on query patterns

B.Use INSERT statements for large data loads

C.Avoid compression encoding to reduce CPU overhead

D.Specify distribution keys to minimize data movement

E.Set distribution style to ALL for all tables

AnswersA, D

Sort keys improve query performance for range-filtered queries.

Why this answer

Options B and D are correct. Using sort keys for query performance and distribution keys for data distribution are best practices. Option A is wrong because compression encoding should be applied, not avoided.

Option C is wrong because COPY is the preferred method. Option E is wrong because distribution keys should be chosen carefully, not set to ALL by default.

Full explanation →

444

MCQeasy

Refer to the exhibit. A company uses S3 Event Notifications to trigger an AWS Lambda function whenever a new object is uploaded to an S3 bucket. The Lambda function processes the file and moves it to a different bucket. Recently, the function has been failing intermittently. The engineer checks the Lambda CloudWatch logs and sees the above event. What is the MOST likely cause of the intermittent failures?

A.The event JSON is malformed; the 'EventSource' should be 's3.amazonaws.com'.

B.The S3 bucket name contains a hyphen, which is not allowed.

C.The S3 event notifications are not guaranteed to be delivered exactly once, causing duplicate processing.

D.The event is missing the 'object:versionId' field.

AnswerC

At-least-once delivery can cause issues.

Why this answer

Option C is correct because S3 Event Notifications are asynchronous and may be duplicated or delivered out of order, causing race conditions. Option A is wrong because the event has all required fields. Option B is wrong because the bucket name is valid.

Option D is wrong because the event format is correct.

Full explanation →

445

MCQmedium

A data engineer is designing a pipeline that ingests JSON logs from an application into Amazon S3. The logs contain a timestamp field. The pipeline must partition the data by date in S3 (e.g., year=2024/month=10/day=01). Which approach minimizes transformation effort?

A.Use Amazon Kinesis Data Firehose with dynamic partitioning

B.Use AWS Glue crawlers to infer schema and create partitions

C.Use AWS Lambda to process each object and copy to the appropriate prefix

D.Use Amazon Athena to create partitions on the existing data

AnswerA

Firehose can dynamically partition data based on the timestamp and deliver to S3 partitioned prefixes.

Why this answer

Option D is correct because S3 Batch Operations can copy objects and apply prefix changes, but for real-time partitioning, Kinesis Data Firehose with dynamic partitioning is the best approach. The correct answer is actually Kinesis Data Firehose. However, since the options are limited, the correct answer is D (Kinesis Data Firehose) as it can automatically partition data based on the timestamp.

Option A is wrong because Lambda would require custom code. Option B is wrong because Athena is a query service. Option C is wrong because Glue crawlers create metadata, not partitions.

Full explanation →

446

Multi-Selecthard

A company is migrating a large Oracle data warehouse to Amazon Redshift. Which THREE considerations are important for optimizing the Redshift cluster?

Select 3 answers

A.Purchasing reserved instances for the cluster.

B.Using columnar storage format.

C.Defining appropriate sort keys for the tables.

D.Applying compression encoding to columns.

E.Choosing the right distribution style (KEY, ALL, EVEN).

AnswersC, D, E

Improves query performance by reducing scans.

Why this answer

Distribution style, sort keys, and compression encoding are key to Redshift performance. Columnar storage is inherent. Auto Vacuum is automatic.

Reserved instances are for cost savings, not performance.

Full explanation →

447

MCQhard

An e-commerce company uses AWS Glue to process clickstream data from its website. The data is stored in Amazon S3 in partitioned Parquet format by date and hour. A recent increase in traffic has caused the Glue job to fail with 'Java heap space' errors. The job runs with 10 DPUs and uses Spark's default configurations. The data engineer needs to resolve the memory issue without modifying the ETL script. What should the data engineer do?

A.Decrease the Spark configuration 'spark.sql.shuffle.partitions' to 50.

B.Change the worker type to G.1X.

C.Increase the Spark configuration 'spark.sql.shuffle.partitions' to 500.

D.Increase the number of DPUs to 20.

AnswerC

More shuffle partitions reduce the size of data per partition, mitigating memory issues.

Why this answer

Option B is correct. Increasing Spark's shuffle partitions reduces the amount of data handled per partition, preventing heap space errors. Option A is wrong because increasing DPUs may not directly address the heap space issue if the problem is within a single executor.

Option C is wrong because reducing partitions may increase memory pressure. Option D is wrong because changing worker type to G.1X may not help if the issue is shuffle-related.

Full explanation →

448

MCQeasy

A data engineer is using AWS Glue to perform ETL on data stored in an S3 bucket. The source data is in CSV format with a header row, and the target is a set of Parquet files partitioned by date. The engineer notices that the Glue job is reading all files in the source prefix, including temporary files that should be ignored. What is the MOST efficient way to exclude these temporary files?

A.Change the source format from CSV to Parquet.

B.Set up an S3 event notification to trigger a Lambda function that moves temporary files.

C.Use an S3 prefix exclusion pattern in the Glue job's source path.

D.Create a custom classifier in the Glue Data Catalog.

AnswerC

Glue supports S3 include/exclude patterns to filter files.

Why this answer

Using a S3 prefix exclusion pattern in the Glue job's S3 path is the most efficient way to exclude files. Option A is wrong because changing to Parquet does not exclude files. Option B is wrong because a custom classifier is for schema inference.

Option D is wrong because Lambda would add complexity.

Full explanation →

449

Multi-Selecthard

A company ingests streaming data from social media feeds into Amazon Kinesis Data Streams. The data is consumed by an AWS Lambda function that transforms and writes to Amazon S3. Recently, the Lambda function started timing out and dropping records. The data volume has tripled. Which actions should the data engineer take to resolve this? (Choose TWO.)

Select 2 answers

A.Increase the number of shards in the Kinesis data stream

B.Replace Lambda with Amazon Kinesis Data Firehose for the transformation

C.Increase the Lambda function timeout to 15 minutes

D.Set a reserved concurrency on the Lambda function

E.Increase the memory allocated to the Lambda function

AnswersA, E

More shards increase throughput capacity.

Why this answer

Options A and D are correct because increasing the number of shards increases throughput, and increasing Lambda memory can improve processing speed and reduce timeouts. Option B (Lambda concurrency) may cause throttling. Option C (Lambda timeout) may not be sufficient without more resources.

Option E (Data Firehose) is a different service; replacing Lambda with Firehose could be an alternative but not a direct fix for current architecture.

Full explanation →

450

Multi-Selectmedium

A data engineer is designing a disaster recovery strategy for an Amazon RDS for PostgreSQL database that is used in a data pipeline. The database must have a Recovery Point Objective (RPO) of less than 1 minute and a Recovery Time Objective (RTO) of less than 5 minutes. Which TWO actions should the engineer take?

Select 2 answers

A.Take frequent manual snapshots and copy them to another Region.

B.Enable automated backups with point-in-time recovery.

C.Enable Multi-AZ deployment with a standby instance.

D.Create a read replica in a different Availability Zone.

E.Use cross-Region replication with Amazon Aurora Global Database.

AnswersB, C

Allows recovery to any point within retention period, meeting RPO.

Why this answer

Options B and D are correct. Multi-AZ with standby provides automatic failover with RTO typically under 1-2 minutes, and automated backups with point-in-time recovery enable RPO of seconds. Option A is wrong because read replicas are not for automatic failover.

Option C is wrong because snapshots are manual and have higher RPO/RTO. Option E is wrong because cross-Region replication adds latency and may not meet RPO.

Full explanation →

Page 6 of 24

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Practice DEA-C01 by domain

Target a specific domain to shore up weak areas.

Data Ingestion and Transformation Data Operations and Support Data Security and Governance Data Store Management

See all domains with question counts →