CCNA Data Operations Support Questions

75 of 387 questions · Page 3/6 · Data Operations Support topic · Answers revealed

151
Multi-Selectmedium

A company uses Amazon Kinesis Data Firehose to deliver streaming data to Amazon S3. The delivery stream is failing with 'Insufficient capacity' errors. Which THREE actions should the data engineer take to resolve this issue? (Choose THREE.)

Select 3 answers
A.Enable S3 bucket versioning to handle concurrent writes.
B.Increase the buffer size and buffer interval in the Firehose delivery stream configuration.
C.Configure a CloudWatch alarm to monitor the error rate.
D.Request a service quota increase for Kinesis Data Firehose.
E.Increase the number of shards in the source Kinesis data stream.
AnswersB, D, E

Larger buffers reduce the frequency of writes, lowering capacity needs.

Why this answer

Options A, B, and C are correct. A: Increasing buffer size and interval allows Firehose to batch more records, reducing the number of PUT requests. B: Increasing the number of shards in the source Kinesis stream provides more write capacity.

C: Requesting a service quota increase for Firehose can raise the default limits. Option D is wrong because S3 bucket versioning does not affect Firehose capacity. Option E is wrong because CloudWatch alarms only alert, they do not resolve capacity issues.

152
MCQmedium

A company uses Amazon EMR to process large datasets stored in Amazon S3. The data engineer notices that EMR tasks are failing with 'DiskOutOfSpace' errors. The cluster uses m5.xlarge instances with 1 EBS volume of 64 GB. What is the MOST cost-effective solution to resolve this issue?

A.Use a mix of on-demand and spot instances for core nodes.
B.Increase the EBS storage volume size for each instance and use spot instances for task nodes.
C.Switch to D2 instances which have more instance store volume.
D.Increase the number of task instances to distribute the workload.
AnswerB

More disk space solves the issue; spot instances reduce cost.

Why this answer

Option D is correct because increasing EBS storage per instance provides more disk space, and using spot instances reduces cost. Option A is wrong because adding more instances may not address the per-instance disk space issue and increases cost. Option B is wrong because increasing core nodes only helps if the shuffle data is distributed; spot instances reduce cost but may cause interruptions.

Option C is wrong because switching to D2 instances is more expensive and may not be needed.

153
Multi-Selectmedium

Which THREE are best practices for managing data in Amazon S3 for a data lake? (Choose three.)

Select 3 answers
A.Enable S3 Versioning to protect against accidental deletions.
B.Configure lifecycle policies to transition data to colder storage tiers.
C.Enable S3 Snapshot for point-in-time recovery.
D.Disable S3 server access logging to reduce costs.
E.Use bucket policies to restrict access based on IAM roles.
AnswersA, B, E

Versioning provides data protection.

Why this answer

Enabling S3 Versioning is a best practice for data lakes because it protects against accidental deletions or overwrites by preserving all versions of an object, including deletions (which are recorded as delete markers). This allows you to recover previous object states and is essential for data governance and auditability in a data lake environment.

Exam trap

The trap here is that candidates may confuse S3 Versioning with a non-existent 'S3 Snapshot' feature, or mistakenly think disabling server access logging is a cost-saving best practice, when in fact it undermines security auditing.

154
Drag & Dropmedium

Arrange the steps to set up a streaming ETL pipeline using Amazon Kinesis Data Firehose to Amazon S3.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

First, create the Firehose stream, configure source, set S3 destination, enable optional Lambda transformation, and test.

155
MCQeasy

A data engineer needs to monitor the number of records processed by an AWS Glue ETL job. Which CloudWatch metric should the engineer use?

A.glue.driver.aggregate.elapsedTime
B.glue.driver.aggregate.numRecords
C.glue.driver.aggregate.bytesRead
D.glue.driver.aggregate.recordsRead
AnswerB

This metric tracks the number of records processed.

Why this answer

Option B is correct because Glue emits a 'glue.driver.aggregate.numRecords' metric for the number of records processed. Option A is wrong because 'glue.driver.aggregate.elapsedTime' is for time. Option C is wrong because 'glue.driver.aggregate.bytesRead' is for bytes.

Option D is wrong because 'glue.driver.aggregate.recordsRead' is not a standard metric.

156
Multi-Selectmedium

A company runs a data lake on Amazon S3 with AWS Glue for ETL. The data is stored in Parquet format and partitioned by date. The data engineer notices that queries using Amazon Athena are scanning large amounts of data even when filtering on the partition column. Which TWO actions would improve query performance? (Choose TWO)

Select 2 answers
A.Use a different file format like Avro
B.Ensure that the WHERE clause uses the partition column correctly
C.Convert the data from Parquet to CSV for better compression
D.Increase the number of partitions by adding a second partition column
E.Enable predicate pushdown in Athena
AnswersB, E

Enables partition pruning.

Why this answer

Option B and D are correct. Partition pruning is most effective when the partition column is used correctly in the WHERE clause, and using columnar formats like Parquet with predicate pushdown reduces data scanned. Option A is wrong because adding more partitions may not help if the filter is not applied.

Option C is wrong because converting to CSV would increase data scanned. Option E is wrong because using a different file format may not help if partitioning is not leveraged.

157
Multi-Selecthard

A company runs a data processing pipeline using Amazon EMR with Spark. The pipeline reads from S3, processes data, and writes to S3. Recently, the job started failing with 'S3AccessDeniedException' even though the EMR role has appropriate S3 permissions. Which TWO actions should the data engineer take to resolve this issue? (Choose TWO.)

Select 2 answers
A.Enable S3 versioning on the bucket to allow multiple access methods.
B.Verify that the EMR service role has the necessary S3 permissions in IAM.
C.Disable S3 Block Public Access settings on the bucket.
D.Check the S3 bucket policy for explicit deny statements that may override the IAM role.
E.Ensure the EMR cluster is launched in a VPC with an S3 VPC endpoint.
AnswersB, D

The EMR service role (EMR_EC2_DefaultRole) must have permissions.

Why this answer

Options A and D are correct. A: S3 bucket policies can deny access to specific IP addresses or VPC endpoints; checking them can reveal explicit denies. D: EMR roles may need an IAM policy for S3 access; verifying the policy ensures correct permissions.

Option B is wrong because S3 Block Public Access does not affect IAM-based access. Option C is wrong because S3 endpoints in the VPC are not required if using public internet. Option E is wrong because S3 versioning does not affect access permissions.

158
MCQeasy

A data engineer is troubleshooting a failed AWS Glue ETL job. The job reads from an S3 bucket and writes to an RDS MySQL database. The job fails with an 'Access Denied' error when trying to write to RDS. What is the most likely cause?

A.The IAM role associated with the Glue job does not have the necessary permissions to write to the RDS instance.
B.The Glue job is running in a VPC without a route to the internet.
C.The S3 bucket policy does not allow the Glue job to read the data.
D.The RDS instance is encrypted with a KMS key that the Glue job cannot access.
AnswerA

IAM role needs RDS write permissions.

Why this answer

The error 'Access Denied' when writing to RDS indicates that the AWS Glue job's IAM role lacks the necessary permissions (e.g., rds-db:connect, or specific database-level GRANTs) to perform write operations on the RDS MySQL instance. AWS Glue uses the attached IAM role to authenticate and authorize actions against AWS services, and without proper IAM policies allowing access to the RDS resource, the write attempt is denied.

Exam trap

The trap here is that candidates often confuse network connectivity issues (Option B) with authorization errors, but 'Access Denied' is a specific HTTP 403 error indicating lack of permissions, not a network problem.

How to eliminate wrong answers

Option B is wrong because a missing route to the internet would cause a network connectivity timeout or 'connection refused' error, not an 'Access Denied' error, which is an authorization failure. Option C is wrong because the error occurs when writing to RDS, not when reading from S3; an S3 bucket policy issue would produce an S3-specific 'Access Denied' error during the read phase. Option D is wrong because if the RDS instance is encrypted with a KMS key that the Glue job cannot access, the error would typically be a 'KMS access denied' or 'encryption key unavailable' error, not a generic 'Access Denied' for writing to RDS.

159
Multi-Selectmedium

Which TWO actions should a data engineer take to optimize Amazon S3 query performance for Amazon Athena when dealing with large Parquet files? (Choose 2.)

Select 2 answers
A.Store data in a single large file without partitioning
B.Use GZIP compression on the Parquet files
C.Split large files into many small files
D.Optimize file sizes to be around 64 MB to 256 MB
E.Partition the data by frequently filtered columns
AnswersD, E

Optimal file size improves parallelism and performance.

Why this answer

A and D are correct: Partitioning reduces scanned data, and converting to Parquet is already done but optimizing file size (e.g., 64 MB) further improves performance. B (compression) is already assumed in Parquet. C (small files) actually hurts performance.

E (no partitioning) is not an optimization.

160
MCQhard

Refer to the exhibit. A data engineer is reviewing the configuration of an Amazon Redshift cluster. The engineer wants to ensure that the cluster can be restored to a point in time up to 35 days in the past. Based on the exhibit, what change is needed?

A.Increase the automated snapshot retention period to 35 days.
B.Change the cluster subnet group to a custom one.
C.Enable encryption on the cluster.
D.Increase the number of nodes to 6.
AnswerA

Current retention is 1 day.

Why this answer

Option C is correct. The AutomatedSnapshotRetentionPeriod is set to 1 day, which allows only 1 day of point-in-time recovery. To support 35 days, this value must be increased to 35.

Option A is incorrect because the cluster is already encrypted. Option B is incorrect because the number of nodes does not affect snapshot retention. Option D is incorrect because the subnet group does not affect retention.

161
MCQeasy

A data engineer is troubleshooting an AWS Glue ETL job that uses a Python shell script to extract data from an Amazon RDS for PostgreSQL database and load it into an Amazon Redshift table. The job runs successfully, but the data engineer notices that the row count in Redshift is consistently lower than the row count in PostgreSQL. The job uses a SELECT * query without any filtering. The data engineer suspects that some rows are being dropped during the transfer. The job uses the psycopg2 library to connect to PostgreSQL and the psycopg2 connection is configured with autocommit=True. The Redshift table has no constraints that would reject rows. What is the most likely cause of the missing rows?

A.The SELECT * query includes columns with data types that are not supported by psycopg2.
B.The SSL/TLS connection to PostgreSQL is dropping packets.
C.The autocommit=True setting is causing incomplete transactions.
D.The Redshift table has a distribution key that causes some rows to be silently discarded.
AnswerA

Unsupported data types may cause rows to be skipped or nullified.

Why this answer

Option C is correct. If the Redshift table is set to distribute data, and if the distribution key is not unique, some rows may be lost if there are duplicates? Actually, the most common cause is that the Glue job is using multiple executors, and the data is being split across them, but Python shell uses a single executor. However, the most plausible answer is that the SELECT * query may include columns with special characters that are not handled correctly.

Option A is wrong because autocommit=True should not cause data loss. Option B is wrong because SSL/TLS is about encryption, not row count. Option D is wrong because the Redshift table has no constraints.

162
Multi-Selecthard

Which THREE are valid considerations when troubleshooting data loss in an AWS Glue ETL job? (Choose three.)

Select 3 answers
A.Job bookmarks may be skipping new data if not configured properly.
B.Server-side encryption is disabled on the S3 bucket.
C.The job timeout is set too low.
D.Dynamic frame transformations may drop rows with errors.
E.The mapping of source columns to target columns may be incorrect.
AnswersA, D, E

Bookmarks control reprocessing.

Why this answer

Options A, B, and E are correct. Job bookmarks can skip data, incorrect mapping can lose columns, and dynamic frame transformations can drop rows if errors occur. Option C is wrong because disabling encryption does not cause data loss.

Option D is wrong because increasing timeout does not cause data loss.

163
Multi-Selecteasy

A data engineer is setting up a data pipeline using Amazon Kinesis Data Firehose to deliver data to Amazon S3. The data must be transformed using an AWS Lambda function before delivery. Which THREE steps are required to configure this?

Select 3 answers
A.Create a Lambda@Edge function in the same Region.
B.Create an AWS Lambda function that transforms the data.
C.Attach an IAM role to the Firehose delivery stream that grants permission to invoke the Lambda function.
D.Configure an S3 event notification to trigger the Lambda function when new data arrives.
E.Configure the Kinesis Data Firehose delivery stream to use the Lambda function as a data transformation source.
AnswersB, C, E

A Lambda function is needed to perform the transformation logic.

Why this answer

Options B, C, and D are correct. The Lambda function must be created (B), Firehose must be configured to invoke the Lambda function (C), and the IAM role must allow Firehose to invoke Lambda (D). Option A is wrong because Lambda@Edge is for CloudFront.

Option E is wrong because S3 events are not used for Firehose transformation.

164
MCQmedium

A company uses AWS DMS to migrate data from an on-premises Oracle database to Amazon Aurora MySQL. After the migration, the data in Aurora is inconsistent with the source. The engineer needs to ensure ongoing replication with minimal downtime. Which solution should the engineer implement?

A.Use AWS Schema Conversion Tool (SCT) to convert the schema
B.Export the data from Oracle and import into Aurora using mysqldump
C.Configure a DMS task with change data capture (CDC)
D.Perform a full load migration again
AnswerC

CDC captures ongoing changes.

Why this answer

Option B is correct because using DMS with change data capture (CDC) captures ongoing changes and replicates them with minimal downtime. Option A is wrong because full load only captures a snapshot. Option C is wrong because AWS Schema Conversion Tool does not handle data replication.

Option D is wrong because exporting and importing does not provide ongoing replication.

165
Multi-Selecthard

A company is using Amazon EMR to run Spark jobs. The jobs are failing due to memory issues. Which THREE configurations can help mitigate out-of-memory errors?

Select 3 answers
A.Configure instance store volumes for intermediate shuffle data.
B.Use instances with more vCPUs to process more tasks in parallel.
C.Tune Spark memory configurations like spark.executor.memory and spark.memory.fraction.
D.Increase the instance type to one with more memory per node.
E.Enable Spark dynamic allocation to adjust executors based on workload.
AnswersC, D, E

Proper tuning optimizes memory usage.

Why this answer

Options A, B, and D are correct. Increasing instance memory, tuning memory fractions, and enabling dynamic allocation help manage memory. Option C is wrong because more cores per node increases parallelism, potentially worsening memory pressure.

Option E is wrong because instance store volumes are for temporary storage, not memory.

166
MCQmedium

A data engineer attempts to suspend versioning on an S3 bucket but receives the error shown. The engineer needs to suspend versioning to reduce storage costs. What should the engineer do FIRST?

A.Disable MFA Delete by using the AWS CLI with the --mfa parameter and then suspend versioning.
B.Use the AWS Management Console to suspend versioning, as it bypasses MFA Delete.
C.Delete the bucket and recreate it without versioning.
D.Add a bucket policy to allow versioning suspension.
AnswerA

MFA Delete must be disabled first; this requires the root account and MFA device.

Why this answer

Option B is correct because MFA Delete must be disabled before versioning can be suspended. Option A is incorrect because MFA Delete is already enabled, not disabled. Option C is incorrect because versioning cannot be suspended directly.

Option D is incorrect because the error is about MFA Delete, not permissions.

167
Multi-Selecthard

A data engineer is troubleshooting a slow-running Amazon Redshift query. The query joins several large tables and performs aggregations. The engineer runs EXPLAIN and sees a 'DS_DIST_ALL' step. Which TWO actions will MOST likely improve query performance? (Choose TWO.)

Select 2 answers
A.Run the VACUUM command on all tables.
B.Use the CNAME command to rename the tables.
C.Change the distribution style of the tables to DISTSTYLE KEY on the join columns.
D.Increase the number of nodes in the Redshift cluster.
E.Define appropriate SORTKEYs on the tables based on the query predicates.
AnswersC, E

Reduces data redistribution across nodes.

Why this answer

Option A is correct because DS_DIST_ALL indicates a cross-node redistribution; using DISTSTYLE KEY on the join columns can reduce data movement. Option C is correct because SORTKEY can speed up joins and aggregations by reducing data scans. Option B is wrong because increasing node count may help but is not a targeted fix.

Option D is wrong because VACUUM reclaims space and sorts, but does not address distribution issues. Option E is wrong because CNAME is a DNS concept, not a Redshift feature.

168
Multi-Selectmedium

A company uses Amazon Kinesis Data Firehose to deliver streaming data to Amazon S3. The data is in JSON format, and each record is approximately 5 KB. The company has set the buffer interval to 60 seconds and the buffer size to 5 MB. However, the data engineer observes that the delivery to S3 is delayed by up to 5 minutes during peak traffic. The engineer wants to reduce the delivery latency to under 1 minute. Which TWO actions should the engineer take? (Choose TWO.)

Select 2 answers
A.Enable GZIP compression for the delivery stream.
B.Reduce the buffer size to 1 MB.
C.Increase the buffer size to 50 MB.
D.Convert the data format to Apache Parquet before delivery.
E.Reduce the buffer interval to 10 seconds.
AnswersB, E

A smaller buffer size triggers delivery sooner when the size threshold is reached.

Why this answer

Option A is correct because reducing the buffer interval to 10 seconds forces Firehose to deliver data more frequently, reducing latency. Option C is correct because reducing the buffer size to 1 MB also triggers delivery sooner when the size threshold is met. Option B is wrong because increasing the buffer size would increase latency, as it takes longer to fill.

Option D is wrong because converting to Parquet requires additional processing and does not directly reduce latency; it may increase it. Option E is wrong because enabling GZIP compression reduces volume but does not reduce delivery latency; it may actually increase processing time.

169
MCQhard

A company runs a data warehouse on Amazon Redshift. The data engineer notices that some queries are running slowly. Upon reviewing the system tables, the engineer finds that the 'svv_table_info' shows high 'unsorted' percentage for several large tables. What is the MOST effective action to improve query performance?

A.Run the ANALYZE command on the tables.
B.Run the VACUUM command on the tables.
C.Change the distribution style of the tables to ALL.
D.Increase the number of nodes in the Redshift cluster.
AnswerB

VACUUM sorts the data, improving query performance.

Why this answer

Option B is correct because VACUUM sorts the data and reclaims space, improving query performance. Option A is wrong because ANALYZE updates statistics but does not sort. Option C is wrong because increasing the number of nodes may help but is not the most direct fix for unsorted data.

Option D is wrong because changing distribution style would require recreating the table.

170
Multi-Selectmedium

A data engineer is troubleshooting a slow-running Amazon Athena query. The query scans a large amount of data. Which TWO actions can improve query performance? (Choose TWO.)

Select 2 answers
A.Convert the data to Parquet or ORC format.
B.Enable encryption at rest.
C.Increase the Athena query timeout.
D.Partition the table on frequently filtered columns.
E.Use SELECT * to retrieve all columns.
AnswersA, D

Columnar formats reduce I/O and improve compression.

Why this answer

Partitioning the table and converting to columnar formats like Parquet reduce the amount of data scanned, improving performance. Option C is wrong because using SELECT * scans all columns. Option D is wrong because increasing timeout does not improve performance.

Option E is wrong because it is not a standard optimization.

171
MCQhard

A data engineer is troubleshooting a slow Amazon Redshift query. The query scans a large table with interleaved sort keys. The engineer notices that the query plan shows a sequential scan instead of a range-restricted scan. What is the MOST likely reason?

A.The table has not been vacuumed and reindexed after large data loads.
B.The table has a poor distribution key (DISTKEY) causing data skew.
C.The table uses compression encodings that prevent range-restricted scans.
D.The workload management (WLM) queue is configured with too few query slots.
AnswerA

Without VACUUM REINDEX, interleaved sort keys lose effectiveness, causing sequential scans.

Why this answer

Option D is correct because after a significant number of rows are inserted, the sort keys may become unsorted, requiring a VACUUM REINDEX to restore the sort order. Option A is incorrect because DISTKEY affects distribution, not sort key efficiency. Option B is incorrect because compression is for storage, not sorting.

Option C is incorrect because WLM configuration affects concurrency, not scan type.

172
MCQhard

A data engineer is troubleshooting an AWS Glue job that writes data to an Amazon S3 bucket in Parquet format. The job runs successfully but the output files are smaller than the configured 'groupFiles' size. The engineer has set 'groupFiles' to 'inPartition' and 'groupSize' to 1 GB. The input data is 10 GB in a single partition. What is the most likely reason for the small files?

A.The 'groupFiles' parameter is deprecated in the current Glue version.
B.The 'groupFiles' parameter only affects the input read phase, not the output write phase.
C.The 'groupFiles' parameter is misspelled or set incorrectly.
D.The engineer must also set 'repartition' to 1 to merge output files.
AnswerB

Grouping coalesces small input files during reading but does not control output file size.

Why this answer

Option B is correct because 'groupFiles' only works when the input data is already small and needs to be coalesced. However, if the input is large and the job writes output, the output file size is determined by the number of Spark partitions, not grouping. The grouping feature only applies to reading input files.

Option A is wrong because the setting is correct. Option C is wrong because grouping is a read-time feature, not write-time. Option D is wrong because grouping does not require repartitioning.

173
MCQmedium

A data engineer sees the CloudWatch log entry in the exhibit for a Lambda function that processes data from an Amazon SQS queue. What is the MOST likely cause of the timeout?

A.The Lambda function's reserved concurrency is set too low.
B.The Lambda function is running out of memory.
C.The Lambda function's timeout is too short for the processing required.
D.The SQS queue's visibility timeout is set too low.
AnswerC

The function timed out at exactly the 30-second limit.

Why this answer

Option B is correct because the log shows the function used only 64 MB of the allocated 128 MB memory, indicating that memory is not the issue. The function timed out after 30 seconds, and the duration is 30001.23 ms, which is just over the default timeout of 30 seconds. Therefore, increasing the timeout will resolve the issue.

Option A is incorrect because memory usage is low. Option C is incorrect because the function is not hitting memory limits. Option D is incorrect because SQS visibility timeout does not affect Lambda execution timeout directly; the Lambda timeout is separate.

174
MCQhard

A company runs a data pipeline on Amazon EMR that processes terabytes of data daily. The pipeline reads from Amazon S3, performs transformations using Spark, and writes results back to S3. Recently, the data engineer noticed that the EMR cluster's spot instances are frequently reclaimed, causing job failures and delays. The cluster uses a mix of On-Demand and Spot instances. The engineer wants to minimize job interruptions while keeping costs low. The current configuration uses a single EMR cluster with a core node group of 10 On-Demand instances and a task node group of 20 Spot instances. The job failures occur during the shuffle phase when tasks on Spot instances are lost. The engineer has no control over when spot instances are reclaimed. Which action will MOST effectively reduce job failures while maintaining cost efficiency?

A.Increase the number of On-Demand instances in the core node group to 20.
B.Configure the task node group to use only Spot instances and increase the bid price to the On-Demand price.
C.Change the task node group to use only On-Demand instances.
D.Enable EMR managed scaling to automatically add On-Demand instances when Spot instances are reclaimed.
AnswerD

Managed scaling dynamically adjusts the cluster capacity, adding On-Demand instances to maintain cluster stability during Spot interruptions.

Why this answer

Option B is correct because enabling managed scaling allows EMR to automatically add On-Demand instances to replace reclaimed Spot instances, ensuring capacity without manual intervention. Option A is incorrect because using all Spot instances increases risk. Option C is incorrect because using all On-Demand increases cost significantly.

Option D is incorrect because increasing On-Demand count in core node group may help but does not dynamically adjust; managed scaling is more effective.

175
MCQhard

Refer to the exhibit. A data engineer runs the command on an object in S3. The engineer expected the object to have a tag 'type=raw' but sees no metadata. What is the likely cause?

A.Object tags are not returned by head-object; use get-object-tagging instead
B.The S3 bucket is in a different AWS Region
C.The bucket policy blocks reading tags
D.The object was created without tags because of lifecycle rules
AnswerA

Tags are separate from metadata.

Why this answer

Option B is correct because object tags are not returned by head-object; they require a separate get-object-tagging call. Option A is wrong because the command works. Option C is wrong because bucket policies affect access, not tag visibility.

Option D is wrong because lifecycle rules do not remove tags.

176
MCQeasy

A company runs an Amazon EMR cluster that processes data from S3 and writes results back to S3. The cluster uses Spot Instances for task nodes. Some tasks are failing due to Spot Instance interruptions. What is the BEST way to handle this without manual intervention?

A.Enable automatic node replacement in the EMR cluster
B.Manually relaunch the cluster after failures
C.Configure the application to checkpoint to S3 every few minutes
D.Use only On-Demand instances for task nodes
AnswerA

EMR can automatically replace Spot Instances that are interrupted.

Why this answer

Option A is correct because EMR's automatic node replacement detects interruptions and launches new instances. Option B is manual. Option C uses on-demand which increases cost.

Option D is for checkpointing but doesn't automatically replace instances.

177
MCQhard

A data engineer is monitoring an Amazon Redshift cluster and notices that some queries are experiencing high disk usage and slow performance. The engineer wants to identify the queries that are causing the most disk spills to temporary files. Which system table should the engineer query to get this information?

A.SVL_QUERY_SUMMARY
B.SYS_QUERY_DETAIL
C.STL_SCAN
D.STV_TBL_PERM
AnswerA

SVL_QUERY_SUMMARY includes bytes spilled to disk per query step.

Why this answer

Option D is correct because the SVL_QUERY_SUMMARY system view provides information about disk spills for each query step, including the number of bytes spilled to disk. Option A is incorrect because STL_SCAN is for table scans, not spills. Option B is incorrect because STV_TBL_PERM shows permanent table storage, not temporary spills.

Option C is incorrect because SYS_QUERY_DETAIL provides general query details but not spill information.

178
MCQhard

A data engineer is designing a data pipeline that ingests JSON files from an S3 bucket, transforms them using AWS Glue, and loads into Amazon Redshift. The data is updated daily, and the pipeline must handle late-arriving data from the previous day. Which approach minimizes reprocessing?

A.Use AWS Glue job bookmarks to process only new files based on S3 event notifications.
B.Stream data using Amazon Kinesis Data Firehose to Redshift.
C.Enable S3 versioning and process only the latest version of each object.
D.Schedule a full reload of all data from S3 to Redshift each day.
AnswerA

Job bookmarks track processed files and skip them, so only new files (including late-arriving) are processed.

Why this answer

Option A is correct because incrementally processing only new files avoids reprocessing. Option B (full reload) is inefficient. Option C (versioning) handles overwrites but not late-arriving data.

Option D (Kinesis) is for streaming, not batch with late data.

179
MCQmedium

A data engineer reviews the above error log from an AWS Glue ETL job. The job uses a G.1X worker type (16 GB memory). The job processes a 30 GB CSV file from S3. What should the engineer do to resolve the memory error?

A.Convert the input file from CSV to Parquet format.
B.Set 'spark.executor.memory' to 12g in the job parameters.
C.Decrease the number of workers to 1 to reduce memory overhead.
D.Increase the number of workers from 2 to 4.
AnswerB

Increasing executor memory to 12 GB gives the task more headroom within the 16 GB container.

Why this answer

Option B is correct because the error indicates that the executor ran out of memory (10 GB used of 10 GB limit). Increasing the Spark executor memory to 12 GB (since G.1X has 16 GB total, leaving room for overhead) will prevent the container from being killed. Option A is wrong because increasing the number of workers does not increase per-executor memory.

Option C is wrong because reducing the number of workers reduces total memory but does not fix per-executor limits. Option D is wrong because converting to Parquet reduces file size but does not change the memory limit per executor.

180
MCQhard

A data engineer is troubleshooting an AWS Glue ETL job that suddenly started failing with 'An error occurred while calling o103.pyWriteDynamicFrame. Unknown error'. The job writes data to an Amazon Redshift table. Which step should the engineer take FIRST?

A.Recreate the Redshift table with a different distribution style.
B.Test the job with a small sample dataset to isolate the issue.
C.Update the Redshift JDBC driver version in the Glue job.
D.Review the job's CloudWatch Logs for detailed error messages.
AnswerD

CloudWatch Logs contain the stack trace and root cause.

Why this answer

Option A is correct because the error is generic; checking CloudWatch Logs provides more details. Option B is wrong because it assumes a JDBC driver issue without evidence. Option C is wrong because testing a small dataset may not reproduce the issue.

Option D is wrong because the error is not specific to table structure.

181
MCQhard

A data engineer is troubleshooting an AWS Step Functions workflow that calls a Lambda function to process data. The workflow sometimes fails with a 'StateMachineExecutionLimitExceeded' error. What is the MOST likely cause?

A.Number of concurrent executions exceeds the account limit
B.Execution time exceeds the maximum allowed duration
C.Lambda function memory limit exceeded
D.Lambda function concurrency limit reached
AnswerA

Step Functions has a default limit of 1 million state transitions per account; exceeding it causes this error.

Why this answer

Option C is correct because Step Functions has a default execution limit per account; hitting it causes this error. Option A is wrong because Lambda concurrency limits cause throttling errors, not state machine execution limits. Option B is wrong because execution time limits cause 'ExecutionTimedOut' errors.

Option D is wrong because the error is specific to state machine executions, not Lambda.

182
MCQeasy

A data engineer needs to grant an IAM user read-only access to an S3 bucket named 'data-lake-bucket'. Which IAM policy statement should be attached to the user?

A.{"Effect":"Allow","Action":"s3:GetObject","Resource":"arn:aws:s3:::data-lake-bucket/*"}
B.{"Effect":"Allow","Action":"s3:ListBucket","Resource":"arn:aws:s3:::data-lake-bucket"}
C.{"Effect":"Allow","Action":"s3:PutObject","Resource":"arn:aws:s3:::data-lake-bucket/*"}
D.{"Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::data-lake-bucket/*"}
AnswerA

Read-only access to objects.

Why this answer

Option A is correct because it grants read-only access by allowing only the s3:GetObject action, which permits downloading objects from the bucket. The resource ARN includes the wildcard /* to cover all objects within 'data-lake-bucket', ensuring the user can read but not list or modify data.

Exam trap

The trap here is that candidates often confuse 'read-only access' with just s3:GetObject, forgetting that listing objects (s3:ListBucket) is typically needed for practical read-only use, but the question specifically asks for read-only access to the bucket, not listing, so s3:GetObject alone suffices for the stated requirement.

How to eliminate wrong answers

Option B is wrong because s3:ListBucket alone only allows listing objects in the bucket, not reading their contents; without s3:GetObject, the user cannot download or view object data. Option C is wrong because s3:PutObject grants write access, which violates the read-only requirement. Option D is wrong because s3:* grants full administrative access to all S3 actions on the bucket, far exceeding read-only permissions.

183
MCQhard

A financial services company runs a critical data pipeline using AWS Step Functions to orchestrate multiple AWS Lambda functions and AWS Glue jobs. The pipeline processes transaction data and must complete within 15 minutes to meet a service-level agreement (SLA). Recently, the pipeline has been failing intermittently with a 'StateMachineExecutionLimitExceeded' error. The Step Functions state machine is configured with a Standard type. The company has a single state machine that runs on demand. The error occurs when multiple requests are submitted simultaneously. What should the team do to prevent this error?

A.Increase the state machine execution timeout to 30 minutes.
B.Switch the state machine type to Express Workflow to handle higher throughput.
C.Request a service quota increase for concurrent executions of Standard Workflows.
D.Increase the Lambda function reserved concurrency to 100.
AnswerC

The error is due to hitting the account-level limit for concurrent Standard Workflow executions; a quota increase resolves it.

Why this answer

Option D is correct because Standard Workflows have a limit of 1 million state transitions and a maximum execution duration of 1 year, but they also have a limit of 1,000 open executions per state machine. However, the error 'StateMachineExecutionLimitExceeded' indicates the account-level limit for concurrent executions is exceeded. The default limit for Standard Workflows is 1,000 executions per state machine.

To handle spikes, the team should request a limit increase. Option A is wrong because Express Workflows are designed for high-volume event-driven workloads and have different limits (e.g., 5-minute max duration), which may not meet the SLA. Option B is wrong because increasing the timeout does not affect execution limits.

Option C is wrong because Lambda concurrency limits are separate from Step Functions execution limits.

184
MCQeasy

A data engineer configured an AWS Glue job that reads from an S3 bucket and writes to an Amazon Redshift table. The job runs successfully, but the data in Redshift is missing some records that exist in S3. The engineer suspects the issue may be related to the job's bookmarks. What should the engineer do to ensure all records are processed?

A.Update the IAM role to grant additional S3 read permissions.
B.Reset the job bookmark using the AWS Glue API.
C.Increase the number of workers in the Glue job.
D.Disable job bookmarks in the Glue job configuration.
AnswerD

Disabling bookmarks forces reprocessing of all data.

Why this answer

Option B is correct because disabling job bookmarks forces the Glue job to reprocess all data, which will include the missing records. Option A is wrong because increasing the number of workers does not address the bookmark issue. Option C is wrong because updating the IAM role does not affect bookmarks.

Option D is wrong because job bookmarks do not have a 'reset' API; they are managed via the job configuration.

185
Multi-Selecthard

A company uses Amazon Kinesis Data Analytics for Apache Flink to process streaming data. The application is experiencing high latency and checkpoint failures. Which THREE actions should the data engineer take to improve performance and reliability? (Choose three.)

Select 3 answers
A.Increase the parallelism of the Flink application
B.Configure the application to use event time processing instead of processing time
C.Increase the checkpoint interval to reduce the frequency of checkpoints
D.Decrease the parallelism to reduce resource contention
E.Disable checkpointing to avoid checkpoint failures
AnswersA, B, C

Higher parallelism improves throughput.

Why this answer

Options A, C, and E are correct. Option A: Increasing parallelism improves throughput. Option C: Increasing checkpoint interval reduces checkpoint failures.

Option E: Using event time helps with out-of-order data. Option B is wrong because decreasing parallelism reduces throughput. Option D is wrong because disabling checkpointing hurts reliability.

186
MCQhard

A data pipeline using Amazon Kinesis Data Streams is experiencing high consumer lag. The stream has 10 shards. The consumer is an AWS Lambda function that processes each record and writes to Amazon DynamoDB. What is the MOST likely cause of the lag?

A.The Lambda function's reserved concurrency is set too low
B.The DynamoDB table's write capacity is throttling writes
C.The number of shards is insufficient for the data volume
D.The Lambda function is not authorized to read from Kinesis
AnswerA

Low concurrency limits parallel processing of shards.

Why this answer

Option C is correct because if the Lambda function's concurrency limit is reached, it cannot process all shards simultaneously, causing lag. Option A is wrong because increasing shards would increase parallelism but is not the root cause. Option B is wrong because Lambda can process from multiple shards.

Option D is wrong because DynamoDB write capacity could be a bottleneck but is less likely than Lambda concurrency.

187
Multi-Selectmedium

A data engineer is setting up Amazon CloudWatch alarms for an Amazon Redshift cluster. The engineer wants to be alerted when the disk space usage exceeds 80% for more than 5 minutes and when the CPU utilization exceeds 90% for more than 10 minutes. Which TWO CloudWatch metrics and conditions should the engineer use? (Choose two.)

Select 2 answers
A.Metric: CPUUtilization; Condition: > 90 for 10 minutes
B.Metric: DatabaseConnections; Condition: > 500 for 5 minutes
C.Metric: NetworkReceiveThroughput; Condition: > 1 GB for 10 minutes
D.Metric: WLMQueueLength; Condition: > 100 for 5 minutes
E.Metric: PercentageDiskSpace; Condition: > 80 for 5 minutes
AnswersA, E

This alarm triggers on CPU usage.

Why this answer

Options B and D are correct. Option B: PercentageDiskSpace metric with threshold 80. Option D: CPUUtilization metric with threshold 90.

Option A is wrong because WLM is not for disk. Option C is wrong because NetworkReceiveThroughput is not relevant. Option E is wrong because DatabaseConnections is not about CPU.

188
MCQmedium

A data engineer is tasked with reducing costs for an Amazon Redshift cluster. The cluster is used for both ETL workloads and BI reporting. The engineer observes that the cluster is over-provisioned during off-peak hours. Which action would be MOST effective in reducing costs while maintaining performance during peak hours?

A.Switch to RA3 node types for managed storage.
B.Enable concurrency scaling to automatically add cluster capacity during peak hours.
C.Purchase Reserved Instances for the cluster.
D.Reduce the number of nodes in the cluster.
AnswerB

Concurrency scaling adds transient clusters only when needed, reducing cost during off-peak hours.

Why this answer

Option B is correct because concurrency scaling adds additional capacity on demand and is cost-effective for variable workloads. Option A is incorrect because reserved instances require upfront payment and are for steady-state, not variable. Option C is incorrect because reducing node count may impact performance during peak hours.

Option D is incorrect because RA3 nodes are for managed storage, not cost reduction for variable workloads.

189
MCQeasy

A data engineer is monitoring an Amazon EMR cluster and notices that one core node is running out of disk space. The cluster is running a Spark job that processes large Parquet files. What should the engineer do to prevent the issue?

A.Terminate the core node and replace it with a larger instance type
B.Use Spark's in-memory processing to avoid writing intermediate data to disk
C.Enable Snappy compression for intermediate data
D.Increase the number of core nodes
AnswerC

Compression reduces disk usage for intermediate data.

Why this answer

Option C is correct because enabling Snappy compression for intermediate data reduces the volume of data written to disk during Spark shuffle operations, directly addressing the disk space issue on the core node. Snappy provides a good balance between compression ratio and speed, minimizing I/O overhead while conserving storage. This is a standard tuning practice in Amazon EMR for Spark jobs that process large Parquet files.

Exam trap

The trap here is that candidates may confuse increasing cluster capacity (options A or D) with optimizing data handling, whereas the exam tests the understanding that compression of intermediate data directly reduces disk usage without requiring hardware changes.

How to eliminate wrong answers

Option A is wrong because terminating the core node and replacing it with a larger instance type is disruptive and does not prevent the recurrence of disk space issues; it only temporarily increases capacity without addressing the root cause of excessive intermediate data. Option B is wrong because Spark's in-memory processing cannot fully avoid writing intermediate data to disk during shuffle operations, as spill-to-disk is inherent when memory is insufficient; relying solely on in-memory processing does not prevent disk exhaustion. Option D is wrong because increasing the number of core nodes distributes the storage load but does not reduce the amount of intermediate data written per node; it may delay but not prevent disk space issues if the data volume per node remains high.

190
MCQhard

A data engineer creates an IAM policy as shown in the exhibit. The engineer then attaches this policy to an IAM role used by an application that uploads objects to the S3 bucket 'my-bucket'. When the application uploads an object without specifying server-side encryption, what happens?

A.The object is uploaded with SSE-S3 encryption by default.
B.The upload fails with a 403 Access Denied error.
C.The object is uploaded without encryption.
D.The object is uploaded with SSE-C encryption.
AnswerB

The condition is not met, so the request is denied.

Why this answer

Option D is correct because the policy requires the s3:x-amz-server-side-encryption header to be set to AES256. If the request does not include that header, the condition is not met, and the request is denied with a 403 Access Denied error. Option A is wrong because the object is not uploaded.

Option B is wrong because the condition requires AES256, not SSE-C. Option C is wrong because the condition is not met if the header is absent; the request is denied.

191
MCQhard

A data engineer is responsible for a data pipeline that uses Amazon S3 as a data lake, AWS Glue for ETL, and Amazon Athena for ad-hoc queries. The pipeline ingests CSV files from an external partner via SFTP into an S3 bucket. The files are then processed by a Glue job that converts them to Parquet and writes to a separate S3 bucket partitioned by date. The Glue job runs daily and is triggered by a scheduled CloudWatch Events rule. Recently, the data engineer noticed that some days the Glue job fails because of memory errors, and on those days the Athena queries that rely on the data return incomplete results. The engineer needs to ensure that the pipeline is resilient and that Athena queries always see a complete view of the data, even if the Glue job fails mid-run. The engineer also needs to minimize re-processing of data. Which course of action should the engineer take?

A.Increase the number of workers and the worker type to G.2X to handle the memory errors, and enable job retries.
B.Replace the Glue job with an AWS Lambda function that processes the CSV files and writes Parquet to S3, and use S3 Event Notifications to trigger the function.
C.Modify the Glue job to use job bookmarks for incremental processing and write the Parquet output to a temporary location, then use an S3 copy operation to move the data into the final partitioned location only after the job completes successfully.
D.Use Athena partition projection to automatically discover partitions and set up a retry mechanism using AWS Step Functions.
AnswerC

Bookmarks prevent reprocessing; atomic move ensures Athena sees complete data.

Why this answer

Option B is correct. Using Glue job bookmarks enables incremental processing and the ability to resume from the last successful checkpoint. Staging the data in a temporary location and moving it atomically ensures that Athena sees only complete data.

Option A is wrong because increasing worker capacity does not prevent partial writes. Option C is wrong because using Lambda for conversion is less scalable and error-prone. Option D is wrong because partition projection does not solve the atomicity issue.

192
MCQhard

A company ingests IoT sensor data into an S3 bucket. Daily, a Lambda function reads new objects, processes them, and writes results to a DynamoDB table. Recently, the Lambda function started timing out after 15 minutes. The data volume has increased, and the function processes records one by one. Which solution would improve performance without significant cost increase?

A.Replace Lambda with an AWS Glue ETL job.
B.Increase the Lambda function timeout to 30 minutes.
C.Use S3 Batch Operations to invoke the Lambda function in parallel for each object.
D.Increase the DynamoDB write capacity units.
AnswerC

S3 Batch Operations processes objects concurrently, drastically reducing processing time.

Why this answer

Option B is correct because using S3 Batch Operations invokes a Lambda function for each object in parallel, handling large volumes efficiently. Option A is wrong because increasing Lambda timeout does not address the root cause of sequential processing. Option C is wrong because Glue jobs have a startup overhead and may cost more.

Option D is wrong because increasing DynamoDB write capacity does not speed up the Lambda processing.

193
MCQmedium

A data engineer is troubleshooting an AWS Glue ETL job that fails intermittently with the error 'Rate exceeded.' The job reads from an Amazon RDS for MySQL source and writes to Amazon S3. What is the MOST likely cause of this error?

A.The Glue job is using Amazon Kinesis Data Streams as a source, which has a shard throughput limit.
B.The number of Glue job workers or parallel queries is exceeding the maximum connections or IOPS of the RDS instance.
C.The Amazon S3 bucket has a bucket policy that limits the number of objects written per second.
D.The IAM role attached to the Glue job does not have sufficient permissions to read from RDS.
AnswerB

This is the typical cause of rate exceeded errors when reading from RDS.

Why this answer

Option D is correct because the 'Rate exceeded' error in AWS Glue when reading from RDS typically indicates that the number of connections or queries per second exceeds the RDS instance's maximum limits. Option A is wrong because AWS Glue does not directly use Amazon Kinesis. Option B is wrong because insufficient IAM permissions would cause an access denied error, not rate exceeded.

Option C is wrong because Amazon S3 does not have a rate exceeded error for writes; it would be a 503 SlowDown error.

194
Matchingmedium

Match each AWS data compression format to its typical use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

General-purpose, good compression ratio

Fast compression/decompression for real-time

Columnar storage with built-in compression

Optimized for Hive and large-scale analytics

High compression ratio, slower speed

Why these pairings

Compression formats affect storage and performance in AWS.

195
MCQeasy

A data engineer is configuring an S3 bucket for a data lake. The engineer runs the command shown in the exhibit. What does the output indicate about the bucket?

A.Versioning is enabled on the bucket.
B.The bucket retains only the latest version of each object.
C.Versioning is suspended on the bucket.
D.MFA Delete is enabled for the bucket.
AnswerA

Status: Enabled means versioning is active.

Why this answer

Option A is correct because Status: Enabled means versioning is enabled. Option B is wrong because MFADelete is disabled. Option C is wrong because versioning is enabled, not suspended.

Option D is wrong because it does not indicate specific versions.

196
Multi-Selectmedium

A data engineer is designing a data pipeline that ingests streaming data from an IoT device fleet. The data must be processed in near real-time and stored in Amazon S3 for long-term analytics. Which TWO AWS services should the engineer use together to achieve this?

Select 2 answers
A.Amazon Athena
B.AWS Glue
C.Amazon Kinesis Data Firehose
D.Amazon Kinesis Data Streams
E.Amazon Simple Queue Service (SQS)
AnswersC, D

Delivers streaming data to S3.

Why this answer

Option A is correct because Kinesis Data Streams ingests streaming data. Option C is correct because Kinesis Data Firehose can deliver data from the stream to S3. Option B is wrong because SQS is for message queues, not real-time streaming.

Option D is wrong because Glue is for batch ETL, not real-time. Option E is wrong because Athena is a query service, not a delivery service.

197
MCQmedium

A data engineer is designing a data lake on Amazon S3. The data is accessed frequently for the first 30 days, then rarely after that. Which lifecycle policy is MOST cost-effective?

A.Transition to S3 Standard-Infrequent Access (Standard-IA) after 30 days.
B.Transition to S3 One Zone-IA after 30 days.
C.Transition to S3 Glacier Deep Archive after 30 days.
D.Keep in S3 Standard for 90 days, then delete.
AnswerA

Standard-IA is cost-effective for infrequently accessed data with low latency.

Why this answer

Transitioning to S3 Standard-IA after 30 days reduces costs for infrequently accessed data while retaining low latency. Option A is wrong because Glacier has retrieval times not suitable for rare but possible access. Option B is wrong because One Zone-IA is less durable.

Option D is wrong because keeping in Standard is more expensive.

198
MCQhard

A data engineer is troubleshooting an AWS Glue ETL job that fails with an 'Access Denied' error when trying to write to an S3 bucket. The IAM role used by the job has the policy shown in the exhibit. The bucket 'my-bucket' uses S3 default encryption with AWS KMS. What is the most likely missing permission?

A.s3:GetObjectVersion
B.glue:GetObject
C.s3:ListBucketMultipartUploads
D.s3:PutObjectAcl
E.kms:GenerateDataKey and kms:Decrypt
AnswerE

KMS permissions are necessary to encrypt and decrypt objects when default encryption uses KMS.

Why this answer

Option D is correct because when S3 default encryption uses KMS, the IAM role must have kms:GenerateDataKey and kms:Decrypt permissions on the KMS key. The policy in the exhibit does not include any KMS actions. Option A is wrong because s3:PutObject is already granted.

Option B is wrong because s3:GetObject is allowed for reading. Option C is wrong because s3:ListBucket is allowed for listing. Option E is wrong because Glue GetObject is not a valid action.

199
MCQmedium

A company uses Amazon Kinesis Data Streams to ingest real-time clickstream data. The data is consumed by a Lambda function that writes to an S3 bucket. Recently, the Lambda function started timing out. Which step should be taken to resolve this issue?

A.Increase the Lambda function timeout
B.Set the Lambda reserved concurrency to 1
C.Decrease the batch size in the event source mapping
D.Increase the number of shards in the Kinesis stream
AnswerA

Allows the function to run longer without timing out.

Why this answer

Option D is correct because increasing the Lambda timeout gives the function more time to process records. Option A is wrong because increasing shards increases throughput but does not fix timeout. Option B is wrong because it may reduce concurrency but does not extend processing time.

Option C is wrong because decreasing batch size reduces records per invocation but does not extend timeout.

200
MCQeasy

A data engineer notices that an Amazon Kinesis Data Firehose delivery stream is failing to deliver data to an Amazon S3 bucket. The CloudWatch metrics show 'DeliveryToS3.Success' is 0 and 'S3.BucketExists' is 1. What is the MOST likely cause?

A.The S3 bucket has an ACL that denies access to Firehose.
B.The Firehose delivery stream Lambda transformation function is failing.
C.The IAM role for Firehose lacks s3:PutObject permission.
D.The S3 bucket does not exist.
AnswerC

Write permission is required for delivery.

Why this answer

The metric 'S3.BucketExists' is 1, confirming the S3 bucket exists, so the issue is not bucket existence. With 'DeliveryToS3.Success' at 0, the failure is in the write operation. The IAM role assumed by Firehose must have the s3:PutObject permission to deliver data; lacking it would cause all delivery attempts to fail silently, matching the observed metrics.

Exam trap

The trap here is that candidates may confuse 'S3.BucketExists' with successful delivery, or assume a missing bucket is the issue when the metric clearly shows the bucket exists, leading them to overlook the IAM permission gap.

How to eliminate wrong answers

Option A is wrong because S3 bucket ACLs are not evaluated when the IAM role grants the s3:PutObject permission via a bucket policy or identity-based policy; ACLs are legacy and Firehose uses IAM for authorization. Option B is wrong because a failing Lambda transformation function would cause 'DeliveryToS3.Success' to be 0 only if the transformation is mandatory, but the metric 'S3.BucketExists' would still be 1, and the failure would be logged as 'Lambda.ExecutionErrors' or similar, not directly as a delivery failure. Option D is wrong because 'S3.BucketExists' is 1, which explicitly indicates the bucket exists, so the bucket not existing cannot be the cause.

201
MCQeasy

A team uses Amazon Kinesis Data Analytics to process streaming data. They notice that the application's output is delayed. Which AWS service can be used to monitor the application's performance and identify bottlenecks?

A.AWS CloudTrail
B.Amazon CloudWatch
C.Amazon Athena
D.AWS X-Ray
AnswerB

CloudWatch monitors Kinesis Data Analytics with metrics like MillisBehindLatest and CPU utilization.

Why this answer

Option A is correct because CloudWatch provides metrics and logs for Kinesis Data Analytics applications. Option B is wrong because CloudTrail records API calls, not performance metrics. Option C is wrong because Athena is an interactive query service.

Option D is wrong because X-Ray traces requests but is not the primary service for monitoring Kinesis Data Analytics performance.

202
MCQhard

A company is running a critical Amazon RDS for MySQL database. They need to implement a backup strategy that allows point-in-time recovery (PITR) with a recovery time objective (RTO) of 15 minutes and a recovery point objective (RPO) of 5 minutes. Which solution meets these requirements?

A.Enable automated backups with 1-day retention and enable Multi-AZ deployment
B.Use cross-Region automated backups and promote the replica
C.Take manual snapshots every 5 minutes and restore from the latest snapshot
D.Enable a Read Replica and promote it during a disaster
AnswerA

Automated backups provide transaction logs every 5 minutes; Multi-AZ failover is fast.

Why this answer

Option A is correct because automated backups with 5-minute transaction logs meet RPO and a Multi-AZ failover meets RTO. Option B is wrong because manual snapshots cannot achieve 5-minute RPO. Option C is wrong because cross-Region replication adds latency.

Option D is wrong because Read Replicas are not for failover.

203
MCQmedium

A company stores sensitive data in Amazon S3 and requires that all data be encrypted at rest. The data is accessed by multiple AWS services. Which solution meets the encryption requirement with the LEAST operational overhead?

A.Use server-side encryption with AWS KMS (SSE-KMS)
B.Use client-side encryption with AWS KMS
C.Use server-side encryption with customer-provided keys (SSE-C)
D.Enable S3 default encryption with SSE-S3
AnswerD

Least overhead as AWS manages the keys.

Why this answer

Option D is correct because enabling S3 default encryption with SSE-S3 provides server-side encryption automatically without any key management overhead. Option A is wrong because client-side encryption requires managing keys. Option B is wrong because SSE-KMS requires managing KMS keys.

Option C is wrong because SSE-C requires managing your own keys.

204
MCQeasy

A data engineer needs to monitor Amazon DynamoDB table metrics to detect throttled requests. Which CloudWatch metric should the engineer set an alarm on?

A.ReadThrottleEvents
B.SuccessfulRequestLatency
C.ThrottledRequests
D.ConsumedWriteCapacityUnits
AnswerC

This metric directly indicates requests that were throttled.

Why this answer

Option C is correct because `ThrottledRequests` is the specific Amazon CloudWatch metric that tracks the number of requests to a DynamoDB table that are throttled due to exceeding the provisioned throughput capacity. This metric directly reflects throttling events, making it the appropriate choice for setting an alarm to detect throttled requests.

Exam trap

The trap here is that candidates confuse `ThrottledRequests` with `ReadThrottleEvents` or `WriteThrottleEvents`, which are not actual CloudWatch metrics, leading them to select a plausible-sounding but incorrect option.

How to eliminate wrong answers

Option A is wrong because `ReadThrottleEvents` is not a valid CloudWatch metric for DynamoDB; the correct metric for throttled reads is `ReadThrottleEvents` is a misconception, as DynamoDB exposes `ThrottledRequests` and `ThrottledGetRecords` for streams, but not a separate read-only throttle metric. Option B is wrong because `SuccessfulRequestLatency` measures the latency of successful requests, not throttling events, and is used for performance monitoring rather than detecting throttled requests. Option D is wrong because `ConsumedWriteCapacityUnits` tracks the amount of write capacity consumed, not throttling events, and is used for capacity planning, not for alerting on throttled requests.

205
Multi-Selectmedium

A data engineer is designing a data pipeline using AWS Step Functions to orchestrate multiple AWS Glue ETL jobs. The pipeline must handle failures and retries. Which TWO configurations should the engineer use to ensure the pipeline is resilient? (Choose two.)

Select 2 answers
A.Configure a dead-letter queue (DLQ) for the state machine
B.Configure the state machine to use a 'Catch' rule to handle specific errors and transition to a fallback state
C.Set the 'Retry' interval to a fixed value instead of exponential backoff
D.Define a 'Timeout' for each state to prevent the pipeline from hanging indefinitely
E.Use a 'Parallel' state to run multiple Glue jobs simultaneously
AnswersB, D

Catch rules handle errors gracefully.

Why this answer

Options A and D are correct. Option A: Adding a Catch rule to the state machine handles errors. Option D: Setting a timeout prevents stuck executions.

Option B is wrong because exponential backoff is for retry, not catch. Option C is wrong because parallel state is for concurrency, not resilience. Option E is wrong because DLQ is for Lambda, not state machines.

206
MCQhard

A company uses AWS Database Migration Service (DMS) to migrate an on-premises Oracle database to Amazon RDS for PostgreSQL. The migration completes successfully, but the data engineer notices that some tables have fewer rows in the target than the source. Which DMS setting should be checked to ensure full data migration?

A.The LOB mode is set to 'Limited LOB mode' instead of 'Full LOB mode'.
B.The task logs show that some rows failed to apply due to data type conversion errors.
C.The 'Enable validation' option is turned off.
D.The 'Parallel Apply' feature is disabled, slowing down the migration.
AnswerB

Failed rows would be logged and can be reviewed.

Why this answer

Option B is correct because if some rows failed to apply due to data type conversion errors, those rows would be logged as errors and not written to the target, resulting in fewer rows. AWS DMS task logs capture these failures, and checking them is the direct way to identify rows that were skipped or rejected during migration. This is the most common cause of row count mismatches after a successful DMS task.

Exam trap

The trap here is that candidates often assume row count mismatches are always due to LOB settings or validation being off, but the most direct cause is data type conversion errors logged in the task logs, which DMS does not surface in the task status summary.

How to eliminate wrong answers

Option A is wrong because LOB mode settings (Limited vs. Full) affect how large objects are handled, not the total row count; even in Limited LOB mode, all rows are migrated, but LOB columns may be truncated if the LOB exceeds the max size. Option C is wrong because 'Enable validation' is a post-migration check that compares source and target data, but turning it off does not cause rows to be lost during migration; it only prevents validation reports from being generated.

Option D is wrong because 'Parallel Apply' affects the speed of applying changes to the target, not the completeness of data; disabling it may slow down the migration but does not cause rows to be omitted.

207
MCQeasy

A company uses Amazon Redshift for data warehousing. They notice that query performance has degraded over time. Which maintenance operation should be performed to improve performance?

A.Run the VACUUM command
B.Drop and recreate the table
C.Run the REINDEX command
D.Run the ANALYZE command
AnswerA

VACUUM re-sorts rows and reclaims disk space, improving query performance.

Why this answer

Option B is correct because VACUUM re-sorts rows and reclaims space, improving performance. Option A is wrong because ANALYZE updates statistics but does not physically reorganize data. Option C is wrong because REINDEX rebuilds indexes, but Redshift uses sort keys, not indexes.

Option D is wrong because DROP TABLE deletes data.

208
MCQhard

A company uses Amazon EMR to run Spark jobs on a transient cluster. The jobs are submitted via a step in the cluster. The cluster is configured to auto-terminate after the last step completes. However, the cluster is not terminating even though the step shows as 'COMPLETED'. What could be the cause?

A.The cluster's root device size is too large.
B.The step failed with an error, but the status shows 'COMPLETED' due to a reporting bug.
C.The cluster is configured as a long-running cluster.
D.The step's 'ActionOnFailure' parameter is set to 'CONTINUE' and 'KeepClusterAliveOnFailure' is true.
AnswerD

These settings prevent auto-termination.

Why this answer

Option B is correct because if KeepClusterAliveOnFailure is set to true for a step, the cluster will not terminate even if the step succeeds. Option A is wrong because 'COMPLETED' indicates success. Option C is wrong because the root volume size does not affect termination.

Option D is wrong because cluster is transient, not long-running.

209
Multi-Selecthard

A data engineer is designing a data pipeline that ingests JSON data from Amazon Kinesis Data Streams and processes it using AWS Lambda. The Lambda function writes the processed data to an Amazon S3 bucket. The engineer needs to ensure at-most-once processing semantics. Which TWO configurations should the engineer implement? (Choose two.)

Select 2 answers
A.Use S3 PutObject with a unique object key (e.g., include a UUID) and overwrite set to false.
B.Use DynamoDB for checkpointing to track processed records.
C.Set the Lambda function's batch size to 100 to process records in larger batches.
D.Enable function-level retries in the Lambda function for transient errors.
E.Set the Lambda function's batch size to 1 and the batch window to 0.
AnswersA, E

Unique keys prevent overwriting and duplicates.

Why this answer

Options B and D are correct. B: Setting the Lambda batch size to 1 and window to 0 ensures one record per invocation, so if the function fails, it will not retry the same record (since at-most-once means no retries). D: Setting the S3 bucket to use Replace or a unique key prevents overwriting.

Option A is wrong because increasing batch size increases chance of partial failures. Option C is wrong because enabling retries violates at-most-once. Option E is wrong because checkpointing is for at-least-once.

210
MCQhard

Refer to the exhibit. An IAM policy is attached to an IAM user. The user is trying to upload an object to 's3://data-lake-bucket/confidential/report.pdf' using the AWS CLI. The upload fails with an AccessDenied error. What is the reason for the failure?

A.The policy does not include 's3:PutObject' action.
B.The resource ARN in the Allow statement does not cover the specific object.
C.The user does not have permission to access the bucket at all.
D.An explicit Deny statement overrides the Allow statement for the 'confidential/' prefix.
AnswerD

Explicit Deny always takes precedence over Allow.

Why this answer

Option A is correct because an explicit Deny overrides any Allow. The Deny statement blocks all s3 actions on the confidential prefix, even though the Allow statement grants PutObject. Option B is wrong because the policy allows PutObject on the bucket.

Option C is wrong because the resource is specified correctly. Option D is wrong because the user has permissions on other parts of the bucket.

211
MCQhard

A data engineer is optimizing an Amazon Redshift cluster that runs a nightly ETL workload. The engineer notices that the query performance degrades over the week and improves after a VACUUM operation. Which action should the engineer take to automate this maintenance and minimize impact on performance?

A.Run VACUUM manually only when performance degrades significantly.
B.Disable auto vacuum and run a manual VACUUM every night after the ETL.
C.Schedule a VACUUM command using a query scheduler like Amazon EventBridge.
D.Drop and recreate the tables weekly to avoid unsorted data.
AnswerC

Automates the maintenance task.

Why this answer

Option B is correct because automating VACUUM using a scheduled query (e.g., via Amazon EventBridge or Redshift's built-in scheduler) ensures regular maintenance without manual intervention. Option A is wrong because disabling auto vacuum does not automate it. Option C is wrong because dropping and recreating tables is disruptive.

Option D is wrong because manual VACUUM is not automated.

212
MCQeasy

A data engineer is troubleshooting a failed AWS Glue ETL job that reads from an S3 bucket and writes to an Amazon Redshift table. The job logs show a permission error. Which IAM policy change would resolve the issue?

A.Enable encryption on the S3 bucket using AWS KMS
B.Add s3:GetObject permission to the Glue job's IAM role
C.Add redshift:DataAPI access to the Glue job's IAM role
D.Attach an IAM role with redshift:GetClusterCredentials to the Redshift cluster
AnswerC

Glue needs permission to write to Redshift via the Data API or JDBC.

Why this answer

Option B is correct because the Glue job needs permission to write to Redshift. Option A is wrong because the job already reads from S3. Option C is wrong because Redshift needs its own IAM role.

Option D is wrong because KMS is not mentioned.

213
MCQeasy

A company uses Amazon S3 to store log files from multiple applications. The logs are written in JSON format. A data engineer wants to use Amazon Athena to query these logs. The logs are stored in a bucket with the following structure: 's3://logs/app1/date=2021-01-01/'. The engineer creates an Athena table with partitions. However, when querying, Athena returns zero results for partitions that exist. The engineer has run MSCK REPAIR TABLE to add partitions. What is the most likely cause of the issue?

A.The MSCK REPAIR TABLE command failed silently.
B.The partition key name in the table definition does not match the S3 folder naming convention.
C.The log files are in JSON format and Athena does not support JSON.
D.The log files need to be copied to a different bucket in the same region.
AnswerB

The folder prefix must match the partition key name; otherwise, MSCK REPAIR cannot detect partitions.

Why this answer

Option C is correct because Athena relies on the partition metadata stored in the Glue Data Catalog. If the partition folder structure does not match the table's partition definition (e.g., the folder is named 'date=2021-01-01' but the table's partition key is named 'dt'), MSCK REPAIR will not register the partitions. The partition key name must match the folder prefix.

Option A is wrong because Athena supports JSON format. Option B is wrong because MSCK REPAIR does add partitions if the structure matches. Option D is wrong because the data is already in the S3 bucket; no need to copy.

214
MCQmedium

A data pipeline uses AWS Glue to process data from Amazon S3 and write results to Amazon Redshift. The pipeline fails intermittently with the error 'S3ServiceException: Access Denied'. The IAM role used by Glue has permissions to read from the S3 bucket. What is the most likely cause of this error?

A.The S3 bucket is in a different AWS Region than the Glue job
B.S3 Server Access Logging is enabled and blocking requests
C.The S3 bucket policy denies access to the Glue job's IAM role
D.The S3 bucket has S3 Transfer Acceleration enabled
AnswerC

A bucket policy can explicitly deny access, overriding IAM allow.

Why this answer

Option C is correct because S3 bucket policies can explicitly deny access even if the IAM role allows it. Option A is wrong because S3 Transfer Acceleration is not related to access denied errors. Option B is wrong because S3 is not region-specific for the error.

Option D is wrong because S3 Server Access Logging does not affect access permissions.

215
MCQeasy

A company runs a data pipeline on AWS Glue that processes streaming data from Amazon Kinesis Data Streams and writes results to an Amazon Redshift cluster. The pipeline has been running smoothly, but recently the Glue job started failing with 'ResourceNotFoundException' for the Redshift table. What should the data engineer check first?

A.Verify that the target Redshift table exists and was not dropped or renamed.
B.Ensure the Redshift table schema matches the Glue job output.
C.Check the IAM role permissions for the Glue job to access Redshift.
D.Review security group rules for the Redshift cluster.
AnswerA

ResourceNotFoundException indicates the table is missing.

Why this answer

Option A is correct because the error indicates the table does not exist or was deleted. Option B is wrong because IAM role issues would cause Access Denied, not ResourceNotFoundException. Option C is wrong because network issues would cause timeout or connection refused.

Option D is wrong because schema changes could cause type mismatch but not ResourceNotFoundException.

216
MCQeasy

A company stores sensitive data in Amazon S3 and needs to ensure that data is encrypted at rest. Which AWS service can be used to manage the encryption keys?

A.AWS Key Management Service (KMS)
B.AWS Secrets Manager
C.AWS Identity and Access Management (IAM)
D.AWS Certificate Manager (ACM)
AnswerA

KMS is the service for managing encryption keys.

Why this answer

AWS KMS is the managed service for creating and controlling encryption keys used to encrypt data. Option A is wrong because IAM manages access, not keys. Option B is wrong because CloudHSM is a hardware security module but is not the only option.

Option C is wrong because Secrets Manager is for secrets like passwords.

217
MCQmedium

A data engineer is troubleshooting a data pipeline that uses Amazon Kinesis Data Firehose to deliver data to Amazon S3. The engineer notices that the S3 bucket contains many small files (less than 1 MB). This is causing performance issues in downstream processing. What is the BEST way to reduce the number of small files?

A.Increase the buffer size to at least 128 MB in the Firehose delivery stream configuration.
B.Use an AWS Lambda function to transform the data before delivery.
C.Change the compression format from GZIP to Snappy.
D.Decrease the buffer interval in the Firehose delivery stream configuration.
AnswerA

Larger buffer size leads to fewer, larger files.

Why this answer

Option C is correct because increasing the buffer size (e.g., to 128 MB) causes Firehose to deliver fewer, larger files. Option A is incorrect because reducing the buffer interval would create more small files. Option B is incorrect because changing the compression algorithm does not affect file size directly; it reduces storage size but not the number of files.

Option D is incorrect because using a Lambda transformation does not inherently change buffering behavior.

218
MCQhard

Your company runs a data pipeline that ingests data from AWS Database Migration Service (DMS) into Amazon S3 in Parquet format. An AWS Glue ETL job then transforms the data and loads it into an Amazon Redshift cluster. The Glue job uses a JDBC connection to Redshift. Recently, the Glue job started failing with a 'communication failure' error when writing to Redshift. The Redshift cluster is in a VPC with public accessibility disabled. The Glue job runs in a VPC with a subnet that has a route to a NAT gateway. The security group for Redshift allows inbound traffic from the Glue job's security group. The Glue job's IAM role has the necessary permissions. What is the most likely cause?

A.The Glue job's IAM role does not have the redshift:DescribeClusters permission.
B.The Redshift cluster's public accessibility is disabled, but the Glue job is trying to connect over the internet.
C.The Glue job and Redshift cluster are in different VPCs that are not peered or connected via VPC Transit Gateway.
D.The NAT gateway is not configured to allow traffic to the Redshift cluster's subnet.
AnswerC

Without VPC peering or transit gateway, the Glue job cannot reach the Redshift cluster.

Why this answer

Option C is correct because even though the security group allows inbound traffic, the Glue job's VPC may not have a route to the Redshift cluster's VPC if they are in different VPCs. Option A is wrong because IAM permissions are not the issue. Option B is wrong because the Redshift cluster is in a VPC and not publicly accessible.

Option D is wrong because the NAT gateway is for outbound internet, not for connecting to Redshift within the same VPC.

219
MCQmedium

A data engineering team notices that an AWS Glue ETL job, which processes hourly data from an S3 bucket, is taking progressively longer to run. The job reads Parquet files partitioned by date and hour. Which action is MOST likely to improve the job's performance?

A.Enable pushdown predicate filtering on the job's data source.
B.Convert Parquet files to CSV to improve read performance.
C.Increase the number of DPUs for the job.
D.Switch from Spark to Python shell for simpler processing.
AnswerA

Pushdown predicates filter data at the source, reducing data scanned.

Why this answer

Option C is correct because enabling pushdown predicate filtering in Spark reduces the amount of data read by pruning partitions early. Option A is wrong because Glue jobs already use distributed processing. Option B is wrong because increasing the number of DPUs can help but is not the most direct fix for the described symptom.

Option D is wrong because converting to CSV would increase I/O and processing time.

220
MCQmedium

A data engineer needs to set up a cross-account access for an S3 bucket so that users in Account B can read objects. The bucket in Account A has a bucket policy that grants access. What additional step is required?

A.Enable S3 object ACLs on the bucket.
B.Create an IAM role in Account B and attach a policy that allows s3:GetObject for the bucket.
C.Disable S3 Block Public Access settings on the bucket.
D.Set up an S3 Lifecycle policy to replicate objects to Account B.
AnswerB

Users in Account B need an IAM role or user with explicit permissions to access the bucket.

Why this answer

Option C is correct because cross-account access requires both a bucket policy in the source account and an IAM user/role in the target account with permissions. Option A (disable block public access) is not needed if the bucket policy is not public. Option B (ACLs) are legacy and not recommended.

Option D (lifecycle policy) is unrelated.

221
MCQmedium

A company uses Amazon S3 to store raw data files. An AWS Glue crawler creates metadata in the Data Catalog. The data engineer discovers that the crawler is not detecting new partitions after new data is added to the S3 bucket. What is the MOST likely cause?

A.The IAM role used by the crawler does not have kms:Decrypt permission for the KMS key that encrypts the new partitions.
B.The crawler configuration has 'Crawl all folders' disabled.
C.The S3 bucket has too many objects, exceeding the crawler's limit.
D.The crawler does not have S3 event notifications enabled.
AnswerA

Without decrypt permission, the crawler cannot read the data.

Why this answer

Option B is correct because if the S3 bucket uses an SSE-KMS encrypted prefix, the crawler may not have permission to decrypt; the KMS key policy or IAM role must allow kms:Decrypt. Option A is wrong because S3 events are not required for crawling. Option C is wrong because the crawler can handle many objects.

Option D is wrong because the crawler configuration may need to be set to detect partitions, but the most likely cause given encryption is permission.

222
Multi-Selectmedium

A data engineer is monitoring an Amazon Kinesis Data Analytics for Apache Flink application that processes streaming data. The application is falling behind (increasing 'MillisBehindLatest') and the CPU utilization of the Flink task managers is consistently above 80%. Which THREE actions should the engineer take to improve performance? (Choose THREE.)

Select 3 answers
A.Increase the number of shards in the Kinesis data stream.
B.Decrease the checkpoint interval to reduce state size.
C.Enable auto-scaling for the Flink application.
D.Decrease the number of task managers to reduce CPU contention.
E.Increase the Flink application's parallelism.
AnswersA, C, E

More shards allow higher ingestion rate.

Why this answer

Increasing the number of shards in the Kinesis data stream (Option A) directly increases the ingestion capacity and parallelism source for the Flink application. With more shards, the application can read data from more partitions concurrently, reducing the backlog indicated by 'MillisBehindLatest'. This is a fundamental scaling action for Kinesis-based Flink applications.

Exam trap

The trap here is that candidates often confuse decreasing checkpoint intervals with improving performance, not realizing that more frequent checkpoints increase CPU and I/O overhead, making the lag worse.

223
Multi-Selecteasy

A data engineer is setting up a new Amazon Redshift cluster for a data warehouse. The engineer wants to ensure data durability and high availability. Which THREE features should the engineer consider? (Choose three.)

Select 3 answers
A.S3 Cross-Region Replication for Redshift data.
B.Cross-Region snapshot copy.
C.Multi-node cluster with data replication.
D.Multi-AZ deployment for automatic failover.
E.Automated snapshots to Amazon S3.
AnswersB, C, E

Cross-Region copies protect against region failures.

Why this answer

Options A, C, and E are correct. A: Automated snapshots provide point-in-time recovery. C: Cross-Region snapshot copy protects against regional failures.

E: Multi-node clusters provide data replication within the cluster. Option B is wrong because Multi-AZ is not a feature of Redshift; it uses automatic failover within a single AZ. Option D is wrong because S3 Cross-Region Replication is for S3, not Redshift.

224
Multi-Selecteasy

A data engineer is designing a data pipeline that processes streaming data. The pipeline must be able to handle duplicate records and ensure exactly-once processing semantics. Which THREE AWS services or features should the engineer consider? (Choose three.)

Select 3 answers
A.Amazon EMR with Apache Flink for exactly-once semantics.
B.Amazon Kinesis Data Firehose with automatic retries.
C.Amazon Kinesis Data Streams with sequence numbers for deduplication.
D.Amazon DynamoDB Streams for change data capture.
E.Amazon Kinesis Data Analytics for Apache Flink with idempotent sinks.
AnswersA, C, E

Flink on EMR provides exactly-once processing via checkpointing.

Why this answer

Option A is correct because Kinesis Data Streams provides sequence numbers for records, enabling deduplication. Option C is correct because Kinesis Data Analytics uses the concept of 'idempotent' output to achieve exactly-once processing. Option D is correct because Apache Flink on Amazon EMR provides exactly-once processing through checkpointing.

Option B is incorrect because Kinesis Firehose delivers at-least-once, not exactly-once. Option E is incorrect because DynamoDB Streams is at-least-once.

225
MCQmedium

A data engineer is troubleshooting an AWS Glue ETL job that fails with a 'java.lang.OutOfMemoryError: Java heap space' error. The job processes a 50 GB Parquet file from an S3 bucket. The job uses a G.1X DPU (16 GB memory) and default parameters. Which action should the engineer take to resolve the issue?

A.Change the worker type to G.2X (32 GB memory).
B.Increase the number of workers from 2 to 4.
C.Increase the 'batch size' parameter in the DynamicFrame reader.
D.Convert the input data from Parquet to JSON format.
AnswerA

Doubling the memory per worker resolves the heap space error without changing the number of workers.

Why this answer

Option B is correct because increasing the number of DPUs (e.g., to G.2X with 32 GB) provides more memory per worker, directly addressing the heap space error. Option A is wrong because increasing worker count without increasing worker type may not help if each worker still has insufficient memory. Option C is wrong because increasing batch size could increase memory pressure.

Option D is wrong because converting to JSON typically increases file size and memory usage.

← PreviousPage 3 of 6 · 387 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Data Operations Support questions.