DEA-C01 · topic practice

Data Operations and Support practice questions

Practise AWS Certified Data Engineer Associate DEA-C01 Data Operations and Support practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Data Operations and Support

What the exam tests

What to know about Data Operations and Support

Data Operations and Support questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Data Operations and Support exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Data Operations and Support questions

20 questions · select your answer, then reveal the explanation

A data engineer notices that an AWS Glue job processing data from an Amazon S3 bucket frequently fails with 'OutOfMemoryError'. The job reads CSV files, applies transformations, and writes Parquet to another S3 bucket. The job has 10 workers of type G.1X. Which change is MOST likely to resolve the issue?

A company uses Amazon Kinesis Data Streams to ingest clickstream data. The data is consumed by a custom consumer application that writes to Amazon S3 every 5 minutes. The consumer is falling behind and processing lag is increasing. Which action is MOST effective to reduce the lag?

A data team runs a daily AWS Glue ETL job that processes data from an Amazon Redshift cluster and writes results to Amazon S3. The job completes successfully but takes 2 hours longer than expected. The job uses the JDBC connection to Redshift. The Redshift cluster is 4 dc2.large nodes. The Glue job has 10 workers of type G.1X. Which change would MOST likely reduce the job duration?

A company uses Amazon DynamoDB as a source for an AWS Glue job. The job reads a large table using a DynamoDB export to S3 feature. The job is failing with 'ThrottlingException' from DynamoDB. What should the data engineer do to resolve this issue WITHOUT changing the job's logic?

A data engineer is monitoring an Amazon Kinesis Data Analytics application that uses a SQL query to aggregate streaming data. The application is falling behind and the millisBehindLatest metric is increasing. Which action should the engineer take to improve performance?

A data engineer is troubleshooting an AWS Glue job that reads from an Amazon RDS for PostgreSQL database using a JDBC connection. The job fails with the error 'java.sql.SQLException: No suitable driver'. Which TWO actions should the engineer take to resolve this issue? (Select TWO.)

A company uses Amazon S3 to store raw data and runs AWS Glue ETL jobs to transform it into Parquet. The data is then queried using Amazon Athena. Queries are slow and expensive due to high scan volumes. Which THREE design changes can improve query performance and reduce costs? (Select THREE.)

A data engineer runs a Spark job on Amazon EMR that reads data from Amazon S3 and writes results back to S3. The job fails with an 'S3AccessDenied' error. The engineer verifies that the IAM role attached to the EMR cluster has s3:GetObject and s3:PutObject permissions on the relevant buckets. What is the MOST likely cause of the error?

An AWS Glue job that processes streaming data from Amazon Kinesis Data Streams is failing intermittently with 'Failed to checkpoint' errors. The job uses checkpointing to an Amazon S3 bucket every 60 seconds. Which action should the engineer take to resolve the issue?

A company uses AWS DMS to migrate data from an on-premises Oracle database to Amazon Redshift. The migration is successful, but after a few days, data in Redshift becomes inconsistent with the source due to ongoing changes. The company needs to keep Redshift synchronized with minimal latency. Which approach should the data engineer use?

A data engineer notices that an Amazon Kinesis Data Firehose delivery stream is failing to deliver data to an Amazon S3 bucket. The CloudWatch metrics show 'DeliveryToS3.Success' is 0 and 'S3.BucketExists' is 1. What is the MOST likely cause?

A company runs a batch ETL job on Amazon EMR every night. Recently, the job started failing with 'Out of Memory' errors in the Spark executors. The data volume has grown 20% in the past month. The cluster uses uniform instance groups with 5 core nodes of r5.xlarge (4 vCPU, 32 GB RAM). Which change should the data engineer implement to resolve the issue with minimal cost increase?

A data engineer is troubleshooting an AWS Glue ETL job that fails with the error: 'An error occurred while calling o123.pyWriteDynamicFrame. Access Denied when writing to S3 bucket: my-bucket'. The job uses a Glue service role named 'GlueServiceRole'. Which TWO actions should the engineer take to resolve the issue? (Choose TWO.)

A data engineer is monitoring an Amazon Kinesis Data Analytics for Apache Flink application that processes streaming data. The application is falling behind (increasing 'MillisBehindLatest') and the CPU utilization of the Flink task managers is consistently above 80%. Which THREE actions should the engineer take to improve performance? (Choose THREE.)

A data engineer is troubleshooting a nightly AWS Glue ETL job that reads from an Amazon RDS for MySQL table and writes to an Amazon S3 bucket in Parquet format. The job runs successfully most days, but occasionally fails with the error 'ERROR: An error occurred while calling o67.pyWriteDynamicFrame. The transaction log for the database is full due to 'LOG_BACKUP'.' What is the MOST likely cause of this error?

Question 16mediummulti select
Read the full NAT/PAT explanation →

A company runs a data pipeline that ingests clickstream data from a web application into Amazon Kinesis Data Streams. A Lambda function processes records from the stream and writes them to an Amazon S3 bucket in JSON format. The pipeline has been running smoothly, but for the past hour, the Lambda function has been failing with 'Rate exceeded' errors, and the Kinesis stream shows elevated 'IteratorAgeMilliseconds' metrics. The Lambda function has a reserved concurrency of 100, and the Kinesis stream has 10 shards. The average record size is 5 KB, and the data rate is approximately 15 MB per second. Which combination of actions should a data engineer take to resolve the issue and prevent recurrence? (Choose TWO.)

A company uses AWS Glue to run ETL jobs that process data from Amazon S3 and write results to Amazon Redshift. The Glue job uses the JDBC connection to Redshift. Recently, the job has been failing intermittently with the error: 'java.sql.SQLException: [Amazon](500310) Invalid operation: INSERT has more expressions than target columns;' The Glue job writes to a staging table in Redshift before performing a merge into the final table. The staging table schema matches the source data. The error occurs only on some days and affects different columns each time. The data engineer suspects that the source data occasionally contains extra columns due to a schema drift in the upstream data producer. Which approach should the data engineer take to handle this issue robustly?

A data engineer is troubleshooting a Glue ETL job that reads from an S3 bucket and writes to a Redshift table. The job fails with a 'MemoryError' when processing a large dataset. Which TWO actions should the engineer take to resolve this issue? (Choose TWO.)

A data engineer applies the above S3 bucket policy to an S3 bucket used by a Glue ETL job. The Glue job writes objects to the bucket. Which of the following is true about the behavior of the policy?

Exhibit

Refer to the exhibit.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::data-lake-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    },
    {
      "Effect": "Deny",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::data-lake-bucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "aws:kms"
        }
      }
    }
  ]
}
```

A company runs a nightly batch processing pipeline using AWS Glue ETL jobs. The pipeline reads data from an Amazon S3 bucket, transforms it, and writes results to an Amazon Redshift cluster. Recently, the data volume has increased significantly, and some Glue jobs are failing with the error 'java.lang.OutOfMemoryError: Java heap space'. The data engineer needs to modify the job configuration to prevent these failures without changing the code. The job currently uses 10 DPUs and processes data in a single Spark DataFrame. Which of the following is the MOST effective solution?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Data Operations and Support sessions

Start a Data Operations and Support only practice session

Every question in these sessions is drawn from the Data Operations and Support domain — nothing else.

Related practice questions

Related DEA-C01 topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the DEA-C01 exam test about Data Operations and Support?
Data Operations and Support questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Data Operations and Support questions in a focused session?
Yes — the session launcher on this page draws every question from the Data Operations and Support domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other DEA-C01 topics?
Use the topic links above to move to related areas, or go back to the DEA-C01 question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the DEA-C01 exam covers. They are not copied from any real exam or dump site.