DEA-C01 · topic practice

Data Ingestion and Transformation practice questions

Practise AWS Certified Data Engineer Associate DEA-C01 Data Ingestion and Transformation practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Data Ingestion and Transformation

What the exam tests

What to know about Data Ingestion and Transformation

Data Ingestion and Transformation questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Data Ingestion and Transformation exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Data Ingestion and Transformation questions

20 questions · select your answer, then reveal the explanation

A data engineer needs to ingest streaming data from an IoT fleet into Amazon S3 for near-real-time analytics. The data volume is approximately 5 GB per hour, and each event is less than 1 KB. Which AWS service should be used as the ingestion endpoint?

A company uses AWS Glue ETL jobs to transform data from Amazon S3 to Amazon Redshift. The job reads JSON files, applies schema mapping, and writes to a Redshift table. Recently, the job started failing with memory errors. The data volume has increased tenfold. Which approach should a data engineer take to resolve this issue with minimal code changes?

A financial services company processes real-time stock trade data. They use Amazon Kinesis Data Streams with a shard count of 5, each shard receiving about 500 records per second. The consumer application uses the Kinesis Client Library (KCL) with DynamoDB for checkpointing. Lately, some records are being processed multiple times. What is the most likely cause?

A data engineering team needs to transform CSV files stored in Amazon S3 into Parquet format using AWS Glue. The files are partitioned by date and are updated hourly. Which AWS Glue feature should be used to automatically detect the schema and partition structure?

An e-commerce company ingests clickstream data from their website into Amazon S3. The data is in JSON format, and each file is about 10 MB. They need to transform the data into a columnar format for analytics and load it into Amazon Redshift nightly. The transformation should be cost-effective and require minimal operational overhead. Which approach meets these requirements?

A company uses AWS Database Migration Service (DMS) to continuously replicate data from an on-premises Oracle database to Amazon S3 in Parquet format. The replication is used for near-real-time analytics. Recently, the DMS task started failing with an error indicating insufficient memory. The source database is large (2 TB). What should a data engineer do to resolve this issue while minimizing changes to the existing architecture?

A data engineer needs to ingest data from multiple SaaS applications (Salesforce, Marketo) into Amazon S3 for a data lake. The data volumes are moderate and the sync needs to be scheduled daily. Which AWS service is most appropriate for this task?

A company uses AWS Lambda to process records from an Amazon Kinesis Data Stream. Each record is about 50 KB. The Lambda function transforms the data and writes to Amazon DynamoDB. Recently, the Lambda function has been experiencing throttling and high error rates. The Kinesis stream has 10 shards. What is the most cost-effective solution to improve processing throughput?

A data engineer is designing a data ingestion pipeline for a social media analytics platform. The pipeline must ingest tweets in real-time, perform sentiment analysis, and store results in Amazon S3. The sentiment analysis is compute-intensive and must be done as the data arrives. The estimated throughput is 10,000 tweets per second. Which architecture is most suitable?

A data engineer needs to transfer 50 TB of historical data from an on-premises HDFS cluster to Amazon S3. The network bandwidth is limited to 100 Mbps. The transfer must be completed within one week. Which service should be used?

A company uses Amazon Kinesis Data Firehose to deliver streaming data to Amazon S3. The data is in JSON format. The delivery stream is configured with a buffer size of 5 MB and a buffer interval of 60 seconds. However, the data engineer notices that S3 objects are being created with sizes much smaller than 5 MB. What is a likely cause?

A data engineering team is ingesting streaming data from IoT devices into Amazon Kinesis Data Streams. The data is then consumed by an AWS Lambda function that transforms each record and writes it to Amazon S3. Recently, the Lambda function started failing with 'ProvisionedThroughputExceededException' errors when writing to S3. The team has already increased the Lambda function's memory and timeout. Which action should the team take to resolve the issue?

An e-commerce company uses AWS Glue to run ETL jobs that transform clickstream data from Amazon S3. The job reads Parquet files, performs aggregations, and writes the results to Amazon Redshift. The job runs successfully but takes longer than expected. The data volume is increasing. Which design change would MOST improve the job's performance?

A data engineer needs to ingest JSON data from an on-premises relational database into Amazon S3 every hour. Which AWS service should be used to set up a scheduled, incremental data transfer?

A company is using AWS Glue to process data from Amazon S3. The Glue job reads CSV files and writes Parquet files to a different S3 bucket. The job occasionally fails with 'java.lang.OutOfMemoryError: Java heap space'. The data size varies. Which change should the engineer make to avoid this error?

A data engineering team uses Amazon Kinesis Data Analytics for Apache Flink to process streaming data. They notice that the application's checkpointing is failing intermittently, causing data reprocessing. The application uses a large state. Which configuration change should the team make to improve checkpoint reliability?

A data engineer is designing a serverless data ingestion pipeline that uses Amazon Kinesis Data Firehose to deliver data to Amazon S3. The data must be transformed using AWS Lambda before being written to S3. Which two steps are required to enable this transformation? (Select TWO.)

A company uses AWS Glue to perform ETL on data stored in Amazon S3. The Glue job reads CSV files, converts them to Parquet, and partitions by date. The job runs daily and processes about 500 GB of data. The team wants to optimize costs and performance. Which three actions should the team take? (Select THREE.)

A company uses AWS Glue to process streaming data from Amazon Kinesis Data Streams. The job reads JSON records and writes Parquet to Amazon S3. Recently, the job started failing with 'Out of Memory' errors. Which change is MOST likely to resolve the issue?

Question 20hardmultiple choice
Read the full NAT/PAT explanation →

A data engineer is designing a data ingestion pipeline for IoT sensor data. The data arrives as JSON via AWS IoT Core, and must be stored in Amazon S3 in partitioned Parquet format. The pipeline must handle late-arriving data (up to 1 hour) and ensure exactly-once processing. Which combination of services should the engineer use?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Data Ingestion and Transformation sessions

Start a Data Ingestion and Transformation only practice session

Every question in these sessions is drawn from the Data Ingestion and Transformation domain — nothing else.

Related practice questions

Related DEA-C01 topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the DEA-C01 exam test about Data Ingestion and Transformation?
Data Ingestion and Transformation questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Data Ingestion and Transformation questions in a focused session?
Yes — the session launcher on this page draws every question from the Data Ingestion and Transformation domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other DEA-C01 topics?
Use the topic links above to move to related areas, or go back to the DEA-C01 question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the DEA-C01 exam covers. They are not copied from any real exam or dump site.