How many Hard Difficulty Questions questions are on this page?

This page has 20 Hard Difficulty Questions scenario questions for the DEA-C01 exam, each with detailed explanations and wrong-answer analysis.

How should I approach DEA-C01 scenario questions?

Read the full scenario before looking at the answer options. Identify the constraint or requirement in the scenario, then eliminate options that are generally true but wrong for this specific case. Scenario questions reward careful reading over pattern matching.

← Back to AWS Certified Data Engineer Associate DEA-C01 questions

Scenario-based practice

Hard Difficulty Questions

Practise AWS Certified Data Engineer Associate DEA-C01 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

Start full practice test Read exam guide

scenario questions

DEA-C01

exam code

Amazon Web Services

vendor

Scenario guide

How to approach hard difficulty questions

These are the questions most candidates get wrong. They require connecting multiple concepts, reading tricky output, or knowing edge-case behaviour that isn't on most study cards. Practising them trains you to operate under uncertainty — a necessary skill on the real exam.

Quick answer

Hard Difficulty Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Practice scenarios

Question 1hardmultiple choice

Full question →

An e-commerce company uses AWS Glue to run ETL jobs that transform clickstream data from Amazon S3. The job reads Parquet files, performs aggregations, and writes the results to Amazon Redshift. The job runs successfully but takes longer than expected. The data volume is increasing. Which design change would MOST improve the job's performance?

A
Write the aggregated results to a single large file instead of multiple partitions.
Why wrong: Single file reduces parallelism and increases shuffle overhead.
B
Convert the Parquet files to CSV to simplify the schema.
Why wrong: CSV is less efficient than Parquet for columnar storage and compression.
C
Replace the Redshift target with Amazon Redshift Spectrum.
Why wrong: Spectrum is for querying S3, not for loading transformed data into Redshift.
D
Increase the number of Glue worker nodes (DPUs) for the job.
More workers parallelize tasks and reduce runtime.

Hard Difficulty Questions

How to approach hard difficulty questions

Quick answer

Related DEA-C01 topic practice pages

Data Ingestion and Transformation practice questions

Data Operations and Support practice questions

Data Security and Governance practice questions

Data Store Management practice questions

DEA-C01 fundamentals practice questions

DEA-C01 scenario practice questions

DEA-C01 troubleshooting practice questions

Practice scenarios

A company uses Amazon DynamoDB with on-demand capacity. They notice higher than expected costs due to a sudden spike in read traffic from a reporting job. The reporting job scans the entire table daily. What is the most cost-effective way to reduce costs while maintaining the same reporting output?

A data engineer attaches the above IAM policy to an IAM user. The user tries to download an object from my-bucket using the AWS CLI without specifying SSE headers. The object is stored with SSE-S3. Will the download succeed?

Exhibit

A company uses Amazon RDS for MySQL as a source for AWS DMS to replicate data to S3. The replication task is failing with 'OutOfMemory' errors on the DMS instance. The source table has 10 million rows with large BLOB columns. Which THREE changes would most likely resolve the issue?

Exhibit

A company is migrating a legacy data warehouse to Amazon Redshift. They need to choose a distribution style to minimize data movement during joins. Which THREE factors should they consider?

A data engineer is designing a data lake on Amazon S3. The data must be immutable and support high-throughput streaming ingestion. Which THREE features should the engineer consider? (Select THREE.)

A data engineer is troubleshooting an AWS Glue job that reads from an Amazon RDS for PostgreSQL database using a JDBC connection. The job fails with the error 'java.sql.SQLException: No suitable driver'. Which TWO actions should the engineer take to resolve this issue? (Select TWO.)

A data engineer is troubleshooting an AWS Lake Formation permissions issue. A user is able to query an Amazon Athena table but cannot see the underlying S3 data in the AWS Glue Data Catalog. The user has been granted SELECT permission on the table in Lake Formation. What is the most likely cause?

A data engineering team uses Amazon Redshift for analytics. They notice that queries on a large fact table are slow. The table is distributed using DISTSTYLE ALL. Which design change would most likely improve query performance?