← MLS-C01·Amazon Web Services

Question 354 of 1,755

Data Engineering →hardMultiple ChoiceObjective-mapped

Quick Answer

The correct partitioning strategy is to partition by year, then month, then day, then hour. This hierarchical structure optimizes Amazon Athena query performance by allowing partition pruning, where Athena scans only the relevant S3 prefixes based on time-range filters, drastically reducing data scanned and cost. For the AWS Certified Machine Learning Specialty MLS-C01 exam, this question tests your understanding of how data lake partitioning directly impacts query efficiency for both batch and real-time analytics—a common scenario when building ML pipelines on S3. The trap is choosing source-type partitioning, which creates many small files and degrades performance, or a single partition, which defeats the purpose entirely. Remember the memory tip: "Think of a calendar—year is the big folder, then month, then day, then hour—always drill down from largest to smallest time grain."

MLS-C01 Data Engineering Practice Question

This MLS-C01 practice question tests your understanding of data engineering. Compare every option against the stated constraints before choosing — the best answer satisfies all requirements, not just the most obvious one. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A data engineer is designing a data lake on Amazon S3. The data comes from various sources and must be stored in a way that supports both batch and real-time analytics. The engineer needs to partition the data to optimize query performance in Amazon Athena. Which partitioning strategy is MOST appropriate?

Question 1hardmultiple choice

Full question →

A
Partition by a hash of the record ID to distribute data evenly
Why wrong: Hash partitioning does not help with time-based queries.
B
Do not partition; use a single prefix for all data
Why wrong: Without partitioning, Athena scans the entire dataset, causing poor performance.
C
Partition by year, then month, then day, then hour
Hierarchical time partitioning is standard for time-series data and works well with Athena.
D
Partition by source type and then by date
Why wrong: Source type may lead to uneven data distribution and too many partitions.

Full breakdown with real-world context →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

✓

Partition by year, then month, then day, then hour

Option C is correct. Partitioning by year/month/day/hour allows efficient querying for both batch (daily) and real-time (hourly) use cases, and is a common practice. Option A (source type) may cause small files. Option B (random) is not helpful. Option D (single partition) defeats the purpose.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

✗
Partition by a hash of the record ID to distribute data evenly
Why it's wrong here
Hash partitioning does not help with time-based queries.
✗
Do not partition; use a single prefix for all data
Why it's wrong here
Without partitioning, Athena scans the entire dataset, causing poor performance.
✓
Partition by year, then month, then day, then hour
Why this is correct
Hierarchical time partitioning is standard for time-series data and works well with Athena.
Related concept
Read the scenario before looking for a memorised answer.
✗
Partition by source type and then by date
Why it's wrong here
Source type may lead to uneven data distribution and too many partitions.

Common exam traps

Common exam trap: answer the scenario, not the keyword

Many certification questions include familiar terms but test a specific constraint. Read the exact wording before choosing an answer that is generally true but wrong for this case.

Detailed technical explanation

How to think about this question

This question should be treated as a scenario, not a definition check. Identify the problem, the constraint and the best action. Then compare each option against those facts.

KKey Concepts to Remember

Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.
Use explanations to understand the rule behind the answer.

TExam Day Tips

Underline the problem statement mentally.
Watch for words such as best, first, most likely and least administrative effort.
Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A media company stores terabytes of video archives that are accessed once a year for audit purposes. Moving these objects to a cold storage tier (Azure Archive, S3 Glacier, or Google Nearline) costs a fraction of hot storage. Questions like this test whether you understand storage tiers, access frequency tradeoffs, and retrieval latency requirements.

What to study next

Got this wrong? Here's your next step.

Identify which MLS-C01 exam domain this question belongs to, then review the specific concept being tested. Practise related questions in that domain and focus on understanding why each wrong answer is tempting — not just why the correct answer is right.

Related MLS-C01 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Data Engineering practice questions

Practise MLS-C01 questions linked to Data Engineering.

Machine Learning Implementation and Operations practice questions

Practise MLS-C01 questions linked to Machine Learning Implementation and Operations.

Modeling practice questions

Practise MLS-C01 questions linked to Modeling.

Exploratory Data Analysis practice questions

Practise MLS-C01 questions linked to Exploratory Data Analysis.

MLS-C01 fundamentals practice questions

Practise MLS-C01 questions linked to MLS-C01 fundamentals.

MLS-C01 scenario practice questions

Practise MLS-C01 questions linked to MLS-C01 scenario.

MLS-C01 troubleshooting practice questions

Practise MLS-C01 questions linked to MLS-C01 troubleshooting.

Practice this exam

Start a free MLS-C01 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

10 questions 20 questions 30 questions 50 questions Timed 30

MLS-C01 practice-test guide →Study guide →Browse all practice tests

FAQ

Questions learners often ask

What does this MLS-C01 question test?

Data Engineering — This question tests Data Engineering — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Partition by year, then month, then day, then hour — Option C is correct. Partitioning by year/month/day/hour allows efficient querying for both batch (daily) and real-time (hourly) use cases, and is a common practice. Option A (source type) may cause small files. Option B (random) is not helpful. Option D (single partition) defeats the purpose.

What should I do if I get this MLS-C01 question wrong?

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Same concept, more angles

1 more ways this is tested on MLS-C01

These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.

Variation 1. A data engineer is designing a data lake on Amazon S3. The data comes from various sources, including IoT devices, web logs, and transactional databases. The engineer needs to organize the data in a way that supports efficient querying using Amazon Athena and allows for easy management of access permissions. Which S3 bucket structure is the most appropriate?

easy

A.Store all data in a single prefix without any partitioning.
✓ B.Use a prefix structure like s3://bucket/source/year/month/day/.
C.Store all data in separate S3 buckets for each source and date.
D.Use a prefix structure like s3://bucket/date/source/.

Why B: Option B is correct because partitioning by source, year, month, day allows Athena to prune partitions, reducing scan costs and improving performance. Option A is wrong because storing all data in a flat structure forces full scans. Option C is wrong because prefix-based access controls can be applied at the source level within the partitioned structure. Option D is wrong because using date as the first partition level is less intuitive for managing permissions by source.

Keep practising

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

This MLS-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLS-C01 exam.