You are responsible for monitoring a batch prediction pipeline that runs daily. Recently, the pipeline started failing intermittently with out-of-memory errors. The input data volume has not changed. What is the most likely cause?
This could cause OOM for large datasets.
Why this answer
Option A is correct because a code change that loads the entire dataset into memory before processing would directly cause out-of-memory (OOM) errors, even if the input data volume remains unchanged. In batch prediction pipelines, data is typically streamed or processed in chunks to manage memory efficiently. A change that bypasses this pattern and loads all data at once can exceed the available heap or container memory, leading to intermittent failures depending on data characteristics or concurrent loads.
Exam trap
The trap here is that candidates may assume OOM errors are always caused by increased data volume or resource scaling issues, but the question explicitly states data volume is unchanged, forcing you to consider code-level changes that alter memory access patterns.
How to eliminate wrong answers
Option B is wrong because an increase in model size due to retraining would affect memory usage during model loading or inference, but it would not cause intermittent OOM errors if the input data volume is unchanged; model size changes are typically gradual and would cause consistent failures, not intermittent ones. Option C is wrong because a decrease in the number of worker machines would reduce total available memory, but the question states the input data volume has not changed, so this would cause consistent OOM errors on every run, not intermittent ones. Option D is wrong because the question explicitly states that input data volume has not changed, so an increase in data size cannot be the cause.