A data engineer needs to split time-series data for training a forecasting model. The data is sorted by timestamp. The engineer wants to avoid leakage where future data influences training. Which data splitting approach should they use?
This preserves temporal order and avoids leakage.
Why this answer
For time-series, the only safe split is to use an earlier contiguous block for training and a later block for testing, preserving temporal order. Random splits would cause leakage. K-fold cross-validation on time-series requires special techniques like forward chaining, not standard k-fold.
Stratified split is for classification.