A data scientist is training a gradient boosting model on a large dataset (100 GB) stored in Amazon S3. The training job uses a SageMaker built-in XGBoost algorithm with a single ml.p3.2xlarge instance. The job fails with a memory error. Which solution should the data scientist adopt to resolve the memory issue?
Distributed training splits data across instances, reducing memory per instance.
Why this answer
Option C is correct because increasing the number of instances allows distributed training, reducing per-instance memory pressure. Option A may not fully load the dataset. Option B uses a different algorithm with different memory characteristics, but XGBoost can handle large data with distributed training.
Option D reduces features but may degrade model quality. Option E uses a different algorithm but not necessarily more memory-efficient.