MLS-C01 Machine Learning Implementation and Operations • 40 Questions
40 MLS-C01 Machine Learning Implementation and Operations practice questions with answers and explanations. Free, no signup.
A company is using Amazon SageMaker to train a deep learning model. The training job is failing with an error 'CUDA out of memory'. The training instance is an ml.p3.2xlarge with 16 GB GPU memory. The model architecture and batch size are appropriate for this instance size. What is the most likely cause of this error?