A company uses Amazon SageMaker to build a text classification model using a pre-trained BERT model. The dataset contains 10,000 labeled documents. The model is overfitting: training accuracy is 99%, validation accuracy is 85%. Which TWO of the following are most likely to help reduce overfitting? (Choose TWO.)
Dropout is a regularization technique that randomly drops units, reducing overfitting.
Why this answer
Increasing dropout during fine-tuning adds regularization. Decreasing the learning rate can help the model converge to a better solution and prevent overfitting to the training set. Increasing batch size can sometimes regularize but is not as effective as dropout.
Adding more layers increases model capacity and overfitting. Using a larger pre-trained model also increases capacity.