MLA-C01

Full exam simulation

2:10:00
1

Data Preparation for Machine Learning

hard

A data scientist is preparing text data for natural language processing (NLP). The corpus contains many rare words and typos. To reduce dimensionality and improve generalization, they decide to apply stemming and remove stop words. However, after training, the model performs poorly on domain-specific terms. What is the most likely cause?

0 of 90 answered