A data scientist is working with a dataset that contains both numerical and categorical features. Which algorithm is commonly used for regression tasks in AWS SageMaker?
Linear Learner supports regression and classification on numerical and categorical features.
Why this answer
Linear Learner is the correct choice because it is a supervised learning algorithm in AWS SageMaker specifically designed for both regression and classification tasks. It can handle datasets with mixed numerical and categorical features (after appropriate encoding) and provides built-in mechanisms for training linear models, including automatic model tuning and distributed training.
Exam trap
The trap here is that candidates may confuse unsupervised clustering algorithms (like K-Means) with supervised regression algorithms, or mistakenly think that NLP-focused algorithms (like BlazingText) are appropriate for general regression tasks with mixed data types.
How to eliminate wrong answers
Option A is wrong because K-Means is an unsupervised clustering algorithm, not a regression algorithm, and it cannot predict continuous target values. Option C is wrong because BlazingText is optimized for natural language processing tasks such as word embeddings and text classification, not for general regression on mixed numerical/categorical datasets. Option D is wrong because it is a duplicate of the correct answer (B) and does not represent a distinct algorithm; the question lists two identical options, but only one is correct.