A media company wants to automatically organize a large collection of news articles into several topic-based categories (e.g., politics, sports, technology) without using any predefined labels. They plan to use Azure Machine Learning. Which type of machine learning task should they use?
Clustering is an unsupervised learning method that automatically groups similar data points together. Without labels, it can discover topic-based clusters in the news articles based on content similarity.
Why this answer
Clustering is the correct choice because the media company wants to group unlabeled news articles into topic-based categories based on inherent similarities in the data, without using predefined labels. Azure Machine Learning provides clustering algorithms like K-Means that automatically partition the dataset into distinct clusters, making it ideal for unsupervised learning tasks where the goal is to discover natural groupings.
Exam trap
The trap here is that candidates often confuse clustering with classification because both involve grouping data into categories, but clustering is unsupervised (no labels) while classification requires labeled training data.
How to eliminate wrong answers
Option A is wrong because regression is a supervised learning task used to predict continuous numerical values (e.g., article view count), not to group articles into discrete categories. Option B is wrong because classification is a supervised learning task that requires labeled training data to assign predefined categories, but the scenario explicitly states no predefined labels are used. Option D is wrong because anomaly detection is used to identify rare or unusual data points that deviate from the norm, not to organize data into multiple topic-based groups.