This chapter covers the two fundamental categories of machine learning: supervised and unsupervised learning. Understanding the distinction is critical for the AI-900 exam, as roughly 15-20% of questions test your ability to identify which approach applies to a given scenario. You will learn the internal mechanisms, key algorithms, and how to choose the right method based on data characteristics and business goals. Mastery of this topic directly supports exam objectives under 'Identify common machine learning types' and 'Describe core machine learning concepts.'
Jump to a section
Supervised learning is like a teacher giving students a textbook with example problems and their exact answers (labeled data). The student studies these examples to learn the mapping from problem to answer. During the final exam (prediction), the student sees a new problem and must produce the correct answer based on learned patterns. The teacher can immediately grade the student's work because the correct answers are known (ground truth). Unsupervised learning is like giving students a pile of mixed puzzle pieces (unlabeled data) and asking them to group similar pieces together without any picture on the box. Students might group by color, shape, or edge type. There is no teacher to say which grouping is correct; the goal is to find inherent structure. In machine learning, supervised algorithms (like linear regression, decision trees) require labeled datasets where each input has a known output. Unsupervised algorithms (like k-means clustering, PCA) find hidden patterns without any labels. The mechanistic difference: supervised learning minimizes a loss function comparing predictions to known labels; unsupervised learning optimizes a criterion like within-cluster sum of squares or reconstruction error without any external reference.
What is Supervised Learning?
Supervised learning is a machine learning paradigm where the model is trained on a labeled dataset—each training example has an input (features) and a known output (label). The goal is to learn a mapping function f: X → Y that can predict labels for new, unseen inputs. The model is evaluated by comparing its predictions against the true labels using metrics like accuracy, precision, recall, or mean squared error.
Why Supervised Learning Exists
Supervised learning solves problems where historical data with known outcomes is available. Common applications include spam detection (email → spam/not spam), medical diagnosis (symptoms → disease), and price prediction (house features → price). The exam expects you to recognize that supervised learning requires labeled data and is used for prediction tasks.
How Supervised Learning Works Internally
The training process involves feeding the model input-output pairs. The model makes a prediction, computes a loss (error) between prediction and true label, and updates its internal parameters to minimize that loss. For example, in linear regression, the model learns coefficients that minimize the sum of squared errors. In decision trees, it learns splitting rules that maximize information gain. The process repeats over multiple epochs until convergence.
Key Supervised Learning Algorithms
Linear Regression: Predicts continuous values. Assumes a linear relationship between inputs and output.
Logistic Regression: Predicts binary outcomes (0/1). Uses sigmoid function to output probabilities.
Decision Trees: Tree-like model with decision nodes and leaf nodes. Handles non-linear relationships.
Random Forest: Ensemble of decision trees, reduces overfitting.
Support Vector Machines (SVM): Finds hyperplane that best separates classes.
Neural Networks: Deep learning models with multiple layers, capable of learning complex patterns.
What is Unsupervised Learning?
Unsupervised learning deals with unlabeled data—no target variable. The model identifies patterns, structures, or groupings inherent in the data. Common tasks include clustering (grouping similar data points), dimensionality reduction (reducing number of features), and association (finding rules like 'customers who buy X also buy Y').
Why Unsupervised Learning Exists
Unsupervised learning is used when labels are unavailable or expensive to obtain. It helps explore data, discover hidden segments, or preprocess data for supervised learning. Examples: customer segmentation, anomaly detection, and feature compression.
How Unsupervised Learning Works Internally
Clustering algorithms like k-means assign each data point to the nearest cluster centroid, then update centroids based on the mean of assigned points. The process repeats until centroids stabilize. Dimensionality reduction techniques like PCA find orthogonal axes that capture maximum variance. There is no loss function comparing to a ground truth; instead, algorithms optimize intrinsic criteria like within-cluster sum of squares (WCSS) or explained variance.
Key Unsupervised Learning Algorithms
K-Means Clustering: Partitions data into k clusters based on distance to centroids.
Hierarchical Clustering: Builds a tree of clusters (dendrogram) without specifying k.
DBSCAN: Density-based clustering that finds arbitrarily shaped clusters and noise.
Principal Component Analysis (PCA): Linear dimensionality reduction that transforms features into principal components.
t-SNE: Non-linear dimensionality reduction for visualization.
Apriori Algorithm: Finds association rules in transactional data.
Comparison of Supervised vs Unsupervised
Data: Supervised uses labeled data; unsupervised uses unlabeled data.
Goal: Supervised predicts labels; unsupervised finds hidden structure.
Evaluation: Supervised metrics compare to true labels; unsupervised metrics like silhouette score or inertia assess cluster quality.
Complexity: Supervised often requires more data and careful labeling; unsupervised can work with raw data.
When to Use Which?
Use supervised when you have labeled data and a clear prediction task.
Use unsupervised when you lack labels or want to explore data.
Semi-supervised learning combines both, using a small labeled set with a larger unlabeled set.
Reinforcement learning is a third paradigm (not covered here) where an agent learns by interacting with an environment.
Azure AI Services and Supervised/Unsupervised
In Azure, supervised learning is used in services like Azure Machine Learning (automated ML, regression, classification) and Azure Cognitive Services (e.g., Custom Vision, Language Understanding). Unsupervised learning is available via Azure Machine Learning (clustering, PCA) and Azure Synapse Analytics (anomaly detection). The exam may ask you to identify which service uses which type.
Exam-Relevant Details
Know that supervised learning requires labeled data; unsupervised does not.
Common supervised tasks: regression (predict numeric), classification (predict category).
Common unsupervised tasks: clustering (group items), dimensionality reduction (simplify features).
Specific algorithms: linear regression, logistic regression, decision trees, random forest (supervised); k-means, PCA (unsupervised).
The exam will present scenarios: 'You have historical sales data with prices. Which approach?' → supervised regression. 'You have customer data without segments. Which approach?' → unsupervised clustering.
Trap Patterns
Trap: Confusing regression with classification. Regression predicts continuous values; classification predicts discrete labels.
Trap: Assuming clustering is supervised because it produces groups. But groups are not predefined; they are discovered.
Trap: Thinking dimensionality reduction is supervised. PCA is unsupervised; it does not use labels.
Trap: Overlooking that semi-supervised learning exists. The exam may test this as a hybrid.
Code Example: Supervised vs Unsupervised in Python
# Supervised: Linear Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train) # y_train are labels
predictions = model.predict(X_test)
# Unsupervised: K-Means Clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X) # No y! Just features
labels = kmeans.predict(X) # Assigns cluster IDsSummary of Key Differences
Supervised: Input → Label, Goal → Predict, Evaluation → Compare to truth.
Unsupervised: Input only, Goal → Discover structure, Evaluation → Intrinsic metrics.
Data requirement: Supervised needs labeled data; unsupervised does not.
Complexity: Supervised training can be more computationally intensive due to label verification.
Azure Machine Learning Studio
In Azure Machine Learning designer, supervised modules include 'Train Model', 'Score Model', 'Evaluate Model'. Unsupervised modules include 'K-Means Clustering', 'PCA'. The exam may ask you to select the correct module for a task.
Define the Problem Type
Determine if you have labeled data (known outcomes) or unlabeled data. If labels exist and you want to predict them, it's supervised. If no labels and you want to find patterns, it's unsupervised. This step dictates the entire approach. For example, predicting house prices uses supervised regression; grouping customers by purchasing behavior uses unsupervised clustering.
Select the Appropriate Algorithm
Based on the problem type, choose an algorithm. For supervised classification, options include logistic regression, decision trees, SVM, or neural networks. For regression, linear regression, random forest regression. For unsupervised clustering, k-means, hierarchical, DBSCAN. For dimensionality reduction, PCA or t-SNE. Consider data size, linearity, and interpretability.
Prepare the Data
For supervised learning, split data into training, validation, and test sets. Ensure labels are correct and balanced if classification. For unsupervised, no label splitting needed, but normalize features to avoid scale bias. Handle missing values appropriately. In Azure Machine Learning, use 'Split Data' module for supervised; for unsupervised, use 'Clean Missing Data'.
Train the Model
Feed training data to the algorithm. In supervised, the model learns from input-output pairs. In unsupervised, the model finds patterns without guidance. For example, linear regression minimizes mean squared error; k-means iteratively assigns points to nearest centroid. In Azure, use 'Train Model' for supervised and 'K-Means Clustering' for unsupervised.
Evaluate the Model
Supervised evaluation uses metrics like accuracy, precision, recall, F1-score for classification; R-squared, MAE, RMSE for regression. Compare predictions to true labels on test set. Unsupervised evaluation is trickier; use silhouette score (range -1 to 1) for clustering, or elbow method for k-means. In Azure, use 'Evaluate Model' for supervised; for unsupervised, visual inspection or 'Score Model' to assign clusters.
Enterprise Scenario 1: Credit Card Fraud Detection A financial institution wants to detect fraudulent transactions in real-time. They have historical transaction data with labels (fraud/legitimate). This is a supervised classification problem. They use Azure Machine Learning with a boosted decision tree algorithm trained on millions of transactions. The model is deployed as a web service on Azure Kubernetes Service. Key considerations: class imbalance (fraud is rare) requires techniques like SMOTE or weighted loss. The model must have low latency (<10ms) to approve transactions. If misconfigured (e.g., using unsupervised clustering), the model would group transactions without knowing which cluster is fraud, leading to poor detection.
Enterprise Scenario 2: Customer Segmentation for Retail An e-commerce company wants to segment customers based on purchase history, demographics, and browsing behavior. No predefined segments exist, so they use unsupervised clustering with k-means. They use Azure Machine Learning to preprocess data (normalize features) and run k-means with k=5 (determined by elbow method). The resulting segments are used for targeted marketing. Common pitfalls: choosing wrong k, not scaling features (distance-based algorithms are sensitive), or using supervised classification when no labels exist. Performance considerations: large datasets (>1M customers) require distributed computing; Azure's parallel execution helps.
Enterprise Scenario 3: Anomaly Detection in Manufacturing A factory monitors sensor readings from equipment to detect anomalies. Historical data includes both normal and faulty operation, but fault labels are incomplete. They use unsupervised anomaly detection (e.g., one-class SVM or isolation forest) to flag unusual patterns. Azure Anomaly Detector (Cognitive Services) provides a pre-built unsupervised model. If they mistakenly used supervised learning with incomplete labels, the model would miss novel faults. The system processes thousands of sensor streams per second, requiring real-time inference. Scaling: using Azure Stream Analytics with the model as a function.
The AI-900 exam tests supervised vs unsupervised learning primarily in objective 'Identify common machine learning types' (AI-900 objective 2.1). Expect 2-3 questions that present a scenario and ask which approach to use. Common wrong answers:
Choosing supervised for clustering tasks: Candidates see 'group customers' and think classification. But classification requires labeled groups; clustering is unsupervised. Trap: 'segment customers' → unsupervised, not supervised.
Choosing unsupervised for regression: Candidates see 'predict price' and think 'no labels needed' incorrectly. Regression is supervised because it needs historical prices as labels.
Confusing classification with regression: Both are supervised. Classification predicts categories; regression predicts numbers. Trap: 'predict if email is spam' → classification; 'predict temperature' → regression.
Thinking all AI services are supervised: Azure Cognitive Services include both supervised (Custom Vision) and unsupervised (Anomaly Detector). The exam may ask which service uses unsupervised learning.
Exact terms to know: 'labeled data', 'unlabeled data', 'regression', 'classification', 'clustering', 'dimensionality reduction'. Specific algorithms: 'linear regression', 'logistic regression', 'decision tree', 'k-means', 'PCA'. Edge cases: semi-supervised learning (combination) is not a main focus but could appear as a distractor.
How to eliminate wrong answers: If the scenario mentions 'historical data with known outcomes', it's supervised. If it says 'find hidden patterns without labels', it's unsupervised. If the task is 'predict a number', it's regression (supervised). If 'group items', it's clustering (unsupervised).
Supervised learning uses labeled data (input-output pairs) to predict outcomes; unsupervised uses unlabeled data to find patterns.
Common supervised tasks: regression (predict continuous values) and classification (predict discrete categories).
Common unsupervised tasks: clustering (group similar items) and dimensionality reduction (reduce features).
Key supervised algorithms: linear regression, logistic regression, decision trees, random forest, SVM.
Key unsupervised algorithms: k-means clustering, hierarchical clustering, PCA, t-SNE.
The exam expects you to choose supervised when historical data with labels is available; unsupervised when no labels exist.
Azure Machine Learning provides both supervised and unsupervised modules; Azure Cognitive Services include both types.
Misconception: clustering is not classification; classification is supervised, clustering is unsupervised.
These come up on the exam all the time. Here's how to tell them apart.
Supervised Learning
Requires labeled data with known outputs
Goal is to predict labels for new data
Evaluation uses metrics comparing to true labels
Common tasks: regression, classification
Examples: spam detection, price prediction
Unsupervised Learning
Works with unlabeled data
Goal is to discover hidden patterns or structure
Evaluation uses intrinsic metrics like silhouette score
Common tasks: clustering, dimensionality reduction
Examples: customer segmentation, anomaly detection
Mistake
Supervised learning requires more data than unsupervised learning.
Correct
Not necessarily. Supervised learning can work with small labeled datasets if the model is simple; unsupervised may need large data to discover meaningful patterns. The key difference is labeled vs unlabeled, not quantity.
Mistake
Unsupervised learning is always easier because you don't need labels.
Correct
Unsupervised learning is often harder to evaluate and tune because there is no ground truth. It requires domain expertise to interpret results. Supervised learning has clear metrics.
Mistake
Clustering is a form of classification.
Correct
Classification is supervised (labels known); clustering is unsupervised (labels unknown). They are fundamentally different in how they use data.
Mistake
Dimensionality reduction is supervised.
Correct
PCA is unsupervised—it does not use labels. Some dimensionality reduction techniques like linear discriminant analysis (LDA) are supervised, but PCA is not. The exam focuses on PCA as unsupervised.
Mistake
All machine learning is either supervised or unsupervised.
Correct
There is also reinforcement learning (agent learns from rewards) and semi-supervised learning (mix of labeled and unlabeled). The exam may include these as additional types.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
The main difference is that supervised learning uses labeled data (each input has a known output), while unsupervised learning uses unlabeled data (no output labels). Supervised learns a mapping from inputs to outputs; unsupervised finds hidden patterns or groupings. For the exam, if a scenario mentions 'historical data with known outcomes', choose supervised. If it says 'find groups without predefined categories', choose unsupervised.
Unsupervised learning is not designed for prediction of a specific target. It can be used for anomaly detection (predicting if a point is unusual) or clustering (assigning a cluster label), but these are not prediction in the supervised sense. For numeric or categorical prediction with known labels, use supervised learning.
Azure Machine Learning's Automated ML can train supervised models for regression or classification. Azure Custom Vision (Cognitive Services) is supervised: you provide labeled images to train a classifier. Azure Language Understanding (LUIS) is supervised: you provide intents and utterances.
Azure Machine Learning's K-Means Clustering module is unsupervised. Azure Anomaly Detector (Cognitive Services) uses unsupervised learning to detect anomalies without labeled data. Azure Synapse Analytics offers unsupervised anomaly detection for time series.
Semi-supervised learning uses a small amount of labeled data and a large amount of unlabeled data. It combines supervised and unsupervised techniques. For example, you might use clustering to propagate labels from a few labeled points. The exam may include it as a hybrid approach.
Regression predicts a continuous numeric value (e.g., price, temperature). Classification predicts a discrete category (e.g., spam/not spam, dog/cat). If the output is a number that can be any value in a range, it's regression. If it's a label from a fixed set, it's classification.
The elbow method helps choose the optimal number of clusters (k) for k-means. It plots the within-cluster sum of squares (WCSS) against k. The 'elbow' point where WCSS starts decreasing slowly is considered optimal. In Azure, you can use 'K-Means Clustering' and tune k manually.
You've just covered Supervised vs Unsupervised Learning — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?