AI-900Chapter 39 of 100Objective 2.2

Linear vs Logistic Regression

Two fundamental supervised learning algorithms—linear and logistic regression—appear in the AI-900 Machine Learning domain. These models are core to understanding machine learning on Azure and appear in roughly 15-20% of AI-900 exam questions, especially in the 'Machine Learning' domain (objective 2.2). You will learn the precise mechanics of each algorithm, their differences, when to use each, and how they are implemented in Azure Machine Learning. Mastering this distinction is critical because the exam frequently tests your ability to select the correct algorithm for a given scenario.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

House Price vs. Spam Email: Two Prediction Models

How can a real estate agent predict the selling price of a house? You collect data on house size, number of bedrooms, location, and age. You plot these against known sale prices and draw a straight line that best fits the points. This line is like a ruler: for any new house, you read off the price from the line. This is linear regression — predicting a continuous number (price). Now think of a spam filter. You have emails labeled 'spam' or 'not spam'. You want to predict the probability that a new email is spam. You cannot draw a straight line because the output is binary (0 or 1). Instead, you use a curve that squashes any real number into a probability between 0 and 1. This curve is like a switch: if the probability is above 0.5, classify as spam; below, not spam. That is logistic regression — predicting a category (spam/not spam). The key difference: linear regression outputs a value on an infinite scale; logistic regression outputs a probability bounded by 0 and 1, which is then thresholded to make a binary decision.

How It Actually Works

What is Regression?

Regression is a type of supervised machine learning where the model learns from labeled data to predict a continuous numeric output. The term 'regression' comes from statistics and literally means 'to go back' — the goal is to understand how a dependent variable changes when independent variables are varied. In AI-900, you must know that regression predicts numbers (like price, temperature, sales).

Linear Regression: The Mechanics

Linear regression assumes a linear relationship between input features (X) and the target variable (y). The model is represented as:

y = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ + ε

Where: - y is the predicted value - β₀ is the intercept (bias) - β₁...βₙ are coefficients (weights) for each feature X₁...Xₙ - ε is the error term (residual)

The algorithm 'learns' by finding the line (or hyperplane in multiple dimensions) that minimizes the sum of squared errors between predicted and actual values. This is called Ordinary Least Squares (OLS). In Azure Machine Learning, linear regression uses the 'LinearRegression' class from scikit-learn or the built-in 'Linear Regression' module in the designer.

How Linear Regression Works Internally

Initialize: Start with random coefficients (often zero).

Predict: For each training example, compute the predicted value using the current coefficients.

Calculate Error: Compute the difference between predicted and actual value (residual).

Update Coefficients: Use gradient descent (or closed-form normal equation) to adjust coefficients to minimize the mean squared error (MSE).

Repeat: Iterate until convergence (change in MSE is below a threshold, e.g., 0.0001).

Key Components of Linear Regression

Coefficients (Weights): Each feature gets a weight that indicates its impact on the prediction. A positive weight means the feature increases the predicted value; negative means it decreases.

Intercept: The predicted value when all features are zero. Often not directly interpretable.

R-squared (R²): A metric between 0 and 1 indicating how well the model explains variance in the data. 1 means perfect fit.

Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.

Root Mean Squared Error (RMSE): Square root of the average squared differences. Penalizes large errors more.

Assumptions of Linear Regression

Linearity: The relationship between features and target is linear.

Independence: Observations are independent of each other.

Homoscedasticity: Constant variance of errors across all levels of the independent variable.

Normality: Errors are normally distributed (important for hypothesis testing but not for prediction).

Logistic Regression: The Mechanics

Despite its name, logistic regression is a classification algorithm, not a regression algorithm. It predicts the probability that an instance belongs to a particular class. The output is a value between 0 and 1, which is then thresholded (typically at 0.5) to make a binary decision.

The model uses the logistic function (sigmoid) to transform a linear combination of features into a probability:

P(y=1) = 1 / (1 + e^-(β₀ + β₁X₁ + ... + βₙXₙ))

Where e is Euler's number (~2.718). The expression inside the exponent is the same linear equation as in linear regression, but the sigmoid squashes it into [0,1].

How Logistic Regression Works Internally

Initialize: Set coefficients to small random values.

Compute Linear Combination: Calculate z = β₀ + β₁X₁ + ... + βₙXₙ.

Apply Sigmoid: Compute probability p = 1/(1+e^(-z)).

Calculate Loss: Use log loss (cross-entropy) instead of squared error: Loss = -[y log(p) + (1-y) log(1-p)].

Update Coefficients: Use gradient descent to minimize log loss.

Repeat: Until convergence.

Key Components of Logistic Regression

Coefficients: Interpreted as log-odds ratios. A unit increase in feature X increases the log-odds of the event by the coefficient value.

Threshold: Default 0.5, but can be tuned to balance precision and recall.

Log Loss (Cross-Entropy): The loss function used; lower is better.

Accuracy, Precision, Recall, F1-score: Metrics for evaluating classification performance.

Confusion Matrix: Shows true positives, false positives, true negatives, false negatives.

Assumptions of Logistic Regression

Binary Outcome: The dependent variable is binary (though multinomial logistic regression exists for multiple classes).

Independence: Observations are independent.

No Multicollinearity: Features should not be highly correlated.

Linearity of Logit: The log-odds of the outcome is linearly related to the features.

Differences Between Linear and Logistic Regression

Output: Linear regression outputs a continuous value (e.g., 250,000). Logistic regression outputs a probability (e.g., 0.85).

Loss Function: Linear regression uses mean squared error; logistic regression uses log loss.

Decision Boundary: Linear regression has no decision boundary; logistic regression has a linear decision boundary (in the feature space) because the log-odds is linear.

Assumptions: Linear regression requires normality of errors and homoscedasticity; logistic regression does not.

Evaluation Metrics: Linear regression uses R², MAE, RMSE; logistic regression uses accuracy, precision, recall, F1, AUC-ROC.

When to Use Each on Azure

Use linear regression when predicting a numeric value, e.g., 'What will be the price of this house?' or 'What will be the temperature tomorrow?'.

Use logistic regression when predicting a binary category, e.g., 'Will this customer churn?' (yes/no) or 'Is this email spam?' (spam/not spam).

In Azure Machine Learning, both algorithms are available in the designer (drag-and-drop) and via the Python SDK. You can also use automated ML (AutoML) which automatically selects the best algorithm based on your data and task type.

Training and Evaluation in Azure

For linear regression:

Use the 'Linear Regression' module in the designer.

Connect training data, select the label column (numeric), and run.

Evaluate using 'Evaluate Model' module, which outputs R², MAE, RMSE, etc.

For logistic regression:

Use the 'Two-Class Logistic Regression' module in the designer.

Connect training data, select the label column (binary), and run.

Evaluate using 'Evaluate Model' module, which outputs accuracy, precision, recall, F1, and AUC.

Common Pitfalls

Using linear regression for classification: This can produce probabilities outside [0,1] and is inappropriate.

Using logistic regression for regression: It will output probabilities, not continuous values.

Not normalizing features: Gradient descent converges faster when features are on similar scales.

Ignoring multicollinearity in logistic regression: It can inflate standard errors of coefficients.

Advanced Topics (Not on AI-900 but useful)

Regularization (L1/L2) to prevent overfitting.

Polynomial features to capture non-linear relationships in linear regression.

Multinomial logistic regression for more than two classes.

Walk-Through

Define the Problem Type

First, determine whether your target variable is continuous (numeric) or categorical (binary). This is the single most important step. If you are predicting a number like price, use linear regression. If you are predicting a category like spam/not spam, use logistic regression. In Azure Machine Learning, you specify the task type when creating a training pipeline. The designer modules are separated into 'Regression' and 'Classification' categories. Misidentifying the problem type is a common exam trap.

Prepare the Data

Clean and preprocess your dataset. For both algorithms, handle missing values (e.g., using the 'Clean Missing Data' module in Azure). For linear regression, ensure features are on a similar scale using normalization (e.g., Min-Max or Z-score). For logistic regression, scaling is less critical but still recommended. Split data into training and test sets (typically 80/20 or 70/30). In Azure, use the 'Split Data' module. Also, encode categorical features into numeric using one-hot encoding ('Convert to Indicator Values' module).

Select and Configure the Algorithm

In Azure Machine Learning designer, drag the appropriate module: 'Linear Regression' for regression tasks, or 'Two-Class Logistic Regression' for binary classification. Configure hyperparameters. For linear regression, you can set the regularization weight (L2) and solver (e.g., 'SGD' or 'Normal Equation'). For logistic regression, set the optimization tolerance (default 0.0001) and regularization weight (default 1). In the SDK, you instantiate the model with parameters like `LogisticRegression(C=1.0, solver='lbfgs')`. The exam does not require memorizing hyperparameters, but you should know that these algorithms have configurable settings.

Train the Model

Connect the training data and the algorithm module to the 'Train Model' module. Specify the label column (the column you want to predict). Run the training pipeline. Azure processes the data and updates the model coefficients using gradient descent (or closed-form for linear regression with small datasets). The training step produces a trained model object. For linear regression, the model learns the optimal line. For logistic regression, it learns the optimal sigmoid curve. The training time depends on dataset size and number of features.

Evaluate the Model

Connect the trained model and the test data to the 'Score Model' and 'Evaluate Model' modules. For linear regression, evaluation metrics include R-squared (R²), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Relative Absolute Error. For logistic regression, metrics include accuracy, precision, recall, F1-score, and AUC-ROC. The 'Evaluate Model' module outputs a confusion matrix and ROC curve. Compare these metrics to a baseline (e.g., predicting the mean for regression, or majority class for classification) to gauge improvement.

What This Looks Like on the Job

Enterprise Scenario 1: Predicting Housing Prices

A real estate company wants to build a model to estimate house prices based on features like square footage, number of bedrooms, location, and year built. They use Azure Machine Learning with linear regression. The dataset contains 100,000 records. They preprocess by imputing missing values and normalizing numeric features. The model achieves an R² of 0.85, meaning it explains 85% of the variance in price. They deploy the model as a web service via Azure Container Instances. A common issue is multicollinearity between features (e.g., number of bedrooms and square footage are correlated), which can be addressed by removing one of them or using regularization. Misconfiguration: using logistic regression would produce probabilities, not prices, which is useless.

Enterprise Scenario 2: Credit Card Fraud Detection

A bank wants to detect fraudulent transactions in real-time. They have a dataset of transactions labeled as 'fraud' or 'legitimate'. They use logistic regression because the output is binary. The dataset is highly imbalanced (only 1% fraud). They handle this by using class weights or oversampling (SMOTE). The model is trained on 1 million transactions and achieves an AUC-ROC of 0.95. They deploy the model to Azure Kubernetes Service for low-latency scoring. A common pitfall is not adjusting the decision threshold; the default 0.5 may yield too many false positives. They tune the threshold to 0.3 to catch more fraud at the cost of more false alarms. Misconfiguration: using linear regression would output a continuous score that is not interpretable as a probability.

Enterprise Scenario 3: Customer Churn Prediction

A telecom company wants to predict which customers are likely to cancel their subscription. They use logistic regression with features like contract length, monthly charges, and customer service calls. The model is trained on historical data and evaluated using precision and recall because false negatives (missing churners) are costly. They achieve 80% precision and 75% recall. They deploy the model to Azure Functions for batch scoring. A common issue is that logistic regression assumes linearity in the log-odds, which may not hold; they add interaction terms or polynomial features to capture non-linear relationships. Misconfiguration: using linear regression would produce negative probabilities for some customers, which is nonsensical.

How AI-900 Actually Tests This

What AI-900 Tests on This Topic

The AI-900 exam (objective 2.2) tests your ability to distinguish between regression and classification tasks. You will be given a scenario and asked to choose the appropriate algorithm: linear regression for numeric predictions, logistic regression for binary classification. The exam does not require deep mathematical knowledge but expects you to understand the fundamental difference in output type.

Common Wrong Answers and Why

Choosing linear regression for a classification problem: Candidates see 'regression' in the name and think it can predict categories. Wrong — linear regression outputs continuous values, not probabilities. The exam loves to present a scenario like 'predict whether a customer will buy a product' and offer linear regression as a distractor.

Choosing logistic regression for a numeric prediction: Candidates know logistic regression is for classification but forget it outputs probabilities, not actual numbers. For example, predicting the exact dollar amount of a sale — logistic regression cannot do that.

Confusing logistic regression with linear regression because both have 'regression': The exam expects you to know that logistic regression is a classification algorithm despite its name.

Selecting clustering algorithms (like K-means) for regression tasks: The exam may include unsupervised learning algorithms as wrong answers. Always check if the scenario requires labeled data (supervised) or unlabeled (unsupervised).

Specific Numbers and Terms on the Exam

The exam may ask about R-squared (R²) as a metric for linear regression. Know that R² ranges from 0 to 1 (or negative if model is worse than mean).

AUC-ROC is a metric for logistic regression; AUC of 0.5 means random, 1.0 perfect.

The sigmoid function is the activation function in logistic regression.

Mean Squared Error (MSE) is a common loss function for linear regression.

The term 'binary classification' is used for logistic regression.

Edge Cases and Exceptions

Multinomial logistic regression: The exam may mention logistic regression for multiple classes (e.g., predicting dog, cat, bird). This is still classification, not regression.

Linear regression with a binary target: Technically possible but not recommended; the exam will expect you to choose logistic regression.

When to use regression vs. classification for time series: If predicting a future numeric value (e.g., stock price), use regression. If predicting up/down movement, use classification.

How to Eliminate Wrong Answers

Look at the output type: If the question asks for a 'number', 'amount', 'price', 'temperature' — it's regression. If it asks for 'yes/no', 'true/false', 'category' — it's classification.

Check if the algorithm name matches the task: 'Linear Regression' for continuous output, 'Logistic Regression' for binary output.

Beware of algorithms that sound similar but are wrong: 'Linear Regression' is not for classification; 'Logistic Regression' is not for continuous values.

Key Takeaways

Linear regression predicts continuous numeric values; logistic regression predicts binary categorical probabilities.

Logistic regression uses the sigmoid function to map any real number to a probability between 0 and 1.

The loss function for linear regression is Mean Squared Error; for logistic regression it is Log Loss (Cross-Entropy).

Common evaluation metrics: R² for linear regression; accuracy, precision, recall, F1, AUC for logistic regression.

In Azure Machine Learning, use the 'Linear Regression' module for regression tasks and 'Two-Class Logistic Regression' for binary classification.

The AI-900 exam tests your ability to select the correct algorithm based on the type of target variable (numeric vs. categorical).

Never use linear regression for classification or logistic regression for numeric prediction.

Both algorithms are supervised learning methods requiring labeled data.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Linear Regression

Predicts continuous numeric values (e.g., price, temperature).

Uses a linear equation to model the relationship.

Loss function: Mean Squared Error (MSE).

Evaluation metrics: R², MAE, RMSE.

Assumes normality of errors and homoscedasticity.

Logistic Regression

Predicts binary categorical outcomes (e.g., spam/not spam).

Uses a logistic (sigmoid) function to output probabilities.

Loss function: Log Loss (Cross-Entropy).

Evaluation metrics: Accuracy, Precision, Recall, F1, AUC-ROC.

Does not require normality or homoscedasticity.

Watch Out for These

Mistake

Linear regression can be used for classification by thresholding the output.

Correct

While you could threshold a linear regression output (e.g., predict 1 if >0.5), the values are not bounded between 0 and 1, and the model does not optimize for classification accuracy. Logistic regression is specifically designed to output probabilities and uses log loss, making it more appropriate.

Mistake

Logistic regression is a regression algorithm because it has 'regression' in its name.

Correct

Logistic regression is a classification algorithm. The name comes from the logistic function and the fact that it models the log-odds of the outcome as a linear combination of features. Despite the name, it predicts probabilities and is used for classification tasks.

Mistake

Both linear and logistic regression require the same assumptions about data distribution.

Correct

Linear regression assumes normality of errors, homoscedasticity, and linearity. Logistic regression does not require normality or homoscedasticity; it only assumes linearity of the log-odds and independence of observations.

Mistake

The coefficients in logistic regression represent the change in probability for a unit change in the feature.

Correct

Coefficients represent the change in log-odds, not probability. The effect on probability depends on the current probability level. For example, a coefficient of 2 means the log-odds doubles, but the probability increase from 0.5 to 0.73 is not the same as from 0.9 to 0.98.

Mistake

Linear regression always produces better predictions than logistic regression for numeric data.

Correct

Linear regression is designed for numeric predictions, but if the relationship is non-linear, other models (e.g., polynomial regression, tree-based models) may perform better. Logistic regression cannot be used for numeric predictions at all.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the main difference between linear and logistic regression?

The main difference is the type of output: linear regression predicts a continuous numeric value (e.g., house price), while logistic regression predicts a binary category (e.g., spam or not spam) by outputting a probability between 0 and 1. Linear regression uses a straight line, logistic regression uses an S-shaped curve (sigmoid).

Can I use linear regression for classification?

Technically, you can threshold the output, but it is not recommended. Linear regression outputs values that can be less than 0 or greater than 1, which are not valid probabilities. Also, the loss function (MSE) is not appropriate for classification. Logistic regression is designed for this purpose.

Why is logistic regression called regression if it's for classification?

The name comes from the logistic function and the fact that it models the log-odds of the outcome as a linear combination of features (a regression on the log-odds). Despite the name, it is used for classification.

What metrics should I use to evaluate linear regression?

Common metrics include R-squared (R²), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Relative Absolute Error. R² indicates how well the model explains variance in the data.

What metrics should I use to evaluate logistic regression?

Common metrics include accuracy, precision, recall, F1-score, and AUC-ROC. The confusion matrix provides true/false positives and negatives. AUC-ROC measures the model's ability to distinguish between classes across all thresholds.

How do I choose between linear and logistic regression in Azure Machine Learning?

Look at your label column: if it is numeric (e.g., price), choose Linear Regression; if it is binary (e.g., yes/no), choose Two-Class Logistic Regression. In the designer, these modules are in the 'Regression' and 'Classification' categories respectively.

What is the sigmoid function and why is it used in logistic regression?

The sigmoid function, S(z) = 1/(1+e^(-z)), maps any real number to a value between 0 and 1, which can be interpreted as a probability. It is used because it is differentiable and produces an S-shaped curve that naturally models probability.

Terms Worth Knowing

Artificial intelligence Computer vision Generative AI Machine learning Natural language processing Responsible AI

Ready to put this to the test?

You've just covered Linear vs Logistic Regression — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Try AI-900 practice questions Back to all chapters

Done with this chapter?

Decision Trees and Random Forests

Anomaly Detection

See the full AI-900 study guide