AI-900Chapter 35 of 100Objective 2.1

Overfitting, Underfitting, and Model Complexity

This chapter covers overfitting, underfitting, and model complexity—critical concepts for building reliable machine learning models. For the AI-900 exam, you need to understand how these issues arise, how to identify them, and how to mitigate them. Approximately 15-20% of exam questions touch on these topics, often presenting scenarios where you must choose the correct technique to address a specific problem. Mastering this chapter ensures you can distinguish between high bias and high variance and select appropriate remedies like regularization or cross-validation.

25 min read
Intermediate
Updated May 31, 2026

The Tailor and the Ill-Fitting Suit

Imagine a tailor is asked to make a custom suit for a client. The tailor takes measurements of the client's body—these are the training data. If the tailor creates a suit that is extremely simple, like a one-size-fits-all poncho, it will not fit the client's unique shape at all. This is underfitting: the model is too simple to capture the patterns in the data, resulting in high bias and poor performance on both training and new data. Conversely, if the tailor obsessively sews every tiny wrinkle and fold of the client's exact posture on the day of measurement, the suit will fit perfectly in that specific pose but will be stiff and uncomfortable when the client moves normally. This is overfitting: the model memorizes the training data noise and outliers, performing well on training but poorly on new data. The tailor's goal is to find the right balance—a suit that captures the general shape (the underlying pattern) without being too loose or too tight. In machine learning, this balance is achieved by controlling model complexity through techniques like regularization, cross-validation, and choosing the right algorithm. Just as a tailor uses a standard pattern adjusted with key measurements, a good model generalizes from training data to unseen data by learning the signal, not the noise.

How It Actually Works

What Are Overfitting and Underfitting?

Overfitting and underfitting are the two fundamental problems that occur when a machine learning model fails to generalize well from training data to unseen data. Generalization is the ability of a model to perform accurately on new, never-before-seen data. A model that generalizes well has learned the underlying patterns in the data without memorizing noise. Overfitting happens when a model is too complex—it learns the training data too well, including its random fluctuations and outliers. Underfitting happens when a model is too simple—it fails to capture the underlying trend in the data.

Why Do They Matter?

In practice, a model that overfits will have high accuracy on training data but poor accuracy on test or validation data. This makes it useless for real-world predictions because it cannot generalize. Underfitting, on the other hand, results in poor performance on both training and test data because the model never learned the relevant patterns. The goal of model training is to find the sweet spot: a model complex enough to capture the true signal but simple enough to ignore noise.

How Does Model Complexity Affect Fit?

Model complexity refers to the capacity of a model to learn intricate patterns. In linear models, complexity can be increased by adding polynomial features. In decision trees, complexity increases with tree depth. In neural networks, complexity increases with more layers and neurons. As complexity increases, the model's ability to fit training data improves, but beyond a certain point, it starts fitting noise—leading to overfitting. The relationship between model complexity and error is often visualized with a U-shaped curve for test error: initially, as complexity increases, test error decreases (underfitting region), but after an optimal point, test error increases (overfitting region).

Bias-Variance Tradeoff

The bias-variance tradeoff is the theoretical foundation for understanding overfitting and underfitting. Bias is the error due to overly simplistic assumptions in the learning algorithm. High bias can cause underfitting: the model misses relevant relations between features and target outputs. Variance is the error due to sensitivity to small fluctuations in the training set. High variance can cause overfitting: the model learns random noise instead of the intended outputs. The total error of a model is the sum of bias, variance, and irreducible error. A good model minimizes both bias and variance, but they are often in tension: reducing bias typically increases variance and vice versa.

Detecting Overfitting and Underfitting

Learning Curves: Plot training and validation error as a function of training set size or model complexity. If training error is low but validation error is high, the model is overfitting. If both errors are high, the model is underfitting. If both errors converge to a low value, the model is well-fitted.

Cross-Validation: Use k-fold cross-validation (commonly k=5 or 10) to evaluate model performance. High variance in scores across folds suggests overfitting.

Performance Metrics: Compare training and test accuracy. A large gap (e.g., training accuracy 99%, test accuracy 70%) indicates overfitting.

How to Prevent Overfitting

Regularization: Add a penalty term to the loss function to discourage large coefficients. L1 regularization (Lasso) can shrink some coefficients to zero, effectively performing feature selection. L2 regularization (Ridge) penalizes the square of coefficients, keeping them small but not zero. Elastic Net combines both.

Pruning: For decision trees, limit tree depth or minimum samples per leaf. Prune branches that have little predictive power.

Early Stopping: In iterative algorithms like gradient descent, stop training when validation error starts to increase.

Dropout: In neural networks, randomly drop a fraction of neurons during training (e.g., 20-50%) to prevent co-adaptation.

Data Augmentation: Increase the size and diversity of training data by applying transformations (e.g., rotations, flips for images).

Reduce Model Complexity: Choose a simpler model (e.g., linear instead of polynomial) or reduce the number of features.

Ensemble Methods: Bagging (e.g., Random Forest) reduces variance by averaging multiple models. Boosting (e.g., Gradient Boosting) reduces bias but can overfit if not regularized.

How to Prevent Underfitting

Increase Model Complexity: Use a more complex algorithm (e.g., switch from linear regression to polynomial regression) or add more features.

Feature Engineering: Create new features that capture relevant patterns.

Reduce Regularization: Lower the regularization strength to allow the model to fit the data better.

Increase Training Time: Allow more iterations for algorithms that converge slowly.

Use a More Powerful Algorithm: For example, switch from logistic regression to a neural network.

Key Values and Defaults

In scikit-learn's Ridge and Lasso, the regularization strength is controlled by the alpha parameter. Default alpha=1.0. A higher alpha means stronger regularization (less overfitting).

For decision trees in scikit-learn, max_depth=None (no limit) by default, which can lead to overfitting. Common practice is to set max_depth=3-10.

For Random Forest, n_estimators=100 by default. More trees reduce variance but increase computation.

In neural networks, dropout rate is typically 0.2 to 0.5. A rate of 0 means no dropout.

Early stopping patience is often set to 5-10 epochs.

Configuration and Verification Commands

In Python with scikit-learn:

from sklearn.linear_model import Ridge, Lasso
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score, learning_curve

# Ridge regression with alpha=0.1
ridge = Ridge(alpha=0.1)

# Decision tree with max_depth=5
tree = DecisionTreeClassifier(max_depth=5, min_samples_leaf=10)

# 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

# Learning curve
import matplotlib.pyplot as plt
train_sizes, train_scores, test_scores = learning_curve(model, X, y, cv=5, train_sizes=np.linspace(0.1, 1.0, 10))
plt.plot(train_sizes, train_scores.mean(axis=1), label='Training score')
plt.plot(train_sizes, test_scores.mean(axis=1), label='Cross-validation score')
plt.legend()
plt.show()

How It Interacts with Related Technologies

Azure Machine Learning: In automated ML, you can configure early stopping and regularization. The featurization parameter can automatically handle missing values and encode categorical features, which reduces the risk of overfitting due to improper data preprocessing.

Azure Databricks: Use MLflow to track experiments and compare model performance. You can log parameters like regularization strength and cross-validation scores.

Hyperparameter Tuning: Tools like Azure Hyperdrive or scikit-learn's GridSearchCV help find optimal complexity parameters (e.g., alpha, max_depth) to minimize overfitting.

Feature Selection: Techniques like Recursive Feature Elimination (RFE) or L1 regularization automatically reduce complexity by removing irrelevant features.

Common Pitfalls

Assuming more data always solves overfitting: While more data can help, if the model is too complex, it may still overfit. Regularization is often needed.

Ignoring data leakage: Using test data during training (e.g., scaling before splitting) can artificially inflate performance and mask overfitting.

Over-relying on a single metric: Accuracy can be misleading on imbalanced datasets. Use precision, recall, F1-score, or AUC-ROC.

Setting regularization too high: This can cause underfitting. Always monitor both training and validation performance.

Summary

Overfitting and underfitting are unavoidable challenges in machine learning. The key is to monitor model performance on unseen data, adjust model complexity through regularization or feature engineering, and use validation techniques to ensure generalization. On the AI-900 exam, expect scenario-based questions where you must identify whether a model is overfitting or underfitting and choose the appropriate corrective action.

Walk-Through

1

Split Data into Sets

The first step is to divide the available data into training, validation, and test sets. A common split is 60% training, 20% validation, 20% test. The training set is used to fit the model. The validation set is used to tune hyperparameters (like regularization strength) and detect overfitting during development. The test set is held back until the final model is chosen to provide an unbiased estimate of generalization error. Never use the test set for hyperparameter tuning; doing so leads to overfitting to the test set itself. In k-fold cross-validation, the data is split into k folds, and the model is trained k times, each time using k-1 folds for training and 1 fold for validation. This gives a more robust estimate of performance and helps detect overfitting by showing variance across folds.

2

Train a Baseline Model

Start with a simple model, such as linear regression or a shallow decision tree. This baseline should have high bias (underfit) to establish a lower bound on performance. For example, train a linear regression on a nonlinear dataset. The training error will be high, and the validation error will also be high. This confirms that the model is too simple. The baseline helps you measure how much improvement is gained by increasing complexity. It also serves as a sanity check: if your complex model performs worse than the baseline, you likely have an implementation bug or data leakage. In practice, the baseline model should be fast to train and interpretable.

3

Increase Model Complexity Gradually

Systematically increase model complexity by adding features, increasing polynomial degree, or using a more powerful algorithm. For each complexity level, train the model and record both training and validation errors. For example, start with degree-1 polynomial, then degree-2, degree-3, etc. As complexity increases, training error will decrease. Initially, validation error will also decrease (underfitting region). At some point, validation error will start to increase even as training error continues to drop—this is the onset of overfitting. The optimal complexity is the point just before validation error starts rising. Use learning curves to visualize this. In Azure Machine Learning, you can use automated ML to try many configurations and automatically stop when validation performance degrades.

4

Apply Regularization

Once overfitting is detected, apply regularization to constrain the model. For linear models, use L1 (Lasso) or L2 (Ridge) regularization. For tree-based models, limit depth (`max_depth`), minimum samples per leaf (`min_samples_leaf`), or maximum features (`max_features`). For neural networks, add dropout layers or L2 weight decay. Start with a small regularization strength (e.g., `alpha=0.1` for Ridge) and increase it until validation performance improves. Monitor the validation error: too much regularization will cause underfitting (both errors high). The goal is to find the regularization parameter that minimizes validation error. In scikit-learn, `RidgeCV` and `LassoCV` can automatically search over a range of alphas using cross-validation.

5

Evaluate with Cross-Validation

Perform k-fold cross-validation (typically k=5 or 10) on the final model to get a robust estimate of generalization performance. If the cross-validation scores have high variance (e.g., one fold accuracy 90%, another 60%), the model is likely overfitting to certain subsets of the data. This indicates that further regularization or simpler model is needed. Also, compare the mean cross-validation score to the training score. A large gap suggests overfitting. If the gap is small but both scores are low, the model is underfitting. Use the cross-validation results to select the best hyperparameters. In Azure Machine Learning, you can set `cv` parameter in automated ML to enable cross-validation.

6

Test on Holdout Set

After finalizing the model using validation and cross-validation, evaluate it once on the held-out test set. This gives the final estimate of how the model will perform on new data. The test set should never be used for any decision-making during model development. If the test performance is significantly worse than validation performance, you may have overfit to the validation set (data leakage or too much tuning). In that case, you need to re-collect data or use a more rigorous validation strategy like nested cross-validation. The test set result is what you report as the model's accuracy. For the AI-900 exam, remember that the test set is only used once at the very end.

What This Looks Like on the Job

Enterprise Scenario 1: Credit Risk Prediction

A bank wants to build a model to predict whether a loan applicant will default. They have historical data with 100 features, including income, credit score, debt-to-income ratio, etc. A data scientist trains a deep neural network with three hidden layers. On the training set, accuracy is 99.9%, but on a validation set, accuracy drops to 75%. This is classic overfitting: the model memorized idiosyncrasies of the training data. In production, the model would fail on new applicants. The solution: reduce model complexity by using a simpler model like logistic regression with L2 regularization (Ridge) and perform feature selection to remove irrelevant features. They also use 5-fold cross-validation and set alpha=1.0 via grid search. After regularization, training accuracy drops to 85% but validation accuracy rises to 82%, indicating better generalization. The model is deployed with a monitoring pipeline that tracks performance drift monthly.

Enterprise Scenario 2: Image Classification for Manufacturing

A factory uses computer vision to detect defects on assembly lines. They have a small dataset of 500 labeled images. They train a convolutional neural network (CNN) with many layers. The model achieves 100% training accuracy but only 60% on test images. Overfitting occurs because the model has too many parameters relative to the number of samples. To mitigate, they apply data augmentation (random rotations, flips, brightness changes) to effectively increase dataset size to 5000 images. They also add dropout layers with rate 0.5 and use early stopping with patience of 5 epochs. Validation accuracy improves to 85%. They also use transfer learning from a pre-trained model (ResNet-50) which reduces overfitting further because the pre-trained features are already general. The final model is deployed on Azure IoT Edge devices, and performance is monitored via Azure Machine Learning's model monitoring.

Enterprise Scenario 3: Predictive Maintenance

A utility company predicts equipment failure using sensor data (temperature, vibration, pressure) from turbines. The data has a time series component. They initially use a random forest with 1000 trees and no depth limit. Training accuracy is 99%, but test accuracy is 70%. Overfitting is due to the model capturing noise in the sensor readings. They tune hyperparameters: set max_depth=10, min_samples_leaf=5, and max_features='sqrt'. They also use rolling window cross-validation to respect temporal order. After tuning, test accuracy rises to 85%. They deploy the model using Azure Machine Learning pipelines and set up retraining triggers when performance degrades beyond a threshold. The key lesson: for time series, standard k-fold cross-validation can leak future information into training, causing overfitting. Always use time-based splits.

How AI-900 Actually Tests This

What AI-900 Tests on This Topic (Objective 2.1)

The AI-900 exam expects you to identify overfitting and underfitting from given performance metrics and choose appropriate mitigation techniques. Specifically, you should know: - Objective 2.1: Describe core machine learning concepts, including overfitting and underfitting. - Common Scenario: A model has high accuracy on training data but low accuracy on test data. You must recognize this as overfitting and suggest regularization, cross-validation, or simplifying the model. - Another Scenario: A model has low accuracy on both training and test data. This is underfitting; you should increase model complexity or add features.

Top 3 Wrong Answers and Why Candidates Choose Them

1.

"Add more training data" for overfitting: While more data can help, it is not guaranteed to fix overfitting if the model is too complex. The exam expects you to know that regularization or reducing complexity is the direct fix. Candidates choose this because they think more data always helps, but the question often specifies that data is limited.

2.

"Use a more complex model" for overfitting: This would worsen overfitting. Candidates confuse overfitting with underfitting. They see poor test performance and think the model needs more power, but the high training accuracy indicates overfitting.

3.

"Reduce the number of features" for underfitting: Reducing features can help with overfitting, but for underfitting, the model needs more information. Candidates choose this because they associate feature reduction with improving models generally.

Specific Numbers and Terms That Appear on the Exam

The term "regularization" is always tested as a technique to prevent overfitting.

Cross-validation is often mentioned with k=5 or k=10.

The bias-variance tradeoff is a key concept: high bias = underfitting, high variance = overfitting.

Learning curves: if training error is low and validation error is high, it's overfitting.

Decision tree hyperparameters: max_depth and min_samples_leaf are common.

Edge Cases and Exceptions

Imbalanced datasets: Overfitting can occur when the model predicts the majority class all the time. Accuracy may be high but recall is low. The exam might show a scenario with 95% accuracy but poor recall, indicating overfitting to the majority class.

Time series data: Standard cross-validation can cause data leakage. The exam may ask to use time-based splitting.

Small datasets: Overfitting is more likely. The exam might suggest using a simpler model or regularization.

How to Eliminate Wrong Answers

If training accuracy is high and test accuracy is low, eliminate any answer that increases model complexity or removes regularization. Look for answers that say "add regularization" or "simplify model."

If both training and test accuracy are low, eliminate answers that add regularization or simplify. Look for "increase model complexity" or "add features."

If the question mentions "variance" or "sensitivity to training data," the answer likely involves reducing variance (e.g., regularization, ensemble).

If the question mentions "bias" or "systematic error," the answer likely involves reducing bias (e.g., more complex model).

Key Takeaways

Overfitting: high training accuracy, low test accuracy; caused by too complex model or too little data.

Underfitting: low training and test accuracy; caused by too simple model or too strong regularization.

Regularization (L1, L2) is the primary technique to prevent overfitting.

Cross-validation (k=5 or 10) helps detect overfitting by evaluating model on multiple subsets.

Bias-variance tradeoff: high bias = underfitting, high variance = overfitting.

For decision trees, limit max_depth and min_samples_leaf to avoid overfitting.

Early stopping in iterative algorithms prevents overfitting by halting when validation error increases.

Data augmentation increases effective dataset size, reducing overfitting in image and text models.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

L1 Regularization (Lasso)

Adds penalty equal to absolute value of coefficients.

Can shrink some coefficients exactly to zero, performing feature selection.

Useful when you suspect many features are irrelevant.

Loss function: MSE + λ * Σ|coefficient|.

Solution is not differentiable at zero, but can be solved via coordinate descent.

L2 Regularization (Ridge)

Adds penalty equal to square of coefficients.

Shrinks coefficients but never to zero.

Useful when all features are relevant but need to control magnitude.

Loss function: MSE + λ * Σ(coefficient^2).

Solution is differentiable and has closed form (if using normal equations).

Watch Out for These

Mistake

Overfitting only happens with complex models like neural networks.

Correct

Overfitting can occur with any model, including linear regression, if you include too many polynomial features or irrelevant predictors. The key is the ratio of model capacity to data size.

Mistake

Adding more training data always solves overfitting.

Correct

More data can help, but if the model is excessively complex, it may still overfit. Regularization is often required. Also, if the new data is not representative, it may not help.

Mistake

Underfitting is rare because we can always make models more complex.

Correct

Underfitting occurs when the model is too simple to capture the underlying pattern. It is common when using linear models on nonlinear data or when regularization is too strong.

Mistake

A model with 100% training accuracy is always overfitted.

Correct

Not necessarily. If the data is noise-free and the model is perfectly suited, 100% training accuracy is possible. However, it is suspicious in real-world noisy data and usually indicates overfitting.

Mistake

Cross-validation prevents overfitting.

Correct

Cross-validation helps detect overfitting by evaluating on multiple validation sets, but it does not prevent it. You still need to use regularization or simplify the model based on cross-validation results.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I know if my model is overfitting or underfitting?

Compare training and validation performance. If training accuracy is high but validation accuracy is low, it's overfitting. If both are low, it's underfitting. Plot learning curves: if the training error is much lower than validation error, overfitting; if both are high and close, underfitting.

What is the difference between L1 and L2 regularization?

L1 (Lasso) adds penalty equal to absolute value of coefficients, can shrink some to zero (feature selection). L2 (Ridge) adds penalty equal to square of coefficients, shrinks them but not to zero. L1 is useful when many features are irrelevant; L2 when all features are relevant.

Can cross-validation prevent overfitting?

Cross-validation helps detect overfitting by evaluating on multiple validation folds, but it does not prevent it. You must still apply regularization or reduce complexity based on CV results.

What is the bias-variance tradeoff?

Bias is error from overly simplistic assumptions (underfitting). Variance is error from sensitivity to training data (overfitting). Increasing model complexity reduces bias but increases variance. The goal is to find the optimal complexity that minimizes total error.

How does early stopping work?

During iterative training (e.g., gradient descent), monitor validation error after each epoch. If validation error stops improving for a set number of epochs (patience), stop training. This prevents the model from overfitting to training data.

What is data augmentation and how does it help overfitting?

Data augmentation creates modified copies of training data (e.g., rotated images, synonym replacement in text). It effectively increases dataset size, reducing overfitting by exposing the model to more variations.

Should I use a simpler model if I have little data?

Yes. With small datasets, complex models are prone to overfitting. Use a simpler model (e.g., logistic regression instead of neural network) and apply regularization. Alternatively, use transfer learning.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Overfitting, Underfitting, and Model Complexity — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?