AI-900Chapter 9 of 100Objective 2.4

Automated ML (AutoML)

This chapter covers Automated Machine Learning (AutoML) in Azure Machine Learning, a key topic for AI-900 exam domain Machine Learning, objective 2.4 (Identify automated machine learning solutions). AutoML automates the process of selecting algorithms, preprocessing data, and tuning hyperparameters to produce a high-quality model with minimal human intervention. Approximately 5-10% of AI-900 exam questions touch this area, focusing on understanding when to use AutoML, its benefits, and how it fits into the ML lifecycle.

25 min read
Intermediate
Updated May 31, 2026

AutoML as a Master Chef with a Recipe Lab

Imagine you're a master chef trying to create the perfect chocolate chip cookie recipe. You have dozens of ingredients (flour, sugar, butter, eggs, vanilla, etc.) and countless proportions to try. Manually, you'd mix a batch, bake it, taste it, adjust, and repeat—taking days. Instead, you hire a team of sous-chefs (the training jobs) who each try a different combination: one uses 2 cups flour, another 3 cups; one bakes at 350°F, another at 375°F. They all bake simultaneously. Meanwhile, a head chef (the automated ML orchestrator) monitors their progress, notes which cookies taste best, and adjusts future experiments—for example, if high butter yields crispier cookies, she tells more sous-chefs to try high butter. After many rounds, she selects the best recipe. She also ensures no two sous-chefs waste time on exactly the same recipe (parallelism control), and she stops any that are clearly failing early (early termination). The final recipe is the best model. This mirrors AutoML: it parallelizes training, tunes hyperparameters via Bayesian optimization or grid search, and uses early termination to save compute.

How It Actually Works

What is Automated ML and Why Does It Exist?

Automated Machine Learning (AutoML) is the process of automating the time-consuming, iterative tasks of machine learning model development. In traditional ML, a data scientist must manually select an algorithm (e.g., decision tree, logistic regression, neural network), preprocess data (handle missing values, scale features), engineer features, and tune hyperparameters. This is a trial-and-error process that can take days or weeks. AutoML automates these steps, enabling non-experts to build effective models and allowing experts to focus on higher-level tasks.

For AI-900, you need to know that AutoML is part of Azure Machine Learning and can be used for classification, regression, and time-series forecasting. It automatically tries multiple algorithms and preprocessing steps, and produces a final model along with metrics and explanations.

How AutoML Works Internally

AutoML operates through a series of orchestrated steps:

1.

Data Input: You provide a tabular dataset (CSV, Parquet, or from Azure SQL, etc.) and specify the target column (the column to predict). The data must be labeled (supervised learning).

2.

Task Type Selection: You specify the problem type: classification, regression, or time-series forecasting. AutoML automatically detects if the target is categorical (classification) or numerical (regression), but you can override.

3.

Featurization: AutoML automatically applies data transformations to prepare the data. This includes:

- Handling missing values (e.g., mean imputation for numeric, most frequent for categorical) - Encoding categorical features (one-hot encoding, label encoding) - Scaling numeric features (standardization, normalization) - Text feature extraction (TF-IDF, bag-of-words) if text columns exist - Feature engineering (e.g., creating date parts from datetime columns) You can also provide custom featurization settings.

4.

Algorithm Selection: AutoML selects from a curated list of algorithms based on the task type. For classification, it includes logistic regression, decision tree, random forest, gradient boosting (LightGBM, XGBoost), SVM, and neural networks. For regression, similar algorithms plus linear regression. For forecasting, it includes ARIMA, Exponential Smoothing, Prophet, and gradient boosting with time-series features.

5.

Hyperparameter Tuning and Search Space: AutoML uses Bayesian optimization to search over hyperparameters. It defines a search space for each algorithm (e.g., learning rate in [0.001, 0.1], number of estimators in [10, 500]). The Bayesian optimizer uses past runs to choose the next set of hyperparameters that are likely to improve performance.

6.

Training and Evaluation: AutoML launches multiple training jobs in parallel (controlled by max_concurrent_iterations, default 30). Each job trains a model with a specific algorithm and hyperparameter set. Models are evaluated using cross-validation (default 5-fold) or a validation split (if you provide a separate validation dataset). The primary metric is chosen based on task: accuracy, AUC_weighted, F1_score, precision, recall for classification; RMSE, MAE, R2, normalized RMSE for regression.

7.

Early Termination: To save compute, AutoML can stop poorly performing runs early. It uses a termination policy like BanditPolicy, MedianStoppingPolicy, or TruncationSelectionPolicy. The default is BanditPolicy with a slack factor of 0.2 and evaluation interval of 1. This stops runs that are 20% worse than the best run.

8.

Model Ensemble: At the end, AutoML optionally creates an ensemble model that combines multiple top-performing models via voting or stacking. This often improves performance.

9.

Output: The result is the best model (or ensemble) along with metrics, feature importance, and explanations. You can then deploy it as a real-time endpoint or batch inference pipeline.

Key Components, Values, Defaults, and Timers

Primary Metric: The metric to optimize. Defaults: accuracy for classification, normalized_root_mean_squared_error for regression, normalized_root_mean_squared_error for forecasting. You can change to others like AUC_weighted, f1_score, precision_score_weighted, recall_score_weighted, spearman_correlation, etc.

Max Total Iterations: The total number of training jobs. Default: 100. Maximum: 1000.

Max Concurrent Iterations: Number of jobs running in parallel. Default: 30. You can reduce if you have limited compute.

Experiment Timeout (minutes): Maximum time for the entire AutoML run. Default: 4320 (3 days).

Iteration Timeout (minutes): Maximum time for a single iteration. Default: 60.

Early Termination Policy: BanditPolicy with slack_factor=0.2, evaluation_interval=1, delay_evaluation=5. MedianStoppingPolicy (stops if run's primary metric is worse than median of all runs). TruncationSelectionPolicy (stops bottom X% of runs).

Cross-validation: Number of folds for cross-validation. Default: 5. If you provide a validation dataset, cross-validation is not used.

Featurization: auto (default) or off. You can also specify custom featurization settings.

Allowed Models: List of algorithms to include. You can restrict to specific ones (e.g., only LightGBM and RandomForest) to reduce search space.

Blocked Models: Exclude specific algorithms.

Configuration and Verification Commands

AutoML is configured via the Azure Machine Learning Python SDK, Azure CLI, or Azure Machine Learning Studio UI. The SDK approach is most flexible.

Python SDK Example (v2):

from azure.ai.ml import automl
from azure.ai.ml.entities import AutoMLJob

# Create AutoML classification job
training_job = automl.classification(
    training_data=training_data,
    target_column_name="target",
    primary_metric="accuracy",
    compute="cpu-cluster",
    experiment_name="my-automl-experiment",
    max_total_iterations=100,
    max_concurrent_iterations=30,
)

# Submit job
returned_job = ml_client.jobs.create_or_update(training_job)

Azure CLI:

az ml job create --file automl-job.yml

Verification (check status):

# After submission
job = ml_client.jobs.get("my-automl-job-name")
print(job.status)  # Running, Completed, Failed

To view results: job.outputs contains the best model and metrics.

How AutoML Interacts with Related Technologies

Azure Machine Learning Compute: AutoML runs on a compute cluster (CPU or GPU). You must specify a compute target. For large datasets, GPU can speed up neural network training.

Azure Machine Learning Pipelines: AutoML can be a step in a larger pipeline, e.g., after data preprocessing, before model deployment.

Azure Machine Learning Datasets: Input data must be registered as a dataset or provided as a path to a file.

Model Registry: The best model can be automatically registered in the model registry for versioning.

ONNX: The best model can be converted to ONNX format for deployment on edge devices.

Explainability: AutoML can produce feature importance explanations using SHAP or MimicExplainer.

Azure DevOps: AutoML can be integrated into CI/CD pipelines for retraining models with fresh data.

Common Pitfalls and Exam Traps

Data size: AutoML works best with datasets of at least 1000 rows. Small datasets may lead to overfitting.

Imbalanced classes: AutoML automatically handles imbalance by using techniques like SMOTE or class weighting, but you can also specify smote or downsample in featurization.

Time-series forecasting: Must specify time column, time series ID (for multiple series), and forecast horizon. AutoML automatically creates lag and rolling window features.

Ensemble vs. single model: The ensemble model is often better but more complex. The exam may ask when to use ensemble.

Cost: More iterations and concurrent runs increase cost. The exam may ask about trade-offs.

Data leakage: Ensure your training data does not include future information (especially in forecasting). AutoML cannot detect data leakage; it's your responsibility.

Walk-Through

1

Prepare and Register Dataset

First, you must have your training data in a tabular format (CSV, Parquet, etc.) and register it as an Azure ML dataset. Use the Azure ML Studio UI or SDK: `from azure.ai.ml.entities import Data; data = Data(path='./data.csv', type='uri_file'); ml_client.data.create_or_update(data)`. Ensure the data includes the target column and no missing values in the target. For time-series, include a timestamp column.

2

Create Compute Cluster

AutoML requires a compute target for training. Create a CPU or GPU cluster via Azure ML Studio or SDK. Example: `from azure.ai.ml.entities import AmlCompute; cluster = AmlCompute(name='cpu-cluster', size='Standard_DS3_v2', min_instances=0, max_instances=10); ml_client.compute.begin_create_or_update(cluster)`. The cluster scales automatically to handle concurrent iterations.

3

Configure AutoML Job

Define the AutoML job using the SDK or CLI. Specify the task type (classification, regression, forecasting), target column, primary metric, and constraints like max_total_iterations and max_concurrent_iterations. Optionally set allowed models, featurization settings, and early termination policy. Example: `automl.classification(training_data=my_data, target_column_name='target', primary_metric='accuracy', compute='cpu-cluster', max_total_iterations=50)`.

4

Submit and Monitor Run

Submit the job via `ml_client.jobs.create_or_update(job)`. Monitor progress in Azure ML Studio under Jobs. You can see which algorithms are being tried, their metrics, and duration. Use `job.status` to check if running. The run may take minutes to hours depending on data size and iterations.

5

Review Results and Deploy

Once completed, retrieve the best model: `best_model = job.outputs.best_model`. View metrics and explanations. Optionally register the model: `ml_client.models.create_or_update(best_model)`. Deploy as a real-time endpoint using `ml_client.online_endpoints.begin_create_or_update(endpoint)` or batch inference. The exam may ask about the deployment step.

What This Looks Like on the Job

Enterprise Scenario 1: Churn Prediction for Telecom Company

A telecom company wants to predict customer churn (binary classification) using historical data with 100,000 rows and 50 features. The data includes demographics, usage patterns, and customer service interactions. The data science team is small and lacks deep ML expertise. They use AutoML in Azure ML with default settings: max_total_iterations=100, max_concurrent_iterations=30, primary_metric='AUC_weighted'. The AutoML run completes in 2 hours on a Standard_DS3_v2 cluster. The best model is a LightGBM with AUC of 0.92, outperforming their manual logistic regression (0.85). They deploy the model as a real-time endpoint for monthly scoring. A common issue: initially they set max_concurrent_iterations too high (50) causing resource contention and slower overall time; they reduced it to 30. Another issue: they forgot to set early termination, so many poor runs ran to completion, wasting compute.

Enterprise Scenario 2: Demand Forecasting for Retail

A retail chain needs to forecast daily sales for 1000 products across 50 stores for the next 30 days. They use AutoML for time-series forecasting. They provide data with columns: date, store_id, product_id, sales (target). They set forecast_horizon=30, time_column_name='date', time_series_id_column_names=['store_id','product_id']. AutoML automatically creates features like lag (1,2,7,30 days), rolling windows (7-day mean), and seasonal indicators. The run uses 200 iterations and completes in 4 hours. The best model is a Prophet ensemble with normalized RMSE of 0.12. Production: they retrain weekly with new data. A pitfall: they initially did not set the time_series_id_column_names correctly, causing all series to be treated as one, resulting in poor forecasts. After correction, accuracy improved significantly.

Enterprise Scenario 3: Credit Risk Assessment

A bank wants to classify loan applications as high or low risk. They have 50,000 records with imbalanced classes (5% high risk). They use AutoML with 'classification' task and set primary_metric='AUC_weighted' to handle imbalance. They enable featurization to automatically apply SMOTE for oversampling. The best model is a gradient boosting classifier with AUC of 0.97. They deploy the model as a batch inference pipeline to score daily applications. A common misconfiguration: they initially used 'accuracy' as primary metric, leading to a model that always predicts 'low risk' (95% accuracy but useless). Switching to AUC_weighted fixed this.

How AI-900 Actually Tests This

AI-900 Exam Focus on AutoML

Objective 2.4: Identify automated machine learning solutions. The exam tests your ability to understand what AutoML is, when to use it, and its benefits. Specific sub-objectives include:

Recognize scenarios where AutoML is appropriate (e.g., when you have labeled data but lack ML expertise, or when you need to quickly prototype).

Understand that AutoML automates algorithm selection, hyperparameter tuning, and data preprocessing.

Know that AutoML supports classification, regression, and time-series forecasting.

Understand the concept of primary metric and that you can choose based on business need.

Know that AutoML can produce an ensemble model.

Common Wrong Answers and Why

1.

"AutoML eliminates the need for data preparation" – Wrong. Data must still be cleaned, missing values handled (though AutoML can impute), and features selected. AutoML automates featurization but not data collection or cleaning.

2.

"AutoML only works with deep learning" – Wrong. AutoML includes classical algorithms like logistic regression, decision trees, and gradient boosting, not just neural networks.

3.

"AutoML always produces the best possible model" – Wrong. It produces a good model within constraints, but may not be optimal if search space is limited or data is poor.

4.

"AutoML can be used for unsupervised learning" – Wrong. AutoML requires labeled data (supervised learning).

Specific Numbers and Terms on the Exam

Default primary metric for classification: accuracy; for regression: normalized_root_mean_squared_error.

Default max iterations: 100. Default max concurrent iterations: 30.

Early termination policy: BanditPolicy (slack factor 0.2).

Cross-validation folds: default 5.

The term "featurization" is used for automated data preprocessing.

The term "ensemble" for combining models.

Edge Cases and Exceptions

Small datasets: AutoML may overfit. Use cross-validation and limit iterations.

Missing values in target column: AutoML will fail. Ensure target column has no missing values.

Time-series forecasting: Must specify time column and time series ID. AutoML cannot infer these automatically.

Categorical features with high cardinality: AutoML may struggle; consider grouping rare categories.

How to Eliminate Wrong Answers

If the question asks about automating model selection and tuning, look for AutoML.

If the question mentions "no machine learning experience" or "quickly build a model", AutoML is likely the answer.

If the question involves unsupervised learning (clustering without labels), AutoML is not appropriate.

If the question emphasizes manual control over every step, AutoML is not the answer.

Key Takeaways

AutoML automates algorithm selection, hyperparameter tuning, and data preprocessing for supervised learning tasks.

It supports classification, regression, and time-series forecasting.

Default max iterations is 100; default max concurrent iterations is 30.

Default primary metric for classification is accuracy; for regression is normalized_root_mean_squared_error.

Early termination default is BanditPolicy with slack factor 0.2.

AutoML requires labeled data and a compute target (CPU/GPU cluster).

The best model can be an ensemble of multiple models for improved performance.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

AutoML

Automates algorithm selection, hyperparameter tuning, and featurization

Requires minimal ML expertise to use

Faster time to model for standard problems

Less control over individual steps

Best for rapid prototyping and non-experts

Manual ML

Requires manual algorithm selection, hyperparameter tuning, and feature engineering

Requires deep ML expertise

Slower, but allows fine-grained control

Full control over every step

Best for research and custom architectures

AutoML Classification

Predicts categorical outcomes (e.g., yes/no, red/green/blue)

Primary metric: accuracy, AUC, F1, etc.

Algorithms include logistic regression, decision tree, random forest, gradient boosting, SVM

Handles imbalanced classes with SMOTE or class weighting

Cross-validation typically uses stratified folds

AutoML Regression

Predicts continuous numerical outcomes (e.g., price, temperature)

Primary metric: RMSE, MAE, R2, normalized RMSE

Algorithms include linear regression, elastic net, gradient boosting, random forest, neural network

No special handling for imbalance

Cross-validation uses standard k-fold

Watch Out for These

Mistake

AutoML can work with any type of data, including images and text.

Correct

AutoML in Azure ML primarily supports tabular data (structured). For images and text, Azure offers separate services like Custom Vision and Text Analytics. AutoML can featurize text columns but not handle raw images.

Mistake

AutoML always chooses the most complex model like deep neural networks.

Correct

AutoML evaluates many algorithms, including simple ones like logistic regression. It selects the best based on performance, not complexity. Often simpler models perform well.

Mistake

AutoML requires you to specify the algorithm to use.

Correct

AutoML automatically tries multiple algorithms. You can restrict allowed models, but it is not required. The default is to try all applicable algorithms.

Mistake

AutoML only works on small datasets (less than 1000 rows).

Correct

AutoML works on datasets of any size, but performs best with at least 1000 rows. Very small datasets may lead to overfitting or poor model selection.

Mistake

AutoML cannot handle missing values in the data.

Correct

AutoML automatically imputes missing values using strategies like mean, median, or most frequent. You can also provide custom imputation. It does not fail on missing data.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between AutoML and hyperparameter tuning?

Hyperparameter tuning is a part of AutoML. AutoML automates the entire pipeline: data preprocessing (featurization), algorithm selection, and hyperparameter tuning. Hyperparameter tuning alone only adjusts parameters of a fixed algorithm. AutoML tries multiple algorithms as well, making it more comprehensive.

Can AutoML handle time-series forecasting?

Yes, AutoML has built-in time-series forecasting capabilities. You must specify the time column, time series ID (if multiple series), and forecast horizon. AutoML automatically creates lag features, rolling window aggregates, and seasonal indicators. It includes algorithms like ARIMA, Prophet, and gradient boosting.

How long does an AutoML run typically take?

It depends on data size, number of iterations, and compute resources. A typical run with 100 iterations on a medium-sized dataset (10k rows, 20 features) may take 1-3 hours on a Standard_DS3_v2 cluster. GPU clusters can be faster for neural networks. You can set a timeout to limit duration.

Do I need to clean my data before using AutoML?

AutoML can handle missing values and scale features, but you should still clean data: remove duplicates, correct errors, and ensure target column has no missing values. AutoML cannot fix fundamentally bad data. Data cleaning is still your responsibility.

What is the ensemble model in AutoML?

An ensemble model combines multiple top-performing models (e.g., via voting or stacking) to produce a single prediction. AutoML can automatically create an ensemble at the end of the run, which often yields better accuracy than any single model. It is optional and can be disabled.

Can I use AutoML for deep learning?

AutoML includes neural network algorithms (e.g., MLP, CNN for tabular data), but it is not optimized for deep learning on images or text. For image classification, use Azure Custom Vision or AutoML for images (separate service). For text, use Azure Text Analytics or AutoML for NLP.

What happens if I set max_total_iterations too low?

If max_total_iterations is too low (e.g., 5), AutoML may not explore enough algorithms or hyperparameters, resulting in a suboptimal model. The default of 100 is a good starting point. For large datasets, you may need more iterations to converge.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Automated ML (AutoML) — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?