AI-900Chapter 92 of 100Objective 2.4

MLOps Concepts: Model Registry and Monitoring

AI-900 Domain 2, Objective 2.4 (Describe MLOps) covers critical MLOps concepts Model Registry and Monitoring. These topics appear in approximately 5-10% of exam questions under Domain 2: Machine Learning, Objective 2.4 (Describe MLOps). Understanding how to version, track, and monitor models in production is essential for operationalizing AI. This chapter provides a deep dive into the mechanisms, components, and best practices for model registry and monitoring in Azure Machine Learning.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

The Library Card Catalog and Returns Desk

How does a large public library manage thousands of books (machine learning models)? The library has a card catalog system — this is the Model Registry. Each book (model) gets a unique catalog entry with its title, author, version, publication date, and location on the shelf. Librarians (data scientists) can search the catalog to find a specific edition (model version) and check out a copy for use. When a book is returned (model is deployed and monitored), the librarian inspects it for damage or missing pages (model performance metrics) and records the condition in the catalog. If a book is damaged (model drift), the librarian either repairs it (retrain) or removes it from circulation (archive). The returns desk is the Monitoring system: it tracks how often each book is borrowed (inference requests), how long it takes to process each checkout (latency), and whether any pages are torn (prediction errors). If a book is returned with a torn page (concept drift), the librarian flags it and sends it to the bindery (retraining pipeline). This entire system ensures that only the best, most up-to-date books are available to patrons, and that any problems are caught early before they affect readers.

How It Actually Works

What is MLOps and Why Does It Matter?

MLOps (Machine Learning Operations) is the practice of applying DevOps principles to machine learning workflows. It aims to streamline the lifecycle of ML models from development to production, ensuring reliability, scalability, and governance. The AI-900 exam tests your understanding of two key MLOps components: Model Registry and Monitoring. These are essential for managing model versions, tracking performance, and detecting issues like data drift and model decay.

Model Registry: Centralized Model Management

A Model Registry is a central repository that stores, versions, and manages machine learning models. In Azure Machine Learning, the Model Registry is part of the workspace and allows you to register models with metadata, tags, and descriptions. Each registered model gets a unique ID and version number. The registry supports: - Versioning: Every time you register a model with the same name, Azure AutoML increments the version number automatically. You can also manually specify a version. - Metadata: You can add custom tags (e.g., 'accuracy': '0.95', 'dataset': 'sales_2023') and descriptions to help organize models. - Lifecycle management: Models can be archived or promoted to different stages (e.g., 'Staging', 'Production') using tags or Azure ML pipelines. - Search and discovery: You can search for models by name, tag, or description using the Azure ML SDK, CLI, or studio UI.

How Model Registry Works Internally

When you register a model using the Azure ML SDK, the following steps occur: 1. The model file (e.g., .pkl, .pt, .h5) is uploaded to the workspace’s default blob storage container. 2. A new entry is created in the registry database (Azure Cosmos DB) with the model name, version, tags, and a pointer to the blob storage location. 3. The model is assigned a unique ID (e.g., 'my_model:1'). 4. If the model name already exists, the version is incremented (unless you specify an existing version).

Example code using Azure ML SDK v2:

from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model
from azure.identity import DefaultAzureCredential

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace_name)
model = Model(
    path="./model.pkl",
    name="sales-forecast-model",
    description="Model for forecasting monthly sales",
    tags={"accuracy": "0.92", "dataset": "sales_2023"}
)
registered_model = ml_client.models.create_or_update(model)
print(f"Registered model: {registered_model.name}:{registered_model.version}")

Monitoring: Keeping Models Healthy in Production

Monitoring in ML refers to tracking the performance and behavior of deployed models over time. Azure Machine Learning provides monitoring capabilities through: - Data Drift Monitoring: Detects changes in input data distribution compared to the training data. It uses statistical tests like Population Stability Index (PSI) or Kolmogorov-Smirnov test. - Model Performance Metrics: Tracks accuracy, precision, recall, etc., if ground truth labels are available. - Inference Metrics: Monitors request latency, error rates, and throughput. - Model Retraining Triggers: Can automatically trigger retraining when drift or performance degradation is detected.

Key Components of Azure ML Monitoring

Datasets: You create baseline datasets (training data) and target datasets (production data) to compare.

Monitor Schedule: You define how often to run the monitoring job (e.g., daily, weekly).

Alert Rules: You set thresholds for drift metrics (e.g., PSI > 0.2) to trigger alerts or actions.

Drift Report: Azure ML generates a report showing drift magnitude, top contributing features, and visualizations.

Configuring Monitoring in Azure ML

To set up data drift monitoring: 1. Register the baseline dataset (training data) as a dataset in Azure ML. 2. Register the production data as a dataset (e.g., from a data store). 3. Create a monitor using the Azure ML SDK or studio.

Example CLI command:

az ml data-monitor create --name my-monitor --baseline-dataset training-data --target-dataset production-data --schedule "daily" --drift-threshold 0.2

How Monitoring Works Internally

When a monitoring job runs: 1. It samples the target dataset (production data) and compares it to the baseline dataset. 2. It computes drift metrics for each feature (e.g., PSI for categorical, Wasserstein distance for numerical). 3. It aggregates metrics into an overall drift score. 4. If the score exceeds the threshold, an alert is generated. 5. The results are stored in the Azure ML workspace and can be viewed in the studio.

Interaction with Related Technologies

Model Registry and Monitoring integrate with: - Azure DevOps / GitHub Actions: For CI/CD pipelines that register models and deploy them. - Azure Container Instances / Kubernetes: For model deployment. - Azure Data Factory: For data ingestion and preparation. - Azure Monitor: For infrastructure-level monitoring (e.g., CPU, memory).

Default Values and Timers

Model version: Starts at 1 and increments by 1 for each registration with the same name.

Monitoring schedule: Minimum interval is 1 hour for real-time endpoints, 1 day for batch endpoints.

Drift threshold: Default PSI threshold is 0.2 (common industry standard).

Alert action: Can send email, trigger Azure Function, or run a pipeline.

Best Practices

Use tags to track model lineage (e.g., training script, dataset version, hyperparameters).

Archive outdated models instead of deleting them.

Set up monitoring for all production models.

Use multiple drift metrics (e.g., PSI and KS test) for robust detection.

Automate retraining with Azure ML pipelines when drift is detected.

Walk-Through

First, you train a model using Azure ML or any other environment. Then, you register the model by uploading it to the workspace. This creates a versioned entry in the Model Registry. You can add tags and descriptions to make the model discoverable. The model file is stored in the workspace's default blob storage, and metadata is stored in Azure Cosmos DB. Use the `ml_client.models.create_or_update()` method in Python SDK v2, or the Azure ML studio UI. After registration, you can deploy the model to an endpoint.

Deploy the Model to an Endpoint

After registration, you deploy the model to a real-time endpoint (ACI or AKS) or batch endpoint. The deployment creates a containerized web service that accepts inference requests. You specify the model version, compute target, and environment. The deployment is tracked in the Azure ML workspace. You can also enable logging and monitoring during deployment. The endpoint URL is used by applications to send data and receive predictions.

Set Up Data Drift Monitor

In the Azure ML studio, navigate to the Monitoring section. Create a new monitor by selecting a baseline dataset (usually the training data) and a target dataset (production data). Define the schedule (e.g., daily) and the alert threshold for drift (e.g., PSI > 0.2). Optionally, configure actions like sending an email or triggering a retraining pipeline. The monitor runs on the specified schedule and compares the distributions of features between baseline and target datasets.

Review Drift Reports and Alerts

After each monitoring run, Azure ML generates a drift report. You can view it in the studio. The report shows the overall drift magnitude, top contributing features, and visualizations like histograms and PSI values. If the drift exceeds the threshold, an alert is triggered. You can investigate which features are drifting and decide whether to retrain the model. The report also includes recommendations for retraining.

Trigger Retraining Pipeline on Drift

When drift is detected and an alert fires, you can manually or automatically trigger a retraining pipeline. In Azure ML, you can set up an Azure Function or Logic App that listens for the alert and starts a pipeline. The pipeline uses the latest production data (or a combination of old and new data) to retrain the model. The new model is then registered with an incremented version and deployed, replacing the old model. This closes the MLOps loop.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Recommendation System

A large e-commerce company uses a machine learning model to recommend products to users. The model is trained on historical purchase data and user interactions. Over time, user behavior changes (e.g., seasonal trends, new products). Without monitoring, the model's accuracy degrades, leading to poor recommendations and lost revenue. The company sets up Azure ML Model Registry to version models and track performance. They deploy the model to an AKS cluster and enable data drift monitoring on the input features (e.g., user demographics, browsing history). The monitor runs daily. When drift is detected (e.g., a sudden shift in popular categories), an alert triggers an Azure ML pipeline that retrains the model with the latest data. The new model is automatically registered and deployed via a blue-green deployment strategy, ensuring zero downtime.

Enterprise Scenario 2: Financial Fraud Detection

A bank uses a model to detect fraudulent transactions. The model must be highly accurate to avoid false positives that annoy customers. The bank registers each model version with tags indicating the training dataset date and performance metrics (e.g., AUC, precision). They deploy the model to a real-time endpoint with high availability. Monitoring includes both data drift and model performance drift (using ground truth labels from fraud investigations that come in after a delay). The monitoring schedule is hourly due to the high volume of transactions. When performance metrics drop below a threshold (e.g., recall < 0.9), an alert is sent to the data science team. The team can then roll back to a previous model version using the Model Registry. This scenario highlights the importance of versioning and rollback capabilities.

Common Pitfalls

Not archiving old models: Deleting models can break reproducibility. Always archive them.

Setting drift thresholds too low: Causes false alarms. Use industry standards (PSI > 0.2) and adjust based on domain.

Ignoring monitoring for batch endpoints: Batch models also need monitoring, especially if they run on schedule.

Not monitoring all features: Drift in even one important feature can degrade performance. Monitor all features used in the model.

How AI-900 Actually Tests This

What AI-900 Tests on MLOps Concepts

Objective 2.4: Describe MLOps. The exam expects you to understand the purpose and basic components of Model Registry and Monitoring. You will NOT be asked to write code or configure monitors. Instead, you must recognize the correct definitions and use cases.

Specific topics tested: - Model Registry: Central repository for versioning and managing models. Key terms: 'register', 'version', 'tag', 'archive'. - Monitoring: Tracking model performance and data drift. Key concepts: 'data drift', 'concept drift', 'baseline dataset', 'target dataset', 'drift threshold'. - Automated retraining: Triggered by monitoring alerts.

Common Wrong Answers and Why They Are Chosen

'Model Registry stores only the latest model version.' This is wrong because the registry stores all versions. Candidates confuse it with a simple file store that overwrites.

'Monitoring only tracks inference latency.' Wrong – monitoring also tracks data drift, model accuracy, and input distributions. Candidates think of application monitoring only.

'Data drift is the same as concept drift.' Wrong – data drift is changes in input distribution; concept drift is changes in the relationship between inputs and outputs. The exam tests the distinction.

'You must manually trigger retraining when drift is detected.' Wrong – Azure ML can automate retraining via pipelines. Candidates may not know about automation.

Specific Numbers and Terms That Appear on the Exam

PSI (Population Stability Index): A common metric for data drift. Threshold often 0.2.

Baseline dataset: The training data used as reference.

Target dataset: The production data being monitored.

Versioning: Starts at 1, increments automatically.

Tags: Key-value pairs for metadata.

Edge Cases and Exceptions

No ground truth labels: Monitoring can still detect data drift even if labels are unavailable.

Multiple models in one endpoint: Each model should be registered separately.

Batch vs. real-time: Monitoring works for both, but schedule intervals differ (1 hour for real-time, 1 day for batch).

How to Eliminate Wrong Answers

If an answer says 'only one version' or 'overwrites', it's wrong – registry keeps all versions.

If an answer says 'monitoring only checks latency', it's wrong – it checks data drift and performance too.

If an answer says 'must manually retrain', it's wrong – automation is possible.

If an answer confuses data drift and concept drift, it's wrong – know the difference.

Key Takeaways

Model Registry is a central repository that stores all versions of ML models with metadata and tags.

Model versioning in Azure ML starts at 1 and increments automatically when registering a model with the same name.

Monitoring in MLOps includes data drift detection (using PSI, KS test) and model performance tracking.

Data drift monitoring compares a baseline dataset (training data) to a target dataset (production data) on a schedule.

The default PSI threshold for data drift alerting is 0.2.

Monitoring can trigger automated retraining pipelines in Azure ML.

Data drift and concept drift are distinct; data drift is input distribution change, concept drift is input-output relationship change.

Model Registry supports archiving outdated models instead of deleting them.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Model Registry

Supports versioning with automatic increment.

Stores metadata (tags, descriptions) for searchability.

Integrates with Azure ML pipelines and deployments.

Allows archiving and lifecycle management.

Provides a unique ID for each model version.

Simple File Storage

Overwrites files with the same name; no versioning.

No metadata support; relies on folder structure.

No native integration with ML workflows.

No lifecycle management; manual deletion.

No unique IDs; uses file paths.

Data Drift Monitoring

Detects changes in input feature distributions.

Uses statistical tests like PSI or KS test.

Does not require ground truth labels.

Alerts on drift threshold (e.g., PSI > 0.2).

Commonly used for early warning of model degradation.

Model Performance Monitoring

Detects changes in model accuracy, precision, recall, etc.

Requires ground truth labels (often delayed).

Measures actual prediction quality.

Alerts on performance metric thresholds.

Used to confirm model degradation after drift is detected.

Watch Out for These

Mistake

Model Registry stores only the latest version of a model.

Correct

The Model Registry stores every version of a model. Each registration creates a new version entry, and all versions are retained unless explicitly archived or deleted.

Mistake

Monitoring in MLOps only tracks inference latency and error rates.

Correct

Monitoring also tracks data drift (changes in input distribution) and model performance drift (changes in accuracy, precision, etc.) if ground truth labels are available.

Mistake

Data drift and concept drift are the same thing.

Correct

Data drift refers to changes in the distribution of input features. Concept drift refers to changes in the relationship between inputs and outputs (the underlying function). They are different and require different detection methods.

Mistake

Once a model is deployed, you don't need to monitor it if it was accurate during testing.

Correct

Models can degrade over time due to changes in real-world data. Continuous monitoring is essential to detect drift and maintain performance.

Mistake

You must manually trigger retraining when drift is detected.

Correct

Azure ML can automate retraining by triggering a pipeline when an alert fires. Manual intervention is optional but not required.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between data drift and concept drift?

Data drift refers to changes in the distribution of input features over time, while concept drift refers to changes in the underlying relationship between inputs and outputs. For example, if customer age distribution shifts (data drift) or if the same age group starts buying different products (concept drift). Both can degrade model performance, but they require different detection methods.

How does Azure ML Model Registry version models?

When you register a model with a name that already exists, Azure ML automatically increments the version number by 1. You can also specify a custom version. Each version is stored separately with its own metadata and file location. The registry never overwrites previous versions.

What is a baseline dataset in monitoring?

A baseline dataset is the reference dataset (usually the training data) used to compare against production data. It represents the expected distribution of features. Monitoring jobs compute drift metrics by comparing the baseline to the target dataset (production data).

Can monitoring work without ground truth labels?

Yes. Data drift monitoring does not require labels. It only needs input features. However, model performance monitoring (e.g., accuracy) requires labels. If labels are delayed, you can still detect drift in inputs as an early warning.

What is the default schedule for data drift monitoring?

The minimum schedule interval is 1 hour for real-time endpoints and 1 day for batch endpoints. You can configure longer intervals like weekly or monthly.

How do I trigger retraining automatically when drift is detected?

In Azure ML, you can set up an alert action that runs a pipeline. For example, when a drift alert fires, it can trigger an Azure ML pipeline that retrains the model with new data and registers the new version.

What is PSI and why is 0.2 a common threshold?

PSI (Population Stability Index) is a statistical measure of how much a distribution has changed between two samples. A PSI less than 0.1 indicates no change, 0.1-0.2 indicates moderate change, and greater than 0.2 indicates significant change. The threshold of 0.2 is an industry standard for flagging data drift.

Terms Worth Knowing

Artificial intelligence Computer vision Generative AI Machine learning Natural language processing Responsible AI

Ready to put this to the test?

You've just covered MLOps Concepts: Model Registry and Monitoring — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Try AI-900 practice questions Back to all chapters

Done with this chapter?

No-Code AI Tools: Lobe, Teachable Machine

AI Infrastructure Costs and GPU Compute

See the full AI-900 study guide