AI-900Chapter 48 of 100Objective 2.4

Azure ML Notebooks and Compute Clusters

This chapter covers Azure Machine Learning notebooks and compute clusters, two core components for developing and scaling ML models. For the AI-900 exam, understanding when to use a compute instance vs. a compute cluster is crucial, as roughly 10-15% of questions touch on compute targets and their configurations. You will learn the mechanics of each, how to set them up, and how they interact with other Azure ML resources. This knowledge is foundational for the 'Machine Learning' domain objective 2.4.

25 min read
Intermediate
Updated May 31, 2026

Azure ML Notebooks as a Shared Lab Bench

Imagine a university chemistry lab with 200 students. Each student has their own lab bench (a compute instance) where they can mix chemicals (run code) and write notes (document experiments). The lab also has a shared whiteboard (the notebook) that everyone can see and edit simultaneously. When a student needs to run a complex reaction that requires more heat or time, they can submit their experiment to a central processing facility (a compute cluster) that has specialized equipment (GPU nodes) and runs it on their behalf. The student doesn't need to own that equipment; they just submit the job and pick up the results later. The shared whiteboard allows students to collaborate on the same experiment steps, but each student works at their own bench. The central facility can handle many experiments at once, scaling up to use multiple pieces of equipment. If a student's bench is too small for a big experiment, they use the central facility. The lab manager (Azure ML) allocates resources, ensures no two students interfere, and cleans up after experiments. This mirrors how Azure ML provides notebooks for interactive development on compute instances, and compute clusters for running large-scale training jobs in parallel.

How It Actually Works

What are Azure ML Notebooks and Compute Clusters?

Azure ML provides two primary compute targets for running code: compute instances and compute clusters. Both are managed cloud-based virtual machines (VMs) that allow you to run Jupyter notebooks, Python scripts, or training jobs without managing infrastructure. The key difference is their purpose and lifecycle.

Compute Instance: A fully managed cloud workstation for data scientists. It is an always-on VM that you can use for interactive development, experimentation, and small-scale training. Think of it as your personal development environment.

Compute Cluster: A scalable set of VMs that automatically scales up and down based on job demand. It is designed for running large training jobs in parallel, often using multiple nodes with GPUs. It is not interactive; you submit jobs to it.

How Compute Instances Work Internally

When you create a compute instance in Azure ML: 1. You select a VM size (e.g., Standard_DS3_v2 with 4 vCPUs, 14 GB RAM) or a GPU-enabled size (e.g., Standard_NC6). 2. Azure ML provisions the VM in your workspace's associated virtual network (or a managed VNet). 3. The VM is configured with a pre-built environment including Python, Jupyter, and common ML libraries (scikit-learn, PyTorch, TensorFlow, etc.). 4. You can access the instance via JupyterLab, Jupyter Notebook, or RStudio (if configured). 5. The instance persists until you explicitly stop or delete it. You are billed for the VM uptime, even when idle. 6. You can install additional packages using pip or conda within the instance's terminal. 7. The instance has a built-in SSH key authentication for secure access.

How Compute Clusters Work Internally

A compute cluster is an Azure Machine Learning managed compute target that enables distributed training and batch inference. Its behavior is governed by: - Scale settings: You define the minimum and maximum number of nodes. The cluster autoscales between these bounds based on job queue length. - Idle time: After a job completes, the cluster waits a configurable idle time (default 120 seconds) before scaling down. This prevents premature deallocation if jobs are submitted in quick succession. - VM priority: You can choose between dedicated VMs (guaranteed capacity, higher cost) and low-priority VMs (cheaper but can be preempted). - Node provisioning: When a job is submitted, the cluster provisions the required number of nodes (up to max) from the selected VM size. Nodes are added one by one; provisioning takes a few minutes. - Job scheduling: Jobs are queued and assigned to nodes by Azure ML's scheduler. Each job can request a specific number of nodes and GPUs. - Networking: Nodes are connected via InfiniBand (for GPU clusters) for high-speed inter-node communication, enabling efficient distributed training.

Key Configuration Values and Defaults

- Compute Instance: - Default idle shutdown: None (always on unless you set an idle shutdown policy). - You can set an auto-shutdown schedule to save costs (e.g., stop at 6 PM daily). - SSH port: 22 (allowed through NSG). - Jupyter port: 8888. - Compute Cluster: - Default idle seconds before scale down: 120. - Minimum nodes: 0 (cluster can scale to zero). - Maximum nodes: up to 1000 (subject to quota). - VM priority: Dedicated (default) or Low Priority. - Node provisioning timeout: 30 minutes (if nodes fail to provision, job fails).

Configuration and Verification Commands

To create a compute instance using Azure CLI:

az ml compute create --name my-instance --size Standard_DS3_v2 --type ComputeInstance --workspace-name my-workspace --resource-group my-rg

To create a compute cluster:

az ml compute create --name my-cluster --size Standard_DS3_v2 --type AmlCompute --min-nodes 0 --max-nodes 4 --idle-seconds-before-scaledown 300 --workspace-name my-workspace --resource-group my-rg

To list compute targets:

az ml compute list --workspace-name my-workspace --resource-group my-rg

Interaction with Related Technologies

Datastores: Both compute instances and clusters can access data from datastores (Azure Blob, Azure Data Lake, etc.) via mounted paths or direct download.

Environments: Compute clusters use Azure ML environments (conda dependencies, Docker images) to run jobs. Compute instances have a pre-built environment but you can customize it.

Pipelines: Compute clusters are the primary compute target for Azure ML pipelines. You can define pipeline steps that run on a cluster.

AutoML: Automated ML runs on compute clusters by default, though you can use a compute instance for small datasets.

Performance Considerations

Compute Instance: Suitable for datasets that fit in memory (e.g., <10 GB for a Standard_DS3_v2). Use GPU instances for deep learning prototyping.

Compute Cluster: For large-scale training, use GPU clusters with InfiniBand. The cluster's autoscaling can take 2-5 minutes to add nodes, so batch jobs may incur startup latency.

Cost: Compute instances cost more per hour than cluster nodes because they are always on. Use auto-shutdown to avoid charges.

Quota: Your subscription has a regional vCPU quota. You must request quota increases for large clusters.

Walk-Through

1

Create a Compute Instance

In Azure Machine Learning studio, navigate to Compute > Compute Instances. Click 'New' and select a VM size. For most ML tasks, a DS3_v2 (4 vCPU, 14 GB RAM) is sufficient. Name the instance and optionally configure SSH access. The provisioning takes 2-5 minutes. Once ready, you can open JupyterLab or Jupyter Notebook directly from the studio. This is your interactive environment for data exploration and model prototyping.

2

Develop a Notebook on Instance

Open JupyterLab and create a new notebook. Write code to load data from a datastore, preprocess it, and train a simple model. The notebook runs on the compute instance's kernel. You can install additional packages using pip in a terminal. The instance has internet access by default (unless restricted by a VNet). Save the notebook in the workspace's default storage. This step is for interactive development only.

3

Create a Compute Cluster

Navigate to Compute > Compute Clusters. Click 'New'. Select a VM size (e.g., Standard_NC6 for GPU). Set the minimum nodes to 0 and maximum to 4. Set idle seconds before scale down to 120 or higher. Choose Dedicated or Low Priority. The cluster is created immediately but has zero nodes. When you submit a job, it will scale up.

4

Submit a Training Job to Cluster

From your notebook (or a script), create an estimator or a command job that references your training script. Set the compute target to the cluster name. Submit the job. Azure ML queues the job and the cluster begins provisioning nodes. The job runs on one or more nodes depending on the node count requested. You can monitor the job in the studio under Experiments.

5

Monitor Autoscaling and Job Status

While the job runs, the cluster may add more nodes if other jobs are queued. In the studio, under Compute > Compute Clusters, you can see the current node count and job queue length. After the job completes, the cluster waits for the idle timeout (e.g., 120 seconds) then scales down nodes one by one. You can also manually set the cluster to 0 nodes to force scale down.

What This Looks Like on the Job

Enterprise Scenario 1: Data Science Team Prototyping

A financial services company has a team of 10 data scientists developing fraud detection models. Each data scientist gets their own compute instance (Standard_DS3_v2) for daily work. They use Jupyter notebooks to explore transaction data and test feature engineering. The instances are configured with auto-shutdown at 7 PM to save costs. The team shares notebooks via Azure ML's notebook versioning and Git integration. When a model is ready for production training, the lead data scientist submits a training job to a compute cluster (Standard_NC6 with 4 nodes) that runs for 2 hours. The cluster autoscales to 4 nodes, trains the model using distributed XGBoost, and scales down after idle. The company saves 60% cost compared to always-on VMs.

Enterprise Scenario 2: Large-Scale Deep Learning

A healthcare AI startup trains medical image segmentation models using PyTorch. They use a compute cluster with 8 Standard_NC24rs_v3 VMs (each with 4 GPUs, InfiniBand). The cluster is configured with min 0, max 8, and idle timeout of 300 seconds (5 minutes) because jobs are submitted frequently. They use low-priority VMs to reduce costs by 80%, accepting the risk of preemption. When a preemption occurs, Azure ML automatically resubmits the job. The cluster's InfiniBand network enables efficient all-reduce communication, reducing training time from 3 days to 6 hours. Misconfiguration example: if the idle timeout is set too low (e.g., 60 seconds), the cluster scales down between jobs, causing unnecessary provisioning delays.

Common Pitfalls in Production

Quota exceeded: Submitting a job requesting 8 nodes of NC24rs_v3 (32 vCPUs) when the quota is 24 vCPUs causes the job to fail. Always check quota in the Azure portal.

Idle timeout too short: Setting idle seconds to 30 causes the cluster to scale down immediately after a job, then scale up for the next job, wasting time. Recommended: 300 seconds for batch workloads.

Using compute instance for large training: A data scientist tries to train a deep learning model on a DS3_v2 instance. The instance runs out of memory and crashes. They should use a GPU cluster instead.

How AI-900 Actually Tests This

AI-900 Objective 2.4: Identify compute targets for ML workloads

The AI-900 exam tests your ability to choose the right compute target for a given scenario. Key objective codes: 'Identify compute options for ML' and 'Describe when to use compute instances vs. clusters.'

Common Wrong Answers and Why

1.

'Use a compute cluster for interactive development.' This is wrong because clusters are not interactive; they run jobs asynchronously. Candidates confuse clusters with instances.

2.

'Compute instances can scale automatically based on load.' Wrong. Compute instances are single VMs; they do not autoscale. Only clusters do.

3.

'You can run Jupyter notebooks directly on a compute cluster.' Partially true: you can submit a notebook as a job, but you cannot interactively edit it. The exam expects you to know that notebooks are for instances.

4.

'Compute instances are cheaper than clusters.' Wrong. Instances are always on, so they cost more per hour. Clusters scale to zero and use low-priority VMs for savings.

Specific Numbers and Terms on the Exam

Default idle time for cluster scale down: 120 seconds.

VM priority options: Dedicated (guaranteed) and Low Priority (preemptible).

Compute instance sizes: 'Standard_DS3_v2' is a common example.

GPU sizes: 'Standard_NC6' (one GPU) and 'Standard_NC24rs_v3' (4 GPUs with InfiniBand).

Minimum nodes: 0 (cluster scales to zero).

Edge Cases and Exceptions

Low-priority VMs: The exam may ask what happens when a low-priority VM is preempted. Answer: Azure ML automatically resubmits the job (if the job is configured for retry).

VNet integration: Both instances and clusters can be deployed inside a VNet for secure access. This is a common scenario.

Compute instance SSH: The exam might ask how to connect to a compute instance. Answer: via SSH using the public key configured during creation.

How to Eliminate Wrong Answers

If the scenario says 'interactive development' or 'prototyping', the answer is compute instance.

If the scenario says 'large dataset' or 'distributed training', the answer is compute cluster.

If the question mentions 'autoscaling', it is a cluster, not an instance.

If the question mentions 'cost savings' and 'batch jobs', look for low-priority cluster.

Key Takeaways

Compute instances are for interactive development; compute clusters are for batch training.

Compute clusters autoscale between min (default 0) and max nodes based on job queue.

Default idle timeout before scale down is 120 seconds.

Low-priority VMs reduce cost but can be preempted; jobs are automatically resubmitted.

Compute instances can be stopped to save costs; use auto-shutdown schedules.

GPU VM sizes like Standard_NC6 are used for deep learning.

Both compute targets can be deployed inside a VNet for secure access.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Compute Instance

Always-on single VM for interactive development

Access via Jupyter, JupyterLab, or RStudio

No autoscaling; manual start/stop

Billed for uptime regardless of activity

Suitable for prototyping and small datasets

Compute Cluster

Scalable set of VMs for batch training

Submit jobs via SDK/CLI; no interactive access

Autoscales between min and max nodes based on job queue

Only billed when nodes are running; scales to zero

Suitable for large-scale and distributed training

Watch Out for These

Mistake

Compute instances can be used for running production training jobs at scale.

Correct

Compute instances are single VMs designed for interactive development, not scalable training. Use compute clusters for production-scale jobs.

Mistake

Compute clusters are always running and incur cost even when idle.

Correct

Compute clusters can scale to zero nodes when idle (if min nodes is set to 0). You only pay for running nodes.

Mistake

You can edit notebooks directly on a compute cluster.

Correct

Compute clusters do not provide a Jupyter interface. You submit jobs to them. Notebook editing is done on a compute instance.

Mistake

Low-priority VMs are always available and never preempted.

Correct

Low-priority VMs can be preempted when Azure needs capacity. They are cheaper but not reliable for long-running critical jobs.

Mistake

The default idle timeout for compute clusters is 60 seconds.

Correct

The default idle timeout is 120 seconds. You can configure it from 60 to 3600 seconds.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between a compute instance and a compute cluster in Azure ML?

A compute instance is a single, always-on VM for interactive development with Jupyter notebooks. A compute cluster is a scalable pool of VMs that autoscales based on job demand and is used for running training jobs asynchronously. Use instances for prototyping, clusters for production training.

Can I run a Jupyter notebook on a compute cluster?

You can submit a notebook as a job to a compute cluster, but you cannot interactively edit it. For interactive notebook editing, use a compute instance.

How does autoscaling work for compute clusters?

The cluster monitors the job queue. When a job is submitted, it provisions nodes up to the maximum. After a job completes, it waits for the idle timeout (default 120 seconds) before scaling down. It can scale to zero nodes if min is 0.

What happens if a low-priority VM in a compute cluster is preempted?

Azure ML automatically resubmits the job that was running on the preempted VM. The job will be retried on a new node. This is transparent to the user.

How can I save costs on compute instances?

Set an auto-shutdown schedule to stop the instance during non-working hours. You can also manually stop it when not in use. You are only billed for uptime.

What VM sizes are recommended for deep learning?

GPU-enabled VM sizes like Standard_NC6 (1 GPU), Standard_NC12 (2 GPUs), or Standard_NC24rs_v3 (4 GPUs with InfiniBand) are recommended for deep learning training.

Can I install custom packages on a compute instance?

Yes, you can open a terminal on the compute instance and use pip or conda to install packages. However, for reproducibility in jobs, use Azure ML environments.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure ML Notebooks and Compute Clusters — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?