This chapter covers Azure ML Designer, a drag-and-drop visual interface for building, testing, and deploying machine learning models without writing code. For the AI-900 exam, understanding Designer's role as a low-code/no-code tool is critical, as it appears in roughly 10-15% of questions related to automated ML and visual tools. This chapter explains what Designer is, how it works internally, its key components, and how it compares to other Azure ML tools. You will learn exactly what the exam tests and how to avoid common traps.
Jump to a section
Imagine you are a chef designing a new dish. Instead of writing out every step in text, you have a magnetic board with pre-printed cards: one for "chop onions," another for "sauté in butter," another for "simmer for 20 minutes." Each card represents a cooking operation with specific settings (heat level, time). You arrange these cards in sequence on the board, connecting the output of one card to the input of the next. For example, the "chop onions" card outputs chopped onions, which feeds into the "sauté" card. The board allows you to split flows (e.g., half the onions go to sauté, half to garnish) and merge flows (e.g., combine sautéed onions with broth). Once you've arranged all cards, you press a button and the kitchen executes the recipe automatically, producing the final dish. If you want to test a variation, you can duplicate the board, swap one card (e.g., use "roast" instead of "sauté"), and compare results. The cards are reusable, and the board saves your recipe for later. This is exactly how Azure ML Designer works: pre-built modules (like "Train Model" or "Split Data") are dragged onto a canvas, connected to form a pipeline, and then submitted to run on compute resources. Each module has configurable parameters (e.g., number of folds in cross-validation). The pipeline can be cloned and modified for experimentation. Just as a chef doesn't need to know how to build a stove from scratch, a data scientist doesn't need to code every algorithm — they focus on the recipe, not the infrastructure.
What is Azure ML Designer?
Azure ML Designer is a visual, drag-and-drop interface within Azure Machine Learning that allows users to create machine learning pipelines without writing code. It provides a canvas where you can select pre-built modules for data ingestion, transformation, training, scoring, and evaluation, then connect them to form an end-to-end workflow. The Designer is part of the Azure ML studio and is designed for users who prefer visual programming or need to rapidly prototype ML solutions.
Why It Exists
Not all data scientists or analysts are proficient in coding. Designer lowers the barrier to entry for building ML models. It also speeds up development by providing a library of validated, reusable modules. For organizations with mixed skill sets, Designer enables collaboration between citizen data scientists (who use the visual interface) and professional data scientists (who may write custom code in notebooks). The exam tests your understanding of when to use Designer versus other tools like Automated ML or the Python SDK.
How It Works Internally
When you build a pipeline in Designer, you are essentially creating a directed acyclic graph (DAG) of operations. Each module is a Python script or binary that runs on a compute target. The Designer serializes the graph into a JSON representation, which is then submitted to the Azure ML service. The service orchestrates the execution: it provisions the compute (if not already running), downloads the data, runs each module in dependency order, and passes intermediate datasets between steps via the Azure ML datastore. Outputs from one module become inputs to the next. The entire pipeline can be run on a schedule or triggered by events.
Key Components
Canvas: The visual workspace where you drag and drop modules.
Modules: Pre-built components for data operations (e.g., Split Data, Apply SQL Transformation), model training (e.g., Train Model, Two-Class Boosted Decision Tree), scoring (e.g., Score Model), and evaluation (e.g., Evaluate Model). Each module has configurable parameters.
Datasets: Data sources that you register in Azure ML. You can drag a dataset onto the canvas to start a pipeline.
Compute Targets: The underlying compute resources (e.g., Compute Instance, Compute Cluster) that execute the pipeline. You must attach a compute target to run the pipeline.
Pipelines: The entire graph of connected modules. Pipelines can be saved, cloned, and submitted as runs.
Pipeline Draft: An unsaved version of your pipeline. You must save it to create a pipeline draft, then submit it to run.
Inference Pipelines: After training, you can create a real-time inference pipeline (for online scoring) or a batch inference pipeline (for batch scoring). The Designer automatically generates these from your training pipeline.
Configuration and Defaults
Compute Target: Must be specified before running. Default is no compute; you must select an existing compute or create a new one (e.g., a Compute Cluster with a specific VM size).
Module Parameters: Each module has defaults. For example, the Split Data module defaults to a 0.5 split ratio (50% training, 50% testing) and a random seed of 0. You can change these.
Outputs: Intermediate datasets are stored in the default datastore (workspaceblobstore). You can configure them to be stored elsewhere.
Run History: Each pipeline run is logged with metrics, outputs, and logs. You can view them in the Designer or in the Experiments section.
How It Interacts with Related Technologies
Automated ML: Designer can use Automated ML as a module. You can drag an "Automated ML" module onto the canvas, configure it, and it will automatically try multiple algorithms and hyperparameters.
Python SDK: Pipelines built in Designer can be exported as Python code (using the azureml-pipeline SDK) for further customization or integration into CI/CD workflows. Conversely, pipelines built with the SDK can be imported into Designer for visualization.
Azure ML Datasets: Designer uses registered datasets. You can also use datastores directly.
MLflow: Designer pipelines can log metrics to MLflow for tracking.
Deployment: After training, you can deploy the best model as a web service (real-time endpoint) or batch endpoint directly from Designer.
Step-by-Step Workflow
Create or open a pipeline draft: In Azure ML studio, go to Designer and click "New pipeline draft."
Add dataset: Drag a registered dataset from the asset library onto the canvas.
Add modules: Drag modules from the module palette (left pane) onto the canvas. Connect them by dragging from the output port of one module to the input port of another.
Configure modules: Click on a module to set its parameters in the right pane.
Set compute target: In the settings pane, select a compute target. If none exists, create one.
Submit the pipeline: Click "Submit" to run the pipeline. You will be prompted to name the experiment and optionally set a description.
Monitor the run: In the Run History, you can see the status (Running, Completed, Failed). Click on a module to view its logs and outputs.
Create inference pipeline: After a successful run, click "Create inference pipeline" to generate a real-time or batch inference pipeline.
Deploy: For real-time inference, deploy the model to an Azure Container Instance (ACI) or Azure Kubernetes Service (AKS). For batch inference, deploy to a batch endpoint.
Common Exam Traps
Trap 1: Thinking Designer requires coding. Reality: Designer is code-free, but you can add custom R or Python scripts via the "Execute Python Script" or "Execute R Script" modules.
Trap 2: Confusing Designer with Automated ML. Designer is a visual pipeline builder; Automated ML is a feature that automatically tries multiple algorithms. They can be used together.
Trap 3: Believing Designer only works with small datasets. Designer can handle large datasets if the compute target has sufficient memory.
Trap 4: Assuming pipelines run on local machine. Pipelines always run on a remote compute target in Azure.
Trap 5: Forgetting that you must save a pipeline before you can clone or share it.
Specific Values and Defaults to Know for the Exam
Split Data default split ratio: 0.5 (50/50)
Train Model module requires a trained model and a dataset; it does not have a default algorithm.
The maximum number of parallel runs in a Compute Cluster: depends on the cluster configuration (default is 1, but can be set higher).
The default datastore is workspaceblobstore.
Inference pipelines can be real-time (deployed as a web service) or batch (deployed as a batch endpoint).
The "Apply SQL Transformation" module uses SQLite syntax.
Verification Commands (Using Azure CLI)
While Designer is visual, you can list and manage pipelines using the Azure CLI:
# List all pipelines in the workspace
az ml pipeline list --workspace-name <workspace> --resource-group <rg>
# Show details of a specific pipeline
az ml pipeline show --name <pipeline_name> --workspace-name <workspace> --resource-group <rg>
# List pipeline runs
az ml pipeline run list --workspace-name <workspace> --resource-group <rg>
# Cancel a run
az ml pipeline run cancel --run-id <run_id> --workspace-name <workspace> --resource-group <rg>Create a New Pipeline Draft
In the Azure ML studio, navigate to the Designer section. Click on 'New pipeline draft' to create a blank canvas. You will be prompted to give it a name and optionally a description. The canvas is where you will build your ML workflow. At this stage, no compute is attached yet, and no modules are placed. The draft is saved automatically in the workspace.
Add Dataset to Canvas
From the asset library (left pane), expand 'Datasets' and drag a registered dataset onto the canvas. The dataset appears as a module with an output port. You can also drag a datastore and then add a 'Import Data' module to read data from external sources. The dataset must be tabular (CSV, TSV, etc.) for most modules. If the dataset is not registered, you must register it first via the Data section.
Add and Connect Modules
From the module palette (left pane), drag modules onto the canvas. Common modules include 'Split Data', 'Train Model', 'Score Model', and 'Evaluate Model'. Connect them by clicking and dragging from the output port (small circle) of one module to the input port of another. Each module has specific input and output ports; you cannot connect mismatched types. For example, 'Train Model' expects a dataset and a model algorithm, and outputs a trained model.
Configure Module Parameters
Click on a module to select it. The right pane shows its parameters. For example, for 'Split Data', you can set the split ratio (default 0.5), random seed, and stratification column. For 'Two-Class Boosted Decision Tree', you can set number of leaves, learning rate, etc. You must configure each module appropriately. Incorrect parameters can cause runtime errors. Always check the module documentation for required settings.
Set Compute Target and Submit
In the top menu, click 'Submit'. A dialog appears where you must select or create an experiment name and a compute target. If no compute target exists, click 'Create' and choose a Compute Cluster or Compute Instance. The compute target must have enough memory for the dataset. After submission, the pipeline is serialized and sent to the Azure ML service, which orchestrates the execution. You can monitor progress in the 'Run History' tab.
Enterprise Scenario 1: Retail Demand Forecasting
A large retail chain wants to forecast product demand across 500 stores. The data team uses Azure ML Designer to build a pipeline: they drag a historical sales dataset, use 'Clean Missing Data' to handle nulls, 'Split Data' to create training and test sets, and 'Train Model' with a 'Boosted Decision Tree Regression' algorithm. They then 'Score Model' and 'Evaluate Model' to check accuracy. The pipeline runs on a compute cluster with 8 nodes to handle the large dataset. Once satisfied, they create a batch inference pipeline and deploy it to a batch endpoint that runs weekly. A common mistake is using a too-small compute cluster, causing out-of-memory errors. The team learned to right-size the VM (e.g., Standard_D8s_v3) and set a minimum node count to avoid cold starts.
Enterprise Scenario 2: Credit Risk Assessment
A bank wants to automate credit approval using a binary classification model. Compliance requires interpretability, so the team uses Designer with 'Two-Class Logistic Regression' for its linear nature. They drag customer data, apply 'Normalize Data' to scale features, and use 'Permutation Feature Importance' to explain predictions. The pipeline is deployed as a real-time inference pipeline on AKS for low-latency scoring. The challenge was managing versioning: multiple data scientists iterated on pipelines, but they used the 'Clone' feature to fork experiments. Misconfiguration occurred when the inference pipeline was not updated after retraining the model, leading to stale predictions. The team now uses pipeline endpoints to automatically update the deployment when the pipeline is retrained.
Scenario 3: Predictive Maintenance
A manufacturing company uses IoT sensor data to predict equipment failure. Their Designer pipeline reads streaming data from Azure Blob Storage, uses 'Apply SQL Transformation' to aggregate readings per hour, and trains a 'Multiclass Decision Forest' to classify failure types. They run the pipeline on a schedule every hour using the 'Schedule' feature. A critical issue was handling imbalanced data; they added the 'SMOTE' module (available in Designer via Python script) to oversample minority classes. The pipeline's performance degraded when the data volume grew; they had to increase the compute cluster's max nodes from 2 to 10. The lesson: always monitor pipeline run times and adjust compute resources accordingly.
What AI-900 Tests on Azure ML Designer
The AI-900 exam objectives under 'Machine Learning' (Objective 2.4) include: 'Describe the capabilities of Azure Machine Learning Designer'. Specifically, you must know:
Designer is a drag-and-drop, no-code/low-code tool for building ML pipelines.
It uses pre-built modules for data preparation, training, scoring, and evaluation.
Pipelines can be deployed as real-time or batch inference endpoints.
Designer integrates with Automated ML.
It is part of Azure Machine Learning studio.
Common Wrong Answers and Why Candidates Choose Them
Wrong: 'Designer requires Python coding.' Candidates confuse Designer with the SDK. Reality: Designer is visual, but you can add custom scripts.
Wrong: 'Designer only works with small datasets.' Candidates think visual tools are limited. Reality: It scales with compute.
Wrong: 'Designer is the same as Automated ML.' Candidates see both as 'no-code' but miss that Designer is a pipeline builder, while AutoML is a feature that automates algorithm selection.
Wrong: 'Designer pipelines run on your local machine.' Candidates assume it's like Excel. Reality: They run on Azure compute.
Specific Numbers and Terms on the Exam
The default split ratio is 0.5.
The default datastore is workspaceblobstore.
Inference pipeline types: real-time (web service) and batch.
Compute targets: Compute Instance, Compute Cluster, AKS, ACI.
Modules: 'Train Model', 'Score Model', 'Evaluate Model', 'Split Data', 'Clean Missing Data'.
Edge Cases and Exceptions
You can use 'Execute Python Script' to add custom code, but the exam may present this as a trick: 'Designer is code-free' is false because of this module.
Designer can import pipelines created with the SDK, but not vice versa without conversion.
The 'Apply SQL Transformation' uses SQLite syntax, not T-SQL.
How to Eliminate Wrong Answers
If a question asks about a no-code tool for building ML pipelines, eliminate options that mention writing code (unless they specify custom script modules). If it asks about automated algorithm selection, it's Automated ML, not Designer. If it asks about deploying a model for real-time scoring, think of inference pipelines in Designer. Remember: Designer is about building the pipeline visually; automated ML is about finding the best model algorithm.
Azure ML Designer is a no-code/low-code visual tool for building ML pipelines.
Pipelines consist of connected modules for data prep, training, scoring, and evaluation.
Pipelines run on remote compute targets (Compute Instance, Cluster, AKS, ACI).
Default split ratio in Split Data module is 0.5 (50/50).
Default datastore is workspaceblobstore.
Designer can create real-time and batch inference pipelines for deployment.
You can add custom Python or R scripts using 'Execute Python Script' or 'Execute R Script' modules.
Designer integrates with Automated ML via the 'Automated ML' module.
Pipelines can be scheduled, cloned, and exported as Python code.
The exam expects you to distinguish Designer from Automated ML and the SDK.
These come up on the exam all the time. Here's how to tell them apart.
Azure ML Designer
Visual drag-and-drop interface for building pipelines.
User manually selects modules and connects them.
Best for custom workflows with specific data transformations.
Requires user to choose the algorithm (though AutoML module can be used).
Can deploy as real-time or batch inference pipelines.
Automated ML
Automatically tries multiple algorithms and hyperparameters.
User only provides data and target metric; no manual pipeline building.
Best for quickly finding the best model for a given dataset.
No visual pipeline; results are presented as a list of models.
Deployment is separate; you deploy the best model from the experiment.
Mistake
Azure ML Designer requires no coding at all.
Correct
While Designer is a no-code/low-code tool, it includes modules like 'Execute Python Script' and 'Execute R Script' that allow custom code. So it is not strictly code-free; it supports code when needed.
Mistake
Designer can only handle small datasets.
Correct
Designer can process large datasets if the compute target has sufficient memory and disk. The scale is limited by the compute resources, not by Designer itself.
Mistake
Designer pipelines run on your local machine.
Correct
Pipelines always run on a remote compute target in Azure, such as a Compute Instance or Compute Cluster. The Designer canvas is just a visual editor.
Mistake
Designer is the same as Automated ML.
Correct
Designer is a visual pipeline builder; Automated ML is a feature that automatically tries multiple algorithms and hyperparameters. They can be used together (AutoML module in Designer), but they are distinct.
Mistake
You cannot deploy models from Designer.
Correct
Designer allows you to create inference pipelines (real-time or batch) and deploy them directly to endpoints (ACI, AKS, or batch endpoints). Deployment is a key feature.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Azure ML Designer is a drag-and-drop visual interface for building machine learning pipelines. It allows you to manually select and connect pre-built modules for data transformation, training, scoring, and evaluation. Automated ML, on the other hand, automatically tries multiple algorithms and hyperparameters to find the best model for your data. You can use both together by adding an Automated ML module to a Designer pipeline. For the exam, remember that Designer is about visual pipeline construction, while Automated ML is about automated model selection.
Yes. Designer includes 'Execute Python Script' and 'Execute R Script' modules that allow you to run custom code. This means Designer is not strictly no-code; it is a low-code tool. However, the primary interface is visual, and most common tasks can be done without coding. On the exam, if a question says Designer is 'code-free,' that statement is false because of these modules.
After training a model in a Designer pipeline, you can create an inference pipeline by clicking 'Create inference pipeline' in the designer toolbar. You can choose between a real-time inference pipeline (for online scoring) and a batch inference pipeline (for batch scoring). Then you deploy the inference pipeline to a compute target: real-time to ACI or AKS, batch to a batch endpoint. The deployment is done directly from the Designer interface.
You can use Compute Instances, Compute Clusters, Azure Kubernetes Service (AKS) clusters, and Azure Container Instances (ACI). For training, Compute Cluster is common for scalability. For real-time inference, ACI (for dev/test) or AKS (for production) are typical. For batch inference, you can use a batch endpoint backed by a Compute Cluster. The compute target must be attached to your workspace before use.
Yes. After submitting a pipeline run, you can schedule it to run periodically. In the pipeline draft, click 'Schedule' and set the frequency (e.g., daily, weekly) and start time. The schedule uses Azure Logic Apps or the built-in scheduler. This is useful for retraining models on new data or running batch scoring on a regular basis.
The default split ratio is 0.5, meaning 50% of the data goes to the first output (typically training) and 50% to the second output (testing). You can change this to any value between 0 and 1. The exam may test this default value.
Designer itself does not impose a size limit; the limitation is the compute target's memory and disk. For large datasets, use a Compute Cluster with high-memory VMs (e.g., Standard_E16s_v3). The pipeline processes data in parallel if the compute cluster has multiple nodes. Also, use the 'Select Columns in Dataset' module early to reduce data size.
You've just covered Azure ML Designer: Drag-and-Drop ML — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?