Knowledge + Practice

CCNA Describe fundamental principles of machine learning on Azure Questions

75 of 207 questions · Page 1/3 · Describe fundamental principles of machine learning on Azure · Answers revealed

Practice these questions Domain overview All questions

1

MCQmedium

What is the difference between a binary classification model and a multi-class classification model?

A.Binary classification uses numeric outputs; multi-class uses categorical outputs

B.Binary classification predicts two outcomes; multi-class predicts three or more outcomes

C.Binary is for images; multi-class is for text

D.Binary classification is always more accurate than multi-class

AnswerB

Binary = exactly two classes (positive/negative); multi-class = three or more distinct class labels.

Why this answer

Option B is correct because binary classification models are designed to predict exactly two possible outcomes (e.g., spam/not spam), while multi-class classification models predict three or more mutually exclusive classes (e.g., classifying images of cats, dogs, and birds). In Azure Machine Learning, binary classification algorithms like Logistic Regression output a single probability score, whereas multi-class algorithms like Multinomial Logistic Regression or One-vs-Rest meta-estimators output a probability distribution across all classes.

Exam trap

The trap here is that candidates confuse the number of output classes with the type of data or output format, leading them to pick Option A or C, when the core distinction is simply the count of possible prediction outcomes.

How to eliminate wrong answers

Option A is wrong because both binary and multi-class classification models can output categorical labels or numeric probabilities; the distinction is not about output type but the number of classes. Option C is wrong because classification tasks are not inherently tied to data modality—binary classification can be applied to text (e.g., sentiment analysis) and multi-class to images (e.g., object recognition). Option D is wrong because accuracy depends on the dataset and problem complexity, not the number of classes; multi-class problems often have lower baseline accuracy due to more classes, but neither type is universally more accurate.

Practice this question →

2

MCQmedium

What is 'Bayesian optimisation' in hyperparameter tuning?

A.A statistical method for updating model confidence as new training data arrives

B.A smart hyperparameter search that uses past trial results to select promising configurations

C.An automatic method for adjusting learning rate during training based on gradient information

D.A probabilistic approach to labelling uncertain training examples

AnswerB

Bayesian optimisation builds a surrogate model of performance vs. hyperparameters — selecting next configs based on where improvement is likely.

Why this answer

Bayesian optimisation is a smart hyperparameter search method that builds a probabilistic model (typically a Gaussian process) of the objective function based on past trial results. It uses an acquisition function (e.g., Expected Improvement) to balance exploration and exploitation, selecting the most promising hyperparameter configurations to evaluate next. This makes it far more efficient than grid or random search for expensive-to-evaluate models.

Exam trap

The trap here is that candidates confuse Bayesian optimisation with Bayesian inference for model parameters (Option A) or with adaptive learning rate algorithms (Option C), because both involve 'Bayesian' or 'optimisation' terminology but serve entirely different purposes.

How to eliminate wrong answers

Option A is wrong because it describes online learning or Bayesian updating of model parameters with new data, not hyperparameter tuning. Option C is wrong because it describes adaptive learning rate methods like Adam or SGD with momentum, which adjust the learning rate during training based on gradients, not a search over hyperparameter space. Option D is wrong because it describes active learning or uncertainty sampling for labeling, which selects data points for human annotation, not hyperparameter optimization.

Practice this question →

3

MCQeasy

A data scientist wants to train a model that predicts whether a customer will respond to a marketing offer (yes or no). The dataset includes features such as age, income, past purchase history, and the labeled outcome (responded or not responded) for previous customers. Which type of machine learning is this?

A.Supervised learning

B.Unsupervised learning

C.Reinforcement learning

D.Semi-supervised learning

AnswerA

Correct. The model is trained on labeled data (known outcomes) to predict a discrete class, making it a supervised classification task.

Why this answer

This is supervised learning because the dataset includes labeled outcomes (responded or not responded) for previous customers, which the model uses to learn a mapping from input features (age, income, past purchase history) to the correct output. The goal is to predict a categorical label (yes/no), making it a classification task within supervised learning.

Exam trap

The trap here is that candidates might confuse supervised learning with unsupervised learning, thinking that because the dataset has many features (age, income, etc.) it must be unsupervised clustering, but the presence of labeled outcomes clearly indicates supervised classification.

How to eliminate wrong answers

Option B is wrong because unsupervised learning does not use labeled data; it finds hidden patterns or clusters in unlabeled data, which is not the case here as the dataset includes the target outcome. Option C is wrong because reinforcement learning involves an agent learning through trial-and-error interactions with an environment to maximize a reward signal, not from a static labeled dataset. Option D is wrong because semi-supervised learning uses a mix of labeled and unlabeled data, but the question explicitly states the dataset includes labeled outcomes for previous customers, implying all data is labeled.

Practice this question →

4

MCQmedium

What does it mean for an ML model to 'generalize'?

A.Making the model work for all programming languages and platforms

B.The model's ability to perform well on new, unseen data by learning underlying patterns rather than memorizing training examples

C.Making the model output descriptions in plain language for non-technical users

D.Training a model that works for all possible tasks without specialization

AnswerB

Generalization = good performance on new data. Models that memorize training data (overfit) fail to generalize to real-world inputs.

Why this answer

Generalization in machine learning refers to the model's ability to accurately predict outcomes on new, unseen data by learning the true underlying patterns from the training data, rather than simply memorizing the training examples (overfitting). A model that generalizes well will maintain high performance on a validation or test dataset that was not used during training, which is a core requirement for deploying reliable ML solutions in Azure Machine Learning.

Exam trap

The trap here is that candidates confuse 'generalization' with 'general-purpose' or 'multi-platform' support, leading them to choose options A or D, when the correct focus is solely on the model's performance on unseen data within its trained domain.

How to eliminate wrong answers

Option A is wrong because generalization is about performance on new data, not about compatibility with programming languages or platforms; Azure ML models can be deployed to various runtimes (e.g., Python, C#, ONNX) but that is a separate deployment concern. Option C is wrong because generalization does not involve generating plain-language descriptions; that describes model interpretability or explainability tools (e.g., Azure ML's model explanations), not the core concept of generalization. Option D is wrong because generalization does not mean a single model works for all tasks; it means the model performs well on unseen data for its specific task, and a model trained for one domain (e.g., image classification) cannot generalize to unrelated tasks (e.g., sentiment analysis).

Practice this question →

5

MCQhard

A data science team trains a regression model to predict house prices. They evaluate the model using Mean Absolute Error (MAE). After deployment, they notice that the model occasionally produces large errors (e.g., underpredicting a luxury home by $500,000) while most predictions are within $20,000. The business is more concerned about the impact of these large errors than the average small error. Which additional metric should the team use to better capture the penalty for large errors?

A.Root Mean Squared Error (RMSE)

B.R-squared

C.Mean Absolute Percentage Error (MAPE)

D.F1 score

AnswerA

RMSE squares errors, so large errors contribute much more to the metric, reflecting their negative impact.

Why this answer

Root Mean Squared Error (RMSE) is the correct additional metric because it squares the residuals before averaging, which heavily penalizes large errors like the $500,000 underprediction. Unlike MAE, which treats all errors equally, RMSE amplifies the impact of outliers, making it a better fit for a business that cares more about catastrophic failures than typical small errors. This aligns with the need to capture the penalty for large deviations in regression model evaluation.

Exam trap

The trap here is that candidates often choose MAE or MAPE because they seem intuitive for 'average error,' but they fail to recognize that RMSE's squared term is specifically designed to penalize large outliers, which is the exact business concern described.

How to eliminate wrong answers

Option B (R-squared) is wrong because it measures the proportion of variance explained by the model, not the magnitude or penalty of individual errors, so it does not specifically penalize large errors. Option C (Mean Absolute Percentage Error, MAPE) is wrong because it expresses error as a percentage, which can be misleading when actual values are small or zero, and it still averages errors without squaring, so it does not disproportionately penalize large absolute errors. Option D (F1 score) is wrong because it is a classification metric that combines precision and recall, and it is not applicable to regression tasks like house price prediction.

Practice this question →

6

MCQmedium

A retail company wants to analyze customer purchase histories to identify natural groups of customers with similar buying patterns. They do not have predefined categories. Which type of machine learning should they use?

A.Reinforcement learning

B.Supervised classification

C.Unsupervised clustering

D.Supervised regression

AnswerC

Unsupervised clustering finds natural groupings in unlabeled data, making it the correct choice for identifying customer segments based on purchase behavior.

Why this answer

Unsupervised clustering is the correct approach because the company wants to discover natural groupings in customer purchase histories without predefined labels. Clustering algorithms, such as K-Means or DBSCAN, partition data into clusters based on feature similarity, enabling the identification of customer segments with similar buying patterns without any prior training on labeled examples.

Exam trap

The trap here is that candidates may confuse unsupervised clustering with supervised classification because both involve grouping, but classification requires predefined labels while clustering discovers groups from unlabeled data.

How to eliminate wrong answers

Option A is wrong because reinforcement learning involves an agent learning to make decisions by interacting with an environment to maximize cumulative reward, which is not applicable to grouping static historical purchase data. Option B is wrong because supervised classification requires labeled training data with predefined categories, but the problem explicitly states there are no predefined categories. Option D is wrong because supervised regression predicts a continuous numeric value (e.g., future spending amount) rather than discovering natural groups in data.

Practice this question →

7

MCQmedium

What is 'model lineage' in Azure Machine Learning?

A.The family tree of model architectures showing which models inspired the design

B.A tracked history of the dataset, code, hyperparameters, and compute used to produce a model

C.The geographic lineage of training data showing which regions it was collected from

D.The sequence of model versions deployed to production over time

AnswerB

Lineage enables complete reproduction and audit — tracking every input that produced each model version for debugging and compliance.

Why this answer

Model lineage in Azure Machine Learning is a tracked history that captures the complete lifecycle of a model, including the dataset, code, hyperparameters, and compute environment used to produce it. This is essential for reproducibility, auditability, and governance, as it allows data scientists to trace exactly how a model was trained and which artifacts were involved. Azure ML automatically logs this lineage through its run history and model registry, ensuring every model version is linked to its training run.

Exam trap

The trap here is that candidates confuse model lineage with simple versioning or deployment history, overlooking that it specifically includes the complete provenance of data, code, and compute used during training, not just the sequence of model versions.

How to eliminate wrong answers

Option A is wrong because model lineage is not about the conceptual family tree of architectures or design inspiration; it is a concrete, automated record of the specific resources used in a training run. Option C is wrong because model lineage does not track geographic origins of training data; while data provenance may include location, lineage focuses on the exact datasets, code, and parameters used, not regional collection details. Option D is wrong because model lineage is broader than just deployment version sequences; it encompasses the entire training lifecycle, including the data, code, and compute, not merely the order of production deployments.

Practice this question →

8

MCQmedium

What is hyperparameter tuning in machine learning?

A.Adjusting the training data labels to improve model accuracy

B.Searching for the best training configuration settings (learning rate, layers, etc.) to optimize model performance

C.Reducing the number of features used by the model

D.Updating model weights based on new production data

AnswerB

Hyperparameter tuning finds optimal training settings (learning rate, depth, etc.) that produce the best performing model on validation data.

Why this answer

Hyperparameter tuning is the process of systematically searching for the best combination of hyperparameters—such as learning rate, number of layers, batch size, or regularization strength—that control the training process itself, rather than being learned from data. In Azure Machine Learning, this is often automated using tools like HyperDrive, which runs multiple child runs with different hyperparameter configurations to find the set that maximizes model performance on a validation set.

Exam trap

The trap here is that candidates confuse hyperparameter tuning with model training itself (weight updates) or with data preparation steps (label correction, feature reduction), because all involve 'adjusting' something to improve accuracy, but only hyperparameter tuning searches over algorithm configuration settings that are set before training begins.

How to eliminate wrong answers

Option A is wrong because adjusting training data labels (e.g., relabeling or correcting mislabeled data) is a data quality or data preprocessing step, not hyperparameter tuning; hyperparameters are settings that govern the training algorithm, not the data itself. Option C is wrong because reducing the number of features (dimensionality reduction or feature selection) is a data preprocessing technique to simplify the model or avoid overfitting, not a search over training configuration settings. Option D is wrong because updating model weights based on new production data describes online learning or model retraining (often via continuous integration/continuous deployment pipelines), not the pre-training search for optimal hyperparameters.

Practice this question →

9

MCQeasy

A retail company wants to predict the exact number of units of a product that will be sold next month. They have historical sales data and information about promotions and holidays. The target variable is the number of units sold, which is a continuous value. Which type of machine learning task should they perform?

A.Binary classification

B.Multiclass classification

C.Regression

D.Clustering

AnswerC

Regression is designed to predict continuous numerical values, such as the exact number of units sold.

Why this answer

Regression is the correct choice because the target variable—number of units sold—is a continuous numeric value. Regression algorithms, such as linear regression or decision forest regression, are designed to predict a numeric quantity from historical features like sales data, promotions, and holidays. In Azure Machine Learning, regression models output a real number, making them ideal for this forecasting scenario.

Exam trap

The trap here is that candidates confuse predicting a numeric count (regression) with classification tasks, especially when the count is small or integer-based, but the key distinction is that the target is continuous, not categorical.

How to eliminate wrong answers

Option A is wrong because binary classification predicts one of two discrete categories (e.g., yes/no), not a continuous number like units sold. Option B is wrong because multiclass classification predicts one of three or more discrete classes (e.g., product categories), not a continuous value. Option D is wrong because clustering groups unlabeled data into clusters based on similarity, without predicting a specific numeric target; it is an unsupervised learning task, not a supervised prediction of a continuous variable.

Practice this question →

10

MCQhard

A data scientist trains a regression model to predict housing prices. The model uses polynomial features up to degree 5. It achieves an R-squared of 0.95 on the training set but only 0.60 on the test set. Which problem is the model most likely experiencing?

A.Underfitting

B.Overfitting

C.Data leakage

D.Multicollinearity

AnswerB

Correct: The model performs well on training data but poorly on new data, indicating overfitting.

Why this answer

The model performs exceptionally well on the training data (R-squared 0.95) but poorly on the test data (R-squared 0.60), which is the classic symptom of overfitting. Using polynomial features up to degree 5 introduces high model complexity, causing the model to learn noise and specific patterns in the training set that do not generalize to unseen data.

Exam trap

The trap here is that candidates may confuse overfitting with underfitting, but the key indicator is the large gap between high training performance and low test performance, not uniformly low performance.

How to eliminate wrong answers

Option A is wrong because underfitting would show poor performance on both training and test sets, not a large gap between them. Option C is wrong because data leakage would typically cause artificially high performance on both sets, not a significant drop on the test set. Option D is wrong because multicollinearity affects coefficient stability and interpretability but does not inherently cause a large training-test performance gap; it can exist even in a well-generalized model.

Practice this question →

11

MCQmedium

A data scientist trains a deep neural network on a small dataset. The model achieves 100% accuracy on the training data but only 60% accuracy on a validation set. Which technique is most appropriate to address this issue?

A.Increase the number of training epochs

B.Add more hidden layers

C.Apply regularization

D.Increase the learning rate

AnswerC

Regularization adds constraints to the model to prevent overfitting by discouraging overly complex patterns.

Why this answer

The model's perfect training accuracy (100%) paired with poor validation accuracy (60%) is a classic sign of overfitting, where the model has memorized the training data rather than learning generalizable patterns. Regularization techniques (e.g., L1/L2 regularization, dropout) penalize large weights or randomly drop neurons during training, which forces the network to learn simpler, more robust features and reduces overfitting on small datasets.

Exam trap

The trap here is that candidates often confuse overfitting with underfitting and incorrectly choose options that increase model complexity (more layers or epochs) or speed up training (higher learning rate), rather than recognizing that regularization is the standard technique to combat overfitting.

How to eliminate wrong answers

Option A is wrong because increasing the number of training epochs would allow the model to further memorize the training data, worsening overfitting and potentially decreasing validation accuracy even more. Option B is wrong because adding more hidden layers increases model capacity and complexity, which exacerbates overfitting on a small dataset rather than mitigating it. Option D is wrong because increasing the learning rate can cause the optimizer to overshoot minima, leading to unstable training or divergence, and does not address the core issue of overfitting.

Practice this question →

12

MCQeasy

What is 'training data' vs 'test data' in machine learning?

A.Training data is collected first; test data is older data from an archive

B.Training data fits the model; test data provides an unbiased estimate of real-world performance

C.Training data is labelled by humans; test data is labelled automatically by the model

D.Test data is always larger than training data to ensure reliable evaluation

AnswerB

Training data teaches the model; test data (never seen during training) gives the honest measure of generalisation.

Why this answer

Option B is correct because training data is used to fit the model's parameters (e.g., weights in a neural network or split criteria in a decision tree), while test data is held back and used only after training to evaluate the model's performance on unseen data. This separation provides an unbiased estimate of how the model will generalize to real-world data, which is critical for avoiding overfitting. In Azure Machine Learning, this split is typically managed via the `train_test_split` function or automated in AutoML pipelines.

Exam trap

The trap here is that candidates confuse the purpose of the split (chronological order or labeling method) with the fundamental principle that test data must remain unseen during training to provide an unbiased performance estimate.

How to eliminate wrong answers

Option A is wrong because training data is not necessarily collected first; the chronological order of data collection is irrelevant—the key distinction is how the data is used during the model development lifecycle. Option C is wrong because both training and test data can be labeled by humans (e.g., in supervised learning), and test data is never labeled automatically by the model; the model's predictions on test data are compared against ground-truth labels to compute performance metrics. Option D is wrong because test data is typically smaller than training data (common splits are 70-80% training, 20-30% test) to ensure the model has enough data to learn patterns while still reserving a representative sample for evaluation.

Practice this question →

13

MCQmedium

A data scientist trains a model to predict house prices. The model achieves 99% accuracy on the training data but only 80% accuracy on new test data. Which technique is most likely to help improve the model's generalization?

A.Reduce the amount of training data

B.Apply regularization to the model

C.Remove some features from the dataset

D.Increase the number of layers in the neural network

AnswerB

Regularization (e.g., L1 or L2) discourages overly complex models by penalizing large coefficients, which helps reduce overfitting and improves performance on unseen data.

Why this answer

The model is overfitting: it has memorized the training data (99% accuracy) but fails to generalize to new data (80% accuracy). Regularization (e.g., L1 or L2) penalizes large weights, reducing the model's complexity and forcing it to learn simpler patterns that generalize better. This directly addresses the variance problem without discarding useful information.

Exam trap

The trap here is that candidates often confuse overfitting with underfitting and choose to increase model complexity (Option D) or reduce data (Option A), when the correct response is to simplify the model via regularization.

How to eliminate wrong answers

Option A is wrong because reducing training data would make the overfitting worse, as the model would have even fewer examples to learn from, increasing variance. Option C is wrong because removing features arbitrarily could discard important predictive signals; feature selection should be done carefully (e.g., via correlation analysis or regularization like Lasso), not as a blunt fix for overfitting. Option D is wrong because increasing the number of layers in a neural network increases model capacity, which would exacerbate overfitting rather than reduce it.

Practice this question →

14

MCQmedium

What is 'model versioning' and why is it essential in MLOps?

A.Updating the Python version used to run ML training scripts

B.Tracking each iteration of a trained model for rollback, A/B testing, auditing, and reproducibility

C.Releasing new features of the Azure ML service as versioned API updates

D.Managing multiple versions of training data used by different model experiments

AnswerB

Model versioning enables safe updates (rollback when new version fails), experimentation (A/B test), and regulatory compliance (audit trail).

Why this answer

Model versioning is the practice of tracking each iteration of a trained model, including its hyperparameters, training data snapshot, and evaluation metrics. In MLOps, it is essential because it enables rollback to a previous model if a new version performs poorly, supports A/B testing by comparing multiple model versions in production, provides an audit trail for compliance, and ensures reproducibility by capturing the exact code, data, and environment used to train each version.

Exam trap

The trap here is that candidates confuse model versioning with data versioning or environment versioning, but the question specifically asks about tracking the trained model artifact itself for rollback, A/B testing, auditing, and reproducibility.

How to eliminate wrong answers

Option A is wrong because updating the Python version used to run ML training scripts is a dependency management task, not model versioning; model versioning focuses on tracking the model artifact and its metadata, not the runtime language version. Option C is wrong because releasing new features of the Azure ML service as versioned API updates is a platform-level operation managed by Microsoft, not a practice performed by data scientists or MLOps engineers to manage their own models. Option D is wrong because managing multiple versions of training data is a data versioning concern, which is a separate but complementary practice to model versioning; model versioning specifically tracks the trained model artifact and its associated metadata, not the data itself.

Practice this question →

15

MCQhard

What is 'gradient boosting' and how does it differ from random forests?

A.Gradient boosting uses deep neural networks; random forests use shallow trees

B.Gradient boosting trains trees sequentially to correct prior errors; random forests trains trees independently in parallel

C.Random forests always outperform gradient boosting for structured data

D.Gradient boosting requires GPUs; random forests work only on CPUs

AnswerB

Gradient boosting: each tree corrects previous residuals. Random forests: independent trees averaged — trading off accuracy vs. training speed.

Why this answer

Gradient boosting is an ensemble technique that builds trees sequentially, where each new tree attempts to correct the errors (residuals) of the previous trees by optimizing a loss function via gradient descent. In contrast, random forests build multiple decision trees independently in parallel using bootstrapped samples and random feature selection, then average their predictions. This sequential error-correction process is the key difference, making option B correct.

Exam trap

The trap here is that candidates may confuse ensemble methods and assume gradient boosting uses deep learning (like neural networks) or that random forests are always superior, when the core distinction lies in sequential vs. parallel tree construction and the underlying optimization approach.

How to eliminate wrong answers

Option A is wrong because gradient boosting does not use deep neural networks; it uses shallow decision trees (typically 3-8 leaves), while random forests can use deeper trees but still rely on decision trees, not neural networks. Option C is wrong because random forests do not always outperform gradient boosting for structured data; in practice, gradient boosting (e.g., XGBoost, LightGBM) often achieves higher accuracy on structured/tabular data due to its sequential optimization, though it can overfit if not tuned. Option D is wrong because gradient boosting does not require GPUs; it can run efficiently on CPUs, and random forests also work on CPUs (both can optionally use GPUs for acceleration, but neither is hardware-restricted).

Practice this question →

16

MCQmedium

A data scientist trains a machine learning model on historical sales data to predict future sales volume. The model achieves 99% accuracy on the training dataset but only 75% accuracy on a separate test dataset. What is the most likely issue with this model?

A.Underfitting

B.Overfitting

C.High bias

D.High variance

AnswerB

Overfitting occurs when the model performs very well on training data but poorly on test data due to memorizing training examples instead of learning general patterns.

Why this answer

The model's 99% accuracy on the training set versus 75% on the test set indicates it has memorized the training data, including noise and outliers, rather than learning generalizable patterns. This classic symptom of overfitting occurs when the model is too complex relative to the amount or variability of the training data, causing poor performance on unseen data.

Exam trap

The trap here is that candidates confuse 'high variance' with 'overfitting' as separate concepts, when in fact high variance is the statistical cause of overfitting, but the exam expects 'overfitting' as the direct answer describing the model's behavior.

How to eliminate wrong answers

Option A is wrong because underfitting would show low accuracy on both training and test sets, not high training accuracy with a significant drop. Option C is wrong because high bias typically leads to underfitting, where the model fails to capture patterns even in the training data, resulting in low training accuracy. Option D is wrong because high variance is actually the technical term for the model's sensitivity to fluctuations in the training data, which is the root cause of overfitting; however, the question asks for the 'most likely issue,' and overfitting is the direct observable behavior, while high variance is the underlying statistical property.

Practice this question →

17

MCQmedium

A robotics team is training a robot to navigate a maze. The robot receives a positive reward (+10) when it reaches the exit and a negative reward (-1) every time it bumps into a wall. The robot learns to maximize its cumulative reward over multiple trials. Which type of machine learning is being used?

A.Reinforcement learning

B.Supervised learning

C.Unsupervised learning

D.Semi-supervised learning

AnswerA

The robot receives rewards based on its actions and learns to maximize them, which is the core principle of reinforcement learning.

Why this answer

The robot learns by interacting with its environment, receiving rewards (positive for reaching the exit, negative for bumping into walls), and adjusting its behavior to maximize cumulative reward over time. This trial-and-error learning process, where an agent learns a policy through feedback from its actions, is the defining characteristic of reinforcement learning.

Exam trap

The trap here is that candidates may confuse reinforcement learning with supervised learning because both involve 'learning from feedback,' but they fail to recognize that reinforcement learning uses evaluative feedback (rewards) rather than instructive feedback (labeled examples).

How to eliminate wrong answers

Option B (Supervised learning) is wrong because the robot does not have a labeled dataset of correct actions for each state; it learns from reward signals, not from input-output pairs. Option C (Unsupervised learning) is wrong because the robot is not discovering hidden patterns or clusters in unlabeled data; it is actively optimizing a reward function through interaction. Option D (Semi-supervised learning) is wrong because the robot does not combine a small amount of labeled data with a large amount of unlabeled data; it relies solely on reward feedback from its environment.

Practice this question →

18

MCQmedium

What is 'ensemble learning' in machine learning?

A.Training a single very large model on an ensemble of diverse datasets

B.Combining predictions from multiple models to produce a better overall prediction

C.Using a musical ensemble to record training audio data

D.Deploying a model to multiple Azure regions simultaneously

AnswerB

Ensemble methods (Random Forest, Gradient Boosting, Stacking) combine multiple model outputs — achieving better accuracy than any single model.

Why this answer

Ensemble learning improves predictive performance by combining the outputs of multiple individual models (e.g., decision trees, neural networks) to reduce variance, bias, or noise. This technique leverages the 'wisdom of the crowd' principle, where the aggregated prediction often outperforms any single model, as seen in methods like Random Forest (bagging) or Gradient Boosting (boosting).

Exam trap

The trap here is that candidates confuse 'ensemble' with 'large dataset' or 'deployment scale,' leading them to pick options that describe data diversity or infrastructure redundancy rather than the core concept of combining multiple models.

How to eliminate wrong answers

Option A is wrong because training a single very large model on diverse datasets describes multi-task learning or data augmentation, not ensemble learning, which requires multiple independent models. Option C is wrong because it confuses the term 'ensemble' with a musical group, which has no relevance to machine learning algorithms or model aggregation. Option D is wrong because deploying a model to multiple Azure regions is a geo-redundancy or load-balancing strategy, not a technique for improving prediction accuracy through model combination.

Practice this question →

19

MCQmedium

A data scientist trains a classification model to predict whether an email is 'phishing' or 'legitimate'. The model achieves 99% accuracy on the training data but only 68% accuracy on the test data. Which action is most likely to help improve the model's generalization performance?

A.Increase the number of training epochs significantly.

B.Apply regularization techniques such as L1 or L2 regularization.

C.Remove some of the training data to make the dataset smaller.

D.Add more layers and neurons to the neural network.

AnswerB

Regularization adds a penalty for large weights, discouraging overly complex models. This helps reduce overfitting and improves performance on unseen data.

Why this answer

The model's high training accuracy (99%) paired with much lower test accuracy (68%) is a classic sign of overfitting, where the model has memorized the training data rather than learning generalizable patterns. Regularization techniques like L1 (Lasso) or L2 (Ridge) add a penalty to the loss function that discourages overly complex models by shrinking the weights of less important features, directly reducing overfitting and improving generalization on unseen data.

Exam trap

The trap here is that candidates often confuse high training accuracy with good model performance and incorrectly assume that more data or more complexity will fix the issue, when in fact the problem is overfitting and requires regularization or simpler models.

How to eliminate wrong answers

Option A is wrong because increasing the number of training epochs significantly would allow the model to continue learning from the training data, likely worsening overfitting by further memorizing noise and details specific to the training set. Option C is wrong because removing training data reduces the amount of information available for learning, which typically increases bias and can degrade generalization rather than improve it. Option D is wrong because adding more layers and neurons increases model capacity and complexity, which exacerbates overfitting when the model already has sufficient capacity to memorize the training data.

Practice this question →

20

MCQeasy

What is 'Azure Machine Learning workspace' and what does it contain?

A.A physical office space at Microsoft where ML engineers develop Azure AI services

B.The top-level Azure resource that organises all ML artefacts including models, experiments, and compute for a project

C.A virtual desktop environment pre-configured with ML tools for data scientists

D.A shared document repository for storing ML project documentation and reports

AnswerB

The workspace is the ML project container — holding experiments, models, datasets, compute, and pipelines for team collaboration.

Why this answer

Option B is correct because an Azure Machine Learning workspace is the top-level Azure resource that serves as a centralized hub for all machine learning activities. It contains essential artifacts such as datasets, experiments, models, pipelines, compute targets (e.g., compute clusters, inference clusters), and endpoints, enabling end-to-end ML lifecycle management within a single project.

Exam trap

The trap here is that candidates confuse the workspace with a virtual machine or desktop environment (like Azure Data Science Virtual Machine) because both are used in ML workflows, but the workspace is a logical resource container, not a compute environment.

How to eliminate wrong answers

Option A is wrong because an Azure Machine Learning workspace is not a physical office space; it is a cloud-based Azure resource that organizes ML artifacts and compute resources, not a physical location at Microsoft. Option C is wrong because it describes a virtual desktop environment (like Azure Data Science Virtual Machine), not the workspace itself; the workspace is a management layer that can orchestrate compute resources but is not a pre-configured desktop. Option D is wrong because while documentation can be stored in associated storage accounts, the workspace is not merely a document repository; it is a comprehensive resource for managing ML experiments, models, and compute, with documentation being only a minor aspect.

Practice this question →

21

MCQeasy

What does 'deep learning' refer to in machine learning?

A.Machine learning that requires an internet connection to function

B.Machine learning using neural networks with many layers to learn hierarchical representations

C.A technique for training models on extremely large datasets only

D.Machine learning that digs deeply into structured databases

AnswerB

Deep learning uses deep (many-layered) neural networks that learn increasingly complex representations from raw data.

Why this answer

Deep learning is a subset of machine learning that uses neural networks with multiple layers (deep neural networks) to automatically learn hierarchical representations of data. Each layer extracts increasingly abstract features, enabling the model to capture complex patterns without manual feature engineering. This is why option B is correct.

Exam trap

The trap here is that candidates confuse 'deep learning' with simply 'more data' or 'complex databases,' when the core differentiator is the use of multi-layered neural networks for hierarchical feature learning.

How to eliminate wrong answers

Option A is wrong because deep learning does not require an internet connection to function; models can be trained and inferenced locally on hardware like GPUs. Option C is wrong because deep learning can be applied to datasets of various sizes, not only extremely large ones, though larger datasets often improve performance. Option D is wrong because deep learning is not about digging into structured databases; it processes unstructured data like images, text, and audio through neural network layers.

Practice this question →

22

MCQhard

A data scientist has a small dataset with only 200 labeled samples. They want to get a reliable estimate of model performance without using a separate validation set that would reduce the training data. Which technique should the data scientist use in Azure Machine Learning to obtain this reliable estimate?

A.Hold-out validation

B.k-fold cross-validation

C.Data augmentation

D.Principal Component Analysis (PCA)

AnswerB

k-fold cross-validation iteratively trains on different subsets of data and validates on the held-out fold, using all data for both purposes and yielding a stable performance estimate.

Why this answer

B is correct because k-fold cross-validation splits the small dataset into k folds, trains the model on k-1 folds, and validates on the remaining fold, repeating this process k times. This provides a reliable performance estimate by using all 200 samples for both training and validation without requiring a separate hold-out set, which is critical for small datasets in Azure Machine Learning.

Exam trap

The trap here is that candidates might confuse data augmentation (Option C) as a validation technique, but it is a data preprocessing method to expand the dataset, not a method for obtaining a reliable performance estimate.

How to eliminate wrong answers

Option A is wrong because hold-out validation reserves a portion of the data (e.g., 20-30%) as a separate validation set, which reduces the training data and can lead to unreliable estimates on a small dataset of only 200 samples. Option C is wrong because data augmentation is a technique to artificially increase the size of the training dataset by creating modified copies of existing samples, but it does not directly provide a reliable estimate of model performance; it is used to improve model generalization, not to evaluate it. Option D is wrong because Principal Component Analysis (PCA) is a dimensionality reduction technique used to reduce the number of features, not a method for estimating model performance or validating a model.

Practice this question →

23

MCQmedium

What is 'cross-validation' and when should it be used in machine learning?

A.Validating that a model works correctly across different Azure regions

B.Dividing data into k folds and training k times to get a more reliable performance estimate

C.Comparing two different models' predictions on the same test set

D.Checking whether training labels are consistent across different human annotators

AnswerB

Cross-validation reduces evaluation variance by using all data for both training and validation — especially valuable with limited data.

Why this answer

Cross-validation is a resampling technique used to evaluate machine learning models by partitioning the original dataset into k equal-sized folds. The model is trained on k-1 folds and validated on the remaining fold, repeating this process k times so each fold serves as the validation set once. This provides a more robust and less biased estimate of model performance compared to a single train-test split, especially when data is limited.

Exam trap

The trap here is that candidates confuse cross-validation with simple train/test splitting or model comparison, but the key is recognizing cross-validation as a repeated resampling method to obtain a reliable performance estimate, not a one-time validation or inter-annotator agreement check.

How to eliminate wrong answers

Option A is wrong because cross-validation is a statistical method for model evaluation, not a geographic or regional validation of Azure service deployment. Option C is wrong because cross-validation is a single-model evaluation technique using multiple train/validation splits, not a comparison between two different models on the same test set. Option D is wrong because cross-validation assesses model performance across data partitions, not the consistency of human annotators (which is inter-rater reliability, measured by Cohen's kappa or similar metrics).

Practice this question →

24

MCQmedium

What is the purpose of splitting data into training, validation, and test sets in machine learning?

A.To increase the total amount of data available for training

B.To evaluate model performance honestly on data it hasn't seen during training

C.To make training faster by using smaller datasets

D.To comply with data privacy regulations

AnswerB

Separate validation and test sets give honest performance estimates — the model never trains on these sets, so performance isn't inflated.

Why this answer

Option B is correct because splitting data into training, validation, and test sets is essential for honestly evaluating a model's performance on unseen data. The training set teaches the model patterns, the validation set tunes hyperparameters and prevents overfitting, and the test set provides a final, unbiased estimate of how the model will perform on new, real-world data. This separation ensures that the model's accuracy metrics reflect its generalization ability rather than memorization of the training data.

Exam trap

The trap here is that candidates often confuse the purpose of splitting with increasing data quantity or speeding up training, not realizing that the core reason is to obtain an unbiased estimate of model performance on unseen data.

How to eliminate wrong answers

Option A is wrong because splitting data does not increase the total amount of data; it partitions existing data, and in fact reduces the amount available for training compared to using all data for training. Option C is wrong because using smaller datasets does not inherently make training faster; the goal of splitting is evaluation, not speed, and training on a smaller subset could actually degrade model quality if the subset is not representative. Option D is wrong because data splitting is a model evaluation technique, not a compliance measure; data privacy regulations like GDPR require anonymization, consent, or data minimization, not train/validation/test splits.

Practice this question →

25

MCQmedium

What is 'ensemble learning' in machine learning and why does it improve performance?

A.Combining predictions from multiple models to improve accuracy and robustness

B.Training a single very large model on the full dataset without any data splitting

C.Selecting the best model from a group of candidates after evaluation

D.Running the same model multiple times with different random seeds to test stability

AnswerA

Ensemble methods (bagging, boosting, stacking) aggregate diverse models — typically outperforming any individual model.

Why this answer

Ensemble learning combines predictions from multiple models (e.g., bagging, boosting, stacking) to reduce variance, bias, or improve robustness. By aggregating diverse models, it often achieves higher accuracy than any single model, as errors from individual models are averaged out or corrected. This is a core technique in Azure Machine Learning, where ensembles like Random Forest or Gradient Boosting are commonly used.

Exam trap

The trap here is confusing ensemble learning with model selection (C) or stability testing (D), as candidates often think picking the 'best' model or running multiple trials is the same as combining predictions.

How to eliminate wrong answers

Option B is wrong because training a single very large model on the full dataset without splitting does not involve multiple models or combination of predictions; it risks overfitting and lacks the error-canceling benefit of ensembles. Option C is wrong because selecting the best model from a group after evaluation is model selection, not ensemble learning—ensembles combine predictions rather than pick one. Option D is wrong because running the same model multiple times with different random seeds tests stability or reproducibility, but it does not combine predictions from distinct models to improve performance; it is a diagnostic technique, not an ensemble method.

Practice this question →

26

MCQmedium

A data scientist trains a binary classification model to detect fraudulent transactions. The dataset contains 99% legitimate transactions (negative class) and 1% fraudulent transactions (positive class). The model predicts 'legitimate' for every transaction in the test set and achieves 99% accuracy. Which metric would best reveal that the model is failing to identify any fraudulent transactions?

A.Accuracy

B.Precision

C.Recall

D.F1-score

AnswerC

Recall for the fraud class is 0 since no fraudulent transactions are identified; this directly shows the model's failure to catch any positive cases.

Why this answer

Recall (also known as sensitivity or true positive rate) measures the proportion of actual positive cases (fraudulent transactions) that the model correctly identifies. With 99% accuracy but zero true positives, the recall is 0%, which immediately reveals the model's complete failure to detect fraud. In Azure Machine Learning, the classification metrics pane would show recall = 0.0 for the positive class, highlighting this issue despite high accuracy.

Exam trap

Microsoft often tests the trap that high accuracy implies a good model, especially with imbalanced data, leading candidates to overlook that recall (or sensitivity) is the critical metric for detecting minority class failures.

How to eliminate wrong answers

Option A is wrong because accuracy only measures overall correctness (99% here) and is misleading when classes are imbalanced; it does not reveal the model's inability to detect the minority class. Option B is wrong because precision measures the proportion of predicted positives that are actually positive, but since the model never predicts any positive, precision is undefined (0/0) or reported as 0, which does not directly expose the failure to find any actual fraud. Option D is wrong because F1-score is the harmonic mean of precision and recall; with recall = 0, F1-score is 0, but recall itself is the more direct and interpretable metric for identifying that no fraudulent transactions were caught.

Practice this question →

27

Drag & Dropmedium

Drag and drop the steps to train a custom vision model in Azure Custom Vision into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Training a custom vision model requires uploading tagged images, training, evaluating, and publishing.

Practice this question →

28

MCQmedium

A data scientist trains a linear regression model to predict house prices. The model's training error is very high, and its test error is nearly as high. Which term best describes this situation?

A.Underfitting

B.Overfitting

C.High bias

D.High variance

AnswerA

Underfitting is characterized by a model that does not learn the training data well, leading to high error on both training and test sets.

Why this answer

Underfitting occurs when a model is too simple to capture the underlying patterns in the data, resulting in high training error and similarly high test error. In this linear regression scenario, the model fails to learn the relationship between features and house prices, leading to poor performance on both training and test sets.

Exam trap

The trap here is that candidates often confuse 'high bias' with 'underfitting' as the best descriptor, but the question asks for the term that best describes the situation, and 'underfitting' is the direct behavioral term while 'high bias' is a contributing cause.

How to eliminate wrong answers

Option B (Overfitting) is wrong because overfitting would show very low training error but high test error, not high training error. Option C (High bias) is incorrect because high bias is a cause of underfitting, not a separate term describing the situation itself. Option D (High variance) is wrong because high variance is associated with overfitting, where the model is too sensitive to training data, not with high training error.

Practice this question →

29

MCQhard

What is 'differential privacy' and how is it relevant to AI model training?

A.The difference in model accuracy between a private deployment and a public API

B.A mathematical guarantee that model training reveals negligible information about any individual's data

C.Encrypting model weights so they remain private from users accessing the model API

D.Using different models for different privacy tiers of customers

AnswerB

Differential privacy adds noise during training — providing formal guarantees that models don't memorise or expose individual records.

Why this answer

Differential privacy is a mathematical framework that ensures the output of a model training process does not reveal whether any specific individual's data was included in the training dataset. It achieves this by adding calibrated noise to the training process or query results, providing a formal privacy guarantee quantified by the epsilon parameter. This is directly relevant to AI model training because it allows organizations to train models on sensitive data while protecting individual privacy, which is a core requirement for compliance with regulations like GDPR and HIPAA.

Exam trap

The trap here is that candidates confuse data privacy techniques (like encryption or access control) with the formal mathematical guarantee of differential privacy, which specifically addresses information leakage from the model's outputs rather than protecting the data at rest or in transit.

How to eliminate wrong answers

Option A is wrong because it describes a comparison of model accuracy between deployment environments, which has nothing to do with the mathematical privacy guarantee of differential privacy. Option C is wrong because encrypting model weights protects the model's intellectual property from API users, but does not prevent the model from memorizing and leaking individual training data points. Option D is wrong because using different models for different privacy tiers is a policy or access control mechanism, not a mathematical technique for limiting information leakage about individuals.

Practice this question →

30

MCQhard

A data scientist is training a binary classification model to detect fraudulent transactions. The dataset contains only 1% fraudulent transactions. The model achieves 99% accuracy on the test set, but when deployed, it fails to detect most actual fraud cases. Which metric would best reveal this issue?

A.Accuracy

B.Precision

C.Recall

D.F1 score

AnswerC

Recall measures the fraction of actual fraud cases that the model correctly identifies. A low recall reveals the model's failure to detect fraud.

Why this answer

Recall (sensitivity) measures the proportion of actual positive cases correctly identified. In this highly imbalanced dataset (1% fraud), a model can achieve 99% accuracy by simply predicting 'non-fraud' for every transaction, which yields zero true positives. Recall reveals this failure because it focuses solely on how many fraudulent transactions were caught, ignoring the vast majority of non-fraud cases.

Exam trap

The trap here is that candidates see '99% accuracy' and assume the model is performing well, failing to recognize that accuracy is a poor metric for imbalanced datasets, and that recall specifically measures the ability to detect the minority class.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading in imbalanced datasets; a 99% accuracy can be achieved by a trivial classifier that never predicts fraud, hiding the model's inability to detect any positive cases. Option B is wrong because precision measures the proportion of predicted fraud cases that are actually fraud, but if the model rarely predicts fraud, precision may be undefined or artificially high, and it does not capture the failure to identify actual fraud. Option D is wrong because the F1 score is the harmonic mean of precision and recall; while it balances both, it can still be low if recall is poor, but the question asks for the metric that best reveals the issue, and recall directly exposes the lack of true positive detections.

Practice this question →

31

MCQmedium

What is 'time series forecasting' and what Azure ML tools support it?

A.Forecasting how long model training will take based on dataset size

B.Predicting future values in time-ordered data (sales, demand, energy) using Azure ML AutoML

C.Scheduling ML jobs to run at specific times using Azure ML pipelines

D.Analysing historical model performance over time to detect degradation

AnswerB

Time series forecasting handles temporal patterns — Azure ML AutoML tries many algorithms and handles seasonality for business prediction.

Why this answer

Time series forecasting is a machine learning technique that predicts future values based on historical, time-ordered data, such as sales, demand, or energy consumption. Azure ML AutoML supports this by automatically selecting the best model (e.g., ARIMA, Prophet, or gradient boosting) and tuning hyperparameters for time-dependent features like seasonality and trends.

Exam trap

The trap here is confusing time series forecasting with unrelated Azure ML features like job scheduling or model monitoring, leading candidates to pick options that describe operational tasks rather than predictive modeling.

How to eliminate wrong answers

Option A is wrong because it describes estimating training duration, which is a performance optimization concern, not a forecasting task on time-ordered data. Option C is wrong because scheduling ML jobs is a pipeline orchestration feature, not a predictive modeling technique. Option D is wrong because analyzing historical model performance for degradation is model monitoring (data drift/concept drift), not forecasting future values.

Practice this question →

32

MCQhard

A data scientist evaluates a regression model that predicts house prices. On the test set, the Mean Absolute Error (MAE) is $8,000 and the Root Mean Squared Error (RMSE) is $25,000. What does the large difference between MAE and RMSE indicate about the model's errors?

A.The model is overfitting the training data

B.The model predictions are consistently biased high

C.The model has some predictions with very large errors

D.The model has high variance due to outliers in training data

AnswerC

RMSE penalizes large errors more heavily than MAE. A significantly higher RMSE relative to MAE implies that while most errors are moderate, there are a few predictions with extremely large errors (outliers).

Why this answer

The large difference between MAE ($8,000) and RMSE ($25,000) indicates that the model has some predictions with very large errors. RMSE squares the errors before averaging, which heavily penalizes large deviations, so a significantly higher RMSE relative to MAE suggests the presence of outliers or extreme prediction errors in the test set.

Exam trap

The trap here is that candidates confuse the mathematical behavior of RMSE (which amplifies large errors) with concepts like overfitting or bias, rather than recognizing it as a direct indicator of outlier errors in the predictions.

How to eliminate wrong answers

Option A is wrong because overfitting is characterized by low training error and high test error, not by a specific relationship between MAE and RMSE; the given metrics are both on the test set, so overfitting cannot be inferred from this difference alone. Option B is wrong because consistently biased high predictions would affect both MAE and RMSE similarly (e.g., both would be elevated), not cause a large disparity; bias shifts the mean error but does not disproportionately inflate RMSE over MAE. Option D is wrong because high variance due to outliers in training data is a cause of overfitting or poor generalization, but the question focuses on the test set errors; the large RMSE relative to MAE on the test set directly indicates the presence of large errors in predictions, not the source of variance in training.

Practice this question →

33

MCQmedium

A data scientist is building a classification model to predict customer churn. The dataset has only 5% churn cases. The model achieves 95% accuracy on the test set, but upon investigation, the data scientist finds the model predicts 'not churn' for nearly every customer. Which metric should the data scientist primarily use to evaluate the model's performance on this imbalanced dataset?

A.Accuracy

B.F1 score

C.Mean Absolute Error (MAE)

D.R-squared

AnswerB

The F1 score is the harmonic mean of precision and recall, making it a robust metric for imbalanced datasets as it accounts for false positives and false negatives.

Why this answer

In an imbalanced dataset with only 5% churn, a model that predicts 'not churn' for every case achieves 95% accuracy by always guessing the majority class. This accuracy is misleading because it fails to identify any churn cases. The F1 score (option B) is the harmonic mean of precision and recall, making it the primary metric for evaluating classification performance on imbalanced data, as it penalizes both false positives and false negatives and is not skewed by class imbalance.

Exam trap

The trap here is that candidates often default to accuracy as the primary metric for classification, failing to recognize that on imbalanced datasets, accuracy can be artificially high and misleading, while the F1 score provides a more truthful evaluation of minority class prediction.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading on imbalanced datasets; a model that always predicts the majority class can achieve high accuracy (95%) while failing to detect any minority class (churn) cases. Option C is wrong because Mean Absolute Error (MAE) is a regression metric that measures average absolute differences between predicted and actual continuous values, not classification performance. Option D is wrong because R-squared is a regression metric that indicates the proportion of variance in the dependent variable explained by the model, and it is not applicable to classification tasks or imbalanced class evaluation.

Practice this question →

34

MCQmedium

A data scientist trains a multiclass classification model to identify different species of flowers (Iris setosa, Iris virginica, Iris versicolor). The overall accuracy is 94%, but the accuracy for the Iris virginica class is only 60%. Which additional metric should the data scientist examine to better understand the model's performance on the minority class?

A.Precision

B.Recall

C.F1-score

D.Mean Absolute Error (MAE)

AnswerC

F1-score is the harmonic mean of precision and recall. It provides a single metric that balances both for a specific class, making it ideal for evaluating performance on the underperforming Iris virginica class.

Why this answer

The F1-score is the harmonic mean of precision and recall, providing a single metric that balances both false positives and false negatives. Since the model has high overall accuracy but poor performance on the minority class (Iris virginica), the F1-score is ideal for evaluating the model's effectiveness on that class, as it accounts for class imbalance better than accuracy alone.

Exam trap

The trap here is that candidates often choose precision or recall individually, not realizing that the F1-score is specifically designed to combine both metrics and is the standard choice for evaluating performance on imbalanced classes in classification tasks.

How to eliminate wrong answers

Option A is wrong because precision alone measures the proportion of true positive predictions among all positive predictions, but it does not consider false negatives, so it cannot fully capture the model's weakness on the minority class. Option B is wrong because recall alone measures the proportion of actual positives correctly identified, but it ignores false positives, providing an incomplete picture of performance on the minority class. Option D is wrong because Mean Absolute Error (MAE) is a regression metric that measures average absolute differences between predicted and actual values, and it is not applicable to classification tasks like multiclass flower species identification.

Practice this question →

35

Matchingmedium

Match each Azure AI service to its pricing model.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Pay per transaction or per API call

Pay per message or channel

Pay per training hour and prediction

Pay per token (input and output)

Pay per storage and queries

Why these pairings

Pricing varies by service and usage.

Practice this question →

36

MCQmedium

What is the difference between 'supervised' and 'unsupervised' learning?

A.Supervised learning requires a human to watch the training process; unsupervised runs automatically

B.Supervised learning trains on labelled data; unsupervised discovers patterns in unlabelled data

C.Supervised learning uses neural networks; unsupervised uses decision trees only

D.Unsupervised learning always produces better results than supervised learning

AnswerB

Supervised = learn from labeled input-output pairs. Unsupervised = find structure in unlabelled data (clusters, anomalies, components).

Why this answer

Option B is correct because supervised learning requires a labeled dataset where each training example has an input-output pair, allowing the model to learn a mapping from inputs to known outputs. In contrast, unsupervised learning works with unlabeled data, and the model must find inherent patterns, groupings, or structures (e.g., clustering or dimensionality reduction) without any predefined labels. This distinction is fundamental to choosing the right machine learning approach in Azure, such as using Azure Machine Learning for supervised tasks like regression/classification or Azure Cognitive Services for unsupervised clustering.

Exam trap

The trap here is that candidates confuse the need for human oversight with the technical definition of supervision, mistakenly thinking 'supervised' means a human must monitor the process, when it actually refers to the presence of labeled training data.

How to eliminate wrong answers

Option A is wrong because supervised learning does not require a human to watch the training process; both supervised and unsupervised learning can run automatically once the data and algorithm are configured. Option C is wrong because both supervised and unsupervised learning can use neural networks (e.g., supervised CNNs for image classification, unsupervised autoencoders for anomaly detection), and neither is restricted to decision trees. Option D is wrong because unsupervised learning does not always produce better results; the quality depends on the problem, data, and evaluation metrics—supervised learning often outperforms when high-quality labeled data is available.

Practice this question →

37

MCQhard

What is the 'bias-variance tradeoff' in machine learning?

A.The tradeoff between model accuracy and inference speed

B.The tradeoff between underfitting (high bias) and overfitting (high variance) when choosing model complexity

C.The tradeoff between training data quantity and model quality

D.The difference in fairness metrics between biased and unbiased model versions

AnswerB

Bias-variance: simple models underfit (high bias), complex models overfit (high variance) — finding the optimal complexity is the core ML challenge.

Why this answer

Option B is correct because the bias-variance tradeoff describes the inverse relationship between underfitting (high bias, where the model is too simple to capture patterns) and overfitting (high variance, where the model is too complex and captures noise). In Azure Machine Learning, this tradeoff is managed by tuning hyperparameters like regularization strength or tree depth to balance model complexity and generalization.

Exam trap

The trap here is that candidates often confuse the term 'bias' in bias-variance tradeoff with ethical or fairness bias, leading them to incorrectly select Option D, which is a separate AI-900 concept about model fairness and responsible AI.

How to eliminate wrong answers

Option A is wrong because it confuses the bias-variance tradeoff with a performance optimization concern (accuracy vs. inference speed), which is unrelated to model complexity and generalization. Option C is wrong because it misrepresents the tradeoff as a data quantity issue; while more data can help reduce variance, the core tradeoff is about model complexity, not data volume. Option D is wrong because it conflates the bias-variance tradeoff with fairness metrics; bias in this context refers to statistical bias in model predictions, not ethical or demographic bias.

Practice this question →

38

MCQeasy

A retail company wants to automatically group customers into segments based on their purchasing history, age, and location without using any predefined labels. The goal is to identify distinct customer profiles for targeted marketing campaigns. Which type of machine learning approach should they use?

A.Supervised learning

B.Unsupervised learning

C.Reinforcement learning

D.Regression

AnswerB

Correct. Unsupervised learning is used when the goal is to find patterns or groupings in data without pre-existing labels. Clustering algorithms like K-means are common for customer segmentation.

Why this answer

Unsupervised learning is the correct approach because the company wants to group customers into segments without predefined labels. The algorithm will discover natural patterns and clusters in the data (purchasing history, age, location) on its own, which is the core characteristic of unsupervised learning.

Exam trap

The trap here is that candidates often confuse clustering (unsupervised) with classification (supervised), mistakenly thinking that grouping customers always requires predefined labels like 'high value' or 'low value'.

How to eliminate wrong answers

Option A is wrong because supervised learning requires labeled training data with predefined output categories, but the question explicitly states 'without using any predefined labels'. Option C is wrong because reinforcement learning involves an agent learning through trial-and-error interactions with an environment to maximize cumulative reward, which is not relevant to grouping static customer data. Option D is wrong because regression is a supervised learning technique used to predict continuous numerical values, not to group data into discrete segments.

Practice this question →

39

MCQmedium

A data scientist trains a regression model to predict house prices. The model achieves very low error on the training data but significantly higher error on a held-out test set. Which problem does this scenario best describe?

A.Underfitting

B.Overfitting

C.High bias

D.High variance

AnswerB

Correct. Overfitting is characterized by excellent performance on training data but poor performance on new data due to memorization of noise.

Why this answer

The scenario describes overfitting, where the model learns the training data too well, including noise and outliers, resulting in very low training error but poor generalization to new data. In Azure Machine Learning, this is often detected by comparing training and validation metrics; a large gap indicates overfitting. The correct answer is B.

Exam trap

The trap here is that candidates confuse 'high variance' (a statistical property) with the specific problem name 'overfitting', but the question explicitly asks for the problem description, not the underlying cause.

How to eliminate wrong answers

Option A is wrong because underfitting occurs when the model fails to capture patterns in the training data, resulting in high error on both training and test sets, not low training error. Option C is wrong because high bias typically leads to underfitting, where the model is too simple and performs poorly on both training and test data. Option D is wrong because high variance is a characteristic of overfitting, but the question asks for the problem described, not the statistical property; overfitting is the direct term for the scenario.

Practice this question →

40

MCQmedium

What is 'hyperparameter tuning' in Azure Machine Learning?

A.Adjusting the physical voltage supplied to GPU hardware during training

B.Searching for the optimal algorithm settings (learning rate, batch size) that maximise model performance

C.Training the model to predict hyper-specific rare events in the data

D.Compressing model weights to reduce inference latency

AnswerB

Hyperparameter tuning explores the configuration space to find the settings that produce the best model — HyperDrive automates this in Azure ML.

Why this answer

Hyperparameter tuning in Azure Machine Learning is the process of searching for the optimal set of algorithm settings, such as learning rate, batch size, or number of epochs, to maximize model performance. Azure ML provides automated hyperparameter tuning via HyperDrive, which uses techniques like Bayesian sampling, random sampling, or grid search to efficiently explore the hyperparameter space. This is a core step in training a model to achieve the best accuracy or other metrics, not a hardware or compression task.

Exam trap

The trap here is that candidates confuse hyperparameter tuning with hardware tuning (Option A) or model compression (Option D), because both involve 'tuning' or 'adjusting' something, but hyperparameter tuning is strictly about algorithm configuration, not hardware or post-training optimization.

How to eliminate wrong answers

Option A is wrong because adjusting the physical voltage supplied to GPU hardware is a hardware-level operation (e.g., undervolting or overclocking) unrelated to Azure Machine Learning's software-based hyperparameter tuning, which operates on algorithm parameters. Option C is wrong because training a model to predict hyper-specific rare events describes imbalanced classification or anomaly detection, not the systematic search over hyperparameters to optimize model settings. Option D is wrong because compressing model weights to reduce inference latency refers to model quantization or pruning techniques (e.g., ONNX Runtime optimization), which are post-training steps, not part of hyperparameter tuning during training.

Practice this question →

41

MCQeasy

A robotics company is training a drone to fly autonomously through an obstacle course. The drone receives positive rewards for staying on course and avoiding obstacles, and negative rewards for collisions. The system learns by trial and error to maximize its cumulative reward. Which type of machine learning is being used?

A.Supervised learning

B.Unsupervised learning

C.Reinforcement learning

D.Semi-supervised learning

AnswerC

The drone learns by trial and error through positive and negative rewards, which is the core of reinforcement learning.

Why this answer

Reinforcement learning is the correct choice because the drone learns by interacting with its environment, receiving rewards (positive for staying on course, negative for collisions), and adjusting its behavior through trial and error to maximize cumulative reward. This is the defining characteristic of reinforcement learning, where an agent learns a policy from feedback signals rather than from labeled data or hidden patterns.

Exam trap

The trap here is that candidates often confuse reinforcement learning with supervised learning because both involve feedback, but reinforcement learning uses evaluative feedback (rewards) rather than instructive feedback (correct labels), which is the key distinction tested in AI-900.

How to eliminate wrong answers

Option A is wrong because supervised learning requires labeled input-output pairs (e.g., images with obstacle labels) to train a model, but the drone receives only reward signals, not explicit correct actions. Option B is wrong because unsupervised learning finds hidden patterns or clusters in unlabeled data without any reward or feedback, which does not match the trial-and-error reward-based learning described. Option D is wrong because semi-supervised learning combines a small amount of labeled data with a large amount of unlabeled data, but the scenario involves no labeled data at all—only reward signals from interactions.

Practice this question →

42

MCQhard

A botanist uses Azure Automated Machine Learning to train a model that classifies iris flowers into three species: setosa, versicolor, and virginica. The dataset contains exactly 50 examples of each species, making it perfectly balanced. The botanist wants the primary metric to give equal importance to the classification performance of each species, regardless of their frequency. Which primary metric should the botanist select in Azure AutoML?

A.Accuracy

B.Weighted F1

C.Macro F1

D.Micro F1

AnswerC

Correct because macro F1 averages the F1 scores of all classes without weighting by class size, thereby giving equal importance to the classification performance of each species.

Why this answer

Macro F1 is the correct primary metric because it computes the F1 score independently for each class and then takes the unweighted average, giving equal importance to each species (setosa, versicolor, virginica) regardless of their balanced frequency. This aligns with the botanist's requirement to treat each species equally in performance evaluation.

Exam trap

The trap here is that candidates often confuse Macro F1 with Weighted F1, assuming that because the dataset is perfectly balanced, Weighted F1 and Macro F1 yield the same value, but the question explicitly tests the intent of 'equal importance to each species' which is the definition of Macro averaging, not the support-weighted approach.

How to eliminate wrong answers

Option A is wrong because accuracy, while simple, does not inherently give equal importance to each class; it treats all correct predictions equally and can be misleading if class imbalance existed, though here the dataset is balanced, the requirement is specifically about equal class-level importance, which accuracy does not explicitly enforce. Option B is wrong because Weighted F1 computes the F1 score for each class and averages them weighted by the number of true instances per class, which would give equal weight to each species only if the classes are perfectly balanced (which they are), but the botanist wants explicit equal importance regardless of frequency, and Weighted F1 is designed to account for support size, not to ignore it. Option D is wrong because Micro F1 aggregates contributions of all classes to compute a global metric, effectively treating each instance equally rather than each class equally, which does not satisfy the requirement of giving equal importance to each species.

Practice this question →

43

MCQmedium

A city's traffic department wants to predict the number of cars that will cross a particular bridge each day to plan maintenance schedules. The output of the model should be a numerical value representing the estimated traffic count. Which type of machine learning task is this?

A.Classification

B.Regression

C.Clustering

D.Reinforcement learning

AnswerB

Regression predicts a continuous numeric output, which matches the requirement of estimating a traffic count.

Why this answer

Regression is the correct type of machine learning task because the goal is to predict a continuous numerical value—the number of cars crossing the bridge each day. Unlike classification, which predicts discrete categories, regression models output a real number, making it ideal for forecasting traffic counts.

Exam trap

The trap here is that candidates may confuse regression with classification because both involve prediction, but the key distinction is that regression outputs a continuous number while classification outputs a discrete label.

How to eliminate wrong answers

Option A is wrong because classification predicts discrete class labels (e.g., 'high traffic' vs 'low traffic'), not a continuous numerical value. Option C is wrong because clustering groups unlabeled data into clusters based on similarity, without predicting a specific numeric output. Option D is wrong because reinforcement learning involves an agent learning optimal actions through rewards and penalties in an environment, not predicting a single numeric value from input features.

Practice this question →

44

MCQmedium

What is 'Azure Machine Learning pipelines' and why are they used?

A.Network pipelines for transferring data between Azure regions at high speed

B.Reusable orchestrated workflows that automate and version-control the full ML training lifecycle

C.CI/CD pipelines in Azure DevOps for deploying application code to production

D.Data pipelines that ingest streaming data from IoT sensors into Azure storage

AnswerB

ML pipelines chain data prep, training, and evaluation steps with caching and scheduling — enabling reproducible, automated ML workflows.

Why this answer

Azure Machine Learning pipelines are reusable orchestrated workflows that automate and version-control the full ML training lifecycle, including data preparation, training, evaluation, and deployment. They enable reproducibility, parallel execution of steps, and easy sharing across teams, which is why option B is correct.

Exam trap

The trap here is that candidates confuse 'pipeline' in the context of ML with generic data or DevOps pipelines, leading them to select options that describe unrelated Azure services like Azure DevOps CI/CD or IoT data ingestion.

How to eliminate wrong answers

Option A is wrong because Azure Machine Learning pipelines are not network pipelines; they are ML-specific workflows, and high-speed data transfer between regions is handled by services like Azure ExpressRoute or Azure Data Box, not ML pipelines. Option C is wrong because CI/CD pipelines in Azure DevOps are for deploying application code, not for orchestrating ML training workflows; ML pipelines focus on the ML lifecycle, not general software deployment. Option D is wrong because data pipelines that ingest streaming data from IoT sensors into Azure storage are typically built with Azure Stream Analytics or Azure IoT Hub, not Azure Machine Learning pipelines, which are designed for ML model training and management.

Practice this question →

45

MCQmedium

A data scientist has a dataset containing information about houses: size (sq ft), number of bedrooms, location, and the actual sale price. The goal is to train a model that predicts the price of a new house based on these features. Which type of machine learning task is this?

A.A) Classification

B.B) Regression

C.C) Clustering

D.D) Reinforcement Learning

AnswerB

Correct. Regression models can predict a continuous numeric output such as house prices.

Why this answer

This is a regression task because the goal is to predict a continuous numeric value (the sale price) based on input features. Regression models learn the relationship between independent variables (size, bedrooms, location) and a dependent variable (price) to output a real number. In Azure Machine Learning, regression algorithms like Linear Regression, Decision Forest Regression, or Neural Network Regression would be appropriate for this scenario.

Exam trap

The trap here is confusing regression with classification because both are supervised learning, but regression outputs a continuous number while classification outputs a discrete label.

How to eliminate wrong answers

Option A is wrong because classification predicts discrete categorical labels (e.g., 'expensive' or 'cheap'), not a continuous numeric price. Option C is wrong because clustering groups unlabeled data into clusters based on similarity, without a target variable like sale price. Option D is wrong because reinforcement learning involves an agent learning through rewards and punishments from interactions with an environment, not from a static dataset with labeled examples.

Practice this question →

46

MCQmedium

What is 'model registry' in Azure Machine Learning?

A.A public marketplace where organisations can buy pre-trained models from third parties

B.A centralised versioned store for tracking and managing trained models and their lineage

C.A database of domain-specific vocabularies used for NLP model training

D.A compliance register documenting AI models used by an organisation for audit purposes

AnswerB

The model registry stores all model versions with metadata and lineage — enabling comparison, rollback, and controlled deployment.

Why this answer

The model registry in Azure Machine Learning is a centralized, versioned store that tracks trained models along with their metadata, lineage, and lifecycle. It enables data scientists to register, version, and manage models, ensuring reproducibility and governance across the ML lifecycle.

Exam trap

The trap here is that candidates confuse the model registry with a marketplace or compliance tool, but the exam specifically tests the registry's role as a versioned repository for managing model artifacts and their lineage.

How to eliminate wrong answers

Option A is wrong because it describes a model marketplace or catalog (like Azure AI Gallery or Hugging Face), not the model registry which is for internal versioning and management. Option C is wrong because it refers to a domain-specific vocabulary database used in NLP, which is unrelated to model tracking; Azure ML uses datasets and tokenizers for such purposes. Option D is wrong because it describes a compliance register for audit purposes, which is a governance artifact, not the model registry's primary function of versioned storage and lineage tracking.

Practice this question →

47

MCQmedium

Which Azure service provides a no-code/low-code drag-and-drop interface for building machine learning pipelines?

A.Azure AI Custom Vision

B.Azure Machine Learning Designer

C.Azure AI Language Studio

D.Azure Databricks

AnswerB

Azure ML Designer provides a visual drag-and-drop interface for building machine learning pipelines without writing code.

Why this answer

Azure Machine Learning Designer is the correct answer because it provides a drag-and-drop, no-code/low-code visual interface for building, testing, and deploying machine learning pipelines. Users can connect pre-built modules for data transformation, model training, and scoring without writing code, making it ideal for rapid prototyping and operationalization of ML workflows.

Exam trap

The trap here is that candidates confuse Azure AI Language Studio (a no-code NLP tool) with a general ML pipeline builder, but Language Studio is domain-specific to text analytics and does not support building arbitrary ML pipelines with drag-and-drop modules.

How to eliminate wrong answers

Option A is wrong because Azure AI Custom Vision is a specialized service for training custom image classification and object detection models, not a general-purpose drag-and-drop ML pipeline builder. Option C is wrong because Azure AI Language Studio is a no-code tool for building natural language processing (NLP) applications like text analysis and conversational AI, not for constructing end-to-end ML pipelines. Option D is wrong because Azure Databricks is a big data analytics and collaborative notebook environment based on Apache Spark, requiring code (Python, Scala, SQL) and lacking a native drag-and-drop pipeline designer.

Practice this question →

48

MCQmedium

A data scientist trains a classification model to predict whether an email is spam or not. The model achieves 98% accuracy on the test set, but upon inspection, it classifies all emails as 'not spam' because the dataset has 95% non-spam emails. What is the most likely issue?

A.Overfitting

B.Underfitting

C.Data imbalance

D.Feature scaling error

AnswerC

Data imbalance, where one class vastly outnumbers the other, can cause a model to predict the majority class exclusively. Accuracy is misleading in such cases; the model has not learned to identify spam.

Why this answer

The model achieves 98% accuracy by simply predicting all emails as 'not spam', which reflects the 95% majority class in the dataset. This is a classic symptom of class imbalance, where the model learns to exploit the skewed distribution rather than learning meaningful patterns to distinguish spam from non-spam. In Azure Machine Learning, techniques like SMOTE or stratified sampling are used to mitigate this issue.

Exam trap

The trap here is that candidates see 98% accuracy and assume the model is performing well, failing to recognize that accuracy is meaningless when the dataset is highly imbalanced and the model simply predicts the majority class.

How to eliminate wrong answers

Option A is wrong because overfitting would cause the model to perform well on training data but poorly on unseen test data, whereas here the model performs uniformly poorly on the minority class across both sets. Option B is wrong because underfitting would result in low accuracy on both training and test sets due to insufficient model complexity, not high accuracy driven by majority class bias. Option D is wrong because feature scaling errors affect models sensitive to input ranges (e.g., SVM, neural networks), but the issue here is purely about class distribution, not feature preprocessing.

Practice this question →

49

Drag & Dropmedium

Drag and drop the steps to analyze an image with Azure Computer Vision into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Image analysis requires resource setup, API call with features, and understanding the response.

Practice this question →

50

MCQmedium

A data scientist trains a classification model on a dataset of 10,000 labeled emails to distinguish spam from non-spam. The model achieves 99% accuracy on the training data but only 70% accuracy on a held-out test set. Which term best describes this situation?

A.A) Underfitting

B.B) Overfitting

C.C) Bias-variance tradeoff

D.D) Regularization

AnswerB

Overfitting happens when the model memorizes the training data, including noise, leading to high training accuracy but low test accuracy. This matches the described 99% training vs 70% test accuracy.

Why this answer

The model performs exceptionally well on training data (99% accuracy) but poorly on unseen test data (70% accuracy), which is the classic symptom of overfitting. Overfitting occurs when the model learns noise and specific patterns in the training set rather than generalizing to new data, often due to excessive model complexity or insufficient regularization.

Exam trap

The trap here is that candidates confuse 'high training accuracy' with a good model, failing to recognize that the large gap between training and test performance is the hallmark of overfitting, not underfitting or a tradeoff concept.

How to eliminate wrong answers

Option A is wrong because underfitting would result in poor performance on both training and test data, not high training accuracy with low test accuracy. Option C is wrong because bias-variance tradeoff is a general principle describing the balance between underfitting (high bias) and overfitting (high variance), but it is not the specific term for this situation where the model has memorized the training data. Option D is wrong because regularization is a technique used to prevent overfitting (e.g., L1/L2 regularization in Azure ML), not the name of the problem itself.

Practice this question →

51

MCQeasy

A real estate company has a dataset containing square footage, number of bedrooms, and location for 10,000 houses, along with their sale prices. They want to train a model that predicts the sale price of a new house based on these features. Which type of machine learning should they use?

A.Supervised classification

B.Supervised regression

C.Unsupervised clustering

D.Reinforcement learning

AnswerB

Regression is a type of supervised learning that models the relationship between input features and a continuous numeric target variable, such as sale price.

Why this answer

The goal is to predict a continuous numeric value (sale price) from input features (square footage, bedrooms, location). This is a classic supervised regression problem because the training data includes labeled target values (prices) and the output is a real number, not a category.

Exam trap

The trap here is that candidates confuse 'classification' with any prediction task, forgetting that regression is specifically for continuous numeric outputs, not categorical labels.

How to eliminate wrong answers

Option A is wrong because supervised classification predicts discrete class labels (e.g., 'expensive' or 'cheap'), not a continuous numeric price. Option C is wrong because unsupervised clustering finds hidden patterns or groups in unlabeled data, but here the dataset has labeled sale prices, so clustering is inappropriate. Option D is wrong because reinforcement learning learns optimal actions through trial-and-error interactions with an environment (e.g., game playing or robotics), not from a static dataset of house features and prices.

Practice this question →

52

MCQeasy

What is the role of a label (also called target or ground truth) in supervised machine learning?

A.A category of input features used by the model

B.The correct output or answer associated with each training example that the model learns to predict

C.A text description attached to a model explaining what it does

D.A tag applied to Azure ML resources for organization

AnswerB

Labels are the ground truth answers in training data — the model learns to produce predictions that match the labels.

Why this answer

In supervised machine learning, the label (also called target or ground truth) is the known correct output for each training example. The model uses these labels during training to learn the mapping from input features to outputs, enabling it to make accurate predictions on new, unseen data. This is fundamental to supervised learning, where the algorithm minimizes the error between its predictions and the ground truth labels.

Exam trap

The trap here is confusing the term 'label' in machine learning (ground truth output) with the general concept of a 'label' as a tag or category, leading candidates to mistakenly choose Option A or D.

How to eliminate wrong answers

Option A is wrong because a label is not a category of input features; input features are the independent variables used to make predictions, while the label is the dependent variable the model aims to predict. Option C is wrong because a text description attached to a model is documentation or metadata, not the ground truth used for training. Option D is wrong because a tag applied to Azure ML resources is an organizational metadata label for resource management, not a training data label used in supervised learning.

Practice this question →

53

MCQeasy

What is 'dimensionality reduction' and why is it useful in machine learning?

A.Reducing the physical size of AI hardware components for edge deployment

B.Reducing the number of input features while preserving key information for efficient modelling

C.Reducing the model's output to a single dimension for binary decision making

D.Simplifying the Azure ML workspace to have fewer compute resources and experiments

AnswerB

Dimensionality reduction (PCA, UMAP) removes redundant features — enabling faster training, less overfitting, and data visualisation.

Why this answer

Dimensionality reduction is the process of reducing the number of input features (variables) in a dataset while retaining as much of the original information as possible. This is useful in machine learning because it helps combat the 'curse of dimensionality', reduces overfitting, lowers computational cost, and can improve model performance by eliminating noise and redundant features. In Azure Machine Learning, techniques like Principal Component Analysis (PCA) are commonly used for this purpose.

Exam trap

The trap here is that candidates often confuse dimensionality reduction with model output simplification or hardware reduction, because the word 'reduction' is used broadly, but the exam specifically tests the definition as a feature preprocessing technique for input data.

How to eliminate wrong answers

Option A is wrong because it describes physical hardware downsizing for edge deployment, which is unrelated to the mathematical or algorithmic concept of reducing feature dimensions in a dataset. Option C is wrong because it confuses dimensionality reduction with collapsing the model's output to a single dimension for binary classification; dimensionality reduction applies to input features, not the output. Option D is wrong because it refers to simplifying an Azure ML workspace by reducing compute resources and experiments, which is an operational or administrative action, not a data preprocessing or feature engineering technique.

Practice this question →

54

MCQhard

What is the bias-variance tradeoff in machine learning?

A.Choosing between model accuracy and computational cost

B.The balance between model simplicity (underfitting) and model complexity (overfitting)

C.Deciding whether to use biased training data or unbiased test data

D.The tradeoff between training speed and model size

AnswerB

High bias = underfitting (too simple); high variance = overfitting (too complex). The tradeoff finds optimal model complexity.

Why this answer

Option B is correct because the bias-variance tradeoff directly addresses the tension between underfitting (high bias, overly simple model) and overfitting (high variance, overly complex model). In Azure Machine Learning, this tradeoff is managed through hyperparameter tuning (e.g., regularization strength, tree depth) to achieve optimal generalization on unseen data.

Exam trap

The trap here is that candidates confuse 'bias' in the bias-variance tradeoff (model bias) with 'bias' in data fairness or ethical AI, leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because it confuses the bias-variance tradeoff with a resource allocation decision (accuracy vs. computational cost), which is a separate engineering concern, not a fundamental ML principle. Option C is wrong because it misrepresents bias as data bias (e.g., sampling bias) rather than model bias (systematic error from oversimplification), and the tradeoff involves variance, not test data selection. Option D is wrong because it conflates the tradeoff with operational metrics (training speed and model size), which are unrelated to the statistical concepts of bias and variance.

Practice this question →

55

MCQhard

A real estate company trains a model to predict house prices. They evaluate it on a test set of 100 houses. The model predictions have a mean absolute error (MAE) of $5,000 and a root mean squared error (RMSE) of $20,000. What does the large difference between MAE and RMSE indicate about the model's errors?

A.The model has many small errors and a few large errors.

B.The model consistently overestimates prices.

C.The model has a high bias and low variance.

D.The model is perfectly accurate.

AnswerA

RMSE penalizes large errors heavily; a large gap indicates a few outliers with high error, even if most errors are small.

Why this answer

The mean absolute error (MAE) of $5,000 and root mean squared error (RMSE) of $20,000 show a large discrepancy because RMSE squares errors before averaging, which heavily penalizes large deviations. Since RMSE is four times larger than MAE, this indicates that while most predictions are close (small errors), there are a few predictions with very large errors that inflate the RMSE. This pattern is classic for a model that performs well on most houses but fails badly on a few outliers.

Exam trap

The trap here is that candidates assume a large RMSE always means the model is poor overall, but the question tests the understanding that a large gap between RMSE and MAE specifically reveals the presence of outliers with large errors, not uniform inaccuracy.

How to eliminate wrong answers

Option B is wrong because the MAE and RMSE values do not indicate direction of error (over- or underestimation); they measure magnitude only, and a consistent bias would require analyzing signed errors or mean error. Option C is wrong because high bias would lead to systematic underfitting with large errors across all predictions, not a mix of small and large errors; the large RMSE relative to MAE suggests high variance (overfitting on outliers), not high bias. Option D is wrong because a perfectly accurate model would have both MAE and RMSE equal to $0, not $5,000 and $20,000.

Practice this question →

56

MCQhard

A data scientist is building a classification model to detect fraudulent transactions. The dataset has 1,000,000 legitimate transactions and only 1,000 fraudulent ones. The model achieves 99.9% accuracy on the test set, but it fails to catch most fraudulent cases. Which metric should the data scientist prioritize to better evaluate the model's performance on this imbalanced dataset?

A.Accuracy

B.Mean Squared Error

C.Recall

D.R-squared

AnswerC

Recall measures the proportion of actual fraudulent transactions that the model correctly identifies, which is the key metric for catching fraud.

Why this answer

Recall measures the proportion of actual positive cases (fraudulent transactions) correctly identified by the model. With only 1,000 fraud cases out of 1,001,000 total transactions, a model that predicts 'legitimate' for every transaction would achieve 99.9% accuracy but 0% recall, making recall the critical metric for imbalanced fraud detection.

Exam trap

The trap here is that candidates often default to accuracy as the universal metric, not recognizing that on imbalanced datasets (like 99.9% majority class), accuracy can be deceptively high while the model fails entirely at its primary task of detecting the minority class.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading on imbalanced datasets; a model can achieve high accuracy by simply predicting the majority class (legitimate) for all cases, failing to detect the minority class (fraud). Option B is wrong because Mean Squared Error (MSE) is a regression metric used to measure average squared differences between predicted and actual continuous values, not applicable to classification tasks like fraud detection. Option D is wrong because R-squared is a regression metric that indicates the proportion of variance in the dependent variable explained by independent variables, irrelevant for evaluating classification model performance on imbalanced data.

Practice this question →

57

MCQeasy

A company builds a machine learning model to predict whether a customer will purchase a product. They use a training dataset with 50% purchasers and 50% non-purchasers. The model achieves 90% accuracy on the test set. However, when deployed, the model performs poorly because the actual customer base has only 5% purchasers. What is the most likely cause of this poor performance?

A.The model is overfitted to the training data.

B.The model is underfitted and fails to capture key patterns.

C.Data leakage caused inflated accuracy during testing.

D.The training and deployment data have different distributions.

AnswerD

This is correct. The training set had 50% purchasers, but the production environment only has 5%. The model's assumptions no longer hold, leading to poor real-world performance even though test accuracy was high.

Why this answer

The model was trained on a balanced dataset (50% purchasers, 50% non-purchasers) but deployed on a real-world dataset with only 5% purchasers. This mismatch in class distribution between training and deployment data causes the model to fail, as it learned decision boundaries optimized for balanced classes. This is a classic case of distribution shift, specifically prior probability shift, which invalidates the model's assumptions about the target variable's base rate.

Exam trap

The trap here is that candidates often confuse high accuracy on a balanced test set with real-world readiness, failing to recognize that accuracy is misleading when class distributions shift dramatically between training and production.

How to eliminate wrong answers

Option A is wrong because overfitting would cause poor performance on any test set drawn from the same distribution as training data, but here the test set accuracy was 90%, indicating the model generalized well within the training distribution. Option B is wrong because underfitting would result in low accuracy on both training and test sets, not the observed 90% test accuracy. Option C is wrong because data leakage would inflate test accuracy artificially, but the issue here is not about leakage—it is about the deployment data having a fundamentally different class distribution than the training data.

Practice this question →

58

MCQmedium

A data scientist trains a binary classification model to predict whether a loan applicant will default (positive class) or not (negative class). The training data contains 5% default cases. The model predicts 'no default' for every applicant in the test set and achieves 95% accuracy. Which evaluation metric best reveals that the model is failing to identify any default cases?

A.A. Precision for the default class

B.B. Recall for the default class

C.C. F1-score for the default class

D.D. Overall accuracy

AnswerB

Recall (sensitivity) for defaults is the fraction of actual defaults that the model correctly identifies. With no defaults predicted, recall = 0%, clearly showing the model's failure.

Why this answer

Recall for the default class (positive class) measures the proportion of actual default cases that the model correctly identifies. With a model that predicts 'no default' for every applicant, recall for the default class is 0% because it fails to identify any true positive cases. This metric directly reveals the model's inability to detect defaults, despite the high overall accuracy of 95%.

Exam trap

The trap here is that candidates often focus on the high overall accuracy (95%) and assume the model is performing well, overlooking how class imbalance can make accuracy a misleading metric, and fail to recognize that recall for the positive class is the appropriate diagnostic tool.

How to eliminate wrong answers

Option A is wrong because precision for the default class would be undefined (division by zero) when no positive predictions are made, but it does not directly reveal the failure to identify defaults; precision focuses on the accuracy of positive predictions, not their completeness. Option C is wrong because the F1-score is the harmonic mean of precision and recall; with recall at 0%, the F1-score would also be 0%, but it is not the best metric to reveal the failure because it combines both metrics and is less intuitive than recall alone for this scenario. Option D is wrong because overall accuracy is 95% due to the class imbalance (5% defaults), which masks the model's complete failure to predict defaults; accuracy is misleading in imbalanced datasets and does not reveal the lack of positive predictions.

Practice this question →

59

MCQmedium

What is the purpose of a confusion matrix in evaluating a classification model?

A.To measure how long the model takes to make predictions

B.To show the breakdown of correct and incorrect predictions by class

C.To visualize the distribution of training data

D.To show how confused users are when interacting with AI systems

AnswerB

A confusion matrix reveals true positives, false positives, true negatives, and false negatives, enabling calculation of precision, recall, and F1.

Why this answer

A confusion matrix is a table that compares the actual class labels against the model's predicted class labels, showing the counts of true positives, true negatives, false positives, and false negatives for each class. This breakdown allows you to compute key performance metrics such as accuracy, precision, recall, and F1-score, which are essential for evaluating a classification model's performance. Option B correctly identifies this purpose.

Exam trap

The trap here is that candidates may confuse the term 'confusion' with user confusion or think the matrix measures prediction speed, when in fact it is a structured table for analyzing correct and incorrect predictions per class.

How to eliminate wrong answers

Option A is wrong because prediction time is a performance metric related to latency or throughput, not a classification evaluation tool like a confusion matrix. Option C is wrong because visualizing the distribution of training data is typically done with histograms, bar charts, or scatter plots, not a confusion matrix, which is used for evaluating predictions against actual labels. Option D is wrong because user confusion or sentiment is not a technical metric in machine learning model evaluation; the term 'confusion' in confusion matrix refers to the matrix's ability to show where the model is 'confused' between classes, not human user confusion.

Practice this question →

60

MCQmedium

A data scientist trains a regression model to predict house prices. The model has a mean absolute error (MAE) of $5,000 on the test set. Which statement best interprets this metric?

A.On average, the model's predictions are $5,000 away from the actual prices.

B.The model is accurate 95% of the time.

C.The model's predictions are within $5,000 of the actual prices for 50% of the houses.

D.The square root of the average squared error is $5,000.

AnswerA

Correct. Mean Absolute Error (MAE) is the average absolute difference between predicted and actual values.

Why this answer

Option A is correct because Mean Absolute Error (MAE) measures the average absolute difference between predicted and actual values. An MAE of $5,000 means that, on average, each prediction deviates from the true house price by $5,000. This is a standard interpretation of MAE in regression metrics.

Exam trap

The trap here is that candidates often confuse MAE with RMSE or misinterpret it as a percentage accuracy or percentile bound, leading them to select options B, C, or D.

How to eliminate wrong answers

Option B is wrong because MAE does not represent accuracy percentage; it is an average error magnitude, not a classification accuracy metric. Option C is wrong because MAE is an average over all predictions, not a median or percentile; it does not imply that 50% of predictions fall within $5,000. Option D is wrong because it describes Root Mean Squared Error (RMSE), not MAE; RMSE is the square root of the average squared error, which penalizes larger errors more heavily.

Practice this question →

61

MCQmedium

A data scientist trains a decision tree model to predict customer churn. The model achieves 99% accuracy on the training data but only 80% on the test data. Which concept best explains this performance difference?

A.Underfitting

B.Overfitting

C.Bias-variance tradeoff

D.Cross-validation

AnswerB

Overfitting means the model learns the training data too well, including noise, leading to poor generalization. The large gap between 99% training and 80% test accuracy is a hallmark of overfitting.

Why this answer

The model's high accuracy on training data (99%) but significantly lower accuracy on test data (80%) indicates that it has memorized the training data rather than learning generalizable patterns. This is the classic symptom of overfitting, where the decision tree captures noise and outliers in the training set, leading to poor performance on unseen data.

Exam trap

The trap here is that candidates may confuse overfitting with underfitting because they see a performance gap, but the key differentiator is that overfitting shows high training accuracy, while underfitting shows low accuracy on both sets.

How to eliminate wrong answers

Option A is wrong because underfitting would result in poor performance on both training and test data, not high training accuracy with lower test accuracy. Option C is wrong because while the bias-variance tradeoff is related to overfitting, it is a broader concept describing the balance between underfitting (high bias) and overfitting (high variance); the specific performance pattern described is directly explained by overfitting. Option D is wrong because cross-validation is a technique used to evaluate model generalization and mitigate overfitting, not a concept that explains the performance difference itself.

Practice this question →

62

MCQmedium

What is 'active learning' in Azure Machine Learning data labelling?

A.Having users actively participate in model training by rating AI responses

B.Strategically selecting the most informative examples for human labelling to maximise learning efficiency

C.A training approach where the model actively searches the internet for additional training data

D.Continuous model training that runs actively in the background as new data arrives

AnswerB

Active learning labels uncertain model predictions first — achieving better performance with fewer labels than random selection.

Why this answer

Active learning in Azure Machine Learning data labelling is a technique where the model identifies the data points it is most uncertain about and prioritizes those for human review. This strategic selection maximizes the learning efficiency of the model by ensuring that each labelled example provides the highest possible information gain, reducing the total number of labels needed.

Exam trap

The trap here is that candidates confuse 'active learning' with 'online learning' or 'continuous training' (Option D), because both involve iterative model updates, but active learning is specifically about sample selection efficiency, not the timing of training.

How to eliminate wrong answers

Option A is wrong because it describes a human-in-the-loop feedback mechanism for reinforcement learning or model evaluation, not the data labelling optimization process of active learning. Option C is wrong because active learning does not involve the model searching the internet; it operates on the existing unlabelled dataset to select samples for human annotation. Option D is wrong because it describes continuous or online learning where the model updates incrementally with new data, not the selective sampling strategy used in active learning to reduce labelling effort.

Practice this question →

63

MCQhard

A data scientist trains a binary classification model to detect spam emails. The dataset contains 95% legitimate emails (negative class) and 5% spam (positive class). The model predicts all emails as legitimate. The accuracy is 95%, but the model is useless. Which metric would best indicate the model's failure?

A.Precision

B.Recall

C.F1 score

D.Specificity

AnswerB

Recall (sensitivity) for the positive class is 0 because no spam emails are detected, highlighting the model's complete failure to identify the minority class.

Why this answer

Recall (sensitivity) measures the proportion of actual positive cases correctly identified. With 5% spam and the model predicting all as legitimate, recall is 0% because no spam emails are detected. This directly exposes the model's failure to identify the positive class despite high accuracy.

Exam trap

The trap here is that candidates see 95% accuracy and assume the model is good, failing to recognize that accuracy is meaningless for imbalanced classes without evaluating per-class metrics like recall.

How to eliminate wrong answers

Option A is wrong because precision measures the proportion of positive predictions that are correct; since the model predicts no positives, precision is undefined (division by zero) or 0, but it does not directly show the failure to find actual positives. Option C is wrong because the F1 score is the harmonic mean of precision and recall; with recall at 0, F1 is also 0, but it is a composite metric that obscures the specific failure mode. Option D is wrong because specificity measures the proportion of actual negatives correctly identified; the model correctly identifies all legitimate emails (specificity = 100%), which would misleadingly suggest good performance on the negative class.

Practice this question →

64

MCQmedium

What is the purpose of Azure Machine Learning's automated ML (AutoML) feature?

A.To automatically collect and label training data

B.To automatically try multiple algorithms and hyperparameters to find the best model

C.To automatically deploy trained models to production

D.To automatically monitor models for performance degradation

AnswerB

AutoML runs experiments across many algorithm/hyperparameter combinations and recommends the best performing model.

Why this answer

Azure Machine Learning's automated ML (AutoML) feature automates the process of algorithm selection and hyperparameter tuning. It iterates through various machine learning algorithms and their hyperparameter combinations, evaluating each based on a primary metric (e.g., accuracy, AUC_weighted) to identify the best-performing model for the given dataset and task (classification, regression, or forecasting). This significantly reduces the manual effort and time required for model development.

Exam trap

The trap here is that candidates confuse AutoML's automated model training and tuning with other Azure ML capabilities like automated deployment or monitoring, leading them to select options C or D.

How to eliminate wrong answers

Option A is wrong because AutoML does not handle data collection or labeling; it requires a prepared dataset with labels already present. Option C is wrong because AutoML focuses on model training and selection, not deployment; deploying the best model to a production endpoint is a separate step using Azure ML's model registration and deployment services. Option D is wrong because AutoML does not include ongoing performance monitoring; model monitoring for data drift or performance degradation is handled by Azure ML's Model Data Collector and monitoring capabilities.

Practice this question →

65

MCQeasy

A data scientist uses Azure Machine Learning to train a model that predicts the electricity consumption (in kilowatt-hours) of a building based on features like building age, square footage, and number of occupants. The data scientist wants to evaluate how accurately the model's predictions match the actual consumption values. Which evaluation metric is most appropriate for this regression task?

A.Precision

B.Mean Absolute Error (MAE)

C.F1 score

D.Area Under the ROC Curve (AUC)

AnswerB

MAE is a standard regression metric that measures the average absolute difference between predicted and actual values, making it appropriate for evaluating prediction accuracy.

Why this answer

Mean Absolute Error (MAE) is the most appropriate metric for this regression task because it directly measures the average absolute difference between predicted and actual electricity consumption values. Unlike classification metrics, MAE provides an interpretable error in the same unit (kilowatt-hours) as the target variable, making it ideal for evaluating continuous numerical predictions.

Exam trap

The trap here is that candidates confuse classification metrics (Precision, F1, AUC) with regression metrics, mistakenly applying them to a continuous prediction task because they recall these metrics from other Azure ML scenarios like fraud detection or image classification.

How to eliminate wrong answers

Option A is wrong because Precision is a classification metric that measures the proportion of true positive predictions among all positive predictions, which is irrelevant for a regression task predicting continuous values. Option C is wrong because F1 score is the harmonic mean of Precision and Recall, designed for binary or multiclass classification problems, not for evaluating regression errors. Option D is wrong because Area Under the ROC Curve (AUC) evaluates the trade-off between true positive rate and false positive rate for classification models, and has no meaning for continuous numerical predictions like electricity consumption.

Practice this question →

66

MCQmedium

What is the Azure Machine Learning model registry?

A.A marketplace for purchasing pre-built AI models

B.A centralized repository for versioning, tracking, and managing trained ML models

C.A compliance database for AI regulatory requirements

D.A system for monitoring models in production for data drift

AnswerB

The model registry stores trained models with versioning, lineage tracking, and metadata to support controlled deployment and governance.

Why this answer

The Azure Machine Learning model registry is a centralized repository within Azure Machine Learning that enables versioning, tracking, and management of trained machine learning models. It allows data scientists and MLOps engineers to register models with metadata, tags, and descriptions, and to manage multiple versions of the same model, facilitating reproducibility, collaboration, and deployment lifecycle management.

Exam trap

The trap here is that candidates confuse the model registry with model monitoring or deployment features, but the registry is purely a versioning and management store, not a runtime monitoring or purchasing system.

How to eliminate wrong answers

Option A is wrong because the Azure Machine Learning model registry is not a marketplace for purchasing pre-built AI models; that describes Azure AI Gallery or Azure Marketplace, not the model registry. Option C is wrong because the model registry is not a compliance database for AI regulatory requirements; compliance features are handled by Azure Policy, Azure Blueprints, or Azure Purview, not the model registry. Option D is wrong because the model registry is not a system for monitoring models in production for data drift; that is the function of Azure Machine Learning's data drift monitoring or Azure Monitor, while the registry focuses on versioning and storage of model artifacts.

Practice this question →

67

MCQmedium

What is 'regularization' in machine learning and why is it used?

A.Normalizing input data to a standard scale before training

B.Adding a complexity penalty to the training objective to reduce overfitting

C.Ensuring models comply with AI regulations in different jurisdictions

D.Standardizing the format of training data from different sources

AnswerB

Regularization (L1/L2) penalizes large model weights during training, encouraging simpler models that generalize better.

Why this answer

Regularization is a technique used to reduce overfitting by adding a penalty term to the loss function during training. This penalty discourages the model from learning overly complex patterns (e.g., large weights) that fit the training data too closely but fail to generalize to new data. In Azure Machine Learning, regularization can be applied via algorithms like Lasso (L1) or Ridge (L2) regression, which directly modify the optimization objective.

Exam trap

The trap here is that candidates confuse regularization with data normalization or standardization, because both involve 'regularizing' data in a colloquial sense, but regularization is a penalty on model complexity, not a data transformation step.

How to eliminate wrong answers

Option A is wrong because normalizing input data to a standard scale is called feature scaling or normalization, not regularization; it addresses convergence speed and numerical stability, not overfitting. Option C is wrong because ensuring models comply with AI regulations refers to governance and responsible AI practices, not a mathematical technique to improve model generalization. Option D is wrong because standardizing the format of training data from different sources is data preprocessing or data integration, unrelated to adding a complexity penalty to the training objective.

Practice this question →

68

MCQmedium

What is 'model interpretability' and which Azure tool helps with it?

A.Understanding what programming language a model was written in

B.Understanding why a model makes specific predictions by identifying influential features — supported by Azure ML's Responsible AI dashboard

C.Translating model documentation into multiple languages

D.Monitoring how quickly a model responds to prediction requests

AnswerB

Interpretability explains model decisions; Azure ML's Responsible AI dashboard with InterpretML shows feature importance and counterfactual analysis.

Why this answer

Model interpretability refers to the ability to understand and explain why a machine learning model makes specific predictions, typically by identifying which input features most influenced the output. Azure Machine Learning's Responsible AI dashboard directly supports this through built-in interpretability components like feature importance plots and error analysis, enabling developers to debug models and build trust. Option B correctly pairs the definition with the specific Azure tool that implements it.

Exam trap

The trap here is that candidates confuse 'interpretability' with general monitoring or documentation tasks, but the AI-900 exam specifically tests the Responsible AI dashboard as the tool for explaining model predictions through feature importance.

How to eliminate wrong answers

Option A is wrong because model interpretability is about understanding prediction logic, not the programming language used to write the model — the language is irrelevant to explaining model behavior. Option C is wrong because translating documentation is a localization task, not a machine learning interpretability function; Azure's Responsible AI dashboard does not perform language translation. Option D is wrong because monitoring prediction response speed is a performance metric (latency), not an interpretability concern; Azure Monitor or Application Insights would track that, not the Responsible AI dashboard.

Practice this question →

69

MCQhard

What is the difference between 'precision' and 'recall' as model evaluation metrics?

A.Precision is the speed of prediction; recall is the model's memory usage

B.Precision measures correctness of positive predictions; recall measures coverage of actual positives

C.Precision and recall are both the same metric, just calculated on different datasets

D.Recall is higher than precision whenever the model has seen more training data

AnswerB

Precision = TP/(TP+FP): how often positive predictions are right. Recall = TP/(TP+FN): how many true positives were found.

Why this answer

Option B is correct because precision measures the proportion of positive identifications that were actually correct (true positives / (true positives + false positives)), while recall measures the proportion of actual positives that were correctly identified (true positives / (true positives + false negatives)). In Azure Machine Learning, these metrics are critical for evaluating classification models, especially when dealing with imbalanced datasets, as they provide distinct insights into model performance.

Exam trap

The trap here is that candidates often confuse precision and recall with unrelated concepts like speed or memory, or assume they are identical metrics, when in fact they measure fundamentally different aspects of classification accuracy.

How to eliminate wrong answers

Option A is wrong because precision is not related to prediction speed; it is a statistical metric of classification accuracy, and recall is not about memory usage but about the model's ability to find all relevant positive instances. Option C is wrong because precision and recall are distinct metrics that measure different aspects of model performance; they are not the same metric calculated on different datasets. Option D is wrong because recall is not inherently higher than precision when more training data is used; the relationship between precision and recall depends on the model's threshold and the distribution of the data, not simply on the volume of training data.

Practice this question →

70

MCQmedium

A data scientist has trained a binary classification model to detect fraudulent credit card transactions. The dataset contains 99.9% legitimate transactions and only 0.1% fraudulent ones. The model predicts all transactions as legitimate, achieving 99.9% accuracy on the test set. However, the business requires the model to actually catch as many fraudulent transactions as possible. Which metric would best reveal the model's failure to identify fraud?

A.Accuracy

B.Recall

C.Precision

D.F1 score

AnswerB

Recall measures the fraction of actual fraudulent transactions that the model correctly identifies. Since the model never predicts fraud, recall is 0%, which clearly shows the failure.

Why this answer

Recall (also known as sensitivity) measures the proportion of actual positive cases (fraudulent transactions) that were correctly identified by the model. In this scenario, the model predicts all transactions as legitimate, so it correctly identifies 0 out of the 0.1% fraudulent transactions, yielding a recall of 0%. This directly reveals the model's complete failure to catch fraud, despite the high accuracy.

Exam trap

The trap here is that candidates see the high accuracy (99.9%) and assume the model is performing well, failing to recognize that accuracy is meaningless in extreme class imbalance and that recall is the metric designed to evaluate the model's ability to find the rare positive class.

How to eliminate wrong answers

Option A is wrong because accuracy is a misleading metric in highly imbalanced datasets; here it is 99.9% simply because the model correctly classifies all legitimate transactions, but it hides the fact that no fraud is detected. Option C is wrong because precision measures the proportion of predicted positive cases that are actually positive; since the model never predicts any positive cases, precision is undefined (or 0/0), and it does not directly expose the failure to identify fraud. Option D is wrong because the F1 score is the harmonic mean of precision and recall; with recall at 0%, the F1 score will also be 0%, but it is a composite metric that obscures the specific failure mode—recall alone is the direct and simplest indicator of the model's inability to catch fraud.

Practice this question →

71

MCQeasy

A retail company has historical data about customers, including age, purchase history, and whether they have churned (yes/no). They want to train a model that predicts if a new customer will churn. Which type of machine learning should they use?

A.Supervised regression

B.Supervised classification

C.Unsupervised clustering

D.Reinforcement learning

AnswerB

Classification predicts a discrete category. Churn prediction is a classic binary classification problem.

Why this answer

The goal is to predict a categorical outcome (churn: yes/no) from historical labeled data. Supervised classification algorithms, such as logistic regression or decision trees, learn from input features (age, purchase history) and the target label (churn status) to assign new customers to one of the discrete classes. This directly matches the requirement for a binary classification model.

Exam trap

The trap here is that candidates often confuse regression with classification when the output is a binary yes/no, mistakenly thinking any numeric prediction task is regression, but classification is required for discrete categorical outcomes.

How to eliminate wrong answers

Option A is wrong because supervised regression predicts a continuous numeric value (e.g., revenue amount), not a discrete category like churn yes/no. Option C is wrong because unsupervised clustering groups data without using labeled outcomes, so it cannot predict a specific target like churn status. Option D is wrong because reinforcement learning learns optimal actions through trial-and-error interactions with an environment, not from static historical labeled data for prediction.

Practice this question →

72

MCQmedium

What is an ML pipeline in Azure Machine Learning?

A.The networking infrastructure connecting Azure ML compute nodes

B.A workflow of connected steps for automating the end-to-end ML process

C.A data streaming service for real-time model predictions

D.A GitHub repository for storing ML model code

AnswerB

ML pipelines orchestrate and automate ML steps (data prep, training, evaluation) enabling reusable, schedulable workflows.

Why this answer

Option B is correct because an ML pipeline in Azure Machine Learning is a workflow of connected steps that automates the end-to-end machine learning process, including data preparation, training, evaluation, and deployment. This enables reproducibility, reusability, and orchestration of complex ML tasks without manual intervention.

Exam trap

The trap here is that candidates confuse an ML pipeline with the underlying compute infrastructure (Option A) or with real-time serving services (Option C), because Azure ML uses many interconnected services, but the pipeline is specifically the workflow definition, not the hardware or streaming layer.

How to eliminate wrong answers

Option A is wrong because it describes the networking infrastructure (e.g., virtual networks, compute clusters) that supports Azure ML, not the pipeline itself. Option C is wrong because it describes a data streaming service like Azure Stream Analytics or Event Hubs for real-time predictions, not an ML pipeline which is a batch-oriented workflow. Option D is wrong because a GitHub repository is a version control system for code, whereas an ML pipeline is a defined sequence of steps within Azure ML, often stored as a YAML or Python-based definition.

Practice this question →

73

MCQmedium

A data scientist is training a model to classify customer reviews as positive, negative, or neutral. The dataset contains 10,000 reviews, but only 500 of them are negative. The data scientist wants to ensure the model performs well on the minority class (negative reviews). Which technique should the data scientist consider to address the class imbalance?

A.Increase the learning rate

B.Add more features to the model

C.Use a resampling technique like SMOTE or random oversampling of the minority class

D.Use L1 regularization (Lasso)

AnswerC

Resampling techniques balance the class distribution by creating synthetic samples (SMOTE) or duplicating existing minority samples (oversampling). This gives the minority class more influence during training, improving model recall for that class.

Why this answer

Option C is correct because resampling techniques like SMOTE (Synthetic Minority Oversampling Technique) or random oversampling directly address class imbalance by generating synthetic samples or duplicating existing samples from the minority class (negative reviews). This balances the training dataset, preventing the model from being biased toward the majority class (positive/neutral reviews) and improving recall for the minority class.

Exam trap

The trap here is that candidates may confuse regularization or feature engineering techniques with data-level imbalance solutions, or assume that simply increasing the learning rate can compensate for a skewed dataset.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate does not address class imbalance; it controls the step size during gradient descent and can cause the model to overshoot minima or fail to converge. Option B is wrong because adding more features does not correct the skewed distribution of classes; it may even introduce noise or overfitting without balancing the dataset. Option D is wrong because L1 regularization (Lasso) is used for feature selection and preventing overfitting by penalizing the absolute size of coefficients, not for handling imbalanced class distributions.

Practice this question →

74

MCQmedium

What is 'Azure Machine Learning environments' and why are they important for reproducibility?

A.The physical Azure data centre locations where model training takes place

B.Versioned software configurations (Python packages, dependencies) ensuring reproducible ML runs

C.Development, staging, and production deployment targets for Azure ML models

D.The security boundaries that isolate different ML projects in the same Azure subscription

AnswerB

Environments define the exact software stack — ensuring training is reproducible regardless of who runs it or when.

Why this answer

Azure Machine Learning environments are versioned software configurations that specify the Python packages, dependencies, and runtime settings needed to execute a training script. They are critical for reproducibility because they ensure that every run uses the exact same software stack, eliminating variability from package version mismatches or missing dependencies.

Exam trap

The trap here is that candidates confuse 'environments' with deployment targets or physical locations, but the AI-900 exam specifically tests that environments are versioned software configurations for reproducibility.

How to eliminate wrong answers

Option A is wrong because Azure Machine Learning environments are not physical data center locations; those are Azure regions, not versioned software configurations. Option C is wrong because development, staging, and production deployment targets are referred to as compute targets or endpoints, not environments. Option D is wrong because security boundaries that isolate projects are managed via workspaces, virtual networks, or RBAC, not environments.

Practice this question →

75

MCQeasy

A data scientist has a dataset containing thousands of labeled images of cats and dogs. The data scientist wants to train a model that can automatically classify new unlabeled images as either 'cat' or 'dog'. Which type of machine learning should the data scientist use?

A.Supervised learning

B.Unsupervised learning

C.Reinforcement learning

D.Semi-supervised learning

AnswerA

Correct because the dataset contains labeled images, which is the hallmark of supervised learning. The model learns from the labeled data to predict labels for new data.

Why this answer

The correct answer is A, supervised learning, because the dataset contains labeled images (each image is tagged as 'cat' or 'dog'), and the goal is to train a model to predict the label for new unlabeled images. Supervised learning algorithms, such as convolutional neural networks (CNNs), learn a mapping from input features (pixel values) to output labels using the provided ground-truth labels, enabling accurate classification on unseen data.

Exam trap

The trap here is that candidates may confuse 'semi-supervised learning' with 'supervised learning' when they see a large labeled dataset, but semi-supervised learning is only appropriate when labeled data is scarce, not when thousands of labeled examples are already available.

How to eliminate wrong answers

Option B (unsupervised learning) is wrong because it is used for finding hidden patterns or groupings in unlabeled data, such as clustering, and does not use labeled examples to predict a specific category like 'cat' or 'dog'. Option C (reinforcement learning) is wrong because it involves an agent learning to make sequential decisions by interacting with an environment and receiving rewards or penalties, which is not applicable to static image classification tasks. Option D (semi-supervised learning) is wrong because it combines a small amount of labeled data with a large amount of unlabeled data; here the dataset already contains thousands of labeled images, so there is no need to leverage unlabeled data, and the problem is fully supervised.

Practice this question →

Page 1 of 3 · 207 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Describe fundamental principles of machine learning on Azure questions.

Start 20-question session