CCNA Describe Fundamental Principles Of Machine Learning On Azure Questions — Page 2 of 3

MCQeasy

What is the 'mean absolute error' (MAE) metric used to evaluate in machine learning?

A.The average confidence percentage of classification predictions

B.The average absolute difference between regression model predictions and actual values

C.The proportion of model predictions that deviate from expected values by more than a threshold

D.How much the model's predictions differ from random chance

AnswerB

MAE = mean of |predicted - actual|. It measures average error magnitude in regression tasks — lower is better.

Why this answer

Mean Absolute Error (MAE) is a regression metric that calculates the average of the absolute differences between predicted and actual values. It measures how close predictions are to the true outcomes, with lower values indicating better model accuracy. In Azure Machine Learning, MAE is commonly used to evaluate regression models like linear regression or decision forests.

Exam trap

The trap here is that candidates confuse MAE with classification metrics like accuracy or confidence, or assume it involves thresholds, when in fact MAE is strictly a regression metric measuring average absolute error without any threshold or comparison to random chance.

How to eliminate wrong answers

Option A is wrong because MAE does not measure confidence percentages; classification confidence is typically evaluated using metrics like log loss or calibration curves. Option C is wrong because MAE averages all absolute errors without applying a threshold; metrics like 'accuracy within a tolerance' or 'pinball loss' handle threshold-based deviations. Option D is wrong because MAE compares predictions to actual values, not to random chance; comparing to random chance is done with metrics like R-squared or relative absolute error.

Practice this question →

MCQhard

A data scientist trains a binary classification model to predict loan defaults. The dataset contains 98% non-default cases and only 2% default cases. The model predicts 'non-default' for every instance, achieving 98% accuracy on the test set. Which metric would best reveal that the model fails to identify any actual defaults?

A.Recall for the default class

B.Precision for the default class

C.F1 score for the default class

D.Accuracy

AnswerA

Recall calculates the proportion of actual defaults that the model correctly identifies. Since no defaults are predicted, recall is 0, clearly exposing the failure.

Why this answer

Recall for the default class measures the proportion of actual default cases that the model correctly identifies. With the model predicting 'non-default' for every instance, recall for the default class is 0%, because it fails to capture any true positives. This directly reveals the model's inability to detect any actual defaults, despite the high overall accuracy.

Exam trap

The trap here is that candidates often choose accuracy (D) because it shows a high number, failing to recognize that class imbalance can make accuracy a poor indicator of model performance, especially for the minority class.

How to eliminate wrong answers

Option B is wrong because precision for the default class would be undefined (0/0) or 0% if the model never predicts default, but it does not directly show that the model misses all actual defaults—it focuses on the accuracy of positive predictions. Option C is wrong because the F1 score is the harmonic mean of precision and recall; if recall is 0%, the F1 score is also 0%, but it is a composite metric that does not isolate the failure to identify defaults as clearly as recall. Option D is wrong because accuracy is 98% due to the class imbalance, masking the model's complete failure on the minority class; accuracy alone cannot reveal the lack of true positive predictions.

Practice this question →

Matchingmedium

Match each Azure AI service to its associated API or SDK.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Analyze images and extract information

Understand and analyze text

Convert speech to text and vice versa

Translate text between languages

Access GPT-4, DALL-E, and other models

Why these pairings

Each service provides specific APIs for AI tasks.

Practice this question →

MCQeasy

An e-commerce company has a dataset of customer purchase histories with no predefined categories. The data analyst wants to identify natural groupings of customers based on their purchasing behavior to target marketing campaigns. Which type of machine learning should the analyst use?

A.Regression

B.Classification

C.Clustering

D.Reinforcement learning

AnswerC

Clustering is an unsupervised method that groups unlabeled data into clusters based on similarities, ideal for discovering customer segments.

Why this answer

Clustering is the correct choice because it is an unsupervised learning technique used to discover inherent groupings in data without predefined labels. In this scenario, the analyst wants to identify natural customer segments based on purchase behavior, which aligns perfectly with clustering algorithms like K-Means or DBSCAN that partition data into clusters of similar patterns.

Exam trap

The trap here is that candidates often confuse clustering with classification because both involve grouping, but clustering is unsupervised (no labels) while classification is supervised (requires labeled data).

How to eliminate wrong answers

Option A is wrong because regression is a supervised learning technique used for predicting continuous numerical values (e.g., sales amount), not for discovering natural groupings. Option B is wrong because classification is a supervised learning method that requires labeled training data to assign predefined categories, whereas the dataset has no predefined categories. Option D is wrong because reinforcement learning involves an agent learning optimal actions through trial-and-error interactions with an environment to maximize cumulative reward, which is unrelated to grouping customers based on historical data.

Practice this question →

MCQhard

A bank uses a machine learning model to predict credit card fraud. The model's output is a probability score. The business wants to minimize the number of false positives (legitimate transactions incorrectly flagged as fraud) because these cause customer dissatisfaction. At the same time, they must also catch most fraudulent transactions. Which metric should the bank optimize to balance these two goals?

A.A: Accuracy

B.B: Precision

C.C: Recall

D.D: F1 score

AnswerD

Correct: F1 balances precision and recall, addressing both goals.

Why this answer

The F1 score is the harmonic mean of precision and recall, making it the ideal metric when a balance between minimizing false positives (precision) and catching most fraudulent transactions (recall) is required. In this credit card fraud detection scenario, optimizing F1 ensures the model reduces customer dissatisfaction from false positives while still maintaining high detection of actual fraud.

Exam trap

The trap here is that candidates often choose precision or recall alone, not realizing that the F1 score is specifically designed to balance both metrics when the business requires minimizing false positives while still catching most true positives.

How to eliminate wrong answers

Option A is wrong because accuracy measures overall correct predictions (true positives + true negatives divided by total predictions) and can be misleading in imbalanced datasets like fraud detection, where legitimate transactions vastly outnumber fraudulent ones; a model that always predicts 'not fraud' could achieve high accuracy but fail to catch any fraud. Option B is wrong because precision focuses solely on the proportion of flagged transactions that are actually fraudulent (true positives / (true positives + false positives)), which minimizes false positives but does not account for missed fraudulent transactions (false negatives), potentially allowing many frauds to go undetected. Option C is wrong because recall (sensitivity) measures the proportion of actual fraudulent transactions correctly identified (true positives / (true positives + false negatives)), which prioritizes catching fraud but can lead to a high number of false positives, directly conflicting with the business goal of minimizing customer dissatisfaction.

Practice this question →

MCQmedium

A data scientist has a dataset with 100 features and 10,000 samples. They want to reduce the number of features while retaining as much variance as possible, to improve model training speed and reduce overfitting. Which technique should they use?

A.Feature scaling

B.Principal Component Analysis (PCA)

C.Regularization

D.Cross-validation

AnswerB

PCA reduces the dimensionality by projecting data onto principal components, retaining the most variance.

Why this answer

Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique that transforms the original features into a new set of orthogonal components, ordered by the amount of variance they capture. By selecting only the top principal components, the data scientist can significantly reduce the feature count (e.g., from 100 to 20) while retaining the majority of the dataset's variance, which directly improves model training speed and reduces overfitting.

Exam trap

The trap here is that candidates often confuse regularization (which reduces overfitting by shrinking coefficients) with dimensionality reduction, or they think feature scaling alone can reduce feature count, when PCA is the correct technique for explicitly reducing the number of features while preserving variance.

How to eliminate wrong answers

Option A is wrong because feature scaling (e.g., standardization or normalization) adjusts the range of feature values but does not reduce the number of features; it is often a preprocessing step before applying PCA, not a dimensionality reduction technique itself. Option C is wrong because regularization (e.g., L1 or L2) penalizes model coefficients to prevent overfitting but does not reduce the number of features in the dataset; it works during model training, not as a preprocessing step. Option D is wrong because cross-validation is a model evaluation technique used to assess generalization performance by splitting data into training and validation folds; it does not reduce feature count or variance retention.

Practice this question →

MCQmedium

What is 'ONNX' and why is it relevant to Azure AI?

A.An Azure-specific machine learning programming language

B.An open model interchange format enabling models to move between frameworks and edge deployments

C.A database for storing machine learning model training data

D.A Microsoft cloud service for distributed model training

AnswerB

ONNX is the standard format for model portability — trained once, deploy anywhere (cloud, edge, different runtimes).

Why this answer

ONNX (Open Neural Network Exchange) is an open-source model interchange format that allows machine learning models to be transferred between different frameworks (e.g., PyTorch, TensorFlow, scikit-learn) and deployed across various environments, including edge devices. In Azure AI, ONNX is relevant because it enables interoperability and portability, allowing models trained in one framework to be optimized and run efficiently using Azure's ONNX Runtime, which accelerates inference on both cloud and edge hardware.

Exam trap

The trap here is that candidates confuse ONNX with a proprietary Azure service or a programming language, when in fact it is an open, cross-platform model interchange format designed for portability and not tied to any single cloud provider.

How to eliminate wrong answers

Option A is wrong because ONNX is not a programming language; it is a serialized model format, and Azure-specific ML languages include languages like Python or R, not ONNX. Option C is wrong because ONNX does not store training data; it stores model architecture and weights, while databases like Azure SQL or Cosmos DB are used for data storage. Option D is wrong because ONNX is not a cloud service for distributed training; Azure offers services like Azure Machine Learning for distributed training, while ONNX is purely an interchange format.

Practice this question →

MCQmedium

A data scientist trains a regression model to predict house prices using features like bedrooms, square footage, and location. The model achieves a low error on the training data but performs significantly worse when used to predict prices in a new city with different property characteristics. Which concept best explains this poor performance?

A.Underfitting

B.Overfitting

C.Data leakage

D.Bias-variance tradeoff

AnswerB

Overfitting means the model captures noise and specifics of the training data, leading to poor generalization to new data, especially from a different distribution.

Why this answer

The model performs well on training data but poorly on new data from a different city, which is the classic symptom of overfitting. Overfitting occurs when a model learns noise and specific patterns in the training data that do not generalize to unseen data, especially when the new data has different characteristics (e.g., different property market dynamics). In this case, the model has memorized the training city's price patterns rather than learning generalizable relationships.

Exam trap

The trap here is that candidates may confuse overfitting with the bias-variance tradeoff, but the question specifically asks for the concept that best explains the poor performance on new data, which is overfitting, not the general tradeoff.

How to eliminate wrong answers

Option A is wrong because underfitting would result in high error on both training and test data, not low training error and high test error. Option C is wrong because data leakage involves using information from the test set during training, which would artificially inflate training performance but is not described here; the issue is generalization to a new city, not a data contamination problem. Option D is wrong because while bias-variance tradeoff is a related concept, it does not specifically name the phenomenon; overfitting is the direct explanation for low training error and high test error on new data.

Practice this question →

MCQmedium

What is 'semi-supervised learning' and when is it useful?

A.Training a model that is partially supervised by one human and partially by another

B.Using small amounts of labelled data alongside large amounts of unlabelled data to train a model

C.A model that receives feedback from users during deployment to improve over time

D.Training that automatically stops halfway through and resumes the next day

AnswerB

Semi-supervised learning leverages unlabelled data (cheap to collect) with scarce labels — useful when annotation is expensive.

Why this answer

Semi-supervised learning combines a small set of labeled data with a large set of unlabeled data to train a model. This approach is useful when labeling data is expensive or time-consuming, but large volumes of unlabeled data are readily available. The model first learns patterns from the labeled subset, then propagates those labels to the unlabeled data, iteratively improving its accuracy.

Exam trap

The trap here is that candidates confuse semi-supervised learning with active learning or human-in-the-loop workflows, but the key differentiator is the use of both labeled and unlabeled data in the training process, not the number of humans or feedback loops.

How to eliminate wrong answers

Option A is wrong because it describes a human workflow (multiple labelers), not a machine learning paradigm; semi-supervised learning refers to the data labeling strategy, not the number of human supervisors. Option C is wrong because it describes online learning or reinforcement learning, where the model updates from live user feedback, not the semi-supervised combination of labeled and unlabeled data. Option D is wrong because it describes checkpointing or resumable training, which is a fault-tolerance mechanism, not a learning paradigm.

Practice this question →

MCQmedium

What is 'overfitting' in machine learning and how does Azure ML help prevent it?

A.When a model is trained on too much data and becomes too accurate

B.When a model learns training data too specifically and fails to generalise to new data

C.When a model's predictions exceed the acceptable numerical range

D.When Azure ML runs training for longer than the allocated compute budget

AnswerB

Overfitting means high training accuracy but poor test accuracy — the model memorised noise instead of learning general patterns.

Why this answer

Overfitting occurs when a machine learning model learns the training data too precisely, including noise and outliers, resulting in poor performance on unseen data. Azure ML helps prevent overfitting through automated machine learning (AutoML) which applies regularization, cross-validation, and early stopping techniques, as well as by enabling easy configuration of train/test splits and hyperparameter tuning.

Exam trap

The trap here is that candidates confuse overfitting with high accuracy or large datasets, but the key is that overfitting is about poor generalization, not just high performance on training data.

How to eliminate wrong answers

Option A is wrong because overfitting is not caused by training on too much data; in fact, more data often reduces overfitting. Option B is correct as described. Option C is wrong because exceeding an acceptable numerical range describes prediction errors or data normalization issues, not overfitting.

Option D is wrong because exceeding a compute budget is a resource constraint, not a machine learning concept related to model generalization.

Practice this question →

MCQmedium

What is a training job in Azure Machine Learning?

A.A batch prediction job that scores new data against a deployed model

B.A single execution of a training script that produces a trained model and tracked metrics

C.A scheduled report on model performance in production

D.A data preprocessing pipeline that cleans raw datasets

AnswerB

A training job runs training code on Azure ML compute, tracking metrics/logs and producing model artifacts for evaluation.

Why this answer

A training job in Azure Machine Learning is a single execution of a training script that runs on a specified compute target, producing a trained model and logging metrics, parameters, and artifacts. This is the fundamental unit of model training in Azure ML, distinct from batch inference or data preprocessing.

Exam trap

The trap here is confusing the training job with other Azure ML workflow steps like batch inference, monitoring, or data preprocessing, which are separate job types with distinct purposes and outputs.

How to eliminate wrong answers

Option A is wrong because a batch prediction job that scores new data against a deployed model is an inference or scoring job, not a training job. Option C is wrong because a scheduled report on model performance in production is a monitoring or evaluation task, not a training job. Option D is wrong because a data preprocessing pipeline that cleans raw datasets is a data preparation step, which may precede training but is not itself a training job.

Practice this question →

MCQeasy

What is 'model deployment' in Azure Machine Learning?

A.Uploading training data to Azure Blob Storage for model training

B.Making a trained model available as a callable endpoint for applications to use

C.Distributing the training job across multiple compute nodes

D.Publishing a model to the Azure Marketplace for other organisations to purchase

AnswerB

Deployment packages the trained model into a REST endpoint — real-time for instant predictions or batch for large-scale scoring.

Why this answer

Model deployment in Azure Machine Learning is the process of taking a trained model and hosting it as a web service endpoint (e.g., via Azure Kubernetes Service or Azure Container Instances) so that applications can send data to it and receive predictions in real time or batch mode. This makes the model operational and accessible for inference, which is the core purpose of deployment.

Exam trap

The trap here is that candidates confuse 'model deployment' with other stages of the ML lifecycle, such as data preparation (Option A) or training optimization (Option C), because they focus on the word 'model' rather than the specific action of making it available for inference.

How to eliminate wrong answers

Option A is wrong because uploading training data to Azure Blob Storage is a data ingestion step, not model deployment; deployment involves hosting the trained model, not storing raw data. Option C is wrong because distributing training across multiple compute nodes is a parallel training or distributed computing technique, not deployment; deployment focuses on serving the model after training. Option D is wrong because publishing a model to the Azure Marketplace is a commercial distribution action, not a technical deployment; Azure Machine Learning deployment creates a callable endpoint, not a marketplace listing.

Practice this question →

MCQmedium

What is a neural network?

A.A computer network for distributed AI training across multiple servers

B.A machine learning model architecture with layers of interconnected nodes that learn representations

C.A database for storing trained ML models

D.A rule-based expert system for decision making

AnswerB

Neural networks use layers of weighted connections to learn hierarchical data representations for complex pattern recognition.

Why this answer

A neural network is a machine learning model architecture composed of layers of interconnected nodes (neurons) that process input data through weighted connections and activation functions. These layers learn hierarchical representations of data, enabling the model to capture complex patterns and relationships without explicit rule-based programming. This aligns with option B as the correct definition.

Exam trap

The trap here is that candidates confuse the term 'network' in 'neural network' with a computer network or distributed system, leading them to incorrectly select option A.

How to eliminate wrong answers

Option A is wrong because a neural network is not a computer network for distributed AI training; distributed training across multiple servers is a technique (e.g., using Azure Machine Learning with Horovod or PyTorch DistributedDataParallel), not the definition of a neural network itself. Option C is wrong because a neural network is a model architecture, not a database; storing trained ML models is done in model registries like Azure Machine Learning model registry or container registries. Option D is wrong because a neural network learns from data via backpropagation and gradient descent, unlike a rule-based expert system that relies on hardcoded if-then rules and does not learn representations.

Practice this question →

MCQmedium

What is 'ROC-AUC' and when is it a better metric than accuracy for classification?

A.ROC-AUC is always better than accuracy regardless of the use case

B.A threshold-agnostic metric that measures discrimination ability — better than accuracy for imbalanced classes

C.A metric specifically for measuring multi-class classification across more than two classes

D.An evaluation metric only applicable to models trained on Azure Machine Learning

AnswerB

ROC-AUC evaluates performance across all thresholds — not mislead by class imbalance as accuracy can be.

Why this answer

ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) is a threshold-agnostic metric that measures a model's ability to discriminate between positive and negative classes across all possible classification thresholds. It is a better metric than accuracy when dealing with imbalanced classes because accuracy can be misleadingly high if the model simply predicts the majority class, whereas ROC-AUC evaluates the trade-off between true positive rate and false positive rate independently of class distribution.

Exam trap

The trap here is that candidates often assume accuracy is always the best metric, failing to recognize that ROC-AUC is specifically designed to evaluate model performance independently of class imbalance, which is a common scenario tested in AI-900.

How to eliminate wrong answers

Option A is wrong because ROC-AUC is not always better than accuracy; for balanced datasets with equal misclassification costs, accuracy is often simpler and more interpretable. Option C is wrong because ROC-AUC is fundamentally a binary classification metric; while extensions like macro-averaged or micro-averaged ROC-AUC exist for multi-class problems, the standard definition applies to two classes only. Option D is wrong because ROC-AUC is a general machine learning evaluation metric that can be computed for any binary classifier, regardless of the platform (Azure, AWS, on-premises, etc.).

Practice this question →

MCQmedium

A hospital deploys a machine learning model to screen patients for a rare disease. Only 0.1% of patients actually have the disease. The model correctly identifies most positive cases but also flags many healthy patients as potentially having the disease. The hospital wants to minimize the number of healthy patients who are incorrectly told they might have the disease. Which metric should the model optimize?

A.Recall

B.Precision

C.F1 score

D.Accuracy

AnswerB

Precision measures the accuracy of positive predictions. Maximizing precision reduces false positives, directly addressing the goal of minimizing unnecessary anxiety for healthy patients.

Why this answer

Precision measures the proportion of positive identifications that are actually correct. In this scenario, the hospital wants to minimize false positives (healthy patients incorrectly told they might have the disease). Optimizing precision directly reduces false positives, which is the stated goal.

Exam trap

The trap here is that candidates often default to 'Accuracy' for imbalanced datasets or 'Recall' for medical screening, but the question explicitly asks to minimize false positives, which directly points to Precision as the correct metric.

How to eliminate wrong answers

Option A (Recall) is wrong because recall measures the proportion of actual positives correctly identified; optimizing recall would reduce false negatives (missing diseased patients), but the hospital's priority is minimizing false positives, not false negatives. Option C (F1 score) is wrong because F1 is the harmonic mean of precision and recall; while it balances both, it does not specifically minimize false positives—it trades off between precision and recall, which may still allow many false positives if recall is prioritized. Option D (Accuracy) is wrong because accuracy measures overall correct predictions; with a highly imbalanced dataset (0.1% disease prevalence), a model that always predicts 'no disease' would achieve 99.9% accuracy but would fail to identify any positive cases and would not address the false positive minimization goal.

Practice this question →

MCQeasy

A data scientist is preparing a dataset to train a model that predicts customer churn. The dataset includes a column 'CustomerID' which is a unique identifier for each customer. Should the data scientist include the 'CustomerID' column as a feature in the training data?

A.Yes, because it uniquely identifies each customer and helps the model differentiate them.

B.No, because the CustomerID is a random unique identifier with no predictive power for churn.

C.Yes, because the model can learn patterns from the numeric values.

D.No, because the CustomerID column contains too many missing values.

AnswerB

Correct. Unique identifiers are arbitrary and do not correlate with the outcome. They should be removed to avoid overfitting.

Why this answer

Option B is correct because CustomerID is a unique identifier that does not contain any meaningful pattern or relationship with the target variable (churn). Including such a column would introduce noise and risk overfitting, as the model could memorize each ID rather than learning generalizable patterns. In Azure Machine Learning, features should be predictive attributes, not arbitrary labels.

Exam trap

The trap here is that candidates may think unique identifiers are useful for differentiation, but the exam tests the principle that features must have predictive power and that arbitrary IDs introduce noise rather than signal.

How to eliminate wrong answers

Option A is wrong because including a unique identifier like CustomerID does not help the model differentiate customers in a predictive sense; it simply assigns a distinct value per row, which the model could treat as a categorical feature with no generalization. Option C is wrong because even if CustomerID is numeric, its values are arbitrary and have no ordinal or relational meaning to churn; the model would learn spurious correlations. Option D is wrong because the question does not state that the CustomerID column contains missing values, and the core reason for exclusion is lack of predictive power, not data quality issues.

Practice this question →

MCQmedium

A data scientist trains a machine learning model to predict house prices based on features like square footage, number of bedrooms, and location. The model achieves a very low error on the training data but performs poorly on a held-out test set. Which term best describes this situation?

A.Underfitting

B.Overfitting

C.High bias

D.High variance

AnswerB

Overfitting means the model has memorized the training data and does not generalize, leading to excellent training metrics but poor test performance.

Why this answer

The model performs exceptionally well on training data but poorly on test data, which is the classic symptom of overfitting. Overfitting occurs when the model learns noise and specific patterns in the training set rather than generalizing to unseen data. In Azure Machine Learning, this can be detected by monitoring the gap between training and validation metrics, and mitigated using techniques like regularization or early stopping.

Exam trap

The trap here is that candidates confuse 'high variance' (the cause) with 'overfitting' (the observed behavior), but the question asks for the term that best describes the situation, not the underlying statistical property.

How to eliminate wrong answers

Option A is wrong because underfitting describes a model that performs poorly on both training and test data due to insufficient learning capacity, not the high training accuracy seen here. Option C is wrong because high bias typically leads to underfitting, where the model oversimplifies and misses important patterns, resulting in high error on both sets. Option D is wrong because high variance is a cause of overfitting, but the term 'overfitting' itself is the correct descriptor for the situation where the model fits training data too closely and fails on test data.

Practice this question →

MCQmedium

Which metric is MOST appropriate for evaluating a regression model's performance?

A.Accuracy

B.Root Mean Squared Error (RMSE)

C.Precision and recall

D.AUC-ROC curve

AnswerB

RMSE measures the average magnitude of prediction errors for regression — lower RMSE means predictions are closer to actual values.

Why this answer

Root Mean Squared Error (RMSE) is the most appropriate metric for evaluating a regression model because it measures the average magnitude of prediction errors in the same units as the target variable, penalizing larger errors more heavily due to squaring. In Azure Machine Learning, regression models like Linear Regression or Decision Forest Regression are evaluated using RMSE to quantify how well the predicted continuous values match actual values.

Exam trap

The trap here is that candidates often confuse regression and classification metrics, mistakenly applying Accuracy (a classification metric) to regression problems because they think it measures 'correctness' in a general sense, without understanding that regression requires error-based metrics like RMSE.

How to eliminate wrong answers

Option A is wrong because Accuracy is a classification metric that measures the proportion of correct predictions out of total predictions, and it is not suitable for regression tasks where the output is a continuous value rather than a discrete class. Option C is wrong because Precision and recall are classification metrics used to evaluate the performance of binary or multiclass classifiers, focusing on true positives and false positives/negatives, not continuous predictions. Option D is wrong because AUC-ROC curve is a classification metric that plots the true positive rate against the false positive rate at various threshold settings, and it does not apply to regression models which predict continuous outcomes.

Practice this question →

MCQmedium

A data scientist is building a binary classification model to predict fraudulent credit card transactions. The dataset is highly imbalanced: only 1% of transactions are fraudulent. The cost of a false negative is very high because missing a fraudulent transaction can lead to significant financial loss. Which evaluation metric should the data scientist prioritize to minimize false negatives?

A.Accuracy

B.Precision

C.Recall

D.F1 Score

AnswerC

Recall measures the fraction of actual positive cases that were correctly predicted. Prioritizing recall helps minimize false negatives, which is the stated goal.

Why this answer

Recall (also known as sensitivity or true positive rate) measures the proportion of actual positive cases (fraudulent transactions) that are correctly identified. In this highly imbalanced scenario where missing a fraud (false negative) is extremely costly, maximizing recall ensures that the model catches as many fraudulent transactions as possible, even if it means some false positives occur. This directly aligns with the goal of minimizing false negatives.

Exam trap

The trap here is that candidates often choose Accuracy because it is the most intuitive metric, failing to recognize that in imbalanced datasets with high false-negative cost, recall is the critical measure to minimize missed positives.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading in imbalanced datasets; a model that predicts 'not fraudulent' for all transactions would achieve 99% accuracy but fail to catch any fraud, resulting in maximum false negatives. Option B is wrong because precision focuses on the proportion of predicted positives that are actually positive, which does not directly address false negatives; optimizing precision can reduce false positives but may increase false negatives by being overly conservative. Option D is wrong because the F1 Score is the harmonic mean of precision and recall, balancing both; while useful, it does not prioritize minimizing false negatives over false positives, which is the specific requirement here.

Practice this question →

MCQmedium

A data scientist wants to train a machine learning model to predict the exact market price of a house based on features such as square footage, number of bedrooms, and location. Which type of machine learning task should be used?

A.Classification

B.Regression

C.Clustering

D.Anomaly Detection

AnswerB

Regression predicts a continuous numeric value, which is exactly what is needed for predicting house price.

Why this answer

Predicting the exact market price of a house is a regression task because the target variable (price) is a continuous numeric value. Regression algorithms, such as linear regression or decision tree regression, learn the relationship between input features (e.g., square footage, bedrooms, location) and a continuous output. In Azure Machine Learning, you would select a regression model from the designer or AutoML to solve this problem.

Exam trap

The trap here is that candidates confuse regression with classification because both involve supervised learning, but regression outputs a continuous number while classification outputs a discrete label.

How to eliminate wrong answers

Option A is wrong because classification predicts discrete categorical labels (e.g., 'expensive' or 'cheap'), not a continuous price value. Option C is wrong because clustering groups unlabeled data into clusters based on similarity, without a predefined target variable to predict. Option D is wrong because anomaly detection identifies rare or unusual data points (e.g., fraudulent transactions), not the exact value of a normal house price.

Practice this question →

MCQmedium

A data scientist trains a binary classification model to detect fraudulent credit card transactions. The dataset contains 99.5% legitimate transactions and 0.5% fraudulent transactions. The model predicts every transaction as legitimate and achieves 99.5% accuracy on the test set. Which metric would best reveal that the model is failing to identify any fraudulent transactions?

A.Precision

B.Recall

C.F1 score

D.Mean Absolute Error (MAE)

AnswerB

Recall measures the fraction of actual fraudulent transactions that are correctly detected. The model catches none, so recall is 0, clearly showing the model's failure.

Why this answer

Recall (also known as sensitivity) measures the proportion of actual positive cases correctly identified by the model. In this scenario, the model predicts all transactions as legitimate, so it correctly identifies zero fraudulent transactions, giving a recall of 0%. Accuracy alone is misleading because the dataset is highly imbalanced (99.5% legitimate, 0.5% fraudulent), and a 99.5% accuracy can be achieved by simply predicting the majority class.

Recall directly reveals the model's failure to detect any fraud.

Exam trap

The trap here is that candidates see 99.5% accuracy and assume the model is performing well, failing to recognize that accuracy is a poor metric for imbalanced datasets and that recall specifically measures the model's ability to find positive cases (fraud).

How to eliminate wrong answers

Option A is wrong because precision measures the proportion of predicted positive cases that are actually positive; if the model predicts no positives, precision is undefined (division by zero) or 0/0, but it does not directly show the failure to find actual fraud. Option C is wrong because the F1 score is the harmonic mean of precision and recall; if recall is 0, the F1 score is 0, but the F1 score is a combined metric and does not isolate the failure to detect fraud as directly as recall does. Option D is wrong because Mean Absolute Error (MAE) is a regression metric that measures average absolute error between predicted and actual continuous values; it is not applicable to binary classification tasks like fraud detection.

Practice this question →

MCQmedium

A data scientist is developing a classification model to detect fraudulent transactions. The dataset is split into training and test sets. The data scientist repeatedly tunes the model's hyperparameters and evaluates performance on the test set until the test accuracy reaches 95%. However, when the model is deployed on new, unseen data, its accuracy drops to 70%. Which concept best explains this performance degradation?

A.Overfitting to the training data

B.Data leakage from the training set to the test set

C.Overfitting to the test set

D.Underfitting the training data

AnswerC

Correct. The model's hyperparameters were tuned based on test set performance, causing the model to perform well on that specific test set but poorly on new data. This is overfitting to the test set.

Why this answer

Option C is correct because the data scientist repeatedly tuned hyperparameters based on test set performance, effectively using the test set as part of the training process. This causes the model to become specialized to the test set's specific patterns and noise, so it fails to generalize to new, unseen data. This phenomenon is known as overfitting to the test set, where the test set no longer provides an unbiased estimate of real-world performance.

Exam trap

The trap here is that candidates confuse overfitting to the training data with overfitting to the test set, failing to recognize that repeatedly evaluating on the test set can cause the model to memorize test set patterns rather than generalize.

How to eliminate wrong answers

Option A is wrong because overfitting to the training data would show high training accuracy but lower test accuracy during evaluation, not a drop only after deployment. Option B is wrong because data leakage would typically inflate test accuracy during tuning, but the scenario describes a drop from 95% to 70% on new data, which is consistent with test set overfitting, not leakage. Option D is wrong because underfitting would result in poor performance on both training and test sets, not a high test accuracy of 95%.

Practice this question →

MCQeasy

What is an endpoint in Azure Machine Learning?

A.A visual dashboard for monitoring model performance

B.A deployed ML model accessible via REST API for making predictions

C.The final training step that produces a saved model file

D.A data storage location for training datasets

AnswerB

An endpoint exposes a trained ML model as a REST API service that applications call to get predictions on new data.

Why this answer

In Azure Machine Learning, an endpoint is a REST API endpoint that exposes a deployed machine learning model for real-time inference. When you deploy a model to an Azure Kubernetes Service (AKS) or Azure Container Instances (ACI) cluster, Azure ML creates a scoring URI that clients can call with HTTP POST requests containing input data, and the endpoint returns predictions. This enables applications to integrate model predictions via standard HTTPS protocol.

Exam trap

The trap here is that candidates confuse the term 'endpoint' with the final step of training or with data storage, because in other Azure services 'endpoint' can refer to a storage endpoint or a training job output, but in Azure ML it specifically means the deployed model's REST API for inference.

How to eliminate wrong answers

Option A is wrong because a visual dashboard for monitoring model performance is called Azure ML Studio's monitoring dashboard or Application Insights integration, not an endpoint. Option C is wrong because the final training step that produces a saved model file is the model registration or training run output, not an endpoint; endpoints are created after deployment. Option D is wrong because a data storage location for training datasets is a datastore (e.g., Azure Blob Storage or Azure Data Lake), not an endpoint.

Practice this question →

MCQmedium

What does 'model accuracy' measure in machine learning classification?

A.How quickly the model makes predictions

B.The proportion of correct predictions out of total predictions

C.How much memory the model uses during inference

D.The number of training examples used to build the model

AnswerB

Accuracy = correct predictions / total predictions. It measures overall classification correctness.

Why this answer

Model accuracy in classification measures the ratio of correctly predicted instances to the total number of predictions made. It is calculated as (True Positives + True Negatives) / (Total Predictions). This metric is fundamental in evaluating classification models on Azure Machine Learning, where it is reported in the model evaluation metrics.

Exam trap

The trap here is that candidates often confuse model accuracy with performance metrics like speed or resource usage, or assume it relates to training data size, when in fact accuracy strictly measures the proportion of correct predictions.

How to eliminate wrong answers

Option A is wrong because model accuracy does not measure prediction speed; inference latency is measured in milliseconds or seconds, not as a proportion of correct predictions. Option C is wrong because memory usage during inference is a resource consumption metric, not a measure of prediction correctness; Azure monitors memory via metrics like 'Memory Usage' in container instances. Option D is wrong because the number of training examples is a dataset size characteristic, not a model performance metric; accuracy evaluates how well the model generalizes, not how much data was used.

Practice this question →

100

MCQmedium

A medical research team trains a model to detect a rare disease from lab results. The disease occurs in only 1% of patients. The model predicts 'no disease' for every patient and achieves 99% accuracy. Which metric best reveals that the model is failing to identify actual disease cases?

A.Accuracy

B.Precision

C.Recall

D.F1 score

AnswerC

Recall (sensitivity) is the ratio of true positives to all actual positives. Here recall is 0%, clearly showing the model fails to identify any disease cases.

Why this answer

Recall (sensitivity) measures the proportion of actual positive cases correctly identified by the model. With a 99% accuracy but zero true positives (since the model always predicts 'no disease'), recall is 0%, which directly reveals the model's failure to detect any actual disease cases. In Azure Machine Learning, recall is a key metric for imbalanced classification tasks, especially when missing a positive case has severe consequences.

Exam trap

The trap here is that candidates see 99% accuracy and assume the model is performing well, without recognizing that accuracy is meaningless when the class distribution is extremely skewed.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading in highly imbalanced datasets; a model that always predicts the majority class can achieve high accuracy (99%) while completely failing to detect the minority class (disease). Option B is wrong because precision measures the proportion of positive predictions that are actually correct; since the model never predicts positive, precision is undefined (division by zero) and does not reveal the failure to identify actual disease cases. Option D is wrong because the F1 score is the harmonic mean of precision and recall; with recall at 0%, the F1 score is also 0%, but recall alone more directly and intuitively exposes the model's inability to detect any positive cases.

Practice this question →

101

MCQeasy

A data scientist trains a model to predict house prices using features like number of bedrooms, square footage, and location. The model achieves a mean absolute error (MAE) of $5,000 on the training data but $25,000 on the test data. Which problem is the model most likely experiencing?

A.Underfitting

B.Overfitting

C.Multicollinearity

D.Class imbalance

AnswerB

Overfitting happens when the model learns the training data too well, including noise, resulting in high training accuracy but poor test accuracy. The large MAE difference confirms this.

Why this answer

The model performs well on training data (MAE $5,000) but poorly on test data (MAE $25,000), which is the classic symptom of overfitting. Overfitting occurs when the model learns noise and specific patterns in the training data too well, failing to generalize to unseen data. In Azure Machine Learning, this can be detected by comparing training vs. validation metrics and is often mitigated using regularization techniques or simplifying the model.

Exam trap

The trap here is that candidates confuse overfitting with underfitting because they see a low training error, but the key is the large gap between training and test error, which is the hallmark of overfitting, not underfitting.

How to eliminate wrong answers

Option A is wrong because underfitting would show poor performance on both training and test data (e.g., high MAE on both), not a large gap between them. Option C is wrong because multicollinearity refers to high correlation between independent variables, which can affect coefficient stability but does not directly cause a large train-test performance gap; it would typically inflate variance in predictions but not produce such a stark contrast. Option D is wrong because class imbalance is a problem for classification tasks (e.g., predicting categories), not for regression tasks like predicting house prices, and it would manifest as poor performance on minority classes, not a train-test MAE gap.

Practice this question →

102

MCQmedium

A data scientist trains a model to predict the exact number of cars that will cross a bridge each day for maintenance planning. The model uses historical traffic data as input. Which type of machine learning task is this?

A.Classification

B.Regression

C.Clustering

D.Reinforcement learning

AnswerB

Regression predicts a continuous numeric value, which matches the goal of estimating the exact number of cars.

Why this answer

The model predicts a continuous numerical value (the exact number of cars) based on historical traffic data. Regression is the correct machine learning task for predicting continuous numeric outcomes, such as counts, prices, or temperatures, making option B correct.

Exam trap

The trap here is that candidates confuse predicting a numeric count with classification, mistakenly thinking 'number of cars' is a category, but regression is required for any continuous numeric output.

How to eliminate wrong answers

Option A is wrong because classification predicts discrete categories or labels (e.g., 'high traffic' or 'low traffic'), not a continuous number. Option C is wrong because clustering groups unlabeled data into clusters based on similarity, without predicting a specific numeric value. Option D is wrong because reinforcement learning involves an agent learning optimal actions through rewards and penalties in an environment, not predicting a numeric output from historical data.

Practice this question →

103

MCQmedium

What is 'model evaluation' and what metrics are used for different ML task types?

A.Accuracy is the only metric needed for all ML task types

B.Different tasks use different metrics: F1 for classification, RMSE for regression, mAP for detection

C.Model evaluation is only needed before deployment, not after

D.The only reliable evaluation is user feedback after the model is deployed in production

AnswerB

Evaluation metrics are task-specific — F1/AUC for classification, RMSE/R² for regression, mAP for detection, BLEU for translation.

Why this answer

Model evaluation is the process of assessing how well a trained machine learning model performs on unseen data. Different ML task types require different metrics because they measure distinct aspects of performance: for classification tasks, F1-score balances precision and recall; for regression tasks, RMSE (Root Mean Squared Error) quantifies prediction error in the same units as the target; for object detection tasks, mAP (mean Average Precision) evaluates both localization and classification accuracy. Option B correctly identifies these task-specific metrics.

Exam trap

The trap here is that candidates often assume accuracy is a universal metric, but the AI-900 exam specifically tests that different ML tasks (classification, regression, detection) require specialized metrics like F1, RMSE, and mAP to properly evaluate model performance.

How to eliminate wrong answers

Option A is wrong because accuracy is not sufficient for all ML tasks—it fails on imbalanced classification datasets where a model can achieve high accuracy by always predicting the majority class, and it is meaningless for regression or detection tasks. Option C is wrong because model evaluation is an ongoing process that should occur both before deployment (to validate performance on test data) and after deployment (to monitor for data drift, concept drift, and performance degradation in production). Option D is wrong because user feedback is subjective, delayed, and not a quantitative metric; it cannot replace objective evaluation metrics like precision, recall, or RMSE, which provide reproducible and statistically sound performance measurements.

Practice this question →

104

MCQmedium

What is 'regularisation' in machine learning and what problem does it solve?

A.Standardising input features to the same scale before training

B.Adding a penalty to the loss function to discourage overly complex models and reduce overfitting

C.Applying government regulations to ensure AI models comply with data privacy laws

D.Converting irregular training data shapes into a uniform format for the algorithm

AnswerB

Regularisation (L1/L2) penalises large weights, preventing overfitting by favouring simpler models that generalise better.

Why this answer

Regularisation is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function. This penalty discourages the model from learning overly complex patterns, such as large or numerous coefficients, which helps the model generalise better to unseen data. In Azure Machine Learning, regularisation parameters like L1 (Lasso) or L2 (Ridge) can be configured in algorithms such as linear regression or neural networks to control model complexity.

Exam trap

The trap here is that candidates confuse regularisation with data preprocessing steps like normalisation or reshaping, because both involve modifying data or model parameters, but regularisation specifically targets overfitting by penalising complexity, not by altering input data format or scale.

How to eliminate wrong answers

Option A is wrong because standardising input features to the same scale is called feature scaling or normalisation, not regularisation; it addresses gradient descent convergence, not overfitting. Option C is wrong because applying government regulations for data privacy is a compliance or governance concern, not a machine learning regularisation technique; it relates to policies like GDPR, not model training. Option D is wrong because converting irregular training data shapes into a uniform format refers to data preprocessing or reshaping, which is unrelated to the penalty-based regularisation that controls model complexity.

Practice this question →

105

MCQhard

A data scientist is evaluating a binary classification model that predicts whether a transaction is fraudulent. The test set contains 1,000 transactions: 990 legitimate and 10 fraudulent. The model's predictions are shown in the confusion matrix below. Confusion matrix: Predicted Legitimate Predicted Fraudulent Actual Legitimate 942 48 Actual Fraudulent 2 8 Which metric should the data scientist prioritize if the business goal is to minimize the number of fraudulent transactions that are missed (false negatives)?

A.Precision

B.Recall

C.Accuracy

D.Specificity

AnswerB

Recall measures the ability to find all actual positive cases. A high recall ensures that very few fraudulent transactions are missed, directly aligning with the goal of minimizing false negatives.

Why this answer

Recall (sensitivity) measures the proportion of actual positives correctly identified, calculated as TP/(TP+FN). With 2 false negatives (missed fraudulent transactions), recall is 8/(8+2)=0.80. Minimizing missed fraud directly corresponds to maximizing recall, making it the correct priority for this business goal.

Exam trap

The trap here is that candidates often pick Accuracy because it seems intuitive, but the severe class imbalance (99% legitimate) makes accuracy a poor metric, while Recall directly addresses the business requirement of minimizing missed fraud.

How to eliminate wrong answers

Option A (Precision) is wrong because precision measures the proportion of predicted positives that are actually positive (TP/(TP+FP)), which focuses on avoiding false alarms, not on catching all fraud. Option C (Accuracy) is wrong because accuracy is (TP+TN)/(total) = (8+942)/1000 = 0.95, which is misleadingly high due to class imbalance (990 legitimate vs 10 fraudulent) and does not reflect the cost of missing fraud. Option D (Specificity) is wrong because specificity measures the proportion of actual negatives correctly identified (TN/(TN+FP)), which is about correctly classifying legitimate transactions, not about minimizing missed fraudulent transactions.

Practice this question →

106

MCQmedium

A data scientist has a dataset containing customer transaction records with features such as age, income, and purchase history, but no labels. The goal is to identify natural groupings of customers for a targeted marketing campaign. Which type of machine learning should be used?

A.Classification

B.Regression

C.Clustering

D.Reinforcement learning

AnswerC

Clustering is an unsupervised learning method that groups similar data points without requiring labels, making it ideal for discovering natural groupings.

Why this answer

Clustering is the correct choice because the dataset has no labels, and the goal is to discover natural groupings of customers based on feature similarity. Unsupervised learning algorithms like K-Means or DBSCAN partition data into clusters where intra-cluster similarity is high and inter-cluster similarity is low, enabling targeted marketing without pre-existing categories.

Exam trap

The trap here is that candidates confuse clustering with classification because both involve grouping, but classification requires pre-labeled categories while clustering discovers them from unlabeled data.

How to eliminate wrong answers

Option A is wrong because classification requires labeled data to predict discrete class labels, but this dataset has no labels. Option B is wrong because regression predicts continuous numerical values (e.g., income amount) from labeled data, not groupings. Option D is wrong because reinforcement learning involves an agent learning from rewards and punishments in an environment, not from static unlabeled data.

Practice this question →

107

MCQeasy

What is 'Azure Machine Learning Responsible AI dashboard's error analysis'?

A.A log of all Python exceptions and errors that occurred during model training

B.Identifying data subgroups where the model makes disproportionately more errors than average

C.Counting the total number of incorrect predictions across the full test set

D.Reviewing error messages from failed Azure ML pipeline runs to diagnose infrastructure issues

AnswerB

Error analysis surfaces model blind spots — finding where accuracy is significantly lower (by age group, region, or feature range).

Why this answer

Azure Machine Learning Responsible AI dashboard's error analysis is specifically designed to identify data subgroups where the model performs poorly, often revealing bias or systematic failures. It uses a decision tree-based approach to partition the dataset and highlight cohorts with disproportionately high error rates, enabling targeted mitigation. This goes beyond simple aggregate metrics to uncover hidden disparities in model performance.

Exam trap

The trap here is that candidates confuse 'error analysis' with basic error counting or debugging, when the key is its focus on subgroup-level disparity detection, not aggregate or infrastructure errors.

How to eliminate wrong answers

Option A is wrong because error analysis does not log Python exceptions or training errors; it focuses on model prediction errors on test data, not code-level failures. Option C is wrong because counting total incorrect predictions is a basic aggregate metric (e.g., error rate), not the subgroup-level analysis that error analysis provides. Option D is wrong because error analysis evaluates model predictions, not infrastructure or pipeline run errors; diagnosing failed runs is a separate operational concern.

Practice this question →

108

MCQmedium

What is 'imbalanced classification' handling using 'SMOTE'?

A.A technique for collecting more real minority class examples from external data sources

B.Generating synthetic minority class examples by interpolating between existing examples

C.Removing majority class examples until all classes have equal representation

D.Setting model confidence thresholds to classify more examples as the minority class

AnswerB

SMOTE creates plausible synthetic minority examples — helping classifiers learn from rare classes without just duplicating existing ones.

Why this answer

SMOTE (Synthetic Minority Over-sampling Technique) is a data augmentation method that creates synthetic examples for the minority class by interpolating between existing minority class instances. It selects a minority example, finds its k-nearest neighbors from the same class, and generates new samples along the line segments connecting the example to those neighbors. This balances the class distribution without duplicating existing data or discarding majority class examples.

Exam trap

The trap here is that candidates confuse SMOTE with undersampling or threshold tuning, but SMOTE is specifically a synthetic oversampling technique that creates new data points, not a method for removing data or adjusting model parameters.

How to eliminate wrong answers

Option A is wrong because SMOTE does not involve collecting real examples from external sources; it generates synthetic data from the existing minority class. Option C is wrong because that describes random undersampling, not SMOTE, which oversamples the minority class rather than removing majority examples. Option D is wrong because adjusting confidence thresholds is a post-training decision boundary technique, not a data-level method like SMOTE for handling imbalanced classification.

Practice this question →

109

MCQmedium

What is a 'compute instance' in Azure Machine Learning?

A.A scalable cluster for running distributed training jobs across many nodes

B.A managed cloud workstation for interactive ML development with pre-installed tools

C.A virtual machine that automatically scales to run batch predictions

D.A serverless execution environment for ML inference requests

AnswerB

Compute instances are managed single-node VMs with ML frameworks pre-installed — providing individual data scientist development environments.

Why this answer

Option B is correct because a compute instance in Azure Machine Learning is a fully managed cloud workstation that provides a pre-configured environment with popular ML tools like Jupyter Notebooks, TensorFlow, and PyTorch. It is designed for interactive development, allowing data scientists to train and experiment with models without managing infrastructure.

Exam trap

The trap here is that candidates confuse 'compute instance' with 'compute cluster' because both are compute targets, but the instance is for single-user interactive work while the cluster is for multi-node distributed jobs.

How to eliminate wrong answers

Option A is wrong because a scalable cluster for running distributed training jobs across many nodes describes an Azure Machine Learning compute cluster, not a compute instance. Option C is wrong because a virtual machine that automatically scales to run batch predictions describes an Azure Machine Learning inference cluster or a managed online endpoint, not a compute instance. Option D is wrong because a serverless execution environment for ML inference requests describes Azure Machine Learning serverless inference endpoints or Azure Functions, not a compute instance.

Practice this question →

110

MCQmedium

A data scientist is training a regression model to predict house prices using features like square footage, number of bedrooms, and location. After evaluating the model on a test set, the data scientist wants to select a metric that measures the average magnitude of prediction errors in the same units as the target variable (price). Which evaluation metric should the data scientist use?

A.Root Mean Squared Error (RMSE)

B.Accuracy

C.F1 Score

D.Precision

AnswerA

RMSE measures the average magnitude of prediction errors in the original units, making it suitable for regression.

Why this answer

Root Mean Squared Error (RMSE) is the correct metric because it measures the average magnitude of prediction errors in the same units as the target variable (price). RMSE is computed as the square root of the average squared differences between predicted and actual values, which brings the error metric back to the original unit (e.g., dollars), making it directly interpretable for regression tasks like house price prediction.

Exam trap

The trap here is that candidates often confuse regression metrics with classification metrics, mistakenly selecting Accuracy or F1 Score because they are familiar from other contexts, without recognizing that the question explicitly asks for a metric measuring error magnitude in the same units as the target variable, which only RMSE (or MAE) satisfies.

How to eliminate wrong answers

Option B (Accuracy) is wrong because accuracy is a classification metric that measures the proportion of correct predictions out of total predictions, not applicable to regression tasks predicting continuous values like house prices. Option C (F1 Score) is wrong because F1 Score is a harmonic mean of precision and recall used for evaluating classification models, particularly for imbalanced datasets, and does not measure prediction error magnitude in regression. Option D (Precision) is wrong because precision is a classification metric that measures the proportion of true positive predictions among all positive predictions, irrelevant for regression error analysis.

Practice this question →

111

MCQmedium

What is 'online learning' (incremental learning) in machine learning?

A.Training ML models through an online learning management system

B.Continuously updating model weights on new data as it arrives rather than batch retraining

C.Requiring an internet connection during model training for cloud compute access

D.A training approach where users can interact with and correct the model in real time

AnswerB

Online learning adapts to streaming data in real time — useful for high-velocity data but risks forgetting old patterns.

Why this answer

Online learning (incremental learning) is a machine learning technique where the model is updated continuously as new data arrives, rather than retraining from scratch on the entire dataset. This is essential for scenarios with streaming data or when retraining on all historical data is computationally prohibitive. In Azure, this is supported by services like Azure Stream Analytics and Azure Machine Learning's online endpoints, which can update model weights incrementally.

Exam trap

The trap here is confusing 'online learning' with 'requiring an internet connection' (Option C) or with 'interactive human correction' (Option D), when the term specifically refers to incremental data ingestion and model weight updates.

How to eliminate wrong answers

Option A is wrong because it describes a learning management system (LMS) for human education, not a machine learning training paradigm. Option C is wrong because while cloud compute may be used, online learning does not require an internet connection; it refers to incremental data processing, not network connectivity. Option D is wrong because it describes interactive or active learning where humans correct the model, which is a different concept from automated incremental weight updates based on new data.

Practice this question →

112

MCQmedium

A data scientist is training a model to predict whether a patient has a rare disease (1% prevalence). The model predicts 'no disease' for all patients and achieves 99% accuracy, but fails to identify any actual cases. Which metric would best reveal this failure?

A.Precision

B.Recall

C.F1 score

D.Mean absolute error

AnswerB

Recall (sensitivity) would be 0% because the model predicts no positives, making it clear that it misses all actual disease cases.

Why this answer

Recall (sensitivity) measures the proportion of actual positive cases correctly identified. With 1% disease prevalence and a model that predicts 'no disease' for all patients, recall is 0% because zero true positives are found. Accuracy (99%) is misleading here because the model fails to detect any rare disease cases, and recall directly exposes this failure.

Exam trap

The trap here is that candidates see 99% accuracy and assume the model is performing well, failing to recognize that accuracy is a poor metric for imbalanced datasets and that recall specifically measures the model's ability to catch rare positive cases.

How to eliminate wrong answers

Option A is wrong because precision measures the proportion of positive predictions that are correct; since the model never predicts positive, precision is undefined or 0, but precision does not directly reveal the failure to find actual cases. Option C is wrong because the F1 score is the harmonic mean of precision and recall; with recall at 0%, the F1 score is also 0, but it does not isolate the failure as clearly as recall does. Option D is wrong because mean absolute error (MAE) is a regression metric used for continuous values, not for binary classification tasks like disease prediction.

Practice this question →

113

MCQeasy

What is 'Azure Machine Learning compute' and what types are available?

A.The mathematical computations performed by the model during training

B.The managed cloud infrastructure (VMs, clusters) used to run ML training and inference workloads

C.The number of floating-point operations a model performs per second

D.A billing calculator that estimates the cost of running machine learning workloads

AnswerB

Azure ML compute provides managed infrastructure — compute instances for dev, clusters for training, inference clusters for serving.

Why this answer

Azure Machine Learning compute is a managed cloud infrastructure that provides on-demand virtual machines (VMs) and clusters for running machine learning training and inference workloads. It abstracts away the underlying hardware management, allowing you to dynamically scale compute resources up or down based on job requirements, and supports both CPU and GPU instances for different model types.

Exam trap

The trap here is confusing the abstract concept of 'compute' (the infrastructure) with the mathematical computations or performance metrics, leading candidates to pick A or C instead of recognizing it as a managed cloud resource.

How to eliminate wrong answers

Option A is wrong because it describes the mathematical operations (e.g., matrix multiplications) performed during model training, which is a computational process, not the infrastructure that runs it. Option C is wrong because it refers to FLOPS (floating-point operations per second), a performance metric for measuring computational throughput, not the managed compute service itself. Option D is wrong because it describes the Azure Pricing Calculator or TCO calculator, which estimates costs but does not execute ML workloads.

Practice this question →

114

MCQmedium

A data scientist is training a regression model to predict house prices. The data scientist wants to evaluate the model using a metric that penalizes large prediction errors significantly more than small errors. Which evaluation metric should the data scientist choose?

A.Mean Absolute Error (MAE)

B.Root Mean Squared Error (RMSE)

C.R-squared (R²)

D.Mean Absolute Percentage Error (MAPE)

AnswerB

RMSE squares the errors before averaging and then takes the square root. The squaring step causes larger errors to have a disproportionately higher impact on the metric, making it sensitive to outliers and large deviations.

Why this answer

Root Mean Squared Error (RMSE) is the correct choice because it squares the residuals before averaging, which heavily penalizes large prediction errors (outliers) more than small errors. This aligns with the requirement to penalize large errors significantly more than small ones, as the squaring operation amplifies the impact of larger deviations.

Exam trap

The trap here is that candidates often confuse MAE with RMSE, thinking both penalize errors equally, but the squaring operation in RMSE is the key differentiator that makes it penalize large errors disproportionately.

How to eliminate wrong answers

Option A is wrong because Mean Absolute Error (MAE) treats all errors equally by taking the absolute value of residuals, so it does not penalize large errors more than small ones. Option C is wrong because R-squared (R²) measures the proportion of variance explained by the model, not the magnitude of prediction errors, and it does not directly penalize large errors. Option D is wrong because Mean Absolute Percentage Error (MAPE) uses absolute percentage errors, which are not squared and thus do not disproportionately penalize large errors; it also has issues with division by zero when actual values are zero.

Practice this question →

115

MCQmedium

Which type of machine learning uses labeled training data where the correct output is provided for each input?

A.Unsupervised learning

B.Reinforcement learning

C.Supervised learning

D.Transfer learning

AnswerC

Supervised learning uses labeled training data — each input has a corresponding correct output label for the algorithm to learn from.

Why this answer

Supervised learning is the correct answer because it explicitly uses labeled training data where each input example is paired with the correct output label. The algorithm learns to map inputs to outputs by minimizing the error between its predictions and the provided labels, enabling tasks like classification and regression.

Exam trap

The trap here is that candidates often confuse 'supervised learning' with 'reinforcement learning' because both involve feedback, but reinforcement learning uses delayed rewards from actions rather than direct labeled examples.

How to eliminate wrong answers

Option A is wrong because unsupervised learning uses unlabeled data and finds hidden patterns or groupings without any correct output provided. Option B is wrong because reinforcement learning learns through trial-and-error interactions with an environment using rewards and penalties, not from pre-labeled input-output pairs. Option D is wrong because transfer learning is a technique that reuses a pre-trained model on a new but related task, not a distinct learning paradigm that uses labeled training data directly.

Practice this question →

116

MCQeasy

A data scientist is building a model to predict the exact temperature in degrees Celsius based on humidity and atmospheric pressure. The model will output a single numeric value for each input. Which type of machine learning task is this?

A.Classification

B.Regression

C.Clustering

D.Object detection

AnswerB

Regression predicts a continuous numeric value, such as temperature, based on input features.

Why this answer

This is a regression task because the goal is to predict a continuous numeric value (temperature in degrees Celsius) from input features (humidity and atmospheric pressure). Regression models output a real number, unlike classification which predicts discrete categories. In Azure Machine Learning, regression algorithms like Linear Regression or Decision Forest Regression are used for such tasks.

Exam trap

The trap here is that candidates may confuse predicting a numeric value with classification, but classification outputs discrete labels (e.g., 'high temperature' vs 'low temperature'), not a precise continuous number like degrees Celsius.

How to eliminate wrong answers

Option A is wrong because classification predicts discrete class labels (e.g., 'hot' or 'cold'), not a continuous numeric value. Option C is wrong because clustering groups unlabeled data into clusters based on similarity, without predicting a specific numeric output. Option D is wrong because object detection identifies and locates objects within images or video, producing bounding boxes and labels, not a single numeric temperature value.

Practice this question →

117

MCQmedium

What is 'transfer learning' and how is it different from training from scratch?

A.Transfer learning and training from scratch produce identical results

B.Transfer learning fine-tunes a pre-trained model on a new task — requiring far less data and compute than training from scratch

C.Transfer learning copies model weights between Azure subscriptions

D.Transfer learning is used only when the original training data is unavailable

AnswerB

Transfer learning reuses learned representations from pre-training, requiring only a fraction of the data and compute of training from scratch.

Why this answer

Transfer learning starts with a model already trained on a large dataset (e.g., ImageNet) and fine-tunes it on a smaller, task-specific dataset. This approach requires significantly less data and computational resources compared to training from scratch, where all model weights are randomly initialized and learned from the ground up. It is especially effective when the new task is similar to the original training task, allowing the pre-trained features to be reused.

Exam trap

The trap here is that candidates may confuse transfer learning with simply reusing a model without any retraining, or think it only applies when original data is missing, rather than understanding it as a resource-efficient fine-tuning strategy.

How to eliminate wrong answers

Option A is wrong because transfer learning and training from scratch do not produce identical results; transfer learning typically converges faster and may achieve higher accuracy with limited data, while training from scratch requires more data and compute to reach comparable performance. Option C is wrong because transfer learning is a machine learning technique involving model weights, not a mechanism for copying model weights between Azure subscriptions; Azure subscriptions are unrelated to the concept. Option D is wrong because transfer learning can be used even when original training data is available; it is chosen to save resources and improve performance, not solely due to data unavailability.

Practice this question →

118

MCQmedium

A data scientist trains a model to predict customer churn. The dataset includes features like age, income, and number of support calls. The model performs well on historical data but poorly on new data from a different customer segment. Which technique is most likely to help improve generalization?

A.Feature engineering

B.Cross-validation

C.Increasing model complexity

D.Using a larger learning rate

AnswerB

Cross-validation helps ensure the model performs consistently across different data splits, leading to better generalization to new customer segments.

Why this answer

Cross-validation (Option B) is the most effective technique to improve generalization because it evaluates the model on multiple subsets of the training data, reducing overfitting to a specific segment. By partitioning the data into folds and training/validating iteratively, cross-validation ensures the model learns patterns that are consistent across different data distributions, not just the historical segment. This directly addresses the problem of poor performance on new customer segments by providing a more robust estimate of model performance on unseen data.

Exam trap

The trap here is that candidates often choose 'Feature engineering' (Option A) thinking it always improves model performance, but they miss that the core issue is overfitting to a specific segment, which cross-validation directly mitigates by validating across data splits.

How to eliminate wrong answers

Option A is wrong because feature engineering (creating or transforming input variables) can improve model performance but does not inherently address generalization across different data segments; it may even exacerbate overfitting if features are tailored to the historical segment. Option C is wrong because increasing model complexity (e.g., adding more layers or parameters) typically worsens generalization by increasing the risk of overfitting to the training data, making the model less adaptable to new segments. Option D is wrong because using a larger learning rate can cause the model to converge too quickly to a suboptimal solution or diverge, but it does not directly improve generalization and may harm performance on both historical and new data.

Practice this question →

119

MCQmedium

What is 'k-fold cross-validation' specifically and how is k=10 different from k=5?

A.k=10 always produces a better model than k=5 because it uses more training data

B.k=10 provides more reliable performance estimates at 2x the compute cost vs k=5

C.k=5 and k=10 produce identical results because the total data is the same

D.k=10 requires 10 times more labelled data than k=5

AnswerB

More folds = less variance in the performance estimate, but more training runs — k=10 is more reliable but computationally costlier than k=5.

Why this answer

k-fold cross-validation splits the dataset into k equal folds, training on k-1 folds and validating on the remaining fold, repeating this process k times. With k=10, each model is trained on 90% of the data and validated on 10%, while k=5 uses 80% for training and 20% for validation. The key difference is that k=10 yields a performance estimate with lower variance (more reliable) because it averages over more folds, but it requires approximately twice the computational cost (10 training runs vs. 5).

Exam trap

The trap here is confusing model performance improvement with estimate reliability; candidates often think more folds always yield a better model, but cross-validation is about evaluating performance, not training the final model.

How to eliminate wrong answers

Option A is wrong because k=10 does not always produce a better model; it provides a more reliable estimate of model performance, but the actual model quality depends on the algorithm and data, not the cross-validation fold count. Option C is wrong because k=5 and k=10 produce different results due to different training/validation splits and variance in estimates; they are not identical. Option D is wrong because k-fold cross-validation does not require more labelled data; it uses the same dataset, just partitioned differently.

Practice this question →

120

MCQhard

A data scientist trains a binary classification model to detect fraudulent transactions. The dataset contains only 2% fraudulent transactions. The model achieves 98% overall accuracy, but it fails to detect any fraudulent transactions, classifying all transactions as legitimate. Which metric would most clearly reveal this failure?

A.Precision

B.Recall

C.F1 score

D.Specificity

AnswerB

Recall (true positive rate) is 0 when no fraudulent transactions are identified, exposing the model's failure.

Why this answer

Recall (also known as sensitivity or true positive rate) measures the proportion of actual positive cases (fraudulent transactions) that were correctly identified by the model. In this scenario, the model classifies all transactions as legitimate, so it detects zero fraudulent transactions, yielding a recall of 0%. Despite 98% overall accuracy, the recall metric clearly exposes the model's complete failure to identify any fraud.

Exam trap

The trap here is that candidates often assume high overall accuracy (98%) implies good model performance, failing to recognize that accuracy is a poor metric for imbalanced datasets and that recall is the metric that directly exposes the model's inability to detect the minority class.

How to eliminate wrong answers

Option A is wrong because precision measures the proportion of predicted positive cases that are actually positive; if the model predicts no positives, precision is undefined (division by zero) or 0/0, which does not clearly reveal the failure to detect fraud. Option C is wrong because the F1 score is the harmonic mean of precision and recall; with recall at 0%, the F1 score will also be 0%, but it does not directly highlight the failure as intuitively as recall alone. Option D is wrong because specificity measures the proportion of actual negative cases (legitimate transactions) correctly identified; the model correctly classifies all legitimate transactions, so specificity would be 100%, masking the failure to detect fraud.

Practice this question →

121

MCQhard

A data scientist is training a binary classification model to detect rare equipment failures from sensor data. The dataset contains 99.5% normal operation readings and only 0.5% failure readings. The model currently predicts all readings as 'normal' and achieves 99.5% accuracy on the test set. The business requires the model to identify at least 80% of actual failures. Which data-level technique should the data scientist use to most directly address the class imbalance?

A.Oversample the minority class (failure examples)

B.Undersample the majority class (normal examples)

C.Use precision as the optimization metric

D.Reduce the complexity of the model

AnswerA

Correct. Oversampling increases the number of minority class examples in the training set, helping the model learn to identify failures better.

Why this answer

Oversampling the minority class (failure examples) directly addresses the severe class imbalance by creating synthetic copies or duplicates of the rare failure instances. This balances the training dataset, allowing the model to learn patterns associated with failures rather than always predicting the majority class. With a balanced dataset, the model can be trained to meet the business requirement of identifying at least 80% of actual failures, even though overall accuracy may decrease.

Exam trap

The trap here is that candidates may think high accuracy (99.5%) is always good, but in imbalanced datasets, accuracy is misleading; the question tests whether you recognize that data-level techniques like oversampling are needed to force the model to learn the minority class, not just optimize metrics or simplify the model.

How to eliminate wrong answers

Option B is wrong because undersampling the majority class discards the vast majority of normal operation data, which can lead to loss of valuable information and reduced model generalization, especially when the majority class is 99.5% of the data. Option C is wrong because using precision as the optimization metric does not directly address the class imbalance at the data level; it is a model evaluation metric that can be used after rebalancing, but it does not change the underlying skewed distribution. Option D is wrong because reducing model complexity does not fix the class imbalance; it may help prevent overfitting but will not enable the model to learn from the rare failure class when it is vastly underrepresented in the training data.

Practice this question →

122

MCQeasy

What is feature engineering in machine learning?

A.Designing the hardware chips for running ML models

B.Selecting, transforming, and creating input variables from raw data to improve model performance

C.Selecting which neural network layers to include in a model

D.Writing code to deploy ML models as REST APIs

AnswerB

Feature engineering prepares and transforms raw data into informative inputs that help ML models learn better patterns.

Why this answer

Feature engineering is the process of selecting, transforming, and creating input variables (features) from raw data to improve the performance of machine learning models. This step is critical because the quality and relevance of features directly impact a model's ability to learn patterns and generalize to new data. In Azure Machine Learning, feature engineering is often performed using tools like the 'Feature Engineering' step in automated ML or custom Python scripts with libraries such as pandas and scikit-learn.

Exam trap

The trap here is that candidates confuse feature engineering with model architecture design (Option C) or deployment (Option D), because all three are part of the ML lifecycle but serve distinct purposes—feature engineering focuses solely on input data transformation, not on model structure or serving.

How to eliminate wrong answers

Option A is wrong because designing hardware chips for running ML models is a hardware engineering task, not a data preprocessing or feature creation activity; it relates to specialized processors like GPUs or FPGAs, not to feature engineering. Option C is wrong because selecting neural network layers is part of model architecture design (e.g., choosing the number of layers in a deep learning model), which occurs after feature engineering and focuses on model structure, not input variable manipulation. Option D is wrong because writing code to deploy ML models as REST APIs is a deployment and MLOps activity, typically using tools like Azure Kubernetes Service or Azure Functions, and has nothing to do with transforming raw data into features.

Practice this question →

123

MCQmedium

What is the F1 score in machine learning evaluation?

A.The first evaluation metric calculated before training a model

B.The harmonic mean of precision and recall that balances both metrics

C.The proportion of predictions correct on the test set

D.A measure of how fast the model produces predictions

AnswerB

F1 = 2*(P*R)/(P+R). It balances precision (positive reliability) and recall (detection rate) into one metric.

Why this answer

Option B is correct because the F1 score is defined as the harmonic mean of precision and recall, calculated as 2 * (precision * recall) / (precision + recall). This metric provides a single score that balances both false positives and false negatives, making it especially useful when classes are imbalanced. In Azure Machine Learning, the F1 score is a standard evaluation metric for classification models, reported in automated ML runs and designer modules.

Exam trap

The trap here is that candidates confuse the F1 score with accuracy (Option C) because both are single-number metrics, but the F1 score specifically addresses the trade-off between precision and recall, not just overall correctness.

How to eliminate wrong answers

Option A is wrong because the F1 score is an evaluation metric computed after model training and prediction, not before training; metrics like accuracy or loss are not calculated prior to training. Option C is wrong because it describes accuracy (the proportion of correct predictions), not the F1 score, which specifically balances precision and recall. Option D is wrong because it describes inference speed or latency, which is a performance metric unrelated to the statistical evaluation of classification quality.

Practice this question →

124

MCQmedium

What is overfitting in machine learning?

A.When a model performs well on training data but poorly on new, unseen data

B.When a model is trained with too little data

C.When a model takes too long to train

D.When a model performs poorly on both training and test data

AnswerA

Overfitting means the model memorized training data specifics (including noise) and fails to generalize to new examples.

Why this answer

Overfitting occurs when a machine learning model learns the training data too well, including its noise and outliers, resulting in high accuracy on training data but poor generalization to new, unseen data. This is a fundamental concept in ML because the goal is to create models that perform well on real-world data, not just the data they were trained on. In Azure Machine Learning, techniques like regularization, cross-validation, and early stopping are used to detect and mitigate overfitting.

Exam trap

The trap here is that candidates confuse overfitting with underfitting (Option D) or mistakenly think overfitting is caused solely by insufficient data (Option B), when in fact overfitting is about the model's inability to generalize due to excessive complexity or noise memorization.

How to eliminate wrong answers

Option B is wrong because training with too little data can lead to underfitting (high bias) or high variance, but overfitting is specifically about the model memorizing the training data, not the quantity of data alone. Option C is wrong because training time is a performance metric, not a definition of overfitting; a model can overfit quickly or slowly depending on complexity and data size. Option D is wrong because poor performance on both training and test data describes underfitting (high bias), where the model is too simple to capture underlying patterns, not overfitting.

Practice this question →

125

MCQeasy

A data scientist wants to group customers into segments based on purchasing behavior without using any labeled examples. Which type of machine learning is this?

A.Supervised learning

B.Unsupervised learning

C.Reinforcement learning

D.Semi-supervised learning

AnswerB

Unsupervised learning identifies inherent groupings (clusters) in data without pre-existing labels, which matches the scenario of segmenting customers by purchasing behavior.

Why this answer

Unsupervised learning is the correct choice because the data scientist has no labeled examples and wants to discover hidden patterns or groupings in the data. Clustering algorithms, such as K-Means or DBSCAN, are used to segment customers based solely on their purchasing behavior features, without any predefined categories.

Exam trap

The trap here is that candidates may confuse 'no labeled examples' with semi-supervised learning, but the key distinction is that semi-supervised learning still requires at least some labeled data, while this scenario uses none.

How to eliminate wrong answers

Option A is wrong because supervised learning requires labeled training data with known outcomes, which is not available here. Option C is wrong because reinforcement learning involves an agent learning from rewards and penalties in an environment, not from static data without labels. Option D is wrong because semi-supervised learning uses a small amount of labeled data alongside a larger unlabeled dataset, but the question explicitly states no labeled examples are used.

Practice this question →

126

MCQmedium

A data scientist trains a model on historical data and achieves high accuracy on both the training set and a held-out test set. However, when the model is deployed in production, it performs poorly on new, unseen data. Which issue is most likely the cause?

A.Overfitting

B.Underfitting

C.Data leakage

D.Concept drift

AnswerC

Data leakage causes the model to learn patterns that include information not available at inference time, leading to overly optimistic evaluation and poor real-world performance.

Why this answer

Data leakage occurs when information from outside the training dataset is inadvertently used to train the model, causing it to learn patterns that do not generalize to new data. In this scenario, the high accuracy on both training and test sets but poor production performance indicates that the test set was contaminated with information from the future or from the target variable, making the model appear accurate during validation but fail in real-world deployment.

Exam trap

The trap here is that candidates confuse high accuracy on both training and test sets with overfitting, but the key differentiator is that overfitting would show a significant gap between training and test accuracy, whereas data leakage produces deceptively high accuracy on both sets.

How to eliminate wrong answers

Option A is wrong because overfitting would show high training accuracy but low test accuracy, not high accuracy on both sets. Option B is wrong because underfitting would result in poor performance on both training and test sets, not high accuracy. Option D is wrong because concept drift refers to a change in the underlying data distribution over time after deployment, not a static mismatch between training and production data at the time of deployment.

Practice this question →

127

MCQmedium

A hospital has a dataset with historical patient records, each labeled as either 'readmitted within 30 days' or 'not readmitted'. The hospital wants to train a model to predict which current patients are likely to be readmitted. Which type of machine learning task is this?

A.Supervised regression

B.Supervised classification

C.Unsupervised clustering

D.Reinforcement learning

AnswerB

Classification is used when the target variable is a category, and the data is labeled. Here, the output is one of two classes – readmitted or not readmitted.

Why this answer

This is a supervised classification task because the dataset contains labeled historical patient records (readmitted or not readmitted), and the goal is to predict a discrete category (binary outcome) for new patients. In Azure Machine Learning, this would use a classification algorithm like logistic regression or decision tree to assign each patient to one of the two classes.

Exam trap

The trap here is that candidates confuse regression with classification when the target variable is a binary outcome, mistakenly thinking 'readmitted or not' is a numeric value rather than a categorical label.

How to eliminate wrong answers

Option A is wrong because supervised regression predicts a continuous numeric value (e.g., number of days until readmission), not a discrete category. Option C is wrong because unsupervised clustering groups data without labeled outcomes, but here the labels are provided (readmitted/not readmitted). Option D is wrong because reinforcement learning involves an agent learning from rewards/penalties in an environment, not from labeled historical data.

Practice this question →

128

MCQmedium

What is 'Azure Machine Learning's Responsible AI dashboard' and what does it include?

A.A compliance checklist confirming a model meets Microsoft's responsible AI certification requirements

B.A unified tool for error analysis, interpretability, fairness, data exploration, and causal inference

C.A monitoring dashboard showing responsible AI policy violations in production

D.A report auto-generated and submitted to regulators when a model is deployed

AnswerB

The Responsible AI dashboard integrates multiple analysis lenses — from where the model fails to why and who it affects differently.

Why this answer

Option B is correct because the Responsible AI dashboard in Azure Machine Learning is a unified, integrated tool that combines multiple components for building and evaluating AI systems responsibly. It includes error analysis, model interpretability, fairness assessment, data exploration, and causal inference capabilities, all accessible through a single interface. This dashboard helps data scientists and developers understand model behavior, identify potential biases, and make informed decisions throughout the ML lifecycle.

Exam trap

The trap here is that candidates confuse the Responsible AI dashboard with a compliance or monitoring tool, when in fact it is an interactive analysis and debugging suite for understanding model behavior before deployment.

How to eliminate wrong answers

Option A is wrong because the Responsible AI dashboard is not a compliance checklist or certification tool; it does not confirm that a model meets Microsoft's responsible AI certification requirements, as no such formal certification exists within Azure ML. Option C is wrong because the dashboard is designed for pre-deployment analysis and model understanding, not for monitoring production policy violations; production monitoring is handled by separate tools like Azure Monitor and Model Data Collector. Option D is wrong because the dashboard does not auto-generate or submit reports to regulators; it is an interactive tool for internal analysis, not a regulatory compliance reporting mechanism.

Practice this question →

129

MCQeasy

What is 'automated machine learning' (AutoML) in Azure Machine Learning?

A.A system that automatically retrains models on a fixed daily schedule

B.Automatically iterating through algorithms and hyperparameters to find the best model for a dataset

C.Automatically labelling training data using existing model predictions

D.A robot that physically connects GPU hardware for distributed training

AnswerB

AutoML eliminates manual algorithm selection and tuning by systematically exploring the model search space and surfacing the best option.

Why this answer

Automated machine learning (AutoML) in Azure Machine Learning automates the process of selecting the best machine learning algorithm and tuning its hyperparameters for a given dataset. It iterates through multiple combinations of algorithms and hyperparameter values, evaluating each model's performance to identify the optimal solution without manual intervention. This is why option B is correct.

Exam trap

The trap here is that candidates confuse AutoML with simple scheduled retraining (option A) or with automated data labeling (option C), but the core definition of AutoML is specifically about automating the algorithm selection and hyperparameter tuning process.

How to eliminate wrong answers

Option A is wrong because AutoML does not simply retrain models on a fixed daily schedule; that describes a scheduled retraining pipeline, not the automated algorithm and hyperparameter search process. Option C is wrong because automatically labeling training data using existing model predictions is known as 'pseudo-labeling' or 'self-training', not AutoML. Option D is wrong because AutoML is a software-based optimization process, not a physical robot that connects GPU hardware for distributed training.

Practice this question →

130

MCQhard

What is 'causal inference' and how does it differ from correlation-based machine learning?

A.Causal inference uses larger training datasets; correlation-based ML uses smaller ones

B.Causal inference determines whether X actually causes Y; ML finds correlations that predict outcomes

C.Causal inference is exclusively used in medical research; ML is used in business applications

D.ML models always establish causal relationships; causal inference is needed only when data quality is poor

AnswerB

Correlation ≠ causation — causal inference uses explicit causal reasoning, counterfactuals, and interventions rather than just predictive patterns.

Why this answer

Option B is correct because causal inference specifically aims to determine whether a change in variable X directly causes a change in variable Y, often through controlled experiments or techniques like do-calculus. In contrast, correlation-based machine learning identifies statistical patterns and associations between variables to make predictions, but does not establish a cause-and-effect relationship. This distinction is fundamental in Azure Machine Learning when choosing between predictive modeling (e.g., regression) and causal analysis (e.g., using the DoWhy library).

Exam trap

The trap here is that candidates often confuse correlation with causation, assuming that a strong predictive relationship in ML implies a causal link, when in fact causal inference requires additional experimental or quasi-experimental methods to establish causality.

How to eliminate wrong answers

Option A is wrong because the size of the training dataset is not a defining difference between causal inference and correlation-based ML; both can use large or small datasets depending on the problem. Option C is wrong because causal inference is not exclusively used in medical research; it is applied in economics, social sciences, and business (e.g., A/B testing on Azure). Option D is wrong because ML models do not always establish causal relationships; they typically find correlations, and causal inference is needed when you want to understand the effect of an intervention, not just when data quality is poor.

Practice this question →

131

MCQmedium

What is 'batch inference' vs 'real-time inference' in Azure Machine Learning?

A.Batch inference is more accurate; real-time is faster but less accurate

B.Real-time processes individual requests immediately; batch processes large datasets at scheduled intervals

C.Batch requires GPU compute; real-time uses CPU only

D.Real-time inference is only available in Azure; batch works on-premises too

AnswerB

Real-time = instant individual predictions; Batch = large-scale periodic scoring. Choice depends on whether immediate results are needed.

Why this answer

Option B is correct because batch inference processes large datasets asynchronously at scheduled intervals, making it suitable for offline or periodic predictions, while real-time inference handles individual requests immediately with low latency for interactive applications. Azure Machine Learning supports both: real-time endpoints for synchronous scoring and batch endpoints for asynchronous, high-throughput processing.

Exam trap

The trap here is that candidates confuse 'batch' with 'less accurate' or 'real-time' with 'GPU-only', when in fact the core distinction is synchronous vs asynchronous processing, not performance or hardware constraints.

How to eliminate wrong answers

Option A is wrong because accuracy is not inherently tied to inference mode; both batch and real-time inference use the same trained model, so accuracy is identical. Option C is wrong because neither batch nor real-time inference is restricted to a specific compute type; both can use CPU or GPU depending on the model and workload requirements. Option D is wrong because real-time inference is not exclusive to Azure; it can be deployed on-premises or in other cloud environments, and batch inference also works on-premises via Azure Arc or local deployments.

Practice this question →

132

MCQeasy

What is 'Azure Machine Learning notebooks' and who typically uses them?

A.Digital note-taking applications for recording meeting minutes during ML project planning

B.Interactive Jupyter notebook environments for data exploration and model prototyping by data scientists

C.Automated logging notebooks that record all model training metrics without code

D.Read-only document viewers for reviewing completed ML experiment results

AnswerB

Azure ML notebooks provide cloud-hosted Jupyter — data scientists write Python/R for analysis, visualisation, and experimentation.

Why this answer

Azure Machine Learning notebooks are interactive Jupyter notebook environments hosted within Azure Machine Learning studio. They allow data scientists to write and execute Python code for data exploration, visualization, and model prototyping directly in the cloud, with built-in access to compute instances and datasets. Option B correctly identifies both the technology (Jupyter notebooks) and the primary user role (data scientists).

Exam trap

The trap here is that candidates may confuse Azure Machine Learning notebooks with generic documentation tools (Option A) or assume they are passive logs (Option C), overlooking that they are active, code-driven development environments specifically designed for data scientists.

How to eliminate wrong answers

Option A is wrong because Azure Machine Learning notebooks are not digital note-taking applications for meeting minutes; they are code-centric environments for interactive development, not documentation. Option C is wrong because notebooks are not automated logging tools that record metrics without code; logging in Azure ML requires explicit code (e.g., using `mlflow` or `run.log()`) within the notebook cells. Option D is wrong because notebooks are fully interactive read-write environments, not read-only document viewers; they allow editing and execution of code, not just review of completed results.

Practice this question →

133

MCQmedium

What is data drift in the context of deployed machine learning models?

A.When training data is accidentally deleted from storage

B.When production data distribution changes from the training data distribution over time

C.When a model's weights change during inference

D.When data is moved between different Azure storage accounts

AnswerB

Data drift occurs when real-world data patterns shift away from what the model was trained on, degrading prediction accuracy.

Why this answer

Data drift refers to the phenomenon where the statistical properties of the input data a deployed model receives in production change over time, diverging from the distribution of the data used during training. This degradation can cause the model's predictions to become less accurate or unreliable, even if the model itself remains unchanged. In Azure Machine Learning, data drift is monitored using dataset monitors that compare production data distributions against the training baseline.

Exam trap

The trap here is that candidates confuse data drift with other operational issues like data loss or storage changes, rather than recognizing it as a statistical shift in the input data distribution that degrades model accuracy over time.

How to eliminate wrong answers

Option A is wrong because accidental deletion of training data is a data management or storage issue, not a change in data distribution affecting model performance. Option C is wrong because a model's weights do not change during inference; weights are fixed after training, and any change would require retraining or fine-tuning. Option D is wrong because moving data between Azure storage accounts is a data migration operation unrelated to the statistical properties of the data used for predictions.

Practice this question →

134

MCQmedium

A media company wants to automatically organize a large collection of news articles into several topic-based categories (e.g., politics, sports, technology) without using any predefined labels. They plan to use Azure Machine Learning. Which type of machine learning task should they use?

A.Regression

B.Classification

C.Clustering

D.Anomaly detection

AnswerC

Clustering is an unsupervised learning method that automatically groups similar data points together. Without labels, it can discover topic-based clusters in the news articles based on content similarity.

Why this answer

Clustering is the correct choice because the media company wants to group unlabeled news articles into topic-based categories based on inherent similarities in the data, without using predefined labels. Azure Machine Learning provides clustering algorithms like K-Means that automatically partition the dataset into distinct clusters, making it ideal for unsupervised learning tasks where the goal is to discover natural groupings.

Exam trap

The trap here is that candidates often confuse clustering with classification because both involve grouping data into categories, but clustering is unsupervised (no labels) while classification requires labeled training data.

How to eliminate wrong answers

Option A is wrong because regression is a supervised learning task used to predict continuous numerical values (e.g., article view count), not to group articles into discrete categories. Option B is wrong because classification is a supervised learning task that requires labeled training data to assign predefined categories, but the scenario explicitly states no predefined labels are used. Option D is wrong because anomaly detection is used to identify rare or unusual data points that deviate from the norm, not to organize data into multiple topic-based groups.

Practice this question →

135

MCQmedium

A data scientist trains a regression model on a dataset with 100 features and 10,000 samples. The model achieves a low training error but a much higher error on a held-out test set. Which approach is most likely to improve the model's generalization performance?

A.Increase the complexity of the model by adding more layers or parameters

B.Add more training data

C.Reduce the number of features or apply regularization

D.Use a different train-test split ratio like 80-20 instead of 70-30

AnswerC

Reducing features simplifies the model, making it less prone to overfitting. Regularization also penalizes large coefficients. This is a direct and effective method to improve generalization.

Why this answer

The model exhibits high variance (overfitting), as indicated by low training error but high test error. Reducing the number of features or applying regularization (e.g., L1/L2) directly constrains model complexity, forcing it to learn more general patterns rather than memorizing noise. This is the standard approach to improve generalization in regression models.

Exam trap

The trap here is that candidates often assume adding more training data is always the best fix for overfitting, but the question specifically describes a model with 100 features and only 10,000 samples—feature reduction or regularization is the more direct and efficient solution.

How to eliminate wrong answers

Option A is wrong because increasing model complexity (more layers/parameters) would exacerbate overfitting, making the gap between training and test error even larger. Option B is wrong because while more training data can help reduce overfitting, it is not the most direct or effective fix here—the primary issue is excessive model complexity relative to the data, and adding data may not help if the model is already too flexible. Option D is wrong because changing the train-test split ratio (e.g., from 70-30 to 80-20) does not address the root cause of overfitting; it only slightly alters the amount of data used for training and evaluation, which is unlikely to significantly reduce the variance gap.

Practice this question →

136

MCQmedium

A data scientist trains a machine learning model on a dataset of housing prices. The model achieves 98% accuracy on the training data but only 72% accuracy on a separate test set. What is the most likely problem with this model?

A.Underfitting

B.Overfitting

C.Data leakage

D.Class imbalance

AnswerB

Overfitting causes the model to memorize training data, leading to high training accuracy but poor generalization to new data.

Why this answer

The model's high accuracy on training data (98%) but significantly lower accuracy on test data (72%) is a classic symptom of overfitting, where the model learns noise and specific patterns in the training set rather than generalizing to new, unseen data. In Azure Machine Learning, this often occurs when the model is too complex (e.g., deep decision trees or high-degree polynomial features) relative to the amount of training data, and regularization techniques like L1/L2 regularization or early stopping are not applied.

Exam trap

The trap here is that candidates often confuse high training accuracy with a good model, overlooking the critical test accuracy drop that signals overfitting, and may incorrectly select underfitting because they focus only on the low test score.

How to eliminate wrong answers

Option A is wrong because underfitting would show low accuracy on both training and test sets, not high training accuracy with a large drop. Option C is wrong because data leakage would typically cause artificially high performance on both training and test sets (if leakage is present in test data) or inconsistent results, but the specific pattern of high training and low test accuracy is not characteristic of leakage. Option D is wrong because class imbalance primarily affects models by biasing predictions toward the majority class, often leading to poor recall on minority classes, but it does not inherently cause a large gap between training and test accuracy.

Practice this question →

137

MCQeasy

A data scientist trains a binary classification model to distinguish between images of cats and dogs. On the test set, the model achieves 98% accuracy, but a deeper inspection reveals that the test set contains 95% cats and 5% dogs, and the model predicts 'cat' for every single image. Which metric should the data scientist prioritize to get a more realistic evaluation of the model's performance on this imbalanced dataset?

A.Precision

B.Recall

C.F1-score

D.Accuracy

AnswerC

The F1-score combines precision and recall into a single metric that penalizes extreme values. For this model, the F1-score for the minority class (dogs) would be very low, revealing the poor performance that accuracy hides.

Why this answer

The F1-score is the harmonic mean of precision and recall, providing a single metric that balances both when classes are imbalanced. In this scenario, accuracy is misleadingly high (98%) because the model always predicts the majority class (cat), achieving high accuracy without actually learning to distinguish cats from dogs. The F1-score penalizes the model for its poor recall on the minority class (dogs), giving a more realistic evaluation of its performance.

Exam trap

The trap here is that candidates see 98% accuracy and assume the model is performing well, failing to recognize that accuracy is meaningless on imbalanced datasets where the model can achieve high accuracy by simply predicting the majority class.

How to eliminate wrong answers

Option A is wrong because precision measures the proportion of positive identifications that were actually correct, but here the model never predicts 'dog', so precision for the dog class is undefined (0/0) and precision for the cat class is 95%, which still appears high and does not reveal the model's failure to identify dogs. Option B is wrong because recall for the dog class would be 0% (since no dogs are correctly identified), but recall alone does not account for false positives, and the data scientist needs a balanced metric that combines both precision and recall. Option D is wrong because accuracy is the ratio of correct predictions to total predictions, and with a 95% majority class, a model that always predicts 'cat' achieves 95% accuracy without any discriminative ability, making it a poor metric for imbalanced datasets.

Practice this question →

138

MCQmedium

A data scientist has a dataset containing images of handwritten digits (0-9) where each image is labeled with the correct digit. The goal is to train a model that can predict the digit from a new image. Which type of machine learning approach should be used?

A.Regression

B.Classification

C.Clustering

D.Reinforcement learning

AnswerB

Classification is a supervised learning technique used to predict categorical outcomes from labeled data. Recognizing digits fits this approach.

Why this answer

This is a supervised learning problem where the model must predict a discrete class label (digit 0-9) from input images. Classification algorithms, such as logistic regression or neural networks, are designed to map inputs to categorical outputs, making B the correct choice.

Exam trap

The trap here is that candidates may confuse regression with classification when the output is a number (0-9), but regression is for continuous values, not discrete labels, even if the labels are numeric.

How to eliminate wrong answers

Option A is wrong because regression predicts continuous numerical values (e.g., price or temperature), not discrete categories like digits. Option C is wrong because clustering is an unsupervised learning technique that groups unlabeled data based on similarity, but here the dataset has labeled images. Option D is wrong because reinforcement learning involves an agent learning through rewards and penalties in an interactive environment, which is not applicable to static labeled image classification.

Practice this question →

139

MCQmedium

A data scientist trains a regression model to predict house prices. The model performs poorly on both the training data and the test data, showing high error in both sets. Which concept best describes this situation?

A.Overfitting

B.Underfitting

C.Data leakage

D.Feature scaling

AnswerB

Correct. Underfitting means the model is too simplistic to learn the data patterns, causing poor performance on both training and test sets.

Why this answer

Underfitting occurs when a model is too simple to capture the underlying patterns in the data, resulting in high error on both the training and test sets. In this regression scenario, the model fails to learn the relationship between features and house prices, leading to poor performance across all data splits.

Exam trap

The trap here is that candidates confuse underfitting with overfitting because both involve poor performance, but the key distinction is that underfitting shows high error on both training and test sets, while overfitting shows low training error and high test error.

How to eliminate wrong answers

Option A is wrong because overfitting would show low error on training data and high error on test data, not high error on both. Option C is wrong because data leakage involves information from outside the training set influencing the model, which typically causes overly optimistic performance, not uniformly high error. Option D is wrong because feature scaling normalizes input ranges to improve convergence in algorithms like gradient descent, but it does not directly cause high error on both training and test sets.

Practice this question →

140

MCQmedium

What is 'Azure Machine Learning datasets' and why are they important?

A.The raw data files stored in Azure Blob Storage before any processing

B.Versioned, registered data references enabling reproducibility, sharing, and lineage tracking in Azure ML

C.Synthetic datasets automatically generated by Azure ML to supplement small training sets

D.Pre-labelled benchmark datasets provided by Microsoft for testing Azure ML models

AnswerB

Datasets decouple data management from model code — enabling reproducible experiments with tracked, shared, versioned data.

Why this answer

Azure Machine Learning datasets are versioned, registered data references that encapsulate metadata such as location, schema, and creation time, enabling reproducibility, sharing, and lineage tracking across experiments. They do not store the raw data files themselves but provide a pointer to the data source (e.g., Azure Blob Storage, Azure Data Lake), ensuring that every training run uses the exact same data snapshot, which is critical for auditability and collaboration.

Exam trap

The trap here is that candidates confuse a dataset with the raw data files themselves, assuming it is just a storage container, rather than understanding it as a versioned, registered metadata reference that enables reproducibility and lineage.

How to eliminate wrong answers

Option A is wrong because Azure ML datasets are not the raw data files themselves; they are metadata references that point to the data, and the raw files can be stored in various locations, not just Azure Blob Storage. Option C is wrong because Azure ML does not automatically generate synthetic datasets; synthetic data generation would require custom code or third-party tools, and datasets are for referencing existing data. Option D is wrong because Azure ML datasets are user-created references to their own data, not pre-labelled benchmark datasets provided by Microsoft for testing.

Practice this question →

141

MCQmedium

A data scientist is training a regression model to predict house prices in Azure Machine Learning. The model uses features like square footage, number of bedrooms, and location (zip code). The data scientist notices that the model has a very low error on the training data but a high error on the test data. Which technique should the data scientist apply during model training to reduce overfitting by penalizing large coefficients?

A.Use a smaller test set.

B.Apply feature scaling only.

C.Use a regularization algorithm like Lasso (L1).

D.Increase the number of training epochs.

AnswerC

Regularization adds a penalty for large coefficients (L1 shrinkage), which forces some coefficients to zero and reduces model complexity, effectively combating overfitting.

Why this answer

Option C is correct because Lasso (L1) regularization adds a penalty equal to the absolute value of the magnitude of coefficients, which can shrink some coefficients to zero, effectively performing feature selection and reducing overfitting. This directly addresses the problem of large coefficients causing the model to fit noise in the training data, leading to high test error.

Exam trap

The trap here is that candidates often confuse regularization with feature scaling or training duration, not realizing that only regularization directly penalizes large coefficient magnitudes to combat overfitting.

How to eliminate wrong answers

Option A is wrong because using a smaller test set reduces the reliability of the error estimate and does not address overfitting during training; it may even increase variance in the evaluation metric. Option B is wrong because feature scaling only normalizes the range of input features, which helps gradient descent converge but does not penalize large coefficients or reduce overfitting. Option D is wrong because increasing the number of training epochs can lead to further overfitting by allowing the model to memorize the training data more, not reduce it.

Practice this question →

142

MCQmedium

What is 'stochastic gradient descent' (SGD) and how does it work?

A.A random sampling method for selecting training data without replacement

B.An optimisation algorithm that updates weights using gradients computed on random data mini-batches

C.A technique for randomly selecting which model architecture to use for AutoML

D.Randomly descending through decision tree branches to make predictions

AnswerB

SGD computes cheap gradient estimates from mini-batches — trading noise for speed, enabling training on large datasets.

Why this answer

Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating model weights. It computes the gradient of the loss function on a randomly selected mini-batch of training data (not the entire dataset), which introduces noise but significantly speeds up convergence and reduces memory usage. This mini-batch approach is the core of SGD and distinguishes it from batch gradient descent.

Exam trap

The trap here is that candidates confuse 'stochastic' with 'random sampling of data' (Option A) or 'random model selection' (Option C), when in fact SGD's stochasticity refers to using random mini-batches to compute gradients, not random data selection or architecture choice.

How to eliminate wrong answers

Option A is wrong because SGD does not sample training data without replacement; it typically samples mini-batches with replacement (or shuffles the data) to maintain stochasticity, and it is not a sampling method but an optimization algorithm. Option C is wrong because SGD is not used to select model architectures; AutoML uses techniques like Bayesian optimization, grid search, or reinforcement learning for architecture search, not gradient descent. Option D is wrong because SGD is not a decision tree traversal method; decision trees use greedy splitting criteria (e.g., Gini impurity, information gain) to make predictions, not gradient-based weight updates.

Practice this question →

143

MCQmedium

A data scientist trains a regression model to predict daily electricity consumption (in kWh) for a commercial building. The business team needs a metric that heavily penalizes large prediction errors (outliers) more than small errors. Which metric should the data scientist report to best meet this requirement?

A.Mean Absolute Error (MAE)

B.Root Mean Squared Error (RMSE)

C.R-squared

D.Mean Absolute Percentage Error (MAPE)

AnswerB

RMSE squares the errors before averaging, which gives disproportionately higher weight to large errors. This makes it the correct choice when the goal is to penalize outliers more heavily.

Why this answer

Root Mean Squared Error (RMSE) is the correct metric because it squares the residuals before averaging, which disproportionately amplifies the impact of large errors (outliers) compared to small errors. This aligns directly with the business requirement to heavily penalize large prediction errors in the regression model for daily electricity consumption.

Exam trap

The trap here is that candidates often confuse MAE as a robust metric for all error scenarios, but the question explicitly requires heavy penalization of outliers, which only RMSE (or MSE) achieves through squaring errors.

How to eliminate wrong answers

Option A is wrong because Mean Absolute Error (MAE) treats all errors linearly, giving equal weight to small and large errors, so it does not heavily penalize outliers. Option C is wrong because R-squared measures the proportion of variance explained by the model, not the magnitude or penalty of prediction errors, and it does not specifically penalize outliers. Option D is wrong because Mean Absolute Percentage Error (MAPE) uses percentage-based errors, which can be unstable when actual values are near zero and does not inherently square or amplify large errors more than small ones.

Practice this question →

144

MCQmedium

A data scientist is training a classification model on a dataset with 100 features and only 500 labeled samples. The model achieves 99% accuracy on the training data but only 68% accuracy on a held-out test set, indicating overfitting. Which technique is most appropriate to directly address this problem?

A.Increase the amount of training data by collecting more samples

B.Reduce the number of features used for training

C.Increase the complexity of the model by adding more layers

D.Train for more epochs

AnswerB

Reducing the number of features (e.g., via feature selection or PCA) decreases model complexity, making it less likely to overfit. This is a standard regularization technique especially useful when features outnumber samples.

Why this answer

Option B is correct because reducing the number of features directly combats overfitting by decreasing model complexity and the risk of learning noise from irrelevant or redundant features. With only 500 samples and 100 features, the model has a high variance problem; feature selection or dimensionality reduction (e.g., using Azure Machine Learning's Filter-Based Feature Selection or PCA) simplifies the hypothesis space, improving generalization to the test set.

Exam trap

The trap here is that candidates may assume more data (Option A) is always the best fix for overfitting, but the question explicitly tests the ability to choose a technique that directly addresses the high-dimensional, low-sample scenario without requiring additional data collection.

How to eliminate wrong answers

Option A is wrong because collecting more samples is often impractical or impossible in real-world scenarios (e.g., rare event data), and the question asks for the most appropriate technique to directly address overfitting given the existing constraints—not a data collection strategy. Option C is wrong because increasing model complexity (e.g., adding more layers) would exacerbate overfitting by further increasing variance, not reduce it. Option D is wrong because training for more epochs typically leads to overfitting on the training data, as the model continues to memorize noise rather than learning generalizable patterns.

Practice this question →

145

MCQmedium

What is 'label imbalance' in a classification dataset and how does it affect model training?

A.When labels in the training data contain spelling errors

B.When one class greatly outnumbers others, causing models to be biased toward the majority class

C.When training labels are applied inconsistently by different human annotators

D.When a model produces predictions that don't match any of the training labels

AnswerB

Label imbalance makes models ignore rare classes — requiring resampling, class weighting, or better metrics than accuracy.

Why this answer

Option B is correct because label imbalance refers to a situation in classification datasets where one class (the majority class) has significantly more samples than other classes (minority classes). This causes the model to become biased toward predicting the majority class, as it minimizes overall loss by ignoring minority classes, leading to poor generalization and low recall for underrepresented classes.

Exam trap

The trap here is that candidates confuse label imbalance with data quality issues like label noise or annotation errors, leading them to pick options A or C instead of recognizing it as a class distribution problem.

How to eliminate wrong answers

Option A is wrong because spelling errors in labels are a data quality issue, not a class distribution imbalance; they relate to data cleaning, not the relative frequency of classes. Option C is wrong because inconsistent labeling by annotators is an inter-annotator agreement problem, which affects label noise and reliability, not the proportional count of samples per class. Option D is wrong because predictions that don't match training labels describe a model's inability to map to known classes (e.g., out-of-distribution detection), not an imbalance in the training data's class distribution.

Practice this question →

146

MCQhard

A data scientist is using Azure Automated Machine Learning to build a binary classification model for a highly imbalanced dataset (95% negative, 5% positive). The data scientist wants AutoML to select the best model based on a metric that is robust to class imbalance. Which primary metric should the data scientist configure in the AutoML settings?

A.Accuracy

B.AUC_weighted

C.F1_score

D.Log_loss

AnswerB

AUC_weighted calculates the area under the ROC curve and weights it by the prevalence of each class. It is robust to class imbalance and recommended for imbalanced datasets in AutoML.

Why this answer

AUC_weighted is the correct primary metric for imbalanced binary classification because it computes the area under the ROC curve for each class and averages them with weight proportional to the class support. This weighting ensures that the metric reflects performance on both the majority (95% negative) and minority (5% positive) classes, making it robust to severe imbalance. Azure Automated Machine Learning uses AUC_weighted as a recommended metric when the dataset is skewed, as it penalizes models that ignore the minority class.

Exam trap

The trap here is that candidates often choose Accuracy because it is the most intuitive metric, failing to recognize that on imbalanced datasets it can be misleadingly high and does not reflect minority class performance.

How to eliminate wrong answers

Option A is wrong because Accuracy measures the overall proportion of correct predictions, which on a 95% negative / 5% positive dataset will be high even if the model predicts all negatives (trivial 95% accuracy), failing to capture performance on the minority class. Option C is wrong because F1_score is the harmonic mean of precision and recall, but it is typically computed per class and, without weighting, can be dominated by the majority class; Azure AutoML's F1_score macro or micro options are not as robust as AUC_weighted for severe imbalance. Option D is wrong because Log_loss (logarithmic loss) measures the cross-entropy between predicted probabilities and true labels, but it is sensitive to the overall probability calibration and does not inherently account for class imbalance; a model predicting all negatives can still achieve a low log loss if probabilities are well-calibrated for the majority class.

Practice this question →

147

MCQmedium

What is model monitoring in Azure Machine Learning and why is it important?

A.Checking how many API calls the model endpoint receives per hour

B.Tracking model prediction quality and data distribution changes in production to detect degradation

C.Monitoring the GPU memory usage during model training

D.Reviewing model architecture choices for optimization

AnswerB

Model monitoring detects data drift, prediction drift, and performance degradation — enabling timely retraining decisions.

Why this answer

Model monitoring in Azure Machine Learning is the continuous tracking of a deployed model's performance in production, focusing on prediction quality (e.g., accuracy, precision, recall) and data distribution shifts (data drift) to detect degradation over time. This is critical because models can become stale as real-world data evolves, leading to poor business decisions or compliance failures. Azure ML's Model Data Collector and monitoring dashboards automatically capture input data and predictions, alerting data scientists when drift or performance drops below defined thresholds.

Exam trap

The trap here is that candidates confuse operational metrics (like API call count or GPU usage) with model-specific performance monitoring, leading them to pick options that describe infrastructure monitoring rather than model quality tracking.

How to eliminate wrong answers

Option A is wrong because checking API call volume is a metric for endpoint usage or load, not model monitoring; it does not assess prediction quality or data drift. Option B is correct as described. Option C is wrong because monitoring GPU memory during training is part of training infrastructure optimization, not production model monitoring.

Option D is wrong because reviewing model architecture is a design-time activity, not a post-deployment monitoring task.

Practice this question →

148

MCQmedium

What is 'model monitoring' in Azure Machine Learning after deployment?

A.Watching the training loss curve during model training to detect overfitting

B.Tracking deployed model performance and data drift over time to detect degradation

C.A dashboard showing the compute costs of running model inference in production

D.Monitoring the uptime and latency of the model serving endpoint

AnswerB

Model monitoring detects when production data drifts from training distributions — alerting to silent accuracy degradation requiring retraining.

Why this answer

Model monitoring in Azure Machine Learning refers to the ongoing process of tracking a deployed model's performance metrics (such as accuracy or precision) and detecting data drift (changes in input data distribution) or concept drift (changes in the relationship between inputs and outputs) over time. This is critical because models can degrade in production even if they performed well during training, due to shifts in real-world data. Azure ML provides built-in monitoring capabilities, including drift detection and alerting, to ensure models remain reliable.

Exam trap

The trap here is that candidates confuse infrastructure monitoring (uptime/latency) or cost tracking with model-specific monitoring (performance and drift), which is the core focus of 'model monitoring' in Azure ML.

How to eliminate wrong answers

Option A is wrong because watching the training loss curve during model training is part of training diagnostics, not post-deployment monitoring; it detects overfitting during training, not production degradation. Option C is wrong because a dashboard showing compute costs is a cost management feature, not model monitoring; it tracks resource usage, not model performance or data drift. Option D is wrong because monitoring endpoint uptime and latency is infrastructure monitoring (DevOps/MLOps concern), not model monitoring; it ensures availability but does not detect performance degradation or drift in the model's predictions.

Practice this question →

149

MCQhard

What is 'neural architecture search' (NAS) and how does it relate to AutoML?

A.Searching the web for neural network architectures published in research papers

B.Automating the discovery of optimal neural network architectures using computational search

C.Querying a database of pre-built neural networks to find the closest match for a task

D.A legal search process for patenting new AI model architectures

AnswerB

NAS searches the space of possible architectures computationally — finding better network designs than human experts alone.

Why this answer

Neural Architecture Search (NAS) is an automated process that uses computational search methods—such as reinforcement learning, evolutionary algorithms, or gradient-based optimization—to discover optimal neural network architectures for a given task. It is a key component of AutoML because AutoML aims to automate the entire machine learning pipeline, including model selection and hyperparameter tuning, and NAS specifically automates the design of the neural network topology itself.

Exam trap

The trap here is that candidates confuse NAS with simply searching for existing models online or in a database, rather than understanding it as an automated, generative search process that creates new architectures.

How to eliminate wrong answers

Option A is wrong because NAS does not involve searching the web for published research papers; it is a computational search over a defined architecture space, not a web crawl. Option C is wrong because NAS does not query a static database of pre-built networks; it dynamically generates and evaluates candidate architectures during the search process. Option D is wrong because NAS is a technical optimization method, not a legal or patent-related search process.

Practice this question →

150

MCQmedium

A bike-sharing company wants to predict the number of rentals per hour. Their model's predictions are usually close but occasionally have large errors due to unexpected events like sudden rain. They want a metric that heavily penalizes these large errors to ensure the model is not overly confident. Which evaluation metric should they primarily use?

A.Mean Absolute Error (MAE)

B.Mean Squared Error (MSE)

C.Classification Accuracy

D.R-squared

AnswerB

MSE squares each error, so large errors contribute disproportionately to the total. This aligns with the requirement to penalize large errors heavily.

Why this answer

Mean Squared Error (MSE) is the correct choice because it squares the residuals, which heavily penalizes large errors. Since the bike-sharing company wants to discourage occasional large prediction errors (e.g., due to sudden rain), MSE’s quadratic penalty ensures that models with even a few large outliers receive a much worse score, forcing the model to avoid overconfidence.

Exam trap

The trap here is that candidates often choose MAE because it is simpler and more interpretable, but they miss the explicit requirement to 'heavily penalize large errors,' which only MSE (or RMSE) accomplishes through squaring.

How to eliminate wrong answers

Option A is wrong because Mean Absolute Error (MAE) uses absolute differences and does not disproportionately penalize large errors; it treats all errors linearly, so occasional large errors would not be heavily weighted. Option C is wrong because Classification Accuracy is a metric for classification tasks (e.g., predicting categories), not for regression tasks like predicting a continuous number of rentals per hour. Option D is wrong because R-squared measures the proportion of variance explained by the model and does not inherently penalize large errors more than small ones; it can be misleadingly high even with large outliers if the overall variance is large.

Practice this question →