CCNA Mla Model Development Questions — Page 2 of 2

Multi-Selecteasy

A company wants to use SageMaker Clarify to analyze bias in their training data and model predictions. Which TWO types of bias can Clarify detect? (Choose TWO.)

Select 2 answers

A.Algorithmic bias

B.Pre-training bias

C.Inference bias

D.Deployment bias

E.Post-training bias

AnswersB, E

Clarify analyzes data for bias before training.

Why this answer

SageMaker Clarify can detect pre-training bias (in the data) and post-training bias (in the model predictions).

Practice this question →

MCQmedium

A data scientist is using SageMaker Automatic Model Tuning to optimize hyperparameters for an XGBoost model. They want to maximize AUC. Which search strategy is MOST appropriate for efficient exploration?

A.Random search

B.Grid search

C.Bayesian optimization

D.Hyperband

AnswerC

Practice this question →

MCQmedium

A company is training a large computer vision model using SageMaker. The training dataset is 500 GB and the model has 1 billion parameters. The team needs to minimize training time. Which distributed training strategy should they use?

A.Pipeline parallelism

B.Sharded data parallelism

C.Model parallelism

D.Data parallelism

AnswerC

Model parallelism partitions the model layers across GPUs, enabling training of large models that don't fit on one GPU.

Why this answer

Model parallelism splits the model layers across multiple GPUs, which is necessary when the model is too large to fit on a single GPU. Data parallelism replicates the model on each GPU and splits the data, but is limited by the memory of a single GPU.

Practice this question →

Multi-Selectmedium

A company is training a large NLP model on SageMaker and wants to reduce costs by using Spot Instances. Which TWO configurations should they implement to handle Spot interruptions gracefully?

Select 2 answers

A.Use a single large instance to reduce interruption probability

B.Set `use_spot_instances=True` and `max_wait` in the estimator

C.Increase the `max_run` parameter to allow longer training

D.Use `keep_alive_period` to keep the instance alive after training

E.Enable checkpointing to save model state periodically

AnswersB, E

Managed Spot Training automatically handles interruptions and relaunches jobs.

Why this answer

Checkpointing saves progress so training can resume from the last checkpoint. Managed Spot Training with `use_spot_instances=True` automates handling of interruptions. Using a single instance or increasing max runtime does not handle interruptions; `keep_alive_period` is for persistent notebooks, not training.

Practice this question →

MCQmedium

A company is using SageMaker Automatic Model Tuning to optimize a regression model. They want to minimize the root mean squared error (RMSE). The tuner has completed 20 jobs, and the RMSE has plateaued. Which action should the data scientist take to potentially improve the results?

A.Increase the maximum number of training jobs

B.Increase the number of parallel training jobs

C.Decrease the range of hyperparameters to focus on promising areas

D.Switch the objective metric to mean absolute error (MAE)

AnswerC

Narrowing the search space concentrates trials in regions that previously yielded lower RMSE, potentially finding better values.

Why this answer

Reducing the search space can help the tuner focus on more promising regions. Increasing parallelism or max jobs may explore the same plateau, while switching to a different algorithm altogether might not be necessary.

Practice this question →

Multi-Selectmedium

A data scientist wants to fine-tune a Llama 2 7B model using SageMaker for a text summarization task. The dataset is 10 GB. The budget is limited, so cost efficiency is important. Which THREE steps should the data scientist take? (Choose THREE.)

Select 3 answers

A.Use SageMaker Debugger to reduce training time

B.Use the SageMaker built-in BlazingText algorithm

C.Use LoRA to reduce the number of trainable parameters

D.Use managed spot training

E.Use the SageMaker HuggingFace estimator

AnswersC, D, E

LoRA enables efficient fine-tuning with much lower memory requirements.

Why this answer

LoRA reduces trainable parameters, enabling fine-tuning on smaller instances. HuggingFace estimator is the standard for HF models. Spot instances reduce cost.

DeepSpeed ZeRO-3 is for large models but not necessary with LoRA. BYOC is overkill.

Practice this question →

Multi-Selectmedium

A data scientist is using SageMaker Experiments to track multiple training runs for a PyTorch model. They want to compare metrics across runs and identify the best hyperparameters. Which TWO capabilities should they use? (Choose TWO.)

Select 2 answers

A.SageMaker Experiments list and search API to query runs by metric

B.SageMaker SDK's experiment logging capabilities

C.SageMaker Autopilot

D.SageMaker Clarify

E.SageMaker Model Monitor

AnswersA, B

The list and search API allows filtering and comparing runs based on metrics.

Why this answer

SageMaker Experiments automatically tracks hyperparameters and metrics. The SDK allows logging custom metrics. The Experiments list and search interface can compare runs.

Autopilot is for AutoML, not for custom PyTorch. Model Monitor is for deployed models.

Practice this question →

Multi-Selectmedium

A team is using SageMaker Automatic Model Tuning to optimize hyperparameters for an XGBoost model. They want to find the best configuration as quickly as possible, with a maximum of 50 training jobs. Which TWO strategies should they choose? (Choose TWO.)

Select 2 answers

A.Use the same objective metric but with different strategies

B.Use Hyperband with early stopping

C.Use random search

D.Use grid search

E.Use Bayesian optimization

AnswersB, E

Hyperband allocates resources to promising configurations and stops poor ones early, efficient for many jobs.

Why this answer

Bayesian optimization is efficient for few jobs. Hyperband can be more efficient but early stopping might miss good configurations. Random search is less efficient.

Grid search is too exhaustive.

Practice this question →

MCQeasy

A data scientist wants to train a binary classification model using Amazon SageMaker. The dataset has 10,000 rows and 50 features. Which SageMaker built-in algorithm is MOST appropriate for this task?

A.XGBoost

B.DeepAR

C.K-Means

D.Linear Learner

AnswerA

XGBoost is a gradient boosting algorithm that works well for classification and regression on tabular data.

Why this answer

XGBoost is a popular algorithm for classification and regression tasks. Linear Learner is more suited for linear models, K-Means is for clustering, and DeepAR is for time series forecasting.

Practice this question →

MCQeasy

A data scientist is using SageMaker built-in XGBoost algorithm for a regression problem. Which metric is most appropriate as the objective metric for hyperparameter tuning?

A.NDCG

B.RMSE

C.AUC

D.F1

AnswerB

RMSE is appropriate for regression tasks.

Why this answer

For regression tasks, RMSE is a common objective metric. AUC is for classification, F1 is for classification, and NDCG is for ranking.

Practice this question →

Multi-Selecteasy

A company uses SageMaker Autopilot to build a regression model predicting house prices. After the experiment completes, the company wants to understand why the model makes certain predictions. Which TWO SageMaker features can provide this explainability? (Choose TWO.)

Select 2 answers

A.SageMaker Clarify

B.SageMaker Autopilot explainability report

C.SageMaker Model Monitor

D.SageMaker Debugger

E.SageMaker Experiments

AnswersA, B

Clarify provides feature importance and SHAP values for model explainability.

Why this answer

SageMaker Autopilot automatically generates explainability reports. SageMaker Clarify can be used separately for additional analysis. Model Monitor is for drift detection, not explainability.

Debugger is for debugging training. Experiments is for tracking.

Practice this question →

MCQmedium

A company is using SageMaker to train a model for image classification. The training dataset contains 100,000 labeled images. The team wants to use a pre-trained model to reduce training time. Which SageMaker feature should they use?

A.SageMaker Debugger

B.SageMaker Model Monitor

C.SageMaker built-in Image Classification algorithm

D.SageMaker JumpStart

AnswerD

JumpStart offers pre-trained models for transfer learning.

Why this answer

SageMaker JumpStart provides pre-trained models that can be fine-tuned on custom datasets, reducing training time and data requirements.

Practice this question →

Multi-Selecthard

An ML engineer is fine-tuning a foundation model using RLHF on SageMaker. Which THREE components are essential for this workflow? (Select THREE.)

Select 3 answers

A.A reward model trained on the preference data

B.A large validation dataset for final evaluation

C.The PPO (Proximal Policy Optimization) algorithm for model updates

D.A preference dataset with human rankings

E.A PEFT technique like LoRA

AnswersA, C, D

The reward model scores outputs for the PPO algorithm.

Why this answer

RLHF requires a preference dataset for human feedback, a reward model trained on that data, and the PPO algorithm to update the foundation model. The PEFT technique (like LoRA) is often used to make fine-tuning efficient, but it is not strictly essential for RLHF; however, it is commonly used. The base foundation model is required.

A validation dataset is needed but not specific to RLHF.

Practice this question →

MCQeasy

A team wants to fine-tune a pre-trained Hugging Face transformer model for text classification using SageMaker. They have a custom training script. Which SageMaker estimator should they use?

A.SageMaker generic estimator with a custom container

B.SageMaker Hugging Face estimator

C.SageMaker PyTorch estimator

D.SageMaker TensorFlow estimator

AnswerB

The Hugging Face estimator is specifically designed for Hugging Face models, managing the Transformers library and tokenizers.

Why this answer

The Hugging Face estimator is the recommended way to run Hugging Face models on SageMaker, as it automatically handles the environment and dependencies.

Practice this question →

MCQeasy

A data scientist is using SageMaker built-in XGBoost algorithm for a binary classification task. Which objective metric is MOST appropriate for SageMaker Automatic Model Tuning to maximize?

A.validation:mae

B.validation:rmse

C.validation:ndcg

D.validation:auc

AnswerD

AUC is a common binary classification metric and is available in XGBoost.

Practice this question →

MCQhard

A team is fine-tuning a foundation model using LoRA for a text summarization task. They want to reduce memory footprint during training. Which technique should they combine with LoRA?

A.Data parallelism

B.Gradient checkpointing

C.Mixed precision

D.QLoRA

AnswerD

Practice this question →

Multi-Selecthard

A data scientist is using SageMaker Experiments to track multiple training runs. They want to compare runs based on the objective metric and visualize performance. Which THREE steps should they perform? (Choose THREE.)

Select 3 answers

A.Deploy the best model to an endpoint

B.Use SageMaker Studio Experiments UI to list and compare trials

C.Log hyperparameters and metrics using the SageMaker SDK

D.Create a SageMaker Experiment

E.Enable SageMaker Model Monitor for each run

AnswersB, C, D

The UI provides visualization and comparison.

Why this answer

To track and compare runs, you create an experiment, log parameters and metrics, and then use the Experiments UI or SDK to list and compare trials.

Practice this question →

MCQeasy

A data scientist wants to quickly build a binary classification model without writing any code. Which SageMaker feature is MOST suitable?

A.SageMaker Debugger

B.SageMaker Model Monitor

C.SageMaker Ground Truth

D.SageMaker Autopilot

AnswerD

Practice this question →

MCQhard

A machine learning engineer is using SageMaker Debugger to detect if a neural network has dead ReLU units during training. Which built-in rule should they enable?

A.DeadRelu

B.Overfit

C.ExplodingGradients

D.LossNotDecreasing

AnswerA

DeadRelu rule specifically detects dead ReLU units.

Why this answer

The 'DeadRelu' rule in Debugger monitors the fraction of ReLU activations that are zero and alerts if too many neurons are dead.

Practice this question →

MCQmedium

A company is fine-tuning a large language model using LoRA with a Hugging Face estimator in SageMaker. They want to reduce memory usage during training. Which instance type is most cost-effective for this workload?

A.ml.p4d.24xlarge

B.ml.g5.xlarge

C.ml.c5.2xlarge

D.ml.trn1.2xlarge

AnswerB

G5 instances are cost-effective for fine-tuning with LoRA, providing good performance at lower cost.

Why this answer

LoRA reduces the number of trainable parameters, allowing training on smaller GPUs. ml.g5 instances are optimized for machine learning inference and training with a good price-performance for fine-tuning.

Practice this question →

MCQeasy

Which SageMaker feature automatically generates model cards, feature importance, and bias reports without requiring manual coding?

A.SageMaker Autopilot

B.SageMaker Experiments

C.SageMaker Clarify

D.SageMaker Model Monitor

AnswerA

Autopilot automatically creates model cards, feature importance, and bias reports.

Why this answer

SageMaker Clarify provides bias detection and feature importance, and it can generate reports. SageMaker Autopilot generates model cards and explanations. SageMaker Experiments tracks experiments.

SageMaker Model Monitor is for monitoring. Autopilot is the correct answer because it automates the entire pipeline including model cards and explanations.

Practice this question →

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Fine-tune a base LLM on the policy documents monthly

C.Train a custom model from scratch on the policy documents each month

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

MCQhard

A data scientist is training a model using SageMaker and wants to use spot instances to reduce costs. The training job is checkpointed every 5 minutes. However, the job gets interrupted frequently and never completes. What is the MOST likely cause?

A.The checkpoint interval is too long relative to the interruption frequency

B.The checkpoint S3 URI is incorrect

C.The instance type is too small for the training job

D.The job is configured with too few max retries

AnswerA

If interruptions occur more often than checkpoints, progress is lost and job may never complete.

Why this answer

Spot instances can be reclaimed with little notice. If the job checkpoint interval is longer than the average interruption notice, progress may be lost. Using a smaller instance type reduces cost but not interruption frequency.

Incorrect checkpoint path causes save failures. Too few max retries causes job to stop after few interruptions.

Practice this question →

Multi-Selectmedium

A machine learning engineer is using SageMaker Autopilot for AutoML. Which TWO outputs does Autopilot produce?

Select 2 answers

A.A hyperparameter tuning job summary

B.An ensemble of candidate models

C.A data labeling pipeline

D.A single optimal model

E.An explainability report

AnswersB, E

Practice this question →

100

MCQmedium

A team is fine-tuning a Hugging Face BERT model for text classification using SageMaker. They want to use the Hugging Face estimator for convenience. Which parameter must be set to use a custom training script?

A.framework_version

B.instance_type

C.hyperparameters

D.entry_point

AnswerD

entry_point points to the custom training script.

Why this answer

The entry_point parameter specifies the path to the training script. The instance_type is for hardware selection. The hyperparameters dictionary passes parameters to the script.

The framework_version specifies the Hugging Face version.

Practice this question →

101

MCQmedium

A machine learning engineer is training a TensorFlow model using SageMaker with distributed training. They need to implement data parallelism across multiple GPUs. Which SageMaker feature should they use to distribute the training?

A.SageMaker Distributed Data Parallelism

B.SageMaker Automatic Model Tuning

C.SageMaker Debugger

D.SageMaker Model Parallelism

AnswerA

This library implements data parallelism for SageMaker training.

Why this answer

SageMaker's distributed data parallelism library handles splitting data across GPUs and synchronizing gradients, optimized for TensorFlow and PyTorch.

Practice this question →

102

MCQmedium

A team is fine-tuning a large language model (LLM) using SageMaker and wants to reduce memory footprint during training. Which technique should they use?

A.Use LoRA (Low-Rank Adaptation) with fp32 precision

B.Use QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization

C.Use SageMaker Model Parallelism with tensor parallelism

D.Full fine-tuning on a p3.16xlarge instance

AnswerB

QLoRA uses 4-bit quantization to drastically lower memory usage while preserving performance.

Why this answer

QLoRA (Quantized Low-Rank Adaptation) combines 4-bit quantization with low-rank adapters, significantly reducing GPU memory usage while maintaining model quality.

Practice this question →

103

MCQeasy

A data scientist is using SageMaker Automatic Model Tuning to find the best hyperparameters for a model. They want to reduce the total tuning time for a given number of training jobs. Which tuning strategy should they choose?

A.Hyperband

B.Grid search

C.Random search

D.Bayesian optimization

AnswerA

Hyperband uses early stopping to prune bad trials, reducing total tuning time for the same number of jobs.

Why this answer

Hyperband is an early stopping strategy that allocates resources to promising configurations and stops poor performers early, reducing total tuning time compared to random search or Bayesian optimization without early stopping.

Practice this question →

104

Multi-Selectmedium

A data scientist is evaluating a binary classification model for loan default prediction. Which THREE metrics should they consider to thoroughly assess model performance, especially for imbalanced classes?

Select 3 answers

A.R²

B.Recall

C.RMSE

D.F1 score

E.AUC

AnswersB, D, E

Recall (true positive rate) is critical for default prediction to identify as many defaults as possible.

Why this answer

For imbalanced classification, accuracy can be misleading. AUC (Area under ROC curve) is robust to imbalance, F1 balances precision and recall, and recall (true positive rate) is important to catch defaults. RMSE and R² are for regression, NDCG is for ranking.

Practice this question →

105

MCQmedium

A data scientist is using SageMaker Automatic Model Tuning with Hyperband. They want to stop poorly performing trials early to save resources. Which strategy does Hyperband use?

A.Grid search

B.Random search

C.Successive Halving

D.Bayesian optimization

AnswerC

Hyperband uses Successive Halving to allocate more resources to promising trials.

Why this answer

Hyperband uses early stopping by allocating resources to promising configurations and stopping poorly performing ones. Bayesian optimization uses acquisition functions. Random search does not early stop.

Grid search exhaustively evaluates all combinations.

Practice this question →

106

Multi-Selectmedium

A data scientist is training a large language model using SageMaker and wants to reduce training costs. The training job is expected to run for several days. Which TWO actions should the data scientist take to minimize costs? (Choose TWO.)

Select 2 answers

A.Enable managed spot training

B.Use the most powerful GPU instance available to finish faster

C.Select ml.g5.xlarge instead of ml.p3.2xlarge

D.Increase the number of instances to reduce time

E.Disable checkpointing to save storage costs

AnswersA, C

Spot instances are much cheaper than on-demand; SageMaker managed spot automatically handles interruptions.

Why this answer

Using spot instances can save up to 90% compared to on-demand. Choosing a cheaper instance type like ml.g5 reduces cost. Managed spot training in SageMaker handles interruptions automatically.

GPU instances are not always necessary; the cheapest instance that meets requirements should be selected. Checkpointing is needed for spot instance resilience.

Practice this question →

107

MCQeasy

A data scientist uses SageMaker Experiments to track hyperparameters and metrics. Which component is used to organize related trials?

A.Experiment

B.Artifact

C.Trial component

D.Trial

AnswerA

An experiment contains multiple trials (runs) that share a common goal.

Practice this question →

108

MCQmedium

A company wants to use SageMaker Autopilot for a regression problem. They require an explainability report that shows feature importance globally. Which Autopilot feature should they enable?

A.AutoML candidate generation

B.Ensembling mode

C.Hyperparameter optimization

D.Explainability report generation

AnswerD

Autopilot can generate explainability reports including global feature importance.

Practice this question →

109

MCQeasy

A machine learning engineer needs to reduce costs when training a large model on SageMaker. They are willing to accept potential interruptions and have checkpointing enabled. Which instance purchasing option should they use?

A.Spot instances

B.Reserved instances

C.Dedicated hosts

D.On-demand instances

AnswerA

Spot instances offer large discounts but can be interrupted; with checkpointing, training can resume, saving costs.

Why this answer

Spot instances offer significant cost savings (up to 60-90%) compared to on-demand, but can be reclaimed by AWS with a 2-minute notice. Checkpointing allows resuming training from the last saved state, making spot instances suitable.

Practice this question →

110

Multi-Selecthard

A machine learning engineer is evaluating a binary classification model that predicts customer churn. The model achieves 95% accuracy, but the engineer suspects class imbalance is causing a misleading metric. Which THREE evaluation steps should the engineer perform to properly assess the model? (Choose THREE.)

Select 3 answers

A.Calculate RMSE

B.Calculate precision, recall, and F1-score

C.Compute Mean Absolute Error (MAE)

D.Plot the ROC curve and compute AUC

E.Compute the confusion matrix

AnswersB, D, E

Precision, recall, and F1 are class-imbalance-aware metrics.

Why this answer

Accuracy is misleading for imbalanced datasets. Confusion matrix, precision/recall/F1, and AUC-ROC are robust metrics. RMSE is for regression.

MAE is also for regression.

Practice this question →

111

MCQmedium

A company uses SageMaker Experiments to track training runs. They want to compare different hyperparameter configurations and identify the best run. Which SageMaker Experiments component should they use to organize related runs?

A.Trial

B.Experiment + Trial

C.Experiment

D.Trial Component

AnswerB

Practice this question →

112

MCQmedium

A data scientist is training an object detection model using SageMaker built-in Object Detection algorithm. They want to visualize the bounding boxes on validation images after training. Which approach should they use?

A.Use SageMaker Debugger to capture output tensors

B.Write a custom inference script that saves images with bounding boxes

C.Enable SageMaker Model Monitor

D.Use SageMaker Clarify

AnswerB

A custom script can run inference and save annotated images.

Practice this question →

113

Multi-Selectmedium

A data scientist wants to bring a custom PyTorch model to SageMaker. Which THREE methods are valid?

Select 3 answers

A.Use SageMaker Autopilot

B.Use the built-in Image Classification algorithm

C.Use Script mode with the PyTorch Estimator

D.Create a custom Docker container and use the BYOC framework

E.Use the PyTorch Estimator with a script

AnswersC, D, E

Practice this question →

114

MCQhard

An ML engineer is fine-tuning a large language model using LoRA on SageMaker. The training is converging slowly, and GPU utilization is low. The engineer suspects the bottleneck is data loading. Which action should the engineer take to improve GPU utilization?

A.Increase the batch size to maximize GPU memory usage

B.Enable checkpointing and use spot instances

C.Use SageMaker Pipe mode to stream data from S3 directly to the training instances

D.Reduce model parallelism to decrease communication overhead

AnswerC

Pipe mode reduces I/O latency by streaming data, which can improve GPU utilization.

Why this answer

Low GPU utilization during training is often due to a data pipeline bottleneck. Using SageMaker Pipe mode streams data directly from S3, reducing I/O wait times. Increasing batch size may improve utilization but can cause OOM.

Using spot instances and saving checkpoints helps with interruptions but not utilization. Reducing model parallelism may help if communication is the bottleneck, but the scenario suggests data loading.

Practice this question →

115

MCQmedium

A team is training a large language model and needs to split the model layers across multiple GPUs due to memory constraints. Which distributed training strategy should they use?

A.Data parallelism

B.Hyperparameter tuning

C.Autopilot

D.Model parallelism

AnswerD

Practice this question →

116

MCQmedium

A data scientist is training a linear learner model using SageMaker and notices that the loss is not decreasing. They suspect the issue is exploding gradients. Which SageMaker Debugger rule should they enable to monitor this?

A.LossNotDecreasing

B.ExplodingGradients

C.VanishingGradients

D.Overfit

AnswerB

Monitors for gradients that become excessively large.

Why this answer

The ExplodingGradients rule tracks gradient values and alerts if they exceed a threshold, which is the correct detection for exploding gradients.

Practice this question →

117

MCQmedium

A data scientist is training an XGBoost model on a large dataset using a SageMaker Training Job. They want to minimize costs without sacrificing model performance. Which instance type and training strategy should they choose?

A.Use a single ml.g4dn.xlarge Spot instance with no distributed training

B.Use a single ml.m5.large On-Demand instance with model parallelism

C.Use multiple ml.trn1.2xlarge On-Demand instances with data parallelism

D.Use a single ml.p3.2xlarge On-Demand instance with data parallelism

AnswerA

Spot instances drastically reduce cost; single instance avoids parallelism overhead for XGBoost.

Why this answer

Using Spot instances with Managed Spot Training can reduce costs by up to 90% compared to On-Demand, and SageMaker automatically handles interruptions. For single-instance training, a single ml.g4dn.xlarge provides sufficient compute for moderate-sized datasets.

Practice this question →

118

MCQmedium

A team is training a large deep learning model on SageMaker using a single ml.p3.16xlarge instance. Training is taking too long. They want to reduce time by distributing across multiple GPUs but are constrained by model size that does not fit in a single GPU memory. Which distributed training strategy should they use?

A.Data parallelism using SageMaker distributed data parallelism

B.Switch to a smaller instance type and use horizontal scaling

C.Use multiple training jobs with hyperparameter tuning

D.Model parallelism using SageMaker distributed model parallelism

AnswerD

Model parallelism partitions the model layers across GPUs, allowing training of models that exceed single GPU memory.

Why this answer

Model parallelism splits the model across multiple GPUs, which is needed when the model does not fit in a single GPU. Data parallelism replicates the model on each GPU and splits data, which requires the model to fit in each GPU's memory.

Practice this question →

119

MCQeasy

Which SageMaker feature provides AutoML capabilities, including automatic data preprocessing, model selection, and hyperparameter tuning?

A.SageMaker Data Wrangler

B.SageMaker Automatic Model Tuning

C.SageMaker Autopilot

D.SageMaker Experiments

AnswerC

Autopilot automates the entire ML workflow.

Why this answer

SageMaker Autopilot automates the ML pipeline from data to model, including preprocessing, algorithm selection, and tuning.

Practice this question →

120

MCQeasy

Which SageMaker built-in algorithm is designed for time series forecasting?

A.Linear Learner

B.Factorisation Machines

C.DeepAR

D.BlazingText

AnswerC

Practice this question →

121

Multi-Selectmedium

A data scientist is evaluating a binary classification model. They have the confusion matrix and want to assess the model's performance comprehensively. Which THREE metrics should they consider? (Select THREE.)

Select 3 answers

A.Precision

B.RMSE

C.Recall

D.F1 score

E.R²

AnswersA, C, D

Precision measures the accuracy of positive predictions.

Practice this question →

122

MCQmedium

A machine learning engineer is training a model using SageMaker and wants to set up monitoring to detect if gradients become too large, which could destabilize training. Which SageMaker Debugger built-in rule should they enable?

A.DeadRelu

B.LossNotDecreasing

C.Overfit

D.ExplodingGradients

AnswerD

ExplodingGradients rule detects when gradients become too large.

Why this answer

Debugger's built-in rule 'ExplodingGradients' monitors gradient norms and alerts if they exceed a threshold, helping to stabilize training.

Practice this question →

123

MCQmedium

A data scientist needs to evaluate a binary classification model. The dataset is highly imbalanced (5% positive class). Which metric is MOST appropriate for assessing model performance?

A.Precision

B.Accuracy

C.Recall

D.AUC

AnswerD

AUC measures ranking quality and is insensitive to class imbalance.

Why this answer

AUC (Area Under the ROC Curve) is robust to class imbalance as it evaluates the model's ability to rank positive vs negative examples. Precision, recall, and F1 can be misleading if not threshold-optimized.

Practice this question →

124

MCQeasy

A data scientist wants to use SageMaker Autopilot to automatically build a regression model. The dataset contains 200 features and 50,000 rows. Which output does SageMaker Autopilot provide?

A.Only the best model without any metrics

B.A leaderboard of candidate models with metrics and explainability reports

C.A single optimal model with no further tuning

D.A Python script for manual training

AnswerB

Autopilot generates a leaderboard and can produce explainability reports.

Why this answer

SageMaker Autopilot automatically explores various algorithms and preprocessing steps, then provides a leaderboard of candidate models with metrics.

Practice this question →

125

Multi-Selectmedium

A machine learning engineer is deploying a custom PyTorch model using SageMaker script mode. The training script requires specific dependencies not included in the default PyTorch container. Which TWO actions can the engineer take to ensure the dependencies are available? (Select TWO.)

Select 2 answers

A.Build a custom container that extends the SageMaker PyTorch container and push it to Amazon ECR

B.Include a requirements.txt file in the source directory

C.Use a lifecycle configuration to install dependencies

D.Specify a custom Docker image in the PyTorch estimator

E.Add the dependencies to the estimator's source_dir argument as a separate container

AnswersA, B

Extending the container is a valid approach for additional dependencies.

Why this answer

Option A is correct: requirements.txt in source_dir automatically installs dependencies. Option E is correct: extending the container via Dockerfile. Option B is incorrect because the PyTorch estimator does not accept a custom image directly; that's for BYOC.

Option C is incorrect because source_dir is for code, not a container. Option D is incorrect because SageMaker does not allow apt-get in requirements.txt.

Practice this question →

126

MCQmedium

A data scientist is using SageMaker to train an XGBoost model for a regression problem. After training, they evaluate the model on a test set and get an RMSE of 10 and an R² of 0.85. Which additional metric would give the MOST insight into the model's average prediction error magnitude?

A.AUC

B.Confusion matrix

C.F1 score

D.Mean Absolute Error (MAE)

AnswerD

MAE provides the average absolute difference between predictions and actuals, directly indicating average error magnitude.

Why this answer

MAE (Mean Absolute Error) gives the average absolute prediction error, which is easy to interpret in the same units as the target. RMSE gives a similar but squared metric, and R² indicates variance explained, but MAE directly answers the average error magnitude.

Practice this question →

127

MCQhard

A company is training a large Transformer model on SageMaker and wants to use model parallelism to fit the model into memory. The model has 10 billion parameters. Which instance type is MOST cost-effective for this task while supporting SageMaker's model parallelism?

A.ml.trn1.32xlarge

B.ml.c5.18xlarge

C.ml.g4dn.12xlarge

D.ml.p4d.24xlarge

AnswerD

P4d instances have high GPU memory and support model parallelism for large models.

Why this answer

The ml.p4d.24xlarge instances are optimized for large-scale distributed training with high memory and support SageMaker's model parallelism. ml.trn1 instances are designed for training with AWS Trainium, but they use a different chip architecture and may require specific SDKs. ml.g4dn instances are for inference and light training. ml.c5 instances are compute-optimized but lack GPU memory for large models.

Practice this question →

128

MCQmedium

A data scientist wants to fine-tune a large language model for a question-answering task. They want to reduce memory usage during training by using a low-rank approximation of the weight updates. Which technique should they use?

A.Full fine-tuning

B.Instruction tuning

C.LoRA

D.RLHF

AnswerC

LoRA uses low-rank decomposition to update weights efficiently, reducing memory usage.

Why this answer

LoRA (Low-Rank Adaptation) adds low-rank matrices to model weights, significantly reducing memory footprint while achieving competitive performance. QLoRA adds quantization for further reduction.

Practice this question →