CCNA Mla Model Development Questions — Page 1 of 2

MCQmedium

A fraud detection model is being trained on imbalanced data. The team wants to ensure the model's precision is optimized. Which objective metric should be used in automatic model tuning?

A.F1

B.AUC

C.Precision

D.Recall

AnswerC

Practice this question →

MCQmedium

A team is training a PyTorch model using SageMaker. They have a custom training script that requires specific Python packages not included in the SageMaker default PyTorch container. Which approach should they use?

A.Use the built-in PyTorch estimator and specify a requirements.txt in the source directory

B.Build a custom Docker container from scratch and push it to Amazon ECR

C.Use the SageMaker XGBoost estimator and modify the script to use PyTorch

D.Use SageMaker Autopilot to automatically handle dependencies

AnswerA

SageMaker automatically installs packages listed in requirements.txt in the source directory.

Why this answer

Using a SageMaker PyTorch estimator with a requirements.txt file allows installing additional packages on top of the official container. This is simpler than building a custom container.

Practice this question →

MCQmedium

A data scientist is using SageMaker Autopilot to automatically build a binary classification model on a balanced dataset. They want to understand the relationship between the input features and the model predictions. Which feature in SageMaker Autopilot should they use?

A.Explainability reports

B.Data visualizations

C.Model tuning results

D.Model candidate definitions

AnswerA

Explainability reports provide feature importance and SHAP values, showing how features impact predictions.

Why this answer

SageMaker Autopilot generates explainability reports, including feature importance and model insights, via the 'Explainability' feature. This provides the relationship between features and predictions.

Practice this question →

MCQmedium

A company uses SageMaker Clarify to detect bias during training. They want to ensure that the trained model does not rely on a sensitive attribute like gender. Which Clarify feature should they configure?

A.Clarify bias config with post-training bias metrics

B.Clarify with SageMaker Model Monitor

C.SHAP analysis

D.Clarify processing job with pre-training bias metrics

E.Bias report generation

AnswerA

Post-training bias metrics can be configured to check for bias in model predictions.

Practice this question →

MCQmedium

A team is training a PyTorch model using SageMaker with a custom training script. They want to track hyperparameters and metrics across multiple experiments. Which service should they use?

A.SageMaker Clarify

B.SageMaker Experiments

C.SageMaker Model Monitor

D.SageMaker Debugger

AnswerB

Experiments is designed to track and compare machine learning runs.

Why this answer

SageMaker Experiments is the native service for tracking machine learning experiments, including hyperparameters and metrics. SageMaker Debugger is for debugging training jobs. SageMaker Model Monitor is for inference monitoring.

SageMaker Clarify is for bias analysis.

Practice this question →

Multi-Selectmedium

A company is using SageMaker built-in XGBoost algorithm for a multiclass classification problem. They want to evaluate the model's performance. Which TWO metrics are appropriate for multiclass classification? (Select TWO.)

Select 2 answers

A.RMSE

B.Precision (macro or weighted)

C.Recall (macro or weighted)

D.NDCG (Normalized Discounted Cumulative Gain)

E.AUC (Area Under the ROC Curve)

AnswersB, C

Precision can be averaged across classes.

Why this answer

Option A (Precision) and Option D (Recall) are applicable for multiclass by averaging. Option B (RMSE) is regression. Option C (AUC) is binary classification.

Option E (NDCG) is ranking.

Practice this question →

MCQhard

A company uses SageMaker Autopilot to build a binary classification model. The generated leaderboard shows an ensemble model as the best candidate. The team needs a model that can be deployed for real-time inference with latency < 10ms. What should they do?

A.Use SageMaker Inference Recommender to profile the ensemble model and optimize it

B.Deploy the ensemble model as a SageMaker endpoint; ensemble models are optimized for low latency

C.Retrain the ensemble model with fewer base estimators using a custom container

D.Select the best single model from the leaderboard (non-ensemble candidate) and deploy it

AnswerD

Single models usually have lower latency; evaluate if it meets the latency requirement.

Why this answer

Autopilot ensemble models may have high latency due to combining multiple models. The team should evaluate non-ensemble candidates (single model) from the leaderboard to meet latency requirements.

Practice this question →

MCQhard

A company is using SageMaker to train a model with a custom container. The training script requires a specific version of a Python library that is not included in the default SageMaker containers. How should they provide this library?

A.Use SageMaker Script Mode and specify the library in a requirements.txt

B.Use SageMaker's lifecycle configuration to install the library on the training instance

C.Use pip install in the training script before model training

D.Extend a SageMaker framework container and install the library using a Dockerfile

AnswerD

Extending a SageMaker container via Dockerfile allows you to add the required library, then push to ECR.

Why this answer

Using a custom container (BYOC) allows bundling all dependencies, including specific library versions, into a Docker image that SageMaker can run.

Practice this question →

Multi-Selectmedium

A data scientist is using SageMaker Autopilot for a regression problem. They want to see which data preprocessing steps Autopilot applied. Which TWO sources can they use to find this information?

Select 1 answer

A.Candidate definition notebook

B.Model leaderboard

C.Autopilot job description in AWS CloudTrail

D.Data exploration report

E.Explainability report

AnswersA

The candidate definition notebook includes code for all preprocessing steps.

Why this answer

The candidate definition notebook contains the generated code for data preprocessing and model training. The data exploration report includes statistics but not the exact preprocessing steps. The model leaderboard only shows metrics.

The explainability report shows feature importance. The Autopilot job description in CloudTrail shows API calls but not the steps.

Practice this question →

MCQhard

A data scientist is using SageMaker built-in Image Classification algorithm on a dataset with 1000 classes. The training is very slow. They want to speed it up without sacrificing accuracy. Which instance type and training configuration is MOST appropriate?

A.Use ml.trn1.32xlarge instances with data parallelism

B.Use ml.m5.24xlarge instances with data parallelism

C.Use ml.c5.18xlarge instances with model parallelism

D.Use ml.p3.16xlarge instances with data parallelism

AnswerD

ml.p3.16xlarge has powerful GPUs and data parallelism can speed up training.

Why this answer

For image classification, GPU instances like ml.p3 or ml.g4dn are suitable. ml.p3.16xlarge provides 8 V100 GPUs. ml.m5 is CPU only. ml.c5 is CPU. ml.trn1 is for training, but for this built-in algorithm, GPU instances are standard.

Practice this question →

MCQmedium

A team is training a large language model on SageMaker using PyTorch with data parallelism. The model is too large to fit on a single GPU. Which distributed training strategy should they use to split the model across multiple GPUs?

A.Model parallelism

B.Tensor parallelism

C.Data parallelism

D.Pipeline parallelism

AnswerA

Model parallelism partitions the model across GPUs, allowing training of models that exceed single GPU memory.

Why this answer

Model parallelism splits the model itself across devices, which is necessary when the model is too large for one GPU. SageMaker's model parallelism library supports this.

Practice this question →

MCQmedium

A team is fine-tuning a foundation model using LoRA. They want to reduce memory usage during training. Which technique should they combine LoRA with to further reduce memory?

A.Instruction tuning

B.Pruning

C.RLHF

D.QLoRA

AnswerD

QLoRA quantizes the base model to 4-bit, reducing memory further.

Why this answer

QLoRA combines LoRA with quantization (e.g., 4-bit) to drastically reduce memory. Instruction tuning is a method, not a memory reduction technique. RLHF is a training process.

Pruning reduces model size but is not typically combined with LoRA in this context.

Practice this question →

Multi-Selectmedium

A data scientist wants to use SageMaker Clarify to analyze bias during training of a binary classification model. Which TWO types of bias metrics can SageMaker Clarify compute? (Select TWO.)

Select 2 answers

A.Feature importance

B.Post-training bias metrics (e.g., Difference in Positive Proportions, AD)

C.SHAP values

D.Pre-training bias metrics (e.g., Class Imbalance, DPL)

E.Confusion matrix

AnswersB, D

These metrics are computed on model predictions.

Why this answer

SageMaker Clarify computes pre-training bias (e.g., class imbalance) and post-training bias (e.g., difference in positive proportions across groups).

Practice this question →

Multi-Selecthard

A company is training a deep learning model for object detection using SageMaker. The training is very slow and the GPU memory is insufficient for the batch size. The team wants to scale across multiple GPUs efficiently. Which THREE actions should they take? (Choose THREE.)

Select 3 answers

A.Use SageMaker distributed model parallelism

B.Use SageMaker distributed data parallelism

C.Use managed spot instances

D.Use a SageMaker distributed training configuration with the SageMaker SDK

E.Enable SageMaker Debugger to identify bottlenecks

AnswersA, B, D

Model parallelism partitions the model across GPUs if the model is too large for one GPU.

Why this answer

Distributed data parallelism replicates the model and splits batches across GPUs. SageMaker distributed library optimizes this. Model parallelism splits the model when memory is insufficient.

Spot instances reduce cost but not speed or memory. Debugger does not speed up training.

Practice this question →

Multi-Selectmedium

A data scientist is using SageMaker to train a custom PyTorch model for image classification. They want to use SageMaker Debugger to detect training issues. Which TWO built-in rules are most relevant for detecting common training problems? (Select TWO.)

Select 2 answers

A.DataDistribution

B.Overfit

C.ExplodingGradients

D.ImageQuality

E.ConfusionMatrix

AnswersB, C

Detects overfitting by comparing training and validation loss.

Why this answer

ExplodingGradients detects gradients becoming too large, and Overfit detects when validation loss diverges from training loss. Both are common issues.

Practice this question →

MCQhard

A machine learning engineer is using Amazon SageMaker Debugger to monitor a training job for a deep neural network. They receive a rule alert indicating 'exploding gradients'. Which action should they take to address this issue?

A.Use a smaller batch size

B.Reduce the learning rate

C.Increase the number of layers to absorb gradients

D.Increase the learning rate

AnswerB

Reducing the learning rate decreases the size of weight updates, helping to prevent gradients from exploding.

Why this answer

Exploding gradients occur when gradients become too large, causing instability. Reducing the learning rate mitigates this. Increasing batch size can also help by smoothing gradients, but reducing learning rate is a direct solution.

Practice this question →

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Fine-tune a base LLM on the policy documents monthly

D.Train a custom model from scratch on the policy documents each month

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

MCQmedium

A data scientist is using SageMaker Experiments to track multiple training runs. They want to compare different hyperparameter configurations and visualize the impact on model accuracy. What should they use to track hyperparameters?

A.SageMaker Debugger

B.SageMaker Autopilot

C.SageMaker Experiments

D.SageMaker Model Monitor

AnswerC

Experiments track hyperparameters, metrics, and artifacts for comparison.

Why this answer

SageMaker Experiments allows you to log hyperparameters as parameters. They can be viewed and compared across runs in the SageMaker Studio UI.

Practice this question →

MCQhard

An ML team is using SageMaker Automatic Model Tuning to optimize hyperparameters for a neural network. They want to prioritize exploration of the hyperparameter space early in the tuning process. Which strategy should they choose?

A.Grid search

B.Bayesian optimization

C.Random search

D.Hyperband

AnswerB

Bayesian optimization uses a probabilistic model to guide search, balancing exploration and exploitation.

Why this answer

Bayesian optimization balances exploration and exploitation, but early in the process it tends to explore more. Random search explores uniformly without adaptation. Hyperband focuses on early stopping.

Grid search is exhaustive. Bayesian optimization is the best choice for systematic exploration.

Practice this question →

MCQmedium

A company is using SageMaker Autopilot to automatically build a regression model on a dataset. They want to understand which features are most important for the model's predictions. Which feature of Autopilot can provide this insight?

A.Autopilot candidate definition notebook

B.Autopilot model leaderboard

C.Autopilot data exploration report

D.Autopilot explainability report

AnswerD

Explainability report provides feature importance and partial dependence plots for the best model.

Why this answer

SageMaker Autopilot can generate explainability reports that include feature importance, either through SHAP or other methods, depending on the model type.

Practice this question →

MCQmedium

A data scientist wants to track hyperparameters, metrics, and artifacts for multiple training runs in SageMaker. They need to compare runs and identify the best performing model. Which SageMaker feature should they use?

A.SageMaker Model Monitor

B.SageMaker Debugger

C.SageMaker Autopilot

D.SageMaker Experiments

AnswerD

Experiments provides experiment management to log parameters, metrics, and artifacts and compare across runs.

Why this answer

SageMaker Experiments allows tracking and comparing runs, including hyperparameters, metrics, and artifacts.

Practice this question →

MCQeasy

A company needs to perform time-series forecasting on historical sales data. Which SageMaker built-in algorithm is BEST suited for this task?

A.BlazingText

B.Linear Learner

C.XGBoost

D.DeepAR

AnswerD

DeepAR is a built-in algorithm for time-series forecasting.

Practice this question →

MCQhard

During a SageMaker training job, the loss stops decreasing and the validation accuracy plateaus early. SageMaker Debugger rules are enabled. Which rule is MOST likely to identify this issue?

A.Weight distribution rule

B.Exploding gradients rule

C.Overfit rule

D.Dead relu rule

AnswerC

Overfit rule monitors validation vs training metrics to detect overfitting.

Why this answer

The overfit rule detects when validation accuracy plateaus or decreases while training accuracy continues to improve, which is a sign of overfitting. Exploding gradients detects gradient spikes, dead relu detects dead neurons, and weight distribution checks weight distributions but not directly overfitting.

Practice this question →

MCQeasy

Which SageMaker built-in algorithm is designed for time series forecasting?

A.BlazingText

B.DeepAR

C.IP Insights

D.XGBoost

AnswerB

DeepAR is used for time series forecasting.

Why this answer

DeepAR is a built-in algorithm specifically for time series forecasting. BlazingText is for text, XGBoost is for tabular data, and IP Insights is for anomaly detection in IP traffic.

Practice this question →

MCQeasy

Which SageMaker feature allows you to automatically tune hyperparameters using Bayesian optimization?

A.SageMaker Autopilot

B.SageMaker Experiments

C.SageMaker Debugger

D.SageMaker Automatic Model Tuning

AnswerD

AMT performs hyperparameter optimization.

Why this answer

SageMaker Automatic Model Tuning (AMT) supports Bayesian optimization, random search, and Hyperband. Debugger is for monitoring. Experiments is for tracking.

Autopilot is for AutoML.

Practice this question →

MCQhard

A machine learning engineer is using SageMaker Automatic Model Tuning to optimize hyperparameters for a regression model. The objective metric is RMSE. The training job is costly, and the engineer wants to find a good configuration quickly. Which tuning strategy should they use?

A.Bayesian optimization

B.Hyperband

C.Random search

D.Grid search

AnswerA

Bayesian optimization uses past evaluations to inform future hyperparameter choices, balancing exploration and exploitation.

Why this answer

Bayesian optimization builds a probabilistic model of the objective function and selects hyperparameters to try next based on past results, making it more efficient than random search. Hyperband is a bandit-based approach that may be faster but can be less stable.

Practice this question →

MCQmedium

A team is training a PyTorch model using SageMaker and wants to use their own custom training container with a specific PyTorch version. Which approach should they use?

A.Use the SageMaker built-in PyTorch estimator and set the framework_version

B.Use SageMaker Bring Your Own Container (BYOC) with a custom Docker image

C.Use SageMaker Script Mode with a PyTorch script

D.Use SageMaker Autopilot to automatically select the container

AnswerB

BYOC allows full control over the container, including custom PyTorch versions.

Why this answer

BYOC (Bring Your Own Container) allows teams to package their own environment, including custom PyTorch versions, into a Docker container and use it with SageMaker.

Practice this question →

MCQeasy

A company wants to detect anomalies in login events from a large user base, focusing on unusual patterns that may indicate compromised accounts. Which SageMaker built-in algorithm is most suitable for this task?

A.IP Insights

B.K-Means

C.DeepAR

D.Factorisation Machines

AnswerA

IP Insights uses a neural network to learn patterns in IP addresses and can identify anomalous login events.

Why this answer

IP Insights is designed for anomaly detection in IP address usage, learning typical login patterns and flagging unusual ones. The other algorithms are not specialized for this use case.

Practice this question →

MCQmedium

A machine learning engineer runs a training job and notices the loss is NaN after a few steps. Which SageMaker Debugger rule can help identify this issue?

A.Overfit

B.Exploding gradients

C.Dead ReLU

D.Class imbalance

AnswerB

Practice this question →

Multi-Selecthard

A data science team is using SageMaker Experiments to track hyperparameters and metrics for a model training project. They need to compare multiple trials and identify the best model. Which THREE actions are part of a typical workflow? (Select THREE.)

Select 3 answers

A.Log hyperparameters and metrics using the SageMaker SDK

B.Generate confusion matrices for each trial automatically

C.Use the SageMaker SDK to list trials and compare metrics

D.Create an experiment in SageMaker Experiments

E.Automatically deploy the best trial to an endpoint

AnswersA, C, D

Logging is essential for tracking trials.

Why this answer

Option A is correct: creating an experiment is the first step. Option C is correct: logging parameters and metrics during training. Option D is correct: using the SDK to list and compare trials.

Option B is incorrect: experiments do not automatically deploy models. Option E is incorrect: SageMaker Experiments does not automatically generate confusion matrices; that must be done manually.

Practice this question →

MCQeasy

A data scientist wants to train a binary classification model using Amazon SageMaker with a built-in algorithm that performs well on tabular data. Which algorithm should they choose?

A.Image Classification

B.DeepAR

C.XGBoost

D.BlazingText

AnswerC

XGBoost is a gradient boosting algorithm that works well for classification and regression on tabular data.

Why this answer

XGBoost is a popular built-in algorithm in SageMaker for classification and regression on tabular data. Linear Learner is also for tabular data but XGBoost often performs better for complex patterns.

Practice this question →

MCQhard

A team is training a large model on SageMaker using the SageMaker distributed training library with model parallelism. They need to choose the most cost-effective instance type. Which instance family offers the best balance of performance and cost for large model training?

A.ml.g4dn

B.ml.p3

C.ml.trn1

D.ml.c5

AnswerC

Practice this question →

Multi-Selectmedium

A machine learning engineer is preparing a training job on SageMaker with a custom Docker container. Which TWO actions are required to use the container with SageMaker? (Choose TWO.)

Select 2 answers

A.Push the container image to Amazon ECR

B.Use a SageMaker Estimator with image_uri parameter pointing to the ECR image

C.Upload the container image to Amazon S3

D.Enable SageMaker Debugger to monitor the custom container

E.Register the container in SageMaker Model Registry

AnswersA, B

ECR is the registry for Docker images used by SageMaker.

Why this answer

To use a custom container, you must push it to Amazon ECR and specify the registry path in the estimator. The container must also implement the SageMaker training contract (like /opt/ml), but that is part of building the image.

Practice this question →

Multi-Selecthard

A team is fine-tuning a large language model using reinforcement learning from human feedback (RLHF) in SageMaker. Which THREE components are essential for the RLHF pipeline? (Select THREE.)

Select 3 answers

A.Policy network (the LLM being fine-tuned)

B.Value network

C.Feature store

D.Hyperparameter tuner

E.Reward model

AnswersA, B, E

The policy network generates responses and is updated during RL.

Practice this question →

Multi-Selectmedium

A team wants to evaluate a binary classification model for credit risk. They need to understand the trade-off between false positives and false negatives. Which TWO metrics should they use? (Select TWO.)

Select 2 answers

A.Recall

B.Precision

C.NDCG

D.AUC-ROC

E.RMSE

AnswersA, B

Recall focuses on false negatives.

Why this answer

Precision and recall are complementary; precision measures false positives, recall measures false negatives. AUC-ROC summarizes the trade-off across thresholds. RMSE is for regression.

NDCG is for ranking.

Practice this question →

MCQmedium

A company wants to use SageMaker Autopilot to automatically build a binary classification model. Which output does Autopilot provide to help understand model decisions?

A.A leaderboard of models with only accuracy metrics

B.An explainability report with feature importance

C.A confusion matrix for each candidate model

D.A SHAP values summary plot for each trial

AnswerB

Autopilot generates an explainability report as part of its output.

Why this answer

SageMaker Autopilot generates an explainability report with feature importance. It does not provide a confusion matrix by default; users must evaluate separately. It does not provide SHAP values directly but uses similar techniques.

Leaderboard is for ranking trials, not explainability.

Practice this question →

Multi-Selecthard

A company is using SageMaker to train a large model using data parallelism with the SageMaker distributed data parallelism library. They notice that the training throughput is not scaling linearly with the number of GPUs. Which THREE factors could be causing this?

Select 3 answers

A.I/O bottleneck from reading data from Amazon S3

B.Using different instance types across the cluster

C.Model size too large for the GPUs

D.Inefficient loss scaling strategy

E.Communication overhead from gradient synchronization

AnswersA, D, E

Slow data loading can starve GPUs, reducing scaling efficiency.

Why this answer

Communication overhead from gradient synchronization, I/O bottlenecks from reading data, and an inefficient loss scaling strategy can all limit scaling. Model size alone is not a scaling issue if it fits on GPUs. Instance type differences affect speed but not scaling linearity directly.

Practice this question →

Multi-Selecthard

A company is fine-tuning a foundation model using RLHF (Reinforcement Learning from Human Feedback) on SageMaker. They want to reduce memory usage and training time. Which THREE techniques should they consider? (Select THREE.)

Select 3 answers

A.Use a smaller foundation model (e.g., 7B instead of 70B parameters)

B.Use PPO (Proximal Policy Optimization) for the RL step

C.Use SageMaker Data Parallelism with sharded data

D.Use full fine-tuning on a larger instance

E.Use LoRA or QLoRA to reduce the number of trainable parameters

AnswersA, B, E

Smaller models require less memory and train faster.

Why this answer

LoRA/QLoRA reduces trainable parameters, PPO is the standard RLHF algorithm, and using smaller foundation models reduces memory and compute requirements.

Practice this question →

MCQeasy

Which SageMaker built-in algorithm should be used for forecasting time series data with seasonal patterns?

A.IP Insights

B.BlazingText

C.DeepAR

D.Factorization Machines

AnswerC

DeepAR is specifically designed for time series forecasting.

Why this answer

DeepAR is a supervised learning algorithm for time series forecasting that handles seasonality and trends.

Practice this question →

MCQhard

A data scientist uses SageMaker Automatic Model Tuning (AMT) with Bayesian optimization to tune an XGBoost model. The objective metric is validation:auc, but the tuning job converges to a plateau early. Which action is MOST effective to improve exploration?

A.Increase the number of max parallel jobs

B.Decrease the number of hyperparameters being tuned

C.Increase the exploration_weight parameter in the tuning configuration

D.Switch the tuning strategy from Bayesian to Random Search

AnswerC

A higher exploration_weight (default 0.3) makes Bayesian optimization explore more before exploiting.

Why this answer

Increasing the exploration/exploitation weight (exploration_weight) in Bayesian optimization encourages the algorithm to try more diverse hyperparameter combinations, avoiding premature convergence.

Practice this question →

MCQeasy

A machine learning engineer wants to reduce training costs by using excess EC2 capacity. Which instance purchasing option should they choose for SageMaker training jobs?

A.Reserved Instances

B.On-Demand Instances

C.Dedicated Instances

D.Spot Instances

AnswerD

Practice this question →

MCQeasy

A data scientist needs to train a binary classification model on a large tabular dataset stored in Amazon S3. The team wants to minimize training time and cost while using a built-in SageMaker algorithm. Which algorithm should they use?

A.BlazingText

B.DeepAR

C.Linear Learner

D.XGBoost

AnswerC

Linear Learner is built for large-scale classification and regression, providing fast training and built-in distributed training support.

Why this answer

Linear Learner is a built-in SageMaker algorithm designed for binary classification and regression, and it scales efficiently on large datasets. XGBoost is better for structured data with non-linear relationships, DeepAR is for time series, and BlazingText is for text.

Practice this question →

MCQhard

A machine learning engineer is using SageMaker Debugger to monitor training jobs. They want to capture tensors every 100 steps but only for the first 500 steps. Which configuration should they set in the Debugger hook?

A.collection_configs with save_interval=500 and end_step=100

B.collection_configs with start_step=100 and end_step=500

C.collection_configs with save_interval=100 and end_step=500

D.Use SageMaker Debugger rules to filter steps

AnswerC

This configures saving every 100 steps and stopping after step 500.

Practice this question →

MCQmedium

A team is training a large language model using SageMaker with multiple GPUs. They need to reduce training time by splitting the model across devices due to memory constraints. Which distributed training strategy should they use?

A.SageMaker Distributed Data Parallel (SMDDP)

B.Data parallelism

C.SageMaker Distributed Model Parallel (SMDMP)

D.Model parallelism

AnswerD

Model parallelism splits the model across devices, reducing memory per device.

Practice this question →

MCQmedium

A team wants to use a custom PyTorch training script in SageMaker. They need to install additional Python packages not included in the base PyTorch container. Which approach should they take?

A.Use SageMaker Script Mode with a custom Dockerfile

B.Build a custom container with Docker

C.Install packages using a lifecycle configuration

D.Use the SageMaker PyTorch estimator with a requirements.txt file

AnswerD

The PyTorch estimator automatically installs packages from requirements.txt.

Practice this question →

MCQhard

A team is fine-tuning a foundation model using LoRA in SageMaker. They want to reduce memory usage during training. Which instance type is optimized for cost-effective fine-tuning with LoRA?

A.ml.g5.2xlarge

B.ml.p3.2xlarge

C.ml.c5.2xlarge

D.ml.trn1.2xlarge

AnswerA

g5 instances offer a good balance of performance and cost for fine-tuning with LoRA.

Practice this question →

MCQmedium

A company uses SageMaker Clarify to detect bias in their training data. They find that the model has a high disparate impact for a protected attribute. What should they do to mitigate this bias during training?

A.Use SageMaker Clarify’s built-in bias mitigation algorithm during training

B.Remove the protected attribute from the dataset

C.Increase the model complexity to capture more patterns

D.Preprocess the data using techniques like reweighing or resampling to reduce bias

AnswerD

Bias mitigation often involves preprocessing steps such as reweighing or resampling.

Why this answer

SageMaker Clarify can generate bias reports, but mitigation techniques like reweighing or using bias-aware algorithms are applied separately. Adjusting the threshold does not address training bias. Removing the attribute may not eliminate indirect bias.

Using a different algorithm may help but is not the direct mitigation step from Clarify.

Practice this question →

MCQeasy

A machine learning engineer wants to automatically track hyperparameters, metrics, and artifacts for multiple training runs. Which SageMaker feature should they use?

A.SageMaker Debugger

B.SageMaker Model Monitor

C.SageMaker Experiments

D.SageMaker Clarify

AnswerC

Experiments track hyperparameters, metrics, and artifacts for each training run.

Why this answer

SageMaker Experiments is purpose-built for tracking and comparing training runs, capturing parameters, metrics, and artifacts.

Practice this question →

MCQmedium

A financial services company trains multiple models on SageMaker and needs to track hyperparameters, metrics, and artifacts for each experiment. Which SageMaker feature should they use to organize and compare experiments?

A.SageMaker Model Registry

B.SageMaker Pipelines

C.SageMaker Experiments

D.SageMaker Debugger

AnswerC

SageMaker Experiments is designed to track and compare training runs, including hyperparameters and metrics.

Why this answer

SageMaker Experiments provides experiment management, allowing users to track parameters, metrics, and artifacts, and compare runs. SageMaker Studio offers an interface but the core feature is Experiments.

Practice this question →

MCQeasy

Which SageMaker built-in algorithm is specifically designed for time series forecasting?

A.Image Classification

B.BlazingText

C.DeepAR

D.XGBoost

AnswerC

DeepAR is designed for time series forecasting.

Why this answer

DeepAR is a supervised learning algorithm for forecasting scalar time series using recurrent neural networks. The other algorithms are for different tasks: XGBoost for classification/regression, BlazingText for NLP, and Image Classification for computer vision.

Practice this question →

MCQhard

A company is fine-tuning a large language model using LoRA on SageMaker. They want to reduce GPU memory usage during training. Which configuration change would help?

A.Use QLoRA (quantized LoRA) with 4-bit quantization

B.Enable gradient accumulation

C.Increase the sequence length

D.Increase the batch size

AnswerA

QLoRA combines LoRA with quantization, significantly reducing memory footprint while maintaining performance.

Why this answer

LoRA reduces trainable parameters, and when combined with QLoRA (quantized LoRA), it further reduces memory by quantizing the base model to 4-bit or 8-bit. Increasing batch size or sequence length typically increases memory usage. Gradient accumulation also increases memory as it requires storing gradients for multiple steps.

QLoRA is specifically designed for memory reduction.

Practice this question →

MCQhard

A data scientist trains a binary classification model using SageMaker and obtains an AUC of 0.95 on the test set. However, the precision-recall curve shows low precision for high recall thresholds. The business requires a model that performs well on the minority class. Which metric should the team primarily optimize during hyperparameter tuning?

A.Accuracy

B.F1-score on the validation set

C.AUC (Area Under the ROC Curve)

D.Log loss

AnswerB

F1 combines precision and recall, directly addressing the minority class performance requirement.

Why this answer

For imbalanced datasets, the F1-score balances precision and recall, making it a better objective than AUC, which can be misleading when class imbalance exists.

Practice this question →

MCQhard

A team is using SageMaker to train a distributed model with data parallelism. They notice that the training loss is not decreasing as expected and suspect a bug in the data loading pipeline. Which SageMaker Debugger feature can help them inspect the data distributions during training?

A.SageMaker Model Monitor

B.Built-in rules such as overfit detection

C.Custom rules to monitor input tensors

D.SageMaker Processing jobs

AnswerC

By creating a custom rule or using tensor captures, Debugger can save input tensors for analysis of data distributions.

Why this answer

SageMaker Debugger can capture tensors (including inputs and outputs) during training. By saving input tensors, the team can inspect data distributions. Rules fire on issues like overfitting or dead relu, but tensor captures allow direct data inspection.

SaveConfig defines which tensors to save. Using a different instance type does not help debug data issues.

Practice this question →

MCQhard

A company needs to detect bias in a pre-trained model before deployment. They want to compute metrics like disparate impact and equal opportunity difference. Which AWS service should they use?

A.SageMaker Clarify

B.Amazon Rekognition

C.SageMaker Model Monitor

D.SageMaker Debugger

AnswerA

Practice this question →

MCQmedium

A team is training a large language model using PyTorch on SageMaker. They need to reduce training time. The model has 10 billion parameters. Which distributed training strategy should they use?

A.Data parallelism with Horovod

B.Single GPU training

C.Use a larger instance type without parallelism

D.Model parallelism with SageMaker distributed

AnswerD

Model parallelism partitions the model across GPUs, enabling training of large models.

Why this answer

For large models that do not fit into GPU memory, model parallelism is required. Data parallelism replicates the model on each GPU, which would cause out-of-memory errors.

Practice this question →

MCQmedium

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Fine-tune a base LLM on the policy documents monthly

D.Train a custom model from scratch on the policy documents each month

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining.

Practice this question →

Multi-Selectmedium

A machine learning engineer wants to reduce costs for hyperparameter tuning jobs that run for several hours. The jobs are fault-tolerant and can be interrupted. Which TWO actions should they take? (Select TWO.)

Select 2 answers

A.Use on-demand instances for reliability

B.Enable SageMaker Managed Spot Training

C.Use spot instances for the training jobs

D.Use ml.c5 instances instead of ml.p3

E.Increase the number of parallel training jobs

AnswersB, C

Managed Spot Training automates the use of spot instances and handles interruptions.

Practice this question →

MCQmedium

A team is building a fraud detection model using SageMaker and wants to detect anomalies in user login events. Which SageMaker built-in algorithm is specifically designed for anomaly detection in event-based data?

A.Factorisation Machines

B.IP Insights

C.Random Cut Forest

D.K-Means

AnswerB

IP Insights is designed for anomaly detection on IP addresses and events.

Why this answer

IP Insights is a built-in algorithm for learning representations of IP addresses and detecting anomalous login patterns, commonly used for fraud detection.

Practice this question →

MCQmedium

A data scientist is using SageMaker Autopilot to automatically build a binary classification model. The dataset is imbalanced. Which action will Autopilot take by default to address class imbalance?

A.Perform random undersampling of the majority class

B.Ignore the imbalance and proceed with raw data

C.Apply SMOTE oversampling

D.Use class balancing weights

AnswerD

Autopilot automatically uses class weights when imbalance is detected.

Why this answer

Autopilot automatically applies techniques to handle imbalanced data, such as class balancing weights, when it detects imbalance. It does not require manual configuration. Ensemble selection is part of Autopilot but not specifically for imbalance.

SMOTE and undersampling are not built-in defaults.

Practice this question →

Multi-Selecthard

A company wants to use SageMaker to fine-tune a foundation model for a text generation task using RLHF (Reinforcement Learning from Human Feedback). Which THREE components are required in the RLHF pipeline?

Select 3 answers

A.A LoRA adapter for parameter-efficient fine-tuning

B.A pre-trained base model

C.A classifier to distinguish generated text from real text

D.A reward model trained on human preferences

E.A reinforcement learning algorithm such as PPO

AnswersB, D, E

The base model is the starting point for RLHF fine-tuning.

Why this answer

RLHF typically requires: a pre-trained base model to start, a reward model trained on human preferences, and a reinforcement learning algorithm (like PPO) to update the base model. A LoRA adapter is optional but not required. A classifier is not the same as a reward model.

Practice this question →

MCQmedium

A data scientist needs to run a hyperparameter tuning job for a PyTorch model using SageMaker. They want to use Hyperband for efficient resource allocation. Which tuning strategy should they select in the HyperparameterTuner?

A.Bayesian optimization

B.Hyperband

C.Random search

D.Grid search

AnswerB

Hyperband uses adaptive resource allocation and early stopping to efficiently explore the hyperparameter space.

Why this answer

SageMaker Automatic Model Tuner supports Bayesian, Random, and Hyperband strategies. Hyperband is an early stopping-based method that allocates resources adaptively. The 'Hyperband' strategy should be selected explicitly.

Practice this question →

MCQhard

A practitioner is using SageMaker Automatic Model Tuning with Hyperband strategy. They want to stop underperforming trials early to save compute. Which Hyperband parameter controls the aggressiveness of early stopping?

A.strategy

B.max_jobs

C.max_parallel_jobs

D.early_stopping_type

AnswerD

Hyperband uses early stopping; the 'early_stopping_type' parameter controls whether to apply it.

Practice this question →

MCQmedium

A company is using SageMaker Debugger to monitor a training job for a deep learning model. They want to detect when gradients become extremely large, which may cause training instability. Which built-in rule should they use?

A.DeadRelu

B.ExplodingGradients

C.VanishingGradients

D.Overfit

AnswerB

ExplodingGradients detects gradients becoming too large.

Why this answer

The ExplodingGradients rule monitors gradient norms and raises an alert if they exceed a threshold.

Practice this question →

MCQhard

A team is fine-tuning a foundation model using reinforcement learning from human feedback (RLHF) on SageMaker. They have a dataset of human preferences. Which SageMaker capability is most suitable for the reward model training step?

A.SageMaker JumpStart

B.SageMaker Ground Truth

C.SageMaker Autopilot

D.SageMaker Training with a custom PyTorch container

AnswerD

A custom training job can implement the reward model training using PyTorch.

Why this answer

RLHF typically involves training a reward model on human preference data. SageMaker can be used to train any custom model, including a reward model, using its training jobs with a PyTorch or TensorFlow estimator.

Practice this question →

MCQmedium

A data scientist is training an XGBoost model on a large tabular dataset using SageMaker. The training job is taking too long. The scientist wants to reduce training time while maintaining model quality. Which action should the scientist take?

A.Use SageMaker distributed data parallelism across multiple instances

B.Enable SageMaker managed spot training

C.Switch to Hyperband for hyperparameter tuning

D.Convert the XGBoost model to a Linear Learner model

AnswerA

Distributed data parallelism speeds up training by splitting data across multiple instances.

Why this answer

Using SageMaker's managed spot training can significantly reduce cost, but it may cause interruptions. The best approach to reduce training time is to use distributed data parallelism with multiple instances. Increasing instance type can also speed up training, but distributed training is more scalable.

Using Hyperband is for hyperparameter tuning, not for reducing training time directly. Converting to a different algorithm is not necessary.

Practice this question →

MCQmedium

A team is fine-tuning a Hugging Face transformer model on SageMaker. They need to use a custom training script with the Hugging Face Estimator. Which SageMaker feature does this represent?

A.Built-in algorithm

B.SageMaker Autopilot

C.SageMaker Debugger

D.Script mode

AnswerD

Practice this question →

MCQmedium

A data scientist suspects that a deep learning model is overfitting. They enable SageMaker Debugger and want to detect overfitting automatically. Which built-in rule should they use?

A.ExplodingGradients

B.PoorWeightInitialization

C.Overfit

D.DeadRelu

AnswerC

The Overfit rule alerts when validation loss stops decreasing while training loss continues.

Why this answer

The overfit rule in SageMaker Debugger monitors training and validation loss divergence, a key indicator of overfitting.

Practice this question →

Multi-Selecteasy

A company wants to use SageMaker built-in algorithms for a time series forecasting task. Which TWO algorithms are appropriate for this task? (Choose TWO.)

Select 2 answers

A.DeepAR

B.PCA

C.Linear Learner

D.K-Means

E.XGBoost

AnswersA, C

DeepAR is a built-in algorithm for time series forecasting.

Why this answer

DeepAR is specifically designed for time series forecasting. Linear Learner can also be used for forecasting with engineered features. XGBoost can be used for forecasting but is not a built-in algorithm specifically for time series.

K-Means is clustering. PCA is dimensionality reduction.

Practice this question →

MCQeasy

Which SageMaker built-in algorithm is best suited for detecting anomalous login attempts based on IP addresses and user behavior?

A.XGBoost

B.IP Insights

C.PCA

D.K-Means

AnswerB

IP Insights is designed to detect anomalous IP usage.

Why this answer

IP Insights is a built-in algorithm for learning IP address usage patterns and detecting anomalous behavior. The other algorithms are for different purposes: XGBoost for classification, K-Means for clustering, and PCA for dimensionality reduction.

Practice this question →

Multi-Selectmedium

A machine learning engineer wants to use SageMaker Clarify to analyze bias in their training data and model predictions. They want to detect bias before training. Which TWO types of analysis can SageMaker Clarify perform on the data?

Select 2 answers

A.Pre-training bias metrics (e.g., class imbalance, feature skew)

B.Feature importance (SHAP values) on the dataset

C.Model monitoring for data drift

D.Explainability report for the model

E.Post-training bias metrics (e.g., difference in accuracy across groups)

AnswersA, B

Clarify can compute pre-training bias metrics on the dataset.

Why this answer

SageMaker Clarify can compute pre-training bias metrics like class imbalance and feature correlation, and post-training metrics like accuracy difference. It also generates explainability reports. Model monitoring is separate.

Practice this question →

MCQmedium

An ML engineer is debugging a training job that is consistently failing due to an out-of-memory error. The engineer is using SageMaker's built-in XGBoost algorithm. Which Debugger rule can help identify the issue?

A.Exploding gradients

B.Overfit

C.Dead relu

D.OOM rule

AnswerA

Exploding gradients can cause memory spikes leading to OOM; Debugger can capture this.

Why this answer

The 'Exploding gradients' rule detects when gradients become too large, which is a common cause of training instability but not necessarily OOM. The 'Overfit' rule detects overfitting. The 'Dead relu' rule is for ReLU activation.

None of these directly address OOM. However, Debugger does not have a specific OOM rule; instead, the engineer should monitor memory utilization via CloudWatch or adjust instance type. Among the options, 'Exploding gradients' is the most relevant because large gradients can lead to memory spikes.

Practice this question →

Multi-Selectmedium

A data scientist is using SageMaker to train a model and wants to reduce training costs without sacrificing performance. Which TWO actions should the scientist take? (Select TWO.)

Select 2 answers

A.Use a larger instance type to finish faster

B.Use SageMaker managed spot training

C.Enable SageMaker Debugger hooks to monitor training

D.Enable SageMaker Model Monitor for the training job

E.Use distributed training across multiple smaller instances

AnswersB, E

Spot training reduces cost significantly.

Why this answer

Using spot instances can reduce costs up to 90%. SageMaker managed spot training handles interruptions automatically. Using distributed training across multiple smaller instances can be cost-effective compared to a single large instance.

Using Provisioned Concurrency is for inference, not training. Debugger hooks do not reduce cost.

Practice this question →

MCQmedium

A data scientist is using SageMaker Experiments to track multiple training runs. They want to compare the F1 scores across runs. Which component should they use to log the F1 score?

A.Parameter

B.Hyperparameter

C.Artifact

D.Metric

AnswerD

Metrics are used to track performance values like F1.

Why this answer

In SageMaker Experiments, metrics are logged using the SageMaker SDK's log_metric method or by reporting through the training job's metric definitions. Hyperparameters are logged separately. Artifacts are for model files or datasets.

Practice this question →

Multi-Selecthard

A company is fine-tuning a large language model using reinforcement learning from human feedback (RLHF). Which THREE components are typically required?

Select 3 answers

A.A discriminative classifier

B.A reference model

C.A reward model

D.A policy model (the LLM)

E.A value function

AnswersB, C, D

Practice this question →

MCQhard

A financial services firm is training a fraud detection model using SageMaker. The dataset is highly imbalanced (0.1% fraudulent transactions). The model currently achieves 99.9% accuracy but only catches 5% of fraud cases. Which metric should the team prioritize to evaluate model performance?

A.Accuracy

B.Precision

C.Recall

D.F1-score

AnswerC

Recall focuses on capturing positive cases, which is critical in fraud detection.

Why this answer

Recall (true positive rate) measures the proportion of actual positives correctly identified. For fraud detection, catching fraud is critical; accuracy is misleading due to class imbalance.

Practice this question →