Knowledge + Practice

CCNA Machine Learning Implementation and Operations Questions

75 of 351 questions · Page 3/5 · Machine Learning Implementation and Operations · Answers revealed

Practice these questions Domain overview All questions

151

MCQmedium

A data scientist is using Amazon SageMaker to train a custom image classification model using a PyTorch script. The training job runs successfully but the model accuracy is lower than expected. The scientist wants to debug the training process by inspecting gradients and layer outputs. Which SageMaker feature should be used to capture this internal state during training?

A.Use SageMaker Experiments to track hyperparameters and metrics.

B.Use SageMaker Debugger to capture tensors and gradients.

C.Use SageMaker Profiler to profile system bottlenecks.

D.Use SageMaker Model Monitor to detect data drift.

AnswerB

SageMaker Debugger provides real-time monitoring of training metrics and internal state like gradients.

Why this answer

SageMaker Debugger captures internal model state such as gradients and tensors during training, enabling analysis and debugging. SageMaker Profiler (B) focuses on system performance, not model internals. SageMaker Experiments (C) tracks trials and metrics but does not capture internal state.

SageMaker Model Monitor (D) detects data drift after deployment.

Practice this question →

152

MCQhard

An ML team is deploying a model to a SageMaker endpoint for real-time inference. The model is large (2 GB) and requires GPU for low-latency inference. The team wants to minimize cost while maintaining a response time of under 200 ms. Which instance configuration and SageMaker feature would be best?

A.Use a GPU instance (ml.p3.2xlarge) with SageMaker Elastic Inference.

B.Use a batch transform job on GPU instances.

C.Use a serverless inference endpoint with a CPU instance.

D.Use a multi-model endpoint on a GPU instance.

AnswerA

Elastic Inference provides GPU acceleration at lower cost than a full GPU instance.

Why this answer

Option B is correct because using a GPU instance (ml.p3.2xlarge) with SageMaker's Elastic Inference attaches a fraction of GPU acceleration to a CPU instance, balancing cost and performance. Option A is wrong because serverless inference may not support GPU and has cold starts. Option C is wrong because multi-model endpoints are for hosting multiple models on the same instance, not primarily for latency.

Option D is wrong because batch transforms are for offline inference.

Practice this question →

153

MCQhard

A data scientist submits a SageMaker training job with the provided configuration. The job fails immediately with the error 'Algorithm not found: 382416733822.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1'. What is the most likely cause?

A.The training region is different from the image region.

B.The ECR repository URI is incorrect or the image does not exist.

C.The input data format is incorrect.

D.The IAM role does not have permission to pull the image.

AnswerB

The URI may have wrong account ID or tag.

Why this answer

Option A is correct because the image URI may be wrong. Option B is wrong because region is specified. Option C is wrong because role is specified.

Option D is wrong because format is correct.

Practice this question →

154

MCQmedium

A media company uses SageMaker to train a recommendation model. The training data is stored in an S3 bucket with versioning enabled. The data pipeline updates the training data daily by overwriting objects with new data. Recently, the model's performance degraded, and the team suspects that the training data was corrupted on a specific day. They want to train the model using the data from a previous version. How can the team retrieve the previous version of the training data?

A.Restore the bucket from S3 Glacier, which contains the previous version.

B.Use the S3 GET Object Version API to download the specific version of each object.

C.Use S3 Select to query the previous version of the data.

D.Enable S3 replication to a different bucket and use the replicated data.

AnswerB

S3 versioning stores multiple versions; GET Object with version ID retrieves the desired version.

Why this answer

Option A is correct because S3 versioning allows retrieving any previous version of an object by specifying the version ID. The team can list versions, identify the correct version, and use it for training. Option B is incorrect because S3 Select is for querying data within an object, not for version retrieval.

Option C is incorrect because S3 Glacier is for archival, not for accessing previous versions of current objects. Option D is incorrect because S3 replication does not help in retrieving previous versions from the same bucket.

Practice this question →

155

Multi-Selecthard

A data scientist is deploying a model on Amazon SageMaker. The model requires inference on images, and the data scientist wants to use a GPU instance for low latency. However, the data scientist is unsure about the instance type to choose for the endpoint. Which TWO factors should the data scientist consider when selecting the instance type? (Choose TWO.)

Select 2 answers

A.The time taken to train the model

B.The number of vCPUs on the instance

C.The cost per inference for the instance type

D.The AWS Region of the S3 bucket storing the model

E.The GPU memory available on the instance

AnswersC, E

Cost is a key consideration.

Why this answer

Options B and D are correct. B: GPU memory must be sufficient to hold the model and a batch of images. D: Cost per inference is important for operational efficiency.

Option A (number of vCPUs) is less relevant for GPU inference. Option C (S3 bucket location) does not affect instance choice. Option E (training time) is not relevant for inference.

Practice this question →

156

MCQhard

A company is using Amazon SageMaker Ground Truth to build a training dataset for an image classification model. The company has a large number of unlabeled images stored in Amazon S3. The data science team wants to use a private workforce consisting of internal employees to label the images. The team creates a labeling job with a private workforce. After starting the job, the team notices that the labeling tasks are not being assigned to any workers. The workers have been added to the private workforce and have received their login credentials. What is the MOST likely cause of this issue?

A.The labeling job is configured for a different task type than image classification.

B.The S3 bucket containing the images has incorrect permissions, preventing workers from viewing the images.

C.The workers do not have the required IAM permissions to access the labeling portal.

D.The workers have not been added to the work team that is assigned to the labeling job.

AnswerD

Correct: Workers must be part of the work team to receive tasks.

Why this answer

For a private workforce, workers must be added to a work team that is associated with the labeling job. Option C is correct. Option A (IAM permissions) would prevent login, not task assignment.

Option B (S3 permissions) would affect data access, not task visibility. Option D (task type) does not affect assignment.

Practice this question →

157

MCQhard

A data scientist is deploying a real-time inference endpoint using SageMaker. The model is a large NLP model requiring GPU for low latency. The endpoint must be highly available across two Availability Zones. Which deployment configuration meets these requirements?

A.Deploy a single model endpoint on an ml.c5.xlarge instance with auto-scaling

B.Use SageMaker batch transform on GPU instances

C.Deploy a multi-model endpoint on an ml.p3.2xlarge instance with auto-scaling and at least two instances in different AZs

D.Deploy a single model endpoint on an ml.p3.2xlarge instance with one instance

AnswerC

GPU, auto-scaling, and multi-AZ provide low latency and high availability.

Why this answer

Option B is correct: A multi-model endpoint on a GPU instance with auto-scaling across AZs ensures high availability. Option A uses CPU. Option C (single instance) lacks HA.

Option D (batch transform) is not real-time.

Practice this question →

158

MCQmedium

A company's ML model training on Amazon SageMaker is taking longer than expected. The training job uses a single ml.p3.2xlarge instance. Which change is most likely to reduce training time?

A.Increase the instance's EBS volume size

B.Use distributed training with multiple GPU instances

C.Enable Managed Spot Training

D.Switch to a compute-optimized instance with more vCPUs

AnswerB

Parallelizes work across GPUs.

Why this answer

Distributed training across multiple GPUs can parallelize computation. Option A is wrong because more vCPUs without GPU may not help for deep learning. Option C is wrong because Managed Spot Training may interrupt and not speed up.

Option D is wrong because increasing instance storage does not affect compute speed.

Practice this question →

159

MCQeasy

Refer to the exhibit. An ML engineer creates a CloudFormation stack with this template. The stack creation succeeds, but when the engineer tries to invoke the endpoint, it returns a ModelError. The CloudWatch logs show that the container exited with error. What is the MOST likely cause?

A.The execution role does not have permissions to pull the Docker image from ECR.

B.The initial instance count is set to 2, which is insufficient for the model size.

C.The endpoint is not deployed in a VPC and cannot access the S3 bucket.

D.The EndpointConfig references the model but the model is not yet created.

AnswerA

The role must have ECR permissions to pull the image; if missing, the container fails to start.

Why this answer

The template does not specify a VPC configuration for the endpoint. By default, SageMaker endpoints are not in a VPC and cannot access resources in a VPC unless configured. However, the model artifact is in S3 (s3://my-bucket/model.tar.gz), which is accessible without VPC.

The most common cause of ModelError is that the container image is not compatible with the instance type or the model file is missing. But given the template, a likely issue is that the execution role (SageMakerRole) does not have permissions to access the ECR image or S3 bucket. The error is not about VPC (A) or instance count (B) or endpoint config (D).

Practice this question →

160

MCQmedium

A company is using Amazon SageMaker to train a deep learning model. The training job uses a script that reads data from Amazon S3 using the SageMaker SDK's `s3_input` method. The training job runs on a single ml.p3.2xlarge instance. The data scientist notices that the GPU utilization is very low during training, often below 20%. The training dataset is large, approximately 50 GB, stored as TFRecord files in S3. What is the MOST likely cause of low GPU utilization?

A.The training script is using a CPU-only version of TensorFlow.

B.The data loading pipeline is not optimized, causing the GPU to wait for data.

C.The batch size is too large, causing the GPU to run out of memory.

D.The instance type does not have enough GPU memory for the model.

AnswerB

Correct: Inefficient data loading leads to GPU starvation.

Why this answer

Low GPU utilization often indicates that the data loading pipeline is a bottleneck. The training script may not be using efficient data loading techniques like prefetching and parallel data extraction. Option A is correct.

Option B (batch size) could be a factor but is less likely given TFRecord format. Option C (instance type) is unlikely because ml.p3.2xlarge has a capable GPU. Option D (framework) is not the cause.

Practice this question →

161

MCQeasy

A company is using SageMaker to train a linear regression model on a dataset that fits into memory on a single instance. The training job is taking longer than expected. The data scientist wants to reduce training time without changing the algorithm. Which approach is most effective?

A.Disable automatic model tuning.

B.Use a larger instance type with more vCPUs.

C.Use SageMaker's distributed training with multiple instances.

D.Reduce the number of epochs.

AnswerC

Parallel processing reduces training time.

Why this answer

Option A is correct because SageMaker's managed training can automatically distribute the data across multiple instances for parallel training, reducing time. Option B is wrong because increasing instance size might help but may not be as cost-effective as distributed training. Option C is wrong because it would increase time.

Option D is wrong because it would lose optimization benefits.

Practice this question →

162

MCQhard

A company deploys a SageMaker model for inference. After a few days, response times increase significantly. CloudWatch metrics show high CPU utilization and memory usage. The model is a large ensemble. What is the most cost-effective solution?

A.Configure SageMaker automatic scaling based on CPU utilization

B.Use CloudWatch alarms to notify the team, who manually launch additional endpoints

C.Migrate the model to AWS Lambda with provisioned concurrency

D.Replace the current instance type with a larger one

AnswerA

Auto scaling dynamically adjusts instance count to handle load cost-effectively.

Why this answer

Option C is correct: automatic scaling adds instances based on demand, handling spikes cost-effectively. Option A (ad hoc monitoring) does not automatically adjust. Option B (migrate to Lambda) may not support large models.

Option D (increase instance size) is less cost-effective than scaling out.

Practice this question →

163

MCQhard

A data scientist needs to run a hyperparameter tuning job for a deep learning model. Which SageMaker feature should they use?

A.SageMaker Hyperparameter Tuning Job

B.SageMaker Experiments

C.SageMaker Automatic Model Tuning

D.SageMaker Processing

AnswerA

This is the correct feature for hyperparameter optimization.

Why this answer

Option B is correct because SageMaker Hyperparameter Tuning Jobs are designed for this purpose. Option A is wrong because Automatic Model Tuning is the same as hyperparameter tuning but the correct term is 'Hyperparameter Tuning Job'. Option C is wrong because SageMaker Experiments track runs, not tune.

Option D is wrong because SageMaker Processing is for data processing.

Practice this question →

164

MCQhard

A team is using Amazon SageMaker Autopilot to automatically build models. The dataset has 50 features and 1 million rows. After training, Autopilot generates multiple candidates. The team wants to deploy the model with the highest accuracy. What is the best practice to select and deploy the model?

A.Deploy all candidates behind a multi-model endpoint and route traffic based on request features

B.Select the model with the highest validation accuracy after performing additional hyperparameter tuning

C.Manually review each candidate's architecture and select the one with the simplest design

D.Deploy the candidate with the highest objective metric value from the Autopilot leaderboard

AnswerD

Autopilot ranks candidates by objective metric.

Why this answer

SageMaker Autopilot's best candidate is determined by the objective metric. Option B is wrong because deploying all candidates wastes resources. Option C is wrong because the highest accuracy may not generalize; using the best objective metric is standard.

Option D is wrong because manual selection is subjective.

Practice this question →

165

Multi-Selecthard

A data scientist is deploying a model on Amazon SageMaker for real-time inference. The model is a PyTorch model that requires custom inference code. The data scientist needs to handle variable-length inputs and optimize inference latency. Which THREE steps should the data scientist take? (Choose THREE.)

Select 3 answers

A.Enable SageMaker batch transform to process requests in batches.

B.Use the SageMaker PyTorch container without any modifications.

C.Set the endpoint to use multiple variants for A/B testing.

D.Use TorchScript to compile the model for optimized inference.

E.Provide a custom inference script (inference.py) that defines how to load the model and process requests.

AnswersA, D, E

Batching reduces latency for multiple requests.

Why this answer

Option A is correct because SageMaker batch transform processes requests in batches, which can improve throughput and reduce per-request latency for variable-length inputs by grouping similar-sized inputs together. However, for real-time inference, batch transform is not suitable as it is designed for offline, asynchronous processing; the question specifies real-time inference, so this option is actually incorrect in context. The correct steps for real-time inference with variable-length inputs and optimized latency are B, D, and E, but since the question asks for three correct steps and marks A as correct, this is a trap.

Exam trap

Cisco often tests the distinction between batch transform (offline, asynchronous) and real-time inference (synchronous, low-latency), leading candidates to mistakenly select batch transform for real-time scenarios.

Practice this question →

166

MCQmedium

A company is deploying a real-time inference endpoint using Amazon SageMaker. The model is a large deep learning model that requires GPU inference. The company wants to minimize latency and cost. Which instance type and deployment strategy should be used?

A.Use a serverless inference endpoint with a GPU instance.

B.Use a real-time endpoint with a GPU instance and enable multi-model endpoints.

C.Use a batch transform job with a GPU instance.

D.Use an asynchronous inference endpoint with a GPU instance.

AnswerB

Multi-model endpoints reduce cost by sharing GPU across models.

Why this answer

Option C is correct because SageMaker real-time endpoints with multi-model endpoints allow hosting multiple models on a single GPU instance, reducing cost while maintaining low latency. Option A is wrong because batch transform is not real-time. Option B is wrong because Serverless Inference does not support GPUs.

Option D is wrong because asynchronous inference is not real-time.

Practice this question →

167

MCQeasy

A company uses Amazon SageMaker to deploy a model for real-time inference. The model is a linear regression model that was trained using the SageMaker built-in Linear Learner algorithm. The endpoint is configured with an ml.m5.large instance. After deployment, the company notices that the endpoint returns incorrect predictions. The training data was normalized, but the inference requests send raw feature values without normalization. What should the company do to fix the issue?

A.Retrain the model using raw data without normalization.

B.Change the endpoint instance type to a GPU instance to handle the raw data.

C.Create a SageMaker inference pipeline that includes a preprocessing step to normalize the input data before passing it to the model.

D.Use a batch transform job to preprocess the data before sending it to the endpoint.

AnswerC

Correct: This ensures real-time raw data is normalized before inference.

Why this answer

The model expects normalized input. The inference pipeline must include a preprocessing step to normalize the data. Using a SageMaker inference pipeline with a preprocessing container (e.g., scikit-learn) before the model container is the correct approach.

Option B is correct. Option A (retrain with raw data) is a viable alternative but would require retraining and may reduce model performance. Option C (transform job) is for batch inference, not real-time.

Option D (change instance type) does not address the data mismatch.

Practice this question →

168

Multi-Selecteasy

A machine learning engineer is setting up a training job in Amazon SageMaker. Which THREE components are required to define a training job? (Choose three.)

Select 3 answers

A.VPC configuration for network isolation.

B.Hyperparameters for the algorithm.

C.Output data configuration (e.g., model artifact path).

D.An algorithm or custom container image.

E.Input data configuration (e.g., S3 path).

AnswersC, D, E

Specifies where to save output.

Why this answer

Options A, B, and D are correct. An algorithm or container, input data configuration, and output data configuration are required. Option C is wrong because hyperparameters are optional.

Option E is wrong because an IAM role is required, not a VPC (though VPC is common).

Practice this question →

169

MCQmedium

A machine learning engineer is deploying a model using AWS Lambda for real-time inference. The model is a scikit-learn RandomForestClassifier with 100 trees, serialized as a pickle file of 150 MB. The Lambda function has 3 GB memory allocated. However, the inference requests are timing out after 30 seconds. What is the most likely cause?

A.scikit-learn is not compatible with AWS Lambda.

B.The Lambda function does not have enough memory to load the model.

C.The model is loaded from S3 on every invocation, causing high latency.

D.The Lambda function timeout is set too low; increase it to 5 minutes.

AnswerC

Lambda should load the model outside the handler to reuse across invocations, but even then, cold starts with a large model are slow.

Why this answer

Option C is correct because the default behavior of loading a model from S3 on every Lambda invocation introduces significant latency. Each invocation must download the 150 MB pickle file from S3 over the network, deserialize it, and then run inference, which easily exceeds the 30-second timeout. The model should be loaded once outside the handler (in global scope) and reused across invocations to avoid this overhead.

Exam trap

Cisco often tests the misconception that Lambda timeouts are always the root cause of slow inference, when in fact the real issue is inefficient resource initialization (like loading large models from S3 on every call) that can be fixed by architectural changes rather than simply increasing the timeout.

How to eliminate wrong answers

Option A is wrong because scikit-learn is fully compatible with AWS Lambda when included in the deployment package or as a Lambda layer. Option B is wrong because 3 GB of memory is more than sufficient to load a 150 MB model; memory is not the bottleneck here. Option D is wrong because increasing the timeout to 5 minutes would mask the underlying issue of inefficient model loading, not solve it; the real problem is the per-invocation S3 download latency, not the timeout value itself.

Practice this question →

170

MCQeasy

A machine learning team is using SageMaker to build a model. They need to track hyperparameter tuning experiments, compare results, and visualize metrics. Which SageMaker feature should they use?

A.SageMaker Experiments

B.SageMaker Ground Truth

C.SageMaker Model Monitor

D.SageMaker Hyperparameter Tuning

E.SageMaker Debugger

AnswerA

Experiments provides tracking, comparison, and visualization.

Why this answer

Option C is correct because SageMaker Experiments provides experiment tracking, comparison, and visualization. Option A (Hyperparameter Tuning) only tunes, not tracks. Option B (Debugger) is for debugging.

Option D (Model Monitor) is for monitoring after deployment. Option E (Ground Truth) is for labeling.

Practice this question →

171

MCQmedium

A company is building a recommendation system using Amazon SageMaker. The training data includes user-item interactions stored in a DataFrame with over 100 million rows. The data scientist wants to perform feature engineering, including one-hot encoding of categorical features with high cardinality. Which approach is MOST cost-effective and scalable?

A.Use Amazon EMR with Spark and store the processed data in HDFS.

B.Use SageMaker Processing with a Spark container to distribute the encoding job.

C.Use a SageMaker notebook instance with scikit-learn to perform the encoding in memory.

D.Use AWS Glue ETL jobs to perform the encoding and store the result in S3.

AnswerB

SageMaker Processing with Spark provides distributed processing and is cost-effective for large datasets.

Why this answer

Option B is correct because SageMaker Processing with a Spark job can scale horizontally and is cost-effective for large datasets. Option A is wrong because scikit-learn on a single instance may not handle 100M rows. Option C is wrong because Glue is serverless but may be more expensive for large processing.

Option D is wrong because EMR is more complex and costly for a simple job.

Practice this question →

172

MCQhard

A team notices that a SageMaker training job using TensorFlow is running slower than expected. The training data is in S3 in TFRecord format. Which action is most likely to improve training throughput?

A.Use Pipe mode for data ingestion

B.Use distributed training with more instances

C.Increase the batch size in the training script

D.Switch from Pipe mode to File mode

AnswerA

Pipe mode streams data, reducing I/O wait time.

Why this answer

Option D is correct because SageMaker Pipe mode streams data directly from S3, eliminating download latency. Option A is wrong because using 'File' mode is default but slower. Option B is wrong because increasing batch size may cause memory issues.

Option C is wrong because increasing instances adds complexity but not per-instance throughput.

Practice this question →

173

Matchingmedium

Match each SageMaker built-in metric to its meaning.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Fraction of correct predictions on validation set

Root mean square error on validation set

Area under ROC curve on validation set

Logistic loss on validation set

Harmonic mean of precision and recall on validation set

Why these pairings

These metrics are used to evaluate model performance.

Practice this question →

174

MCQeasy

A machine learning engineer is deploying a model to an Amazon SageMaker endpoint. The model requires GPU for inference. Which instance type should be selected?

A.ml.p3.2xlarge

B.ml.m5.large

C.ml.c5.xlarge

D.ml.r5.large

AnswerA

GPU instance suitable for inference.

Why this answer

Option D is correct because P3 instances (e.g., ml.p3.2xlarge) provide GPU capabilities. Options A, B, and C are CPU-only instances.

Practice this question →

175

MCQmedium

A company is using SageMaker to train a linear learner algorithm. The training log shows that the algorithm converges but the final loss is still high. Which change is most likely to improve the model?

A.Reduce the early stopping tolerance

B.Increase the maximum runtime

C.Add feature crosses or polynomial features

D.Increase the number of training instances

AnswerC

Linear models benefit from feature engineering to capture non-linear relationships.

Why this answer

Option B is correct: feature engineering can help linear models capture non-linear patterns. Option A (increasing instances) may not help if model is underfitting. Option C (early stopping) would stop prematurely.

Option D (max runtime) does not affect model quality.

Practice this question →

176

MCQmedium

A SageMaker training job fails with the failure reason shown in the exhibit. What is the most likely cause?

A.The training instance ran out of memory

B.The S3 bucket with training data is not accessible

C.The SageMaker service limit for the instance type has been exceeded

D.There is an error in the custom training script

AnswerD

ExecuteUserScriptError with ExitCode 1 indicates script error.

Why this answer

Option B is correct: ExitCode 1 from UserScript indicates an error in the training script. Option A (insufficient memory) would show OutOfMemory error. Option C (S3 access) would show AccessDenied.

Option D (instance limit) would show ResourceLimitExceeded.

Practice this question →

177

MCQeasy

A machine learning engineer needs to store and version datasets for reproducibility. Which AWS service is designed for this purpose?

A.AWS CodeCommit

B.Amazon S3

C.SageMaker Feature Store

D.Amazon Redshift

AnswerC

Feature Store is designed for feature storage, versioning, and retrieval.

Why this answer

SageMaker Feature Store stores, manages, and versions features for ML. Option B is correct. Option A is a general object store.

Option C is for code versioning. Option D is for data warehousing.

Practice this question →

178

MCQmedium

An IAM policy attached to a SageMaker execution role is shown in the exhibit. When a data scientist tries to create a training job that writes logs to CloudWatch Logs, the job fails. What is the MOST likely reason?

A.The policy does not specify the SageMaker API version

B.The S3 bucket policy denies access to the training job

C.The policy lacks permissions for CloudWatch Logs actions

D.The policy has an implicit deny for SageMaker actions

AnswerC

Training jobs need logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents.

Why this answer

Option B is correct because the policy does not include CloudWatch Logs permissions. Option A is incorrect because the actions are allowed for all resources. Option C is incorrect because the S3 actions are allowed.

Option D is incorrect because there is no deny statement.

Practice this question →

179

MCQmedium

A company is using Amazon SageMaker to train a deep learning model. The training job is failing with an error 'CUDA out of memory'. The training instance is an ml.p3.2xlarge with 16 GB GPU memory. The model architecture and batch size are appropriate for this instance size. What is the most likely cause of this error?

A.Reduce the number of epochs.

B.Increase the number of GPUs by using a distributed training instance type.

C.Enable automatic mixed precision (AMP) training to reduce memory usage.

D.Use a smaller instance type to force lower memory usage.

AnswerC

AMP uses FP16 where possible, cutting memory usage roughly in half, which often resolves out-of-memory errors.

Why this answer

Option C is correct because enabling automatic mixed precision (AMP) training reduces GPU memory usage by storing tensors in half-precision (FP16) where possible, while keeping critical operations in full precision (FP32). This directly addresses the 'CUDA out of memory' error on an ml.p3.2xlarge instance (16 GB GPU memory) without changing the model architecture or batch size, which are already appropriate.

Exam trap

The trap here is that candidates may incorrectly assume the solution is to reduce epochs (Option A) or scale out to more GPUs (Option B), when the root cause is memory exhaustion per GPU, which is best addressed by mixed precision training to halve the memory footprint without altering the model or batch size.

How to eliminate wrong answers

Option A is wrong because reducing the number of epochs does not affect peak GPU memory usage during training; it only changes the total training time, not the memory footprint per batch. Option B is wrong because increasing the number of GPUs via distributed training (e.g., ml.p3.16xlarge) does not reduce per-GPU memory consumption; it distributes the model across GPUs but each GPU still needs to hold its portion of the data and gradients, and the error is on a single GPU instance. Option D is wrong because using a smaller instance type would reduce available GPU memory (e.g., ml.p3.xlarge has only 8 GB), making the out-of-memory error worse, not better.

Practice this question →

180

MCQhard

A company uses Amazon SageMaker to train a text classification model. The training data is stored in S3 and contains sensitive personally identifiable information (PII). The company must ensure that the data is encrypted at rest in S3 and that the encryption key is managed by the company's own hardware security module (HSM). Which configuration should be used?

A.Use S3 server-side encryption with S3-managed keys (SSE-S3)

B.Use client-side encryption with the encryption key stored in the HSM

C.Use S3 server-side encryption with customer-provided keys (SSE-C) and store the keys in the HSM

D.Use S3 server-side encryption with AWS KMS managed keys (SSE-KMS) with a customer managed key

AnswerC

SSE-C allows customers to provide their own keys, which can be stored in an HSM.

Why this answer

Option A is correct because SSE-C allows customers to provide their own encryption keys, which can be stored in an HSM. Option B (SSE-S3) uses AWS-managed keys. Option C (SSE-KMS with a customer managed key) uses KMS, not HSM.

Option D (client-side encryption) requires managing encryption in the application.

Practice this question →

181

MCQmedium

A machine learning team is deploying a model using Amazon SageMaker. The model inference code runs on GPUs and requires a custom container. The team wants to minimize cold start latency. Which SageMaker hosting option should they use?

A.Use a multi-model endpoint with GPU instances.

B.Use a serverless inference endpoint.

C.Use a real-time endpoint with multiple production variants for redundancy.

D.Use a real-time endpoint with a single production variant using a GPU instance.

AnswerD

Real-time endpoints with GPU instances minimize cold start latency for custom containers.

Why this answer

Multi-model endpoints are designed to host multiple models on the same endpoint and can reduce cold starts when models are loaded on demand, but for a single model with GPU requirement, multi-model endpoints do not support GPU. Real-time endpoints with a single variant and GPU instance are the standard choice for low-latency inference. Serverless inference does not support GPU.

Multi-variant endpoints are for A/B testing. Batch transform is for offline inference.

Practice this question →

182

MCQeasy

A company wants to use SageMaker to host multiple models behind a single endpoint to reduce costs. Which SageMaker feature should they use?

A.SageMaker Elastic Inference

B.SageMaker inference pipeline

C.SageMaker batch transform

D.SageMaker multi-container endpoints

E.SageMaker Multi-Model Endpoints

AnswerE

Multi-Model Endpoints host multiple models on the same endpoint.

Why this answer

Option C is correct because SageMaker Multi-Model Endpoints allow multiple models on the same endpoint. Option A (multi-container) is for different containers, not multiple models. Option B (batch transform) is offline.

Option D (inference pipeline) chains containers. Option E (Elastic Inference) accelerates inference.

Practice this question →

183

MCQhard

A data scientist is training a model using Amazon SageMaker. The training dataset is 500 GB and is stored in S3. The data scientist wants to use Pipe input mode to stream data directly from S3 to the training container. However, the training job fails with an error indicating that the container cannot read the data. What is the most likely cause?

A.The training instance does not have enough memory

B.The IAM role does not have s3:GetObject permission

C.The data is compressed and Pipe mode cannot handle compressed data

D.The S3 bucket is in a different Region

E.The training algorithm does not support Pipe mode

AnswerE

Not all algorithms support Pipe input; they need to read from a pipe.

Why this answer

Option D is correct because Pipe mode requires the training algorithm to support reading from a FIFO pipe (e.g., via stdin) rather than random access files. Option A (S3 bucket) is accessible. Option B (instance type) is not specific to Pipe mode.

Option C (IAM role) is needed but if permissions are correct, the issue is algorithm support. Option E (compression) is not a problem for Pipe mode.

Practice this question →

184

Multi-Selecthard

A company is training a deep learning model on SageMaker using multiple GPUs. The training is slow due to inefficient data loading. Which TWO actions can improve I/O performance?

Select 2 answers

A.Use instance store volumes for data.

B.Increase the instance count to a single large instance.

C.Use Pipe mode input for training data.

D.Use Amazon EBS volumes attached to training instances.

E.Use Amazon EFS as a shared file system.

AnswersC, E

Pipe mode streams data directly from S3, reducing I/O bottlenecks.

Why this answer

Options A and D are correct. Using Pipe mode streams data directly from S3, avoiding local disk writes. Using Amazon EFS can also provide shared, high-throughput storage.

Option B is incorrect because SageMaker does not support EBS volumes as input directly; you would need to use FSx or EFS. Option C is incorrect because instance store volumes are ephemeral and not suitable for persistent data. Option E is incorrect because a single large instance may not improve I/O parallelism.

Practice this question →

185

MCQmedium

Refer to the exhibit. An IAM policy is attached to an IAM role used by a SageMaker training job. The training job fails with an access denied error when trying to write model artifacts to an S3 bucket. What is the most likely cause?

A.The IAM role does not have permission to write to the S3 bucket

B.The training job is trying to write to a different S3 bucket

C.The IAM role does not have permission to read the training data

D.The IAM role does not have permission to create training jobs

AnswerA

The policy lacks s3:PutObject, so writing model artifacts is denied.

Why this answer

Option D is correct because the policy only allows s3:GetObject, not s3:PutObject, so the training job cannot write artifacts. Option A is wrong because the policy allows sagemaker:CreateTrainingJob. Option B is wrong because the policy allows s3:GetObject for the training data.

Option C is wrong because the policy does not restrict the S3 bucket; it allows GetObject on a specific path.

Practice this question →

186

Multi-Selectmedium

A data scientist is deploying a machine learning model on Amazon SageMaker for real-time inference. The model requires low-latency predictions and must be able to handle up to 1000 requests per second. Which TWO actions should the data scientist take to ensure the endpoint can meet the performance requirements? (Choose 2.)

Select 2 answers

A.Use a multi-model endpoint to host multiple models on the same instance.

B.Enable data capture to Amazon S3 for model monitoring and retraining.

C.Use a serverless inference endpoint to automatically scale.

D.Configure an auto scaling policy for the endpoint based on invocation metrics.

E.Deploy the model on a single large instance (e.g., ml.p3.16xlarge).

AnswersB, D

Data capture logs predictions for monitoring and retraining, which is a best practice.

Why this answer

Option B is correct because enabling data capture to S3 allows model monitoring and retraining. Option D is correct because auto scaling adjusts instances based on load. Option A is wrong because serverless inference has a cold start and max concurrency limits unsuitable for 1000 TPS.

Option C is wrong because increasing instance size alone may not be cost-effective and auto scaling is better. Option E is wrong because multi-model endpoints share resources and may cause contention.

Practice this question →

187

MCQeasy

A machine learning engineer needs to deploy a TensorFlow model to a SageMaker endpoint. The model expects a specific input format. The engineer has the model artifacts stored in an S3 bucket. Which step is REQUIRED to deploy the model?

A.Register the model in SageMaker Model Registry.

B.Create a SageMaker training job to re-train the model.

C.Save the model as a SavedModel format.

D.Create a SageMaker Model object using the TensorFlow serving image.

AnswerD

A SageMaker Model object is required to specify the container and artifact location for deployment.

Why this answer

Creating a SageMaker Model object (Option C) is required to specify the image and artifact location, which is then used to deploy an endpoint. Option A (saving as SavedModel) is already done. Option B (Registering in Model Registry) is optional.

Option D (creating a training job) is not needed for deployment.

Practice this question →

188

MCQhard

A SageMaker endpoint creation fails with the above CloudWatch Logs excerpt. What is the MOST likely cause?

A.The S3 bucket containing the model artifacts has incorrect permissions

B.The inference script has a syntax error

C.The instance type does not have enough memory to load the model

D.The model file is too large and takes longer than 300 seconds to load

AnswerD

The timeout indicates the model loading exceeds the default 300 seconds.

Why this answer

Option C is correct because the model file may be too large to load within the default timeout. Option A (missing inference script) would cause a different error. Option B (incorrect S3 permissions) would appear earlier.

Option D (instance type) would cause resource issues but not a timeout on loading.

Practice this question →

189

MCQhard

A company is deploying a real-time inference endpoint using SageMaker. The model has a high memory footprint and requires GPU acceleration. Which instance type and configuration should be used to minimize cost while meeting latency requirements?

A.ml.p3.2xlarge with 1 GPU

B.ml.g4dn.xlarge with 1 GPU

C.ml.c5.xlarge with no GPU

D.ml.p3.16xlarge with 8 GPUs

AnswerA

Good balance of GPU and memory for high-memory models at reasonable cost.

Why this answer

Option C is correct because the p3.2xlarge provides GPU acceleration with sufficient memory for a high-memory model and is cost-effective for real-time inference. Option A is incorrect because ml.c5.xlarge does not have GPU. Option B is incorrect because p3.16xlarge is too large and expensive.

Option D is incorrect because ml.g4dn.xlarge has less memory and may not be suitable for high memory footprint.

Practice this question →

190

Multi-Selecteasy

Which TWO AWS services can be used to deploy a trained model for serverless inference? (Select TWO.)

Select 2 answers

A.AWS Lambda with a container image

B.Amazon SageMaker Serverless Inference

C.Amazon SageMaker batch transform

D.Amazon Elastic Container Service (ECS) with Fargate

E.Amazon EC2 instances

AnswersA, B

Serverless compute for small models.

Why this answer

SageMaker Serverless Inference automatically scales. Lambda can serve models if they fit within its limits. Option C is wrong because SageMaker batch transform is not serverless real-time.

Option D is wrong because ECS is not serverless (requires management). Option E is wrong because EC2 is not serverless.

Practice this question →

191

MCQmedium

An ML team is using Amazon SageMaker to train a model. They notice that the training job is taking longer than expected and the CloudWatch metrics show high GPU utilization but low CPU utilization. Which action is MOST likely to improve training speed?

A.Use SageMaker Pipe mode to stream data from S3 to reduce I/O bottleneck

B.Switch to a CPU-only instance to avoid GPU overhead

C.Increase the number of training instances

D.Use a larger GPU instance with more GPU memory

AnswerA

Pipe mode reduces time spent on data loading, allowing GPU to be more utilized.

Why this answer

Option B is correct because high GPU utilization indicates the GPU is busy, but low CPU may indicate a bottleneck in data loading; using Pipe mode can reduce I/O wait. Option A (increase instance count) may help if the job is parallelizable but not if the bottleneck is data loading. Option C (increase GPU memory) does not address data loading.

Option D (use CPU instance) would slow down training.

Practice this question →

192

MCQhard

A company is using SageMaker to train a model with a large dataset that is stored in S3. The training job is taking a long time due to high I/O latency. The team has already converted the data to RecordIO format. What should they do next to reduce I/O latency?

A.Use SageMaker fast file mode

B.Use multiple training instances

C.Use Amazon FSx for Lustre as the training data source

D.Shuffle the data before training

E.Use Pipe mode to stream the RecordIO data

AnswerE

Pipe mode avoids disk I/O by streaming data directly from S3.

Why this answer

Option A is correct because using Pipe mode with RecordIO streams data directly, reducing I/O. Option B (shuffle) does not reduce I/O. Option C (FSx for Lustre) provides high-performance file system but adds complexity.

Option D (fast file mode) is still File mode. Option E (multiple instances) may increase throughput but not per-instance I/O.

Practice this question →

193

MCQmedium

A company runs a machine learning pipeline on Amazon SageMaker. The pipeline consists of three steps: data preprocessing (using a custom container), training (using a built-in algorithm), and model evaluation (using a custom container). The pipeline is orchestrated using AWS Step Functions. Recently, the pipeline has been failing intermittently at the model evaluation step with a 'TimeoutError'. The evaluation step runs a Python script that loads the trained model and a test dataset from S3, computes metrics, and writes results back to S3. The step is configured with a timeout of 600 seconds. The test dataset size has grown over time. The data science team suspects that the timeout is due to the increased data size. They want a solution that minimizes changes to the existing infrastructure and avoids increasing the timeout arbitrarily. Which approach should the team take?

A.Increase the timeout to 1200 seconds and use a larger instance type for the evaluation step.

B.Increase the timeout to 1800 seconds to accommodate the larger dataset.

C.Modify the evaluation script to process the test dataset in parallel batches, and use multiprocessing to distribute the workload within the same container.

D.Switch the evaluation step to use the 'ml.m5.4xlarge' instance type for more memory and compute.

AnswerC

Reduces wall-clock time without increasing timeout or instance size.

Why this answer

Option C is correct because it addresses the root cause—the evaluation script's inability to process the growing dataset within the 600-second timeout—by parallelizing the workload within the same container. This approach minimizes infrastructure changes (no instance type or timeout increase) and leverages Python's multiprocessing to reduce wall-clock time, directly tackling the 'TimeoutError' without arbitrary timeout extensions.

Exam trap

The trap here is that candidates often default to scaling up infrastructure (larger instances or higher timeouts) instead of optimizing the code, which is a classic 'throw hardware at the problem' misconception that the MLS-C01 exam tests by rewarding efficient, cost-conscious solutions.

How to eliminate wrong answers

Option A is wrong because increasing both timeout and instance type is an over-engineered solution that introduces unnecessary cost and complexity, and it does not address the underlying inefficiency in processing the dataset sequentially. Option B is wrong because simply increasing the timeout to 1800 seconds is a temporary band-aid that does not fix the performance bottleneck; as the dataset continues to grow, the timeout will need to be increased again, leading to an unsustainable pattern. Option D is wrong because switching to a larger instance type (ml.m5.4xlarge) only provides more memory and compute but does not change the sequential processing logic; the script will still take the same amount of time (or only marginally less) and may still hit the timeout if the dataset is large enough.

Practice this question →

194

MCQeasy

A startup is using SageMaker to train a model using the built-in XGBoost algorithm. The training job runs successfully but the resulting model performs poorly on the test data. The data scientist suspects overfitting. The training data is relatively small (10,000 rows). Which action should be taken to reduce overfitting?

A.Decrease the number of trees (num_round) to 50

B.Increase the learning rate to 0.3

C.Increase the number of trees (num_round) to 500

D.Use a larger instance type

AnswerA

Fewer trees reduce overfitting.

Why this answer

Option C is correct because reducing the number of trees (or early stopping) reduces overfitting. Option A is wrong because increasing trees increases overfitting. Option B is wrong because increasing learning rate may cause divergence.

Option D is wrong because using a larger instance does not affect overfitting.

Practice this question →

195

MCQmedium

Refer to the exhibit. An IAM policy is attached to a SageMaker notebook instance. Which action will the notebook be able to perform?

A.Create a training job

B.Create a model

C.Read data from S3

D.Invoke a SageMaker endpoint

AnswerD

The policy explicitly allows sagemaker:InvokeEndpoint.

Why this answer

Option C is correct because the policy allows sagemaker:InvokeEndpoint. Option A is wrong because sagemaker:CreateTrainingJob is not allowed. Option B is wrong because s3:GetObject is not allowed.

Option D is wrong because sagemaker:CreateModel is not allowed.

Practice this question →

196

MCQmedium

A data scientist is using Amazon SageMaker to train a model and wants to use a custom Docker container for training. The container requires access to a private Amazon ECR repository. Which IAM role configuration is needed?

A.Attach an IAM policy to the SageMaker execution role that allows ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, and ecr:GetAuthorizationToken for the ECR repository.

B.Use the AWS account owner's IAM role as the SageMaker execution role.

C.Create a new IAM user with ECR access and store credentials in SageMaker.

D.Add a bucket policy to the ECR repository allowing access from the SageMaker execution role.

AnswerA

These permissions allow SageMaker to pull the container image.

Why this answer

Option C is correct because the SageMaker execution role must have an IAM policy that allows ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, and ecr:GetAuthorizationToken for the ECR repository. Option A is wrong because the ECR repository does not grant permissions to the SageMaker role via bucket policies. Option B is wrong because SageMaker does not use the ECR repository owner's role.

Option D is wrong because the SageMaker role is the appropriate entity to grant permissions.

Practice this question →

197

MCQmedium

A machine learning engineer is deploying a custom XGBoost model for real-time inference on Amazon SageMaker. The model was trained using the SageMaker XGBoost built-in algorithm. The endpoint is deployed with an ml.m5.large instance and is receiving around 50 requests per second. The engineer notices that the endpoint's latency is around 200 ms, but the requirement is under 100 ms. The model's serialized format is a .tar.gz file. The engineer wants to reduce inference latency without modifying the model or retraining. What should the engineer do?

A.Configure SageMaker Debugger to optimize the inference code.

B.Use SageMaker Elastic Inference to attach an accelerator.

C.Use SageMaker Neo to compile the model for the target instance.

D.Use SageMaker Batch Transform instead of a real-time endpoint.

AnswerC

Neo optimizes model for faster inference on specific hardware.

Why this answer

Option C is correct because SageMaker Neo optimizes trained models for target hardware, reducing latency. Option A is wrong because SageMaker Batch Transform is for batch, not real-time. Option B is wrong because SageMaker Debugger monitors training, not inference.

Option D is wrong because Elastic Inference attaches GPU acceleration for deep learning, not XGBoost which is tree-based.

Practice this question →

198

Multi-Selecthard

A company is deploying a machine learning model using Amazon SageMaker. The model must be updated frequently without downtime. Which TWO strategies can achieve this? (Choose two.)

Select 2 answers

A.Update the model artifact on the existing endpoint.

B.Delete the existing endpoint and create a new one.

C.Use blue/green deployment with endpoint variants.

D.Use rolling update with multiple instances.

E.Use canary deployment by gradually shifting traffic.

AnswersC, E

Traffic is shifted gradually.

Why this answer

Options B and D are correct. Blue/green deployment and canary deployment both allow zero-downtime updates by routing traffic gradually. Option A is wrong because deleting and recreating the endpoint causes downtime.

Option C is wrong because updating the model directly on the endpoint is not supported without creating a new endpoint. Option E is wrong because SageMaker does not support rolling updates natively.

Practice this question →

199

MCQhard

A company is using Amazon SageMaker to train a model on data stored in S3. The training job needs to access data from an S3 bucket in a different AWS account. The data owner has granted cross-account access via a bucket policy. However, the training job fails with an AccessDenied error. What is the MOST likely cause?

A.The data is encrypted with SSE-KMS and the SageMaker role lacks KMS permissions.

B.The SageMaker execution role does not have the necessary permissions to access the S3 bucket.

C.The S3 bucket is not configured with public access.

D.The S3 bucket is in a different region and requires a VPC endpoint.

AnswerB

The IAM role used by SageMaker must be allowed in the bucket policy.

Why this answer

Option A is correct because SageMaker training jobs use an execution role; that role must be granted cross-account access via the bucket policy. Option B is wrong because S3 does not require VPC endpoints for cross-account access. Option C is wrong because the data does not need to be public.

Option D is wrong because KMS encryption is not the issue unless the bucket policy denies.

Practice this question →

200

MCQhard

A company uses SageMaker Ground Truth to label images for object detection. After labeling, they notice that the bounding boxes are often misaligned with the objects. Which action should they take to improve label quality?

A.Use a pre-built annotation tool that enforces bounding box alignment

B.Use automated labeling with a pre-trained model

C.Increase the number of workers per task

D.Adjust the confidence threshold for the model

AnswerA

Tool constraints improve consistency.

Why this answer

Option C is correct because using a pre-built annotation tool reduces variability. Option A is wrong because increasing workers does not guarantee consistency. Option B is wrong because automated labeling may not be accurate initially.

Option D is wrong because adjusting confidence threshold is for post-processing, not labeling.

Practice this question →

201

MCQhard

Refer to the exhibit. A SageMaker endpoint is returning 5xx errors. The logs show the above error. Which change will most likely resolve the issue?

A.Reduce the batch size in the inference script

B.Enable Auto Scaling on the endpoint

C.Compress the model artifact

D.Use a larger instance type with more memory

AnswerD

More memory solves OutOfMemoryError.

Why this answer

Option B is correct because increasing memory (e.g., upgrading instance type) addresses the OutOfMemoryError. Option A is wrong because batch size is not relevant for inference (single request). Option C is wrong because the error is memory, not model file size.

Option D is wrong because AutoScaling does not change instance memory.

Practice this question →

202

MCQeasy

Refer to the exhibit. A data scientist runs the AWS CLI command to create a SageMaker training job. The training job fails because the input data is not accessible. Which step should the data scientist take to fix the issue?

A.Attach an IAM policy to SageMakerRole that grants s3:GetObject on the bucket

B.Add a VpcConfig to the training job

C.Modify the bucket policy to allow s3:GetObject for any principal

D.Increase VolumeSizeInGB to 50

AnswerA

The role needs explicit S3 read permissions.

Why this answer

Option B is correct because the IAM role must have s3:GetObject permission for the S3 bucket. Option A is wrong because the bucket policy is separate. Option C is wrong because VolumeSizeInGB is for local storage, not S3.

Option D is wrong because VPC configuration is not the issue.

Practice this question →

203

Multi-Selectmedium

A data scientist is using SageMaker to train a model and wants to track experiments, including hyperparameters and metrics. Which TWO actions should the scientist take to set up experiment tracking? (Choose TWO.)

Select 2 answers

A.Use the SageMaker Experiments Python SDK to create an experiment and log runs.

B.Enable SageMaker Model Monitor to track training metrics.

C.Configure CloudWatch Logs to store experiment data.

D.Create a trial component in the experiment to log hyperparameters and metrics.

E.Enable SageMaker Studio to automatically capture experiments.

AnswersA, D

Directly supports experiment tracking.

Why this answer

Option A is correct because the SageMaker Experiments Python SDK provides the primary interface for creating and managing experiments, allowing the data scientist to log runs, hyperparameters, and metrics in a structured way. This SDK directly integrates with SageMaker training jobs and notebook executions to capture experiment metadata.

Exam trap

Cisco often tests the distinction between monitoring (Model Monitor) and experiment tracking (Experiments SDK), and the trap here is that candidates confuse CloudWatch Logs or Model Monitor as valid tools for structured experiment metadata capture when they are not designed for that purpose.

Practice this question →

204

MCQeasy

A data scientist is using Amazon SageMaker to train a model using a built-in algorithm. The training job is taking a long time, and the data scientist wants to improve performance by using a larger instance type with more vCPUs. The training job is currently using an ml.m5.large instance. The data scientist changes the instance type to ml.m5.4xlarge and resubmits the training job. However, the training time does not decrease significantly. What is the MOST likely reason?

A.The algorithm is single-threaded and cannot use multiple vCPUs.

B.The built-in algorithm is not designed to scale with additional vCPUs.

C.The training job is I/O bound, and increasing vCPUs does not help.

D.The training dataset is too small to benefit from more vCPUs.

AnswerB

Correct: Some algorithms are not parallelized and do not benefit from more vCPUs.

Why this answer

The built-in algorithm may not be able to utilize additional vCPUs effectively if it is not parallelized. Option B is correct. Option A is incorrect because data size does not prevent parallelism.

Option C is incorrect because the algorithm is not inherently single-threaded; it depends on implementation. Option D is unlikely because training on larger instances generally costs more, but that does not affect time.

Practice this question →

205

MCQmedium

A company has deployed a model on SageMaker for real-time inference. The endpoint is experiencing high latency during traffic spikes. Which action should the company take to reduce latency?

A.Use a larger instance type for the endpoint

B.Attach SageMaker Elastic Inference to the endpoint

C.Enable SageMaker endpoint auto-scaling

D.Use SageMaker Neo to compile the model

E.Switch to SageMaker batch transform

AnswerC

Auto-scaling adds instances during spikes, reducing latency.

Why this answer

Option D is correct because enabling auto-scaling adds instances during spikes, reducing latency. Option A (larger instance) may help but is not cost-effective. Option B (batch transform) is for async inference.

Option C (Elastic Inference) accelerates inference but does not handle spikes. Option E (SageMaker Neo) optimizes for edge devices.

Practice this question →

206

Multi-Selecteasy

A company wants to monitor SageMaker endpoints for data drift. Which TWO services can be used together to detect and alert on drift?

Select 2 answers

A.SageMaker Data Wrangler

B.SageMaker Model Monitor

C.AWS CodePipeline

D.Amazon CloudWatch Alarms

E.Amazon CloudWatch Logs

AnswersB, D

Model Monitor detects drift in real-time.

Why this answer

SageMaker Model Monitor detects drift and CloudWatch Alarms can send alerts. Options A and D are correct. Option B is for code, Option C is for logs, Option E is for data preparation.

Practice this question →

207

MCQhard

A company is using SageMaker to host a model that performs real-time fraud detection. The model receives high request volumes with occasional spikes. The company wants to ensure that the endpoint can handle spikes without throttling while minimizing cost. Which scaling strategy should be used?

A.Use a target tracking scaling policy with a target value of 70% for the SageMakerVariantInvocationsPerInstance metric.

B.Use a simple scaling policy with a step adjustment based on the InvocationsPerInstance metric.

C.Manually adjust the instance count based on monitoring dashboards.

D.Use a scheduled scaling action to add instances during peak hours.

AnswerA

Automatically scales based on utilization.

Why this answer

A target tracking scaling policy with the SageMakerVariantInvocationsPerInstance metric is the correct choice because it automatically adjusts the instance count to maintain a target utilization (e.g., 70%), handling spikes without manual intervention while minimizing cost by scaling down during low traffic. This is the recommended approach for real-time endpoints with variable traffic, as it aligns with AWS best practices for dynamic scaling.

Exam trap

The trap here is that candidates often confuse simple scaling (step adjustments) with target tracking, assuming any metric-based policy works, but target tracking is specifically designed for maintaining a utilization target and is the only option that handles irregular spikes without manual or scheduled intervention.

How to eliminate wrong answers

Option B is wrong because simple scaling policies with step adjustments require predefined thresholds and cooldown periods, which can lead to over-provisioning or under-provisioning during sudden spikes, lacking the smooth, proportional response of target tracking. Option C is wrong because manually adjusting instance count based on dashboards is reactive, error-prone, and cannot handle rapid spikes without causing throttling or waste, defeating the goal of cost minimization. Option D is wrong because scheduled scaling only works for predictable traffic patterns, not for occasional spikes that occur at irregular times, leading to either throttling during unscheduled surges or unnecessary cost during off-peak hours.

Practice this question →

208

Multi-Selecteasy

Which TWO actions are best practices for securing a SageMaker notebook instance? (Select TWO.)

Select 2 answers

A.Disable direct internet access for the notebook instance.

B.Enable root access for users to install packages.

C.Launch the notebook instance in a private subnet in a VPC.

D.Store data in the notebook's local storage for performance.

E.Use a shared IAM user for all data scientists.

AnswersA, C

Disabling internet access prevents data exfiltration.

Why this answer

Best practices include using VPC to isolate the notebook, enabling encryption at rest, using IAM roles with least privilege, and disabling direct internet access. Direct internet access should be disabled to prevent data exfiltration. Root access should be disabled for notebook instances.

Practice this question →

209

MCQhard

A data scientist is trying to create a training job named 'test-model' using an IAM role with the attached policy. The creation fails with an AccessDenied error. What is the most likely cause?

A.The Resource is set to '*' and should be specific.

B.The Deny statement uses 'StringNotEquals' which should be 'StringEquals'.

C.The IAM role does not have permission to assume the SageMaker execution role.

D.The Deny statement uses a wildcard '*' in the condition value, which is not supported for StringNotEquals.

AnswerD

Wildcards are not supported in StringNotEquals conditions, causing unexpected denial.

Why this answer

The Deny statement uses 'StringNotEquals' with a wildcard '*' in the condition value, which is not supported for the 'StringNotEquals' condition operator in IAM policies. The 'StringNotEquals' operator requires exact string matching and does not support wildcards; using '*' will cause the condition to never match, effectively making the Deny statement non-functional or causing unexpected behavior. This mismatch leads to an AccessDenied error because the policy evaluation fails to properly deny or allow the action.

Exam trap

The trap here is that candidates may assume 'StringNotEquals' supports wildcards like 'StringNotLike' does, or they may focus on the Resource wildcard (Option A) as the obvious cause, missing the subtle condition operator mismatch.

How to eliminate wrong answers

Option A is wrong because setting the Resource to '*' is generally acceptable for service-linked roles or broad permissions, and the error is specifically about an AccessDenied due to a policy condition issue, not resource specificity. Option B is wrong because 'StringNotEquals' is a valid condition operator; the issue is not the operator itself but the use of a wildcard in its value, which is unsupported. Option C is wrong because the IAM role's ability to assume the SageMaker execution role is a separate permission (sts:AssumeRole) and not directly related to the training job creation failure caused by the Deny statement's condition syntax.

Practice this question →

210

MCQeasy

A machine learning engineer is deploying a model that was trained on a large dataset stored in Amazon S3. The model needs to be retrained daily with new data. Which approach is the MOST cost-effective for storing the training data while allowing quick access for retraining?

A.Store all data in S3 Standard

B.Use S3 Glacier Deep Archive

C.Use S3 Intelligent-Tiering

D.Use S3 One Zone-IA

AnswerC

Intelligent-Tiering automatically optimizes costs for data with changing access patterns.

Why this answer

Option B is correct because S3 Intelligent-Tiering automatically moves data between access tiers to optimize costs, and it provides low-latency access. Option A is wrong because S3 Standard is more expensive for data that is not frequently accessed. Option C is wrong because S3 Glacier is for long-term archival, not for data that needs to be accessed daily.

Option D is wrong because S3 One Zone-IA is cheaper but may not be suitable for critical data and still costs more than Intelligent-Tiering for varying access patterns.

Practice this question →

211

Multi-Selectmedium

A data scientist is using Amazon SageMaker to train a model and wants to track experiments, including parameters and metrics. Which THREE actions should be taken? (Choose three.)

Select 3 answers

A.Use SageMaker Studio to manually record experiments.

B.Use Amazon CloudWatch Logs to store experiment data.

C.Create an experiment in SageMaker Experiments.

D.Use the SageMaker SDK to log parameters and metrics in the training script.

E.Use the SageMaker SDK to create a trial and trial component.

AnswersC, D, E

Experiments organize runs.

Why this answer

Options A, B, and D are correct. SageMaker Experiments tracks parameters and metrics; adding the SDK to the training script logs them; creating an experiment integrates tracking. Option C is wrong because CloudWatch is for monitoring but not designed for experiment tracking.

Option E is wrong because SageMaker Studio is the interface, not the tracking mechanism itself.

Practice this question →

212

MCQhard

A machine learning team is using SageMaker Processing jobs to run feature engineering on large datasets. The job takes a long time to complete. Which change would most likely reduce the processing time?

A.Increase the number of instances in the processing cluster

B.Switch to local mode to avoid network overhead

C.Change the processing script from Python to PySpark

D.Use a larger instance type, e.g., from r5.xlarge to r5.24xlarge

AnswerA

More instances allow parallel processing, reducing overall time.

Why this answer

Option B is correct: Increasing the instance count allows distributed processing, reducing time. Option A (larger instance) helps but is less scalable. Option C (local mode) is for testing, not production.

Option D (changing framework) may not improve performance.

Practice this question →

213

MCQmedium

A company uses SageMaker to host a model for real-time predictions. The model is updated weekly. To minimize downtime during model updates, what should the company do?

A.Create a new endpoint configuration with the new model and update the endpoint to use the new configuration

B.Create a second endpoint with the new model and use an Application Load Balancer to route traffic

C.Update the existing endpoint configuration with the new model URL

D.Delete the existing endpoint and create a new one with the updated model

AnswerA

SageMaker supports blue/green deployment by updating endpoint to new configuration, minimizing downtime.

Why this answer

Option D is correct: Creating a new endpoint configuration with the new model and updating the endpoint with a blue/green deployment minimizes downtime. Option A (delete and recreate) causes downtime. Option B (update endpoint directly) causes brief downtime.

Option C (multiple endpoints with load balancer) works but is more complex than SageMaker's built-in blue/green.

Practice this question →

214

MCQhard

A company is using Amazon SageMaker Ground Truth to create a labeled dataset for object detection. The labeling job is taking longer than expected. The team notices that many workers are spending a lot of time on images with no objects. Which labeling strategy should they use to reduce costs and time?

A.Use a private workforce instead of public.

B.Create a pre-labeling task where workers only identify if an object exists, then send only positive images for full labeling.

C.Use automated data labeling with a pre-trained model to filter empty images.

D.Increase the number of workers per dataset object.

AnswerB

This two-stage approach reduces work on empty images.

Why this answer

Ground Truth supports automated data labeling and can use a pre-built model to filter out images with no objects. However, the most effective way is to use a pre-labeling task with a machine learning model to automatically reject images without objects. Alternatively, using a 'verify' labeling task where workers only verify if objects exist can be efficient.

The best option is to use a 'verify' task mode, which is available for object detection.

Practice this question →

215

Multi-Selectmedium

Which TWO actions can reduce inference latency for a SageMaker real-time endpoint? (Choose 2.)

Select 2 answers

A.Choose a larger instance type with more compute capacity.

B.Add more instances behind the endpoint.

C.Use batch transform instead.

D.Compile the model using SageMaker Neo.

E.Switch to asynchronous inference.

AnswersA, D

More compute reduces per-request latency.

Why this answer

Using a larger instance with more CPU/memory (Option A) and compiling the model with SageMaker Neo (Option D) both reduce latency. Option B (adding instances) increases throughput but not per-request latency. Option C (asynchronous) is for different use case.

Option E (batch transform) is for offline inference.

Practice this question →

216

MCQmedium

A company is building a recommendation system using Amazon SageMaker. The data is stored in a large S3 bucket with millions of small CSV files. The team wants to train a factorization machines model. Which data ingestion strategy will be MOST efficient?

A.Use a SageMaker Processing job with a Spark container to read the files and write a single RecordIO file.

B.Use Amazon Athena to query the data and output to a single CSV.

C.Point the training job directly to the S3 bucket containing the CSV files.

D.Use SageMaker Data Wrangler to create a data flow and export to a training dataset.

AnswerA

Spark can efficiently combine many small files into a single format optimized for training.

Why this answer

Using SageMaker Processing with Spark (Option C) can efficiently read many small files and convert them to a single RecordIO file, which is optimal for SageMaker training. Option A (direct training) would be slow due to many small files. Option B (Athena) is for SQL queries, not data conversion.

Option D (Data Wrangler) is for smaller datasets and manual analysis.

Practice this question →

217

MCQeasy

A company wants to serve predictions from a model using a REST API with low latency. Which SageMaker deployment option is most appropriate?

A.SageMaker Notebook instance

B.SageMaker real-time endpoint

C.SageMaker Processing job

D.SageMaker Batch Transform

AnswerB

Real-time endpoints provide low-latency REST API.

Why this answer

Option B is correct because SageMaker real-time endpoints provide a REST API. Option A is wrong because SageMaker Notebook is for development. Option C is wrong because SageMaker Batch Transform is for offline predictions.

Option D is wrong because SageMaker Processing is for data processing.

Practice this question →

218

MCQhard

A financial services company is deploying a machine learning model for credit risk assessment. The model must have an inference latency under 200ms and must be able to handle up to 1000 transactions per second (TPS). The company wants to minimize costs. The model is a gradient boosting model implemented in XGBoost. Which SageMaker deployment option should the team choose?

A.Use SageMaker Batch Transform to process transactions in batches.

B.Use SageMaker asynchronous inference for queued requests.

C.Deploy the model on a SageMaker real-time endpoint with multiple instances behind a load balancer.

D.Use SageMaker Serverless Inference for automatic scaling.

AnswerC

Real-time endpoints provide sub-second latency and can scale to 1000 TPS.

Why this answer

SageMaker real-time endpoints are designed for low-latency, high-throughput inference. They can scale horizontally to handle TPS requirements. SageMaker Batch Transform (B) is for offline processing.

SageMaker Serverless Inference (C) has cold starts and may not meet latency requirements under high load. SageMaker asynchronous inference (D) is for near-real-time but has higher latency.

Practice this question →

219

MCQmedium

A company is using Amazon SageMaker to host a real-time inference endpoint for a natural language processing model. The endpoint is configured with an ml.m5.large instance. After deployment, the company observes that the inference latency is higher than expected, and the endpoint is experiencing CPU utilization near 100% during peak hours. The model is a PyTorch model that uses a transformer architecture. The company wants to reduce latency without increasing cost significantly. Which approach should the company take?

A.Configure the endpoint with Auto Scaling to add more instances during peak hours.

B.Switch to batch transform for inference.

C.Attach an Elastic Inference accelerator to the existing instance.

D.Change the endpoint instance type to ml.g4dn.xlarge to use GPU acceleration.

AnswerD

Correct: GPU instances accelerate transformer inference, reducing latency.

Why this answer

The issue is high CPU utilization causing latency. Using a GPU instance (ml.g4dn.xlarge) can accelerate inference for transformer models due to parallel processing, reducing latency. Option C is correct.

Option A (Elastic Inference) may help but is less effective than a full GPU for transformer models; also, it adds complexity. Option B (Auto Scaling) helps with traffic but does not reduce per-request latency. Option D (batch transform) is for offline inference, not real-time.

Practice this question →

220

MCQhard

A machine learning engineer is deploying a model to an Amazon SageMaker endpoint. The model is a PyTorch model that requires a custom inference script. The engineer notices that the endpoint is returning 500 errors after deployment. Which step should the engineer take to debug the issue?

A.Redeploy the endpoint with a different instance type.

B.Check the CloudWatch metrics for the endpoint.

C.Modify the inference script and update the endpoint.

D.View the CloudWatch Logs for the endpoint.

AnswerD

Logs contain stack traces and error messages.

Why this answer

Option D is correct because CloudWatch Logs contain detailed error messages. Option A is wrong because the endpoint is already deployed. Option B is wrong because it would overwrite the script.

Option C is wrong because CloudWatch metrics show metrics, not errors.

Practice this question →

221

MCQmedium

A company is using Amazon SageMaker to train a model on a large dataset stored in S3. The training job is taking a long time due to slow data loading. Which action can the data scientist take to reduce data loading time?

A.Use Pipe mode to stream data from S3.

B.Use File mode and copy data to Amazon EBS.

C.Use a larger instance type with more memory.

D.Enable data augmentation during training.

AnswerA

Pipe mode streams data directly, reducing load time.

Why this answer

Option D is correct because Pipe mode streams data directly. Option A is wrong because it doesn't change data transfer. Option B is wrong because it may be slower.

Option C is wrong because it doesn't help.

Practice this question →

222

MCQmedium

A team is using SageMaker to train a model using the built-in XGBoost algorithm. The training job is taking longer than expected. The team suspects that the data is not being loaded efficiently. Which data format should they use to minimize training time?

A.Pipe mode with CSV

B.File mode with Parquet

C.Pipe mode with RecordIO-Protobuf

D.File mode with CSV

AnswerC

Streaming with efficient binary format.

Why this answer

Option B is correct because Pipe mode streams data from S3 directly to the algorithm, reducing initialization time. Option A is wrong because File mode downloads data first. Option C is wrong because CSV is not as efficient as RecordIO for XGBoost.

Option D is wrong because Parquet is not natively supported by XGBoost in SageMaker.

Practice this question →

223

Multi-Selecthard

A company is deploying a machine learning model to an Amazon SageMaker endpoint. The model receives requests with sensitive data that must be encrypted in transit and at rest. Additionally, the company needs to control access to the endpoint using AWS IAM. Which THREE steps should the company take to meet these requirements? (Choose THREE.)

Select 3 answers

A.Enable HTTPS for the endpoint

B.Configure the endpoint to use a VPC

C.Store the model artifacts in Amazon S3 with SSE-S3 encryption

D.Enable encryption at rest for the endpoint's ML storage volume

E.Attach an IAM policy to the endpoint to allow only authorized principals

AnswersA, D, E

HTTPS encrypts data in transit.

Why this answer

Options A, B, and D are correct. Enabling encryption in transit using HTTPS, enabling encryption at rest for the endpoint's attached storage, and attaching an IAM policy to the endpoint to allow only authorized users are necessary. Option C is wrong because VPC settings do not encrypt data at rest.

Option E is wrong because the model data in S3 should be encrypted with SSE-KMS, not SSE-S3, to meet stricter security requirements.

Practice this question →

224

MCQeasy

A company is deploying a PyTorch model on a SageMaker endpoint for real-time inference. The model is stored as a .pth file in an S3 bucket. The data scientist wants to use the SageMaker PyTorch inference toolkit. Which file is REQUIRED in the model artifacts to serve the model?

A.A file named model.tar.gz that contains the model and any dependencies.

B.A file named inference.py that defines the model loading and prediction logic.

C.A file named model.pth containing the model state dictionary.

D.A file named requirements.txt listing the dependencies.

AnswerC

The PyTorch inference toolkit loads model.pth by default.

Why this answer

Option B is correct because the SageMaker PyTorch inference toolkit expects a file named model.pth in the model artifacts. Option A is wrong because inference.py is optional for custom code. Option C is wrong because requirements.txt is optional.

Option D is wrong because the model can be loaded from S3.

Practice this question →

225

MCQmedium

A data scientist is using SageMaker to train a model. The training job is failing with a 'ResourceLimitExceeded' error. Which action should be taken to resolve this issue?

A.Use a different AWS Region.

B.Request a service limit increase for the instance type.

C.Reduce the training dataset size.

D.Switch to a different instance type with lower resource requirements.

AnswerB

The error indicates the instance limit is reached; requesting an increase resolves it.

Why this answer

Option B is correct because the error indicates that the account's maximum number of instances for the instance type has been reached. Requesting a limit increase for the specific instance type is the appropriate resolution. Option A is incorrect because it does not address the limit issue.

Option C is incorrect because switching to a different instance type may not be desired and does not resolve the underlying limit. Option D is incorrect because the error is about resource limits, not data size.

Practice this question →

← PreviousPage 3 of 5 · 351 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Machine Learning Implementation and Operations questions.

Start 20-question session