CCNA Machine Learning Implementation and Operations Questions

75 of 351 questions · Page 4/5 · Machine Learning Implementation and Operations · Answers revealed

226
Multi-Selecteasy

Which TWO SageMaker features can be used to monitor and debug training jobs? (Choose 2.)

Select 2 answers
A.SageMaker Debugger
B.SageMaker Model Monitor
C.SageMaker Ground Truth
D.Amazon CloudWatch Logs
E.SageMaker Clarify
AnswersA, D

Debugger captures real-time training metrics and tensors.

Why this answer

SageMaker Debugger (Option A) captures tensors and metrics during training. CloudWatch Logs (Option E) provide training script logs. Options B (Model Monitor), C (Clarify), and D (Ground Truth) are for inference, explainability, and labeling respectively.

227
MCQhard

A machine learning team is using Amazon SageMaker to train a model with a custom algorithm packaged in a Docker container. The training job fails with the error 'Error: Unable to locate sagemaker-training toolkit.' What is the MOST likely cause?

A.The container does not have internet access to download dependencies
B.The instance type is incompatible with the container
C.The training role does not have permissions to access the container repository
D.The container does not include the SageMaker Training Toolkit
AnswerD

The toolkit must be installed in the Docker image.

Why this answer

SageMaker requires the container to include the SageMaker Training Toolkit (or the common library) for integration. Option B is correct. Option A is wrong because the error is about the toolkit, not network.

Option C is wrong because the error is clear about missing toolkit. Option D is wrong because the container is custom, not the default.

228
Multi-Selecthard

A machine learning team is building a real-time inference pipeline using Amazon SageMaker. The team has multiple models that need to be served, but usage patterns are unpredictable and traffic spikes occur several times a day. The team wants to minimize costs while maintaining low latency. Which THREE actions should the team take?

Select 3 answers
A.Enable provisioned concurrency on the endpoint to reduce cold starts.
B.Use SageMaker inference with Spot Instances to reduce cost.
C.Use a SageMaker multi-model endpoint to serve multiple models on the same instance.
D.Configure automatic scaling on the endpoint to handle traffic spikes.
E.Use SageMaker Batch Transform for all inference requests.
AnswersB, C, D

Spot Instances are cheaper but can be interrupted; for cost savings, sometimes acceptable.

Why this answer

SageMaker multi-model endpoints (A) allow serving multiple models on a single instance, reducing cost. SageMaker automatic scaling (B) adjusts capacity based on demand, handling spikes. Using Spot Instances (C) for inference can reduce cost but may cause interruptions; for real-time, On-Demand is safer.

Provisioned concurrency (D) is for Lambda, not SageMaker. Batch Transform (E) is for offline inference.

229
MCQmedium

A company wants to monitor a deployed model for data drift. Which AWS service should they use?

A.Amazon SageMaker Ground Truth
B.Amazon CloudWatch Logs
C.Amazon SageMaker Clarify
D.Amazon SageMaker Model Monitor
AnswerD

Model Monitor checks for data and model drift.

Why this answer

Option B is correct because SageMaker Model Monitor is designed for drift detection. Option A is wrong because CloudWatch is for metrics, not drift. Option C is wrong because SageMaker Clarify is for bias detection.

Option D is wrong because SageMaker Ground Truth is for labeling.

230
MCQhard

A company is using AWS Glue to run ETL jobs that transform data for machine learning. The jobs are failing with 'Out of Memory' errors. The data size is growing, and the company needs a cost-effective solution. Which approach should be taken?

A.Switch to Spark on Amazon EMR.
B.Increase the number of workers in the job configuration.
C.Optimize the job by filtering data earlier.
D.Use a larger worker type like G.2X.
AnswerB

Increases parallelism, reducing memory per worker.

Why this answer

Increasing the number of workers in the AWS Glue job configuration distributes the data processing load across more Spark executors, directly addressing the 'Out of Memory' error by providing more aggregate memory without changing the worker type. This is a cost-effective approach because it scales horizontally, often at a lower cost than moving to a larger worker type, and it leverages the existing Glue infrastructure without migrating to EMR.

Exam trap

The trap here is that candidates often assume 'Out of Memory' errors must be solved by increasing memory per worker (vertical scaling) or by switching to a more powerful service, but the most cost-effective and direct solution in AWS Glue is to increase the number of workers (horizontal scaling) to distribute the memory load.

How to eliminate wrong answers

Option A is wrong because switching to Spark on Amazon EMR would require significant architectural changes and operational overhead, and it is not inherently more cost-effective than adjusting Glue worker count for the same memory issue. Option C is wrong because filtering data earlier is a best practice for performance optimization but does not directly resolve an 'Out of Memory' error caused by insufficient total memory across workers; it reduces data volume but may not prevent memory exhaustion if the cluster is undersized. Option D is wrong because using a larger worker type like G.2X increases memory per worker but is typically more expensive than adding more workers of the same type, and it may not be the most cost-effective horizontal scaling solution for growing data.

231
MCQhard

A machine learning engineer is deploying a model on SageMaker and needs to ensure that the endpoint can handle a sudden spike in traffic. The engineer expects traffic to increase by 10x during a promotional event. Which scaling strategy should be used?

A.Use a single large instance type instead of multiple smaller instances.
B.Manually increase the instance count before the event.
C.Use only dynamic scaling based on the average latency metric.
D.Use scheduled scaling to add instances before the event, combined with dynamic scaling for the remaining duration.
AnswerD

Scheduled scaling pre-warms the endpoint to handle the spike.

Why this answer

Option A is correct because scheduled scaling can add additional instances before the expected spike. Option B is wrong because dynamic scaling may not react quickly enough to a sudden 10x spike. Option C is wrong because manual scaling requires human intervention.

Option D is wrong because a single large instance provides no redundancy and may not handle the spike.

232
MCQhard

A company operates a real-time fraud detection system using an Amazon SageMaker endpoint. The model is a gradient boosting model trained on historical transaction data. The endpoint is deployed on an ml.c5.2xlarge instance with auto-scaling enabled based on average latency. Recently, during a flash sale event, the endpoint started returning HTTP 503 errors. The CloudWatch metrics show that the CPU utilization is at 70%, and the average latency has increased from 50 ms to 200 ms. The auto-scaling policy is configured to add one instance when average latency exceeds 100 ms for 5 consecutive minutes, and remove one instance when latency drops below 50 ms for 5 minutes. The current number of instances is 2. The flash sale lasted 30 minutes. What should the company do to prevent this issue in future flash sales?

A.Enable request throttling to drop excess requests
B.Change the instance type to ml.c5.4xlarge to handle higher load
C.Pre-warm the endpoint by setting a minimum number of instances that can handle the expected peak load before the flash sale
D.Change the model to a simpler model with lower latency
AnswerC

This ensures capacity is available from the start.

Why this answer

Option C is correct because the auto-scaling policy is reactive and too slow (5-minute evaluation period) to handle rapid traffic spikes. Pre-warming the endpoint by increasing the number of instances before the flash sale ensures capacity is available. Option A (increase instance size) may help but is more expensive and still reactive.

Option B (use a different algorithm) is not the core issue. Option D (enable throttling) would still result in errors.

233
Multi-Selecthard

Which THREE steps should be taken to secure a SageMaker notebook instance that accesses sensitive data? (Select THREE.)

Select 3 answers
A.Enable encryption at rest for the notebook's EBS volume
B.Grant root access to the notebook instance for flexibility
C.Place the notebook instance inside a VPC with no internet access
D.Allow direct internet access from the notebook for downloading packages
E.Use an IAM role with least privilege permissions for the notebook
AnswersA, C, E

Protects stored data.

Why this answer

Using a VPC with no internet access keeps traffic private. IAM roles enforce least privilege access. Encryption at rest protects data on the notebook.

Option D is wrong because root access should be disabled, not enabled. Option E is wrong because public internet access should be disabled for security.

234
MCQhard

A data scientist is using Amazon SageMaker for hyperparameter tuning. The tuning job uses a Bayesian optimization strategy. After 10 training jobs, the objective metric (validation accuracy) has plateaued at 0.85. The data scientist wants to explore more diverse hyperparameter combinations. What should the data scientist do?

A.Decrease the exploration weight in the tuning job configuration.
B.Switch to random search strategy.
C.Increase the exploration weight in the tuning job configuration.
D.Increase the number of parallel training jobs.
AnswerC

Increasing exploration weight prompts the algorithm to try more diverse combinations.

Why this answer

In Bayesian optimization, the exploration weight controls the trade-off between exploring new hyperparameter regions and exploiting known good regions. Increasing this weight encourages the acquisition function to sample more diverse combinations, which can help escape a plateau. Option C is correct because it directly addresses the need for greater diversity in the search space.

Exam trap

Cisco often tests the misconception that increasing parallel jobs or switching to random search is the best way to increase diversity, when in fact Bayesian optimization's exploration weight is the precise control for this purpose.

How to eliminate wrong answers

Option A is wrong because decreasing the exploration weight would make the tuning job more exploitative, focusing on known good regions and reducing diversity, which is the opposite of what is needed. Option B is wrong because switching to random search would abandon the benefits of Bayesian optimization's informed sampling, potentially wasting resources on random trials without leveraging prior results. Option D is wrong because increasing the number of parallel training jobs does not inherently increase exploration diversity; it only speeds up the tuning process but may lead to less informed decisions if the Bayesian model cannot keep up with parallel evaluations.

235
MCQeasy

A company is using SageMaker to deploy a model for real-time inference. The model requires low latency, and the company wants to test the endpoint before production. Which approach should be used to validate endpoint performance?

A.Use CloudWatch Synthetics to create a canary.
B.Perform offline batch evaluation on a test dataset.
C.Deploy to production and monitor using CloudWatch.
D.Use SageMaker's built-in shadow testing or load testing features.
AnswerD

Allows traffic simulation and latency measurement.

Why this answer

Option D is correct because SageMaker provides a built-in load testing tool that simulates traffic to test endpoint performance. Option A is wrong because waiting for production traffic does not allow pre-production validation. Option B is wrong because CloudWatch does not generate traffic.

Option C is wrong because offline evaluation does not test real-time inference.

236
Multi-Selectmedium

Which TWO factors should be considered when choosing between Amazon SageMaker's real-time endpoints and serverless inference? (Select TWO.)

Select 2 answers
A.GPU requirement
B.Inference traffic pattern (intermittent vs steady)
C.Integration with AWS Lambda
D.Availability of built-in algorithms
E.Model size in GB
AnswersA, B

Serverless inference does not support GPU instances.

Why this answer

Serverless inference is ideal for intermittent workloads with no cold start tolerance. Real-time endpoints are better for predictable, low-latency, and high-throughput requirements. GPU support is only available on real-time endpoints.

Memory limits apply to both but serverless has a maximum of 6 GB. Concurrent requests can be handled by both, but serverless scales to zero.

237
MCQeasy

A data scientist needs to version control datasets used for machine learning experiments. Which AWS service should the data scientist use?

A.AWS Lake Formation
B.Amazon SageMaker Feature Store
C.Amazon SageMaker Model Registry
D.Amazon S3 with versioning enabled
AnswerD

S3 versioning provides dataset version control.

Why this answer

Option C is correct because AWS Lake Formation does not version datasets. Option D is correct because Amazon S3 can be used with versioning enabled. Option A is wrong because it's for feature stores.

Option B is wrong because it's for model registry.

238
MCQeasy

A data scientist is using Amazon SageMaker to train a model. The training job is taking longer than expected. The data scientist notices that the GPU utilization is low. Which action would most likely improve GPU utilization?

A.Change to a CPU-based instance
B.Increase the batch size
C.Decrease the batch size
D.Use a larger instance type
E.Enable data augmentation
AnswerB

Larger batch sizes keep GPU busy.

Why this answer

Option A is correct because increasing the batch size allows the GPU to process more data per step, improving utilization. Option B (reduce batch size) would decrease utilization. Option C (increase instance type) may help but is more costly.

Option D (enable data augmentation) increases data, not utilization. Option E (use CPU instance) would make it worse.

239
MCQhard

A company has a real-time inference endpoint on Amazon SageMaker that uses a custom container. The endpoint is experiencing high latency and occasional 502 errors. The logs from the container show that the model inference time is low, but the overall response time is high. Which step is MOST likely to reduce the latency?

A.Switch to batch transform to process requests in batches
B.Use a larger instance type for the endpoint
C.Optimize the model to reduce inference time
D.Increase the number of instances and enable auto-scaling
AnswerD

More instances can handle more concurrent requests, reducing queuing and latency.

Why this answer

Option C is correct because increasing the endpoint's instance count and enabling auto-scaling can distribute the load and reduce queuing delays. Option A is wrong because the inference time is already low, so optimizing the model further won't help much. Option B is wrong because increasing instance size may help but is less cost-effective than scaling out.

Option D is wrong because switching to batch transform is for offline inference, not real-time.

240
MCQmedium

A model deployed on a SageMaker endpoint is producing predictions that are consistently biased against a certain demographic. Which step should the team take FIRST to address this issue?

A.Enable SageMaker Model Monitor to track prediction quality
B.Switch to a different algorithm that is less prone to bias
C.Use SageMaker Clarify to analyze bias in the training data and predictions
D.Retrain the model with balanced data
AnswerC

Clarify can detect and explain bias, guiding corrective actions.

Why this answer

The first step is to analyze the data and model for bias. SageMaker Clarify can detect bias. Option B is correct.

Option A is a solution after analysis. Option C changes model. Option D is a general practice.

241
MCQmedium

Refer to the exhibit. A data scientist is deploying a PyTorch model on a SageMaker endpoint. When the endpoint is invoked, the above error appears in CloudWatch logs. What is the MOST likely cause?

A.The endpoint instance type does not support the required CUDA version.
B.The endpoint instance does not have enough memory to load the model.
C.The input tensor shape does not match the model's expected input shape.
D.The model artifact was not properly saved or is missing from the S3 location.
AnswerD

If the model file is missing or corrupted, load_model returns None.

Why this answer

The error occurs in the model_fn function, which loads the model. The error 'NoneType' object has no attribute 'shape' suggests that the model object is None, meaning the model was not loaded correctly. The most likely cause is that the model file (model.tar.gz) does not contain the expected model artifact, or the model file is missing.

Incorrect input tensor shape (B) would cause errors during inference, not loading. Insufficient memory (C) would cause out-of-memory errors. Wrong endpoint instance type (D) would not cause a NoneType error.

242
Multi-Selecteasy

A data scientist is using Amazon SageMaker to train a large neural network on a GPU instance. The training is taking longer than expected. The scientist wants to reduce training time without changing the model architecture. Which TWO approaches should the scientist consider?

Select 2 answers
A.Use SageMaker Automatic Model Tuning to find optimal hyperparameters.
B.Use SageMaker Managed Spot Training to reduce cost.
C.Use SageMaker's distributed training with multiple GPU instances.
D.Switch to a larger GPU instance type with more CUDA cores.
E.Enable SageMaker Debugger to capture training metrics.
AnswersC, D

Distributed training parallelizes computation, reducing wall-clock time.

Why this answer

Using multiple GPU instances with SageMaker distributed training (A) can accelerate training. Using SageMaker Managed Spot Training (B) reduces cost but not time. Using SageMaker Debugger (C) helps debugging but not speed.

SageMaker Automatic Model Tuning (D) is for hyperparameter optimization. Using a larger GPU instance (E) with more memory and compute can directly reduce training time.

243
MCQhard

A company uses Amazon SageMaker to host a model for fraud detection. The model uses a custom XGBoost container. The endpoint receives about 100 requests per second, each with 50 features. The team notices that the model's predictions are occasionally incorrect for a subset of requests. Which approach should the team take to debug the issue?

A.Use SageMaker Debugger to capture tensors during inference.
B.Scale the endpoint to more instances to reduce load.
C.Enable SageMaker Model Monitor to capture and analyze inference data.
D.Enable detailed CloudWatch Logs for the endpoint.
AnswerC

Model Monitor captures input data and predictions, enabling analysis of data quality and drift.

Why this answer

Enabling SageMaker Model Monitor with data capture (Option D) allows the team to review actual input data and predictions to detect data drift or anomalies. Option A (CloudWatch Logs) only shows container logs, not per-request payloads. Option B (Debugger) is for training, not inference.

Option C (increase instances) addresses capacity, not accuracy.

244
Multi-Selecteasy

Which TWO services can be used to perform hyperparameter tuning in Amazon SageMaker? (Choose two.)

Select 2 answers
A.Amazon SageMaker Automatic Model Tuning
B.Amazon SageMaker Experiments
C.AWS Glue
D.Amazon SageMaker Ground Truth
E.Amazon EMR
AnswersA, B

This is the native hyperparameter tuning service.

Why this answer

Options A and B are correct. SageMaker Automatic Model Tuning is the native tuning service. SageMaker Experiments can track tuning jobs.

Option C (SageMaker Ground Truth) is for labeling. Option D (AWS Glue) is for ETL. Option E (Amazon EMR) is for big data processing.

245
MCQhard

A company runs a real-time fraud detection model on a SageMaker endpoint. The model is a TensorFlow neural network trained on transactional data. The endpoint uses a single ml.p3.2xlarge instance. Recently, the application’s latency has increased from 50ms to 500ms on average. The CloudWatch metrics show that CPU utilization is at 90%, GPU utilization is at 30%, and memory utilization is at 40%. The number of requests per second has remained stable. The ML team suspects the model is not fully utilizing the GPU. What action should the team take to reduce latency without changing the instance type?

A.Switch to SageMaker Batch Transform to process requests in batches
B.Change the endpoint to a compute-optimized instance like ml.c5.large
C.Use SageMaker Neo to compile the model for the target instance
D.Increase the number of instances behind the endpoint and use a load balancer
AnswerC

Neo optimizes model to better utilize GPU.

Why this answer

Optimizing the model for inference using SageMaker Neo can reduce latency by better leveraging GPU. Option A is wrong because increasing instances only helps throughput, not per-request latency. Option B is wrong because SageMaker Batch Transform is for offline inference.

Option D is wrong because CPU instance would not improve GPU utilization.

246
MCQmedium

A data scientist is using Amazon SageMaker Ground Truth to create a labeled dataset for object detection. The team has limited budget and wants to minimize labeling costs while ensuring high-quality labels. Which approach is MOST cost-effective?

A.Use only a private workforce of domain experts to label all data.
B.Use a public workforce and have each data point labeled by three workers.
C.Use active learning to automatically label high-confidence data and send only uncertain data to a private workforce.
D.Use the built-in automated labeling feature without human review.
AnswerC

Active learning reduces labeling cost while ensuring quality.

Why this answer

Option D is correct because active learning selects the most uncertain samples for human labeling, reducing the number of labels needed while maintaining quality. Option A is wrong because using only automated labeling may introduce errors. Option B is wrong because using only workforce is expensive.

Option C is wrong because using all workers is costly.

247
MCQmedium

An ML team uses Amazon SageMaker to train a deep learning model. The training job runs on a single ml.p3.2xlarge instance and is taking 10 hours. The team wants to reduce the training time to under 2 hours without changing the model architecture. Which approach is MOST effective?

A.Use SageMaker distributed training with multiple ml.p3.2xlarge instances.
B.Use SageMaker Managed Spot Training to reduce cost.
C.Switch to a single ml.p3.16xlarge instance with more GPUs.
D.Enable SageMaker Debugger to identify bottlenecks.
AnswerA

Distributed training partitions the model or data across instances, reducing wall-clock time.

Why this answer

Using SageMaker's distributed training with multiple GPU instances (A) can significantly reduce training time by parallelizing the workload. Changing to a larger instance (B) may help but not as much as multiple instances. Using SageMaker Debugger (C) does not speed up training.

Using Spot Instances (D) saves cost but not time.

248
MCQmedium

A company is using Amazon Rekognition to detect objects in images stored in S3. They want to reduce costs by processing images only when they are uploaded. Which AWS service should be used to trigger Rekognition automatically?

A.Amazon CloudWatch Events
B.Amazon Simple Notification Service (SNS)
C.AWS Lambda
D.AWS Step Functions
AnswerC

Lambda can be triggered by S3 event and call Rekognition.

Why this answer

Option C is correct because S3 events can trigger Lambda, which calls Rekognition API. Option A is incorrect because CloudWatch Events can trigger on a schedule, not directly on S3 uploads. Option B is incorrect because Step Functions can orchestrate but not directly triggered by S3 uploads without Lambda.

Option D is incorrect because SNS is passive and needs a subscriber.

249
Multi-Selecthard

A machine learning team is using SageMaker Pipelines to orchestrate a multi-step workflow. The pipeline fails with a 'ThrottlingException' when submitting a training job. Which TWO actions can reduce the likelihood of throttling?

Select 2 answers
A.Use SageMaker Model Registry to version models
B.Implement retry logic with exponential backoff in the pipeline
C.Increase the number of parallel training jobs
D.Reduce the number of concurrent pipeline steps
E.Request a service quota increase for training jobs
AnswersB, D

Exponential backoff reduces request rate after throttling.

Why this answer

Throttling occurs due to API rate limits. Implementing exponential backoff and reducing concurrent submissions help. Options C and D are correct.

Option A increases load, Option B is for model registry, Option E is not possible.

250
MCQmedium

A data scientist needs to deploy a PyTorch model for real-time inference. Which AWS service is best suited for this task?

A.Amazon SageMaker Batch Transform
B.Amazon ECS with Fargate
C.AWS Lambda with custom container
D.Amazon SageMaker real-time endpoint
AnswerD

SageMaker provides managed real-time endpoints with auto-scaling and built-in model hosting.

Why this answer

Option D is correct because Amazon SageMaker provides built-in support for deploying PyTorch models with real-time endpoints. Option A is wrong because AWS Lambda has limited memory and runtime duration. Option B is wrong because SageMaker Batch Transform is for offline inference.

Option C is wrong because ECS requires manual setup for model serving.

251
MCQeasy

A data scientist is using Amazon SageMaker to train an XGBoost model on a dataset with missing values. The dataset has both numeric and categorical features. Which preprocessing step is MOST appropriate before training?

A.Impute missing numeric values with the mean and categorical values with the mode, then train without encoding
B.Remove all rows with missing values and train on the remaining data
C.One-hot encode categorical features and let XGBoost handle missing values natively
D.Label encode categorical features and use the built-in missing value handling of XGBoost
AnswerC

XGBoost handles missing values by default; one-hot encoding is appropriate for categorical data.

Why this answer

Option D is correct because XGBoost can handle missing values natively, so imputation may not be necessary, and one-hot encoding is needed for categorical features. Option A (mean imputation) may be okay but not necessary. Option B (remove rows) loses data.

Option C (label encoding) may create ordinal relationships.

252
Multi-Selectmedium

A data scientist is using Amazon SageMaker to deploy a model for real-time inference. The endpoint receives a large number of requests with variable traffic patterns. The team wants to minimize cost while ensuring low latency. Which THREE actions should the team take? (Choose THREE.)

Select 3 answers
A.Use a multi-model endpoint to host multiple models on the same instance.
B.Enable auto-scaling for the endpoint based on the invocation count.
C.Set the initial variant weight to 1 and increase the number of instances.
D.Use a single large instance to handle all traffic.
E.Create a production variant with a smaller instance type.
AnswersA, B, E

Multi-model endpoints reduce cost by sharing resources.

Why this answer

Options A, C, and E are correct. Option A: Using a production variant with a smaller instance type reduces cost. Option C: Enabling auto-scaling adjusts capacity based on traffic.

Option E: Using a multi-model endpoint allows sharing instances among models. Option B is wrong because higher concurrency may increase cost. Option D is wrong because a single large instance may be over-provisioned.

253
Multi-Selecthard

Which TWO of the following are valid configurations for SageMaker Training Job resource limits? (Select TWO.)

Select 2 answers
A.Maximum number of instances
B.Maximum wait time in seconds
C.Maximum run time in seconds
D.Minimum number of instances
E.Maximum number of spot instances
AnswersA, C

You can limit the number of instances used by the training job.

Why this answer

Options A and D are correct. SageMaker training jobs can have a maximum run time (A) and a maximum number of instances (D). Option B is wrong because there is no minimum instance count limit.

Option C is wrong because spot instance limit is set separately. Option E is wrong because there is no maximum wait time for training jobs.

254
MCQeasy

A data scientist needs to store and version machine learning models, along with metadata such as hyperparameters and metrics. Which AWS service is designed for this purpose?

A.Amazon S3 with versioning enabled
B.Amazon SageMaker Model Registry
C.Amazon DynamoDB
D.Amazon Elastic Container Registry (ECR)
AnswerB

It provides model versioning, metadata, and approval workflows.

Why this answer

Option D is correct: SageMaker Model Registry is for cataloging and versioning models. Option A (S3) is object storage but not specialized. Option B (ECR) stores container images.

Option C (DynamoDB) is a NoSQL database, not purpose-built for ML models.

255
MCQmedium

A company is using Amazon SageMaker to deploy a model for real-time predictions. The model requires access to a DynamoDB table to look up features. The SageMaker endpoint is configured with a VPC and subnet. However, the endpoint cannot connect to DynamoDB. What is the most likely reason?

A.The security group does not allow outbound traffic to DynamoDB
B.The IAM role for the endpoint does not have dynamodb:GetItem permission
C.The VPC does not have a VPC endpoint for DynamoDB or a NAT gateway
D.The DynamoDB table is in a different AWS Region
E.The CloudWatch logs show no errors
AnswerC

Without a route to DynamoDB, the endpoint cannot connect.

Why this answer

Option C is correct because a VPC endpoint for DynamoDB (Gateway endpoint) is needed for private connectivity, or the subnet must have a NAT gateway for internet access. Option A (security group) may be an issue but the most common cause is lack of routing. Option B (IAM role) is necessary but if the role has permissions, the issue is network.

Option D (DynamoDB table) is not relevant. Option E (CloudWatch) is for logging.

256
MCQhard

A machine learning team is building a fraud detection system using Amazon SageMaker. The training data is highly imbalanced (99% legitimate, 1% fraudulent). They need to maximize the recall of the fraud class while keeping precision above 90%. Which approach should they take?

A.Undersample the majority class to create a balanced dataset and train a Random Forest
B.Train a model using the original data, then adjust the decision threshold on the validation set to maximize recall while precision > 90%
C.Train an XGBoost model with scale_pos_weight parameter set to 99
D.Use SMOTE to oversample the fraud class and then train a logistic regression
AnswerB

Threshold tuning directly optimizes recall with a precision constraint.

Why this answer

Option D is correct because adjusting the model threshold after training to favor recall while monitoring precision is the most direct way to meet the business requirement. Option A (SMOTE) can help but may not guarantee precision. Option B (weighted loss) is good but less direct than threshold tuning.

Option C (random undersampling) may discard too much data.

257
MCQeasy

A data scientist wants to deploy a PyTorch model for real-time inference. Which SageMaker deployment option provides the lowest latency for single-digit millisecond responses?

A.SageMaker Real-Time Inference endpoint
B.SageMaker Asynchronous Inference
C.SageMaker Serverless Inference
D.SageMaker Batch Transform
AnswerA

Real-Time endpoints provide the lowest latency for online inference.

Why this answer

SageMaker Real-Time Inference endpoints are designed for low-latency, real-time predictions. Option B is correct because SageMaker Serverless Inference can have cold starts and higher latency. Option C is for batch processing.

Option D is for asynchronous inference with higher latency.

258
MCQeasy

A company wants to perform automated hyperparameter tuning for a model. Which Amazon SageMaker feature should be used?

A.Amazon SageMaker Clarify
B.Amazon SageMaker Ground Truth
C.Amazon SageMaker Debugger
D.Amazon SageMaker automatic model tuning
AnswerD

Purpose-built for hyperparameter optimization.

Why this answer

Option B is correct because SageMaker automatic model tuning (hyperparameter tuning) automates hyperparameter optimization. Option A is wrong because Debugger is for monitoring. Option C is wrong because Ground Truth is for labeling.

Option D is wrong because Clarify is for bias detection.

259
Multi-Selecthard

A company is deploying a machine learning model using Amazon SageMaker. To reduce costs, they want to use SageMaker Managed Spot Training. Which THREE conditions must be met for the training job to use spot instances? (Choose THREE.)

Select 3 answers
A.The model must be deployed to a serverless endpoint
B.The training job must be able to handle interruptions gracefully
C.The chosen instance type must be available in the spot market
D.The training script must save checkpoints to an S3 bucket periodically
E.The training job must be configured to run in a VPC
AnswersB, C, D

Spot instances can be reclaimed; the job must be fault-tolerant.

Why this answer

Options A, C, and D are correct. Spot training requires checkpointing (A), the instance type must support spot (C), and the training job must be stoppable (D). Option B is not required — spot can be used in a VPC.

Option E is not required; spot is only for training, not inference.

260
MCQhard

An ML team is using SageMaker Autopilot to automatically build a binary classification model. The dataset has 500,000 rows and 200 columns, with a severe class imbalance (1% positive). Which configuration should the team set to address the imbalance?

A.Specify the 'objective' as 'F1' or 'AUC' to optimize for imbalanced data.
B.Set the 'problem_type' to 'MulticlassClassification' to handle imbalance.
C.Use the 'AutoML' job with 'EnsembleMode' and 'SMOTE' sampling.
D.Configure the data split to use stratified sampling based on the target.
AnswerA

F1 and AUC are better metrics for imbalanced classification.

Why this answer

SageMaker Autopilot's AutoML job allows specifying objective metric such as F1 or AUC for imbalanced data (Option D). Option A (undersampling) is not built into Autopilot. Option B (class weights) is not directly configurable.

Option C (SMOTE) is not supported by Autopilot.

261
Drag & Dropmedium

Drag and drop the steps to evaluate a trained model using SageMaker Model Monitor in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Model Monitor requires enabling data capture, creating baseline, schedule, and reviewing reports.

262
MCQhard

A company wants to serve a scikit-learn model via SageMaker. The inference code requires a custom preprocessing step that is not in the default scikit-learn container. What is the simplest way to deploy?

A.Create a custom Docker image extending the SageMaker scikit-learn container
B.Package the code in a Lambda layer and use SageMaker hosting
C.Use SageMaker Batch Transform with a custom processing script
D.Use SageMaker Neo to compile the model and add preprocessing
AnswerA

Extending the container with the custom preprocessing is straightforward and supported.

Why this answer

Option C is correct: extending the SageMaker scikit-learn container with a Dockerfile is the simplest. Option A (Lambda) may have compatibility issues. Option B (SageMaker Batch Transform) is for batch, not real-time.

Option D (SageMaker Neo) optimizes for hardware, not custom code.

263
MCQmedium

A company is using Amazon SageMaker to deploy a real-time inference endpoint for a computer vision model. The endpoint receives bursts of traffic with up to 500 requests per second, but the load is unpredictable. Which scaling strategy is MOST cost-effective while maintaining low latency?

A.Manually provision enough instances to handle peak load
B.Use provisioned concurrency on SageMaker Serverless Inference
C.Use a multi-model endpoint to reduce the number of instances
D.Configure automatic scaling with a target tracking policy and add a buffer to handle bursts
AnswerD

Autoscaling with a target tracking policy adjusts instances based on demand, and a buffer helps absorb sudden spikes.

Why this answer

Option C is correct because SageMaker can add instances in response to increased load, and using a buffer helps absorb sudden spikes. Option A (provisioned concurrency) is for serverless but not SageMaker. Option B (manual scaling) is not cost-effective for unpredictable traffic.

Option D (multi-model endpoints) is for serving multiple models, not for scaling.

264
MCQhard

A company deploys a SageMaker endpoint for real-time inference. After a week, the response latency increases from 50 ms to 500 ms. CPU utilization is at 30%. What is the most likely cause?

A.The model has a memory leak
B.The instance type is underpowered for the inference load
C.The inference code makes a call to a downstream service that is throttling requests
D.The SageMaker endpoint is experiencing a network outage
AnswerC

Downstream throttling can increase latency without high CPU on the endpoint.

Why this answer

Increased latency with low CPU suggests a bottleneck elsewhere, often due to throttling by downstream services like a database. Option B is correct. Option A would show high CPU.

Option C would cause errors. Option D would cause high memory usage.

265
MCQhard

Refer to the exhibit. A data scientist is reviewing CloudWatch logs for a SageMaker real-time endpoint. The log shows that a prediction took 15 ms. The endpoint is configured with an ml.c5.large instance and the model is a small scikit-learn model. The latency requirement is under 10 ms. Which action would most likely reduce the latency?

A.Use a larger instance type
B.Add more instances to the endpoint
C.Change the model to a TensorFlow model
D.Enable SageMaker Batch Transform
E.Increase the batch size for inference
AnswerA

More CPU power reduces latency.

Why this answer

Option D is correct because using a larger instance (ml.c5.xlarge) provides more CPU resources, reducing inference time. Option A (increase batch size) is not applicable for real-time single requests. Option B (enable SageMaker Batch Transform) is for offline.

Option C (add more instances) does not reduce per-request latency. Option E (use a different framework) is not likely the issue.

266
MCQhard

A machine learning engineer is using Amazon SageMaker to train a deep learning model. The training job is failing with a 'ResourceLimitExceeded' error. The engineer checks the account limits and sees that the current limit for the instance type is 2, and they are already using 2 instances for other jobs. Which approach would resolve the issue MOST cost-effectively?

A.Request a service limit increase for the current instance type
B.Use a different instance type that is available and has sufficient capacity
C.Use a managed spot training instead of on-demand
D.Stop the other training jobs to free up resources
AnswerB

Different instance types have separate limits and may be available immediately.

Why this answer

Option B is correct because using a different instance type within the same family often has separate limits. Option A increases cost. Option C may not resolve if the limit is account-wide.

Option D changes the request, not the limit.

267
MCQeasy

A company is using Amazon SageMaker to train a model. The training data is stored in an S3 bucket. The data scientist wants to use the Pipe mode for training to stream data directly from S3 instead of downloading it first. Which of the following is a prerequisite for using Pipe mode?

A.The training data must be compressed using Gzip.
B.The S3 bucket must have public read access.
C.The training data must be stored as a single large file.
D.The training data must be in RecordIO-protobuf or TFRecord format.
AnswerD

Pipe mode streams data line by line; RecordIO and TFRecord are supported.

Why this answer

Option B is correct because Pipe mode requires the training data to be in a format that supports streaming, such as RecordIO or TFRecord. Option A is wrong because SageMaker can handle S3 data regardless of bucket policy. Option C is wrong because Pipe mode does not require data to be in a single file.

Option D is wrong because Pipe mode does not require the data to be uncompressed.

268
Multi-Selecteasy

Which TWO of the following are benefits of using SageMaker Managed Spot Training? (Select TWO.)

Select 2 answers
A.No need to checkpoint the model
B.Potential for significant cost savings
C.Faster training times
D.Guaranteed instance availability
E.Lower training cost compared to on-demand instances
AnswersB, E

Savings can be up to 90%.

Why this answer

Options A and C are correct. Managed Spot Training uses spare EC2 capacity at a lower cost (A) and can significantly reduce training costs (C). Option B is wrong because spot instances can be interrupted.

Option D is wrong because spot training may take longer due to interruptions. Option E is wrong because spot instances are not guaranteed to be available.

269
MCQeasy

A machine learning engineer needs to deploy a model that requires custom inference code with dependencies. Which SageMaker deployment option should be used?

A.Use a SageMaker notebook instance as an endpoint.
B.Create a custom Docker container and deploy to SageMaker endpoint.
C.Use a built-in SageMaker algorithm.
D.Use a SageMaker batch transform job.
AnswerB

Custom container provides flexibility for custom code and dependencies.

Why this answer

Option A is correct because a custom container allows full control over dependencies and inference code. Option B is incorrect because a built-in algorithm may not support custom code. Option C is incorrect because a SageMaker notebook instance is for development, not deployment.

Option D is incorrect because a batch transform job is for batch inference, not real-time.

270
MCQmedium

A data scientist is training a model using Amazon SageMaker and wants to automatically stop training when the model stops improving. Which feature should be used?

A.Use SageMaker Debugger to monitor the loss metric.
B.Configure a CloudWatch alarm on the training job's CPU utilization.
C.Use SageMaker Hyperparameter Tuning with random search.
D.Enable early stopping in the training job configuration.
AnswerD

Stops training if improvement plateaus.

Why this answer

Option D is correct because SageMaker's built-in early stopping feature automatically halts a training job when the model's objective metric (e.g., loss or accuracy) ceases to improve over a specified number of steps or epochs. This is configured directly in the training job's `StoppingCondition` parameter, which monitors the metric defined in the `MetricDefinitions` and stops training if no improvement is detected, saving compute time and avoiding overfitting.

Exam trap

The trap here is that candidates confuse SageMaker Debugger's monitoring capabilities with automatic stopping, but Debugger only provides hooks for custom actions (e.g., via rules like `LossNotDecreasing`) and does not natively halt training without additional configuration, whereas early stopping is a direct, built-in feature of the training job configuration.

How to eliminate wrong answers

Option A is wrong because SageMaker Debugger is designed for debugging and profiling training jobs (e.g., capturing tensors, monitoring system bottlenecks), not for automatically stopping training based on metric stagnation; it can emit alerts but does not natively trigger a stop. Option B is wrong because a CloudWatch alarm on CPU utilization monitors infrastructure health (e.g., resource exhaustion), not model performance metrics like loss or accuracy, so it cannot determine when the model stops improving. Option C is wrong because SageMaker Hyperparameter Tuning with random search is a strategy for exploring hyperparameter combinations to find optimal values, not a mechanism to stop an individual training job early; early stopping can be used within a tuning job, but the feature itself is separate and configured via the training job's `StoppingCondition`.

271
MCQhard

A machine learning team is deploying a real-time inference endpoint on Amazon SageMaker for a model that requires low latency (<100 ms). The model is a PyTorch model with custom pre- and post-processing logic. The team uses a SageMaker Model with a custom inference container. After deployment, they observe that the endpoint takes over 500 ms for the first request, but subsequent requests are fast (~50 ms). What is the MOST likely cause?

A.The instance type is too small to handle the model size.
B.The model is too large and exceeds the instance memory.
C.The container has a cold start delay because the model needs to be loaded into memory from Amazon S3 on the first request.
D.The endpoint is not configured with auto-scaling.
AnswerC

Cold start occurs when no idle instances are available; model loading from S3 adds latency.

Why this answer

Option B is correct because the first request often suffers from cold start latency due to container initialization and model loading. Option A is wrong because the issue is transient and not related to instance type. Option C is wrong because the model is large but cold start is the primary cause.

Option D is wrong because the issue is not about auto-scaling but about initialization.

272
Multi-Selectmedium

Which TWO options are valid ways to reduce inference latency for a model deployed on a SageMaker real-time endpoint? (Select TWO.)

Select 2 answers
A.Use SageMaker batch transform instead of real-time endpoint
B.Deploy the model to multiple instances behind a load balancer
C.Enable SageMaker Neo to compile the model for the target instance
D.Use a GPU instance type for the endpoint
E.Increase the endpoint's invocation timeout
AnswersC, D

Neo optimizes model for faster inference.

Why this answer

Using a GPU instance can accelerate model computations. Enabling SageMaker Neo compilation optimizes the model for target hardware. Option C is wrong because increasing timeout does not reduce latency.

Option D is wrong because batch transform is not real-time. Option E is wrong because increased instance count does not directly reduce latency per request.

273
MCQhard

A machine learning team is using SageMaker to train a model with a custom Docker container. The training script runs locally but fails on SageMaker with a 'Permission denied' error when writing to /opt/ml/model. What is the likely cause?

A.The container's user does not have write permission to /opt/ml/model
B.The Docker image is too large
C.The training script is trying to read from /opt/ml/input/data instead of /opt/ml/input/data/training
D.The training data is not in the correct S3 bucket
AnswerA

SageMaker mounts /opt/ml/model as a volume; the user must have write access.

Why this answer

SageMaker expects the container to write the model artifact to /opt/ml/model. If the user in the container lacks write permissions, it fails. Option C is correct.

Option A is unrelated. Option B would cause different errors. Option D is about reading data.

274
Multi-Selectmedium

A company uses SageMaker to run training jobs on a schedule. The training data is stored in an S3 bucket that receives new data every hour. Which TWO approaches can the company use to trigger a training job when new data arrives?

Select 2 answers
A.Use an SQS queue to buffer S3 events and poll from a training instance
B.Set up an Amazon EventBridge rule that triggers on S3 Object Created events and targets a Lambda function
C.Use AWS Step Functions with a scheduled execution
D.Use CloudWatch Logs to monitor S3 access logs and trigger a Lambda function
E.Configure an S3 event notification to invoke a Lambda function that starts the training job
AnswersB, E

EventBridge can capture S3 events and invoke Lambda.

Why this answer

Options B and D are correct. Option B: S3 events can invoke Lambda to start training. Option D: EventBridge rule can trigger on custom events.

Option A is wrong because CloudWatch Logs are for logs, not events. Option C is wrong because SQS is not directly integrated with SageMaker. Option E is wrong because Step Functions would need a trigger.

275
MCQmedium

A company is using Amazon SageMaker to deploy a model for real-time inference. The endpoint uses an ml.c5.xlarge instance. The company wants to reduce costs without affecting performance. The current traffic pattern shows a daily peak of 500 requests per second for 2 hours, and the rest of the day sees fewer than 50 requests per second. The model has a cold start time of about 30 seconds. What should the company do?

A.Switch to a serverless inference endpoint.
B.Configure an auto scaling policy that scales down during low traffic and keep a minimum of 1 instance.
C.Use a single ml.c5.xlarge instance and rely on it.
D.Use SageMaker Batch Transform for all predictions.
AnswerB

Auto scaling reduces instances during low traffic, and minimum instance prevents cold starts.

Why this answer

Option D is correct because adding a scaling policy to scale down during low traffic reduces cost, and keeping a minimum instance ensures low latency during low traffic without cold starts. Option A is wrong because serverless endpoints have cold starts and may not handle 500 TPS. Option B is wrong because Batch Transform is not for real-time.

Option C is wrong because one instance during peak may cause latency.

276
MCQmedium

A machine learning team is deploying a model using Amazon SageMaker. They need to automatically retrain the model every week with new data and update the endpoint without downtime. Which approach should they use?

A.Use SageMaker Ground Truth to label new data and trigger retraining
B.Use SageMaker batch transform to periodically generate predictions and replace the model
C.Use AWS Lambda to trigger retraining on a schedule and deploy a new endpoint
D.Use SageMaker automatic model tuning with a schedule and update the endpoint using CreateEndpointConfig and UpdateEndpoint
E.Use SageMaker Pipelines to automate retraining and deploy a new endpoint with blue/green deployment
AnswerD

This allows retraining and zero-downtime update.

Why this answer

Option C is correct because SageMaker automatic model tuning (hyperparameter tuning jobs) can be scheduled, and updating the endpoint with a new model can be done with CreateEndpointConfig and UpdateEndpoint for zero-downtime deployment. Option A (Lambda + retraining) is possible but not the most integrated. Option B (SageMaker Pipelines) can orchestrate retraining but updating endpoint may still be needed.

Option D (batch transform) is for inference, not retraining. Option E (SageMaker Ground Truth) is for labeling.

277
MCQhard

A company is using SageMaker to host a model for real-time inference. They notice that the endpoint's latency increases over time. The model is stateless and the inference code does not log any errors. What is the MOST likely cause?

A.Memory leak in the inference container
B.Gradual increase in request payload size
C.Endpoint auto scaling is adding new instances
D.Model is accumulating state from previous requests
AnswerA

Memory leaks cause slowdown over time.

Why this answer

Memory leaks cause gradual performance degradation. Option A is correct. Option B is wrong because the model is stateless.

Option C is wrong because auto scaling would add instances, not degrade existing ones. Option D is wrong because the model is stateless.

278
MCQmedium

A machine learning engineer is deploying a real-time inference endpoint using Amazon SageMaker. The model is a large deep learning model that requires low latency (under 100 ms) and high throughput (1000 requests per second). Which SageMaker deployment option is MOST suitable?

A.Deploy the model on a single endpoint with automatic scaling based on CPU utilization.
B.Use SageMaker Serverless Inference with provisioned concurrency.
C.Use SageMaker Inference Recommender to find the optimal instance type and endpoint configuration.
D.Use a multi-model endpoint to load multiple copies of the model on the same instance.
AnswerC

Inference Recommender runs load tests and suggests the best instance and configuration to meet latency and throughput targets.

Why this answer

Option C is correct because SageMaker Inference Recommender provides automated testing and recommendations for instance type and configuration to meet latency and throughput requirements. Option A is wrong because Multi-Model Endpoints are designed for multiple small models, not optimized for a single large model's throughput. Option B is wrong because Serverless Inference has a maximum concurrency and may not achieve 1000 TPS with low latency.

Option D is wrong because a single endpoint may not handle the load; Auto Scaling helps but does not guarantee optimal instance choice.

279
MCQhard

A machine learning engineer is using AWS Step Functions to orchestrate a SageMaker training job followed by a Lambda function for post-processing. The training job completes successfully, but the Lambda function fails with a timeout error. What is the MOST likely cause?

A.The Lambda function's IAM role lacks permissions to access the training output
B.The Lambda function execution time exceeds the maximum timeout limit
C.The Step Functions state machine has a misconfigured retry policy
D.The SageMaker training job output data is too large for Lambda to process
AnswerB

Lambda timeout is 15 minutes max.

Why this answer

Lambda has a maximum execution timeout of 15 minutes. If post-processing takes longer, it will timeout. Option A is correct.

Option B is wrong because the training job completed successfully. Option C is wrong because Step Functions itself is not the cause. Option D is wrong because the failure is a timeout, not a permission issue.

280
MCQmedium

A company is deploying a machine learning model to production on Amazon SageMaker. The model requires low-latency inference (under 10 ms) for real-time predictions. The data scientist has trained a model using XGBoost and wants to minimize cost while meeting latency requirements. Which SageMaker hosting option should be used?

A.Use a real-time endpoint with a single model
B.Use a serverless inference endpoint
C.Use a real-time endpoint with multi-model hosting
D.Use a batch transform job
E.Use an asynchronous inference endpoint
AnswerA

Real-time endpoints provide low-latency inference.

Why this answer

Option B is correct because SageMaker real-time endpoints provide low-latency inference suitable for real-time predictions. Option A (batch transform) is for offline predictions, not real-time. Option C (serverless inference) has cold starts and may not guarantee under 10 ms.

Option D (asynchronous inference) is for near-real-time with higher latency. Option E (multi-model endpoint) can reduce cost by sharing resources, but may introduce higher latency due to model loading.

281
MCQmedium

A team is training a large NLP model using SageMaker. The training job fails with an OutOfMemory error. The instance type is ml.p3.2xlarge with 61 GB GPU memory. Which action should the team take to resolve the issue without changing the model architecture?

A.Switch to a regression model
B.Increase the number of epochs
C.Enable SageMaker Managed Warm Pools
D.Reduce the batch size in the training script
AnswerD

Smaller batch size reduces GPU memory consumption per step.

Why this answer

Reducing the batch size decreases GPU memory usage per iteration. Option A is correct. Option B changes the problem to regression.

Option C increases memory usage. Option D is unrelated to GPU memory.

282
Multi-Selecthard

Which TWO approaches can reduce inference latency on a SageMaker real-time endpoint? (Choose 2.)

Select 2 answers
A.Attach an Elastic Inference accelerator
B.Increase the batch size
C.Enable SageMaker Model Monitor
D.Use a GPU instance type
E.Compile the model using SageMaker Neo
AnswersA, E

Provides GPU acceleration at lower cost.

Why this answer

Using a GPU instance (Option A) and enabling SageMaker Model Monitor (Option E) are not directly for latency reduction. Actually, correct: Option B (Elastic Inference) and Option D (compiled model with SageMaker Neo) reduce latency. Option A is wrong because GPU does not always reduce latency; it can add overhead.

Option C is wrong because larger batch sizes increase latency. Option E is wrong because Model Monitor adds overhead.

283
MCQmedium

A company's ML model is deployed on a SageMaker endpoint. The model's predictions are used in a customer-facing application that requires low latency. Over time, the model's performance degrades due to data drift. What is the most suitable approach to detect this drift automatically?

A.Set up a CloudWatch alarm on the endpoint's invocation latency
B.Periodically retrain the model using all historical data
C.Use Amazon S3 events to trigger a Lambda function that compares distributions
D.Enable Amazon SageMaker Model Monitor to continuously check for data drift
AnswerD

Built-in drift detection.

Why this answer

SageMaker Model Monitor can detect data drift automatically. Option A is wrong because CloudWatch alarms are for infrastructure metrics, not drift. Option B is wrong because S3 events trigger on object changes, not drift.

Option D is wrong because retraining on all data is inefficient.

284
MCQmedium

A company is building a fraud detection model. The dataset is highly imbalanced (99% legitimate, 1% fraud). The data scientist trains a model using Amazon SageMaker's built-in XGBoost algorithm. The model achieves 99% accuracy but only catches 10% of fraud cases. Which technique should the data scientist apply to improve recall for the minority class?

A.Use random under-sampling of the majority class.
B.Set the scale_pos_weight hyperparameter in XGBoost.
C.Use mean squared error as the objective function.
D.Use SMOTE to oversample the minority class.
AnswerB

This adjusts the weight of positive class to handle imbalance.

Why this answer

Option B is correct because setting scale_pos_weight balances class weights. Option A is wrong because SMOTE creates synthetic samples, but XGBoost has built-in handling. Option C is wrong because under-sampling loses data.

Option D is wrong because it's for regression.

285
MCQeasy

A machine learning team is using AWS Glue to prepare data for training. They notice that the ETL job takes a long time to process large datasets. Which change is most likely to improve performance?

A.Increase the number of DPUs for the Glue job.
B.Decrease the number of workers in the Glue job.
C.Disable Spark shuffle operations.
D.Reduce the dataset size by sampling.
AnswerA

More DPUs increase parallelism and speed up processing.

Why this answer

Option A is correct because increasing the number of DPUs (Data Processing Units) in AWS Glue can parallelize processing and reduce job duration. Option B is incorrect as it may reduce parallelism. Option C is incorrect because Spark shuffle is necessary; avoiding it may not be feasible.

Option D is incorrect because reducing data size is not always possible.

286
Multi-Selectmedium

A company is deploying a SageMaker model for real-time inference. The endpoint must be highly available and cost-effective. Which TWO actions should the company take? (Select TWO.)

Select 2 answers
A.Use managed spot training for inference
B.Deploy the endpoint with at least two instances in different Availability Zones
C.Use GPU instances for all models even if not required
D.Configure automatic scaling based on latency or request count
E.Use a single large instance to handle peak load
AnswersB, D

Multi-AZ deployment provides high availability.

Why this answer

Options A and C are correct. A: Multiple instances across AZs ensures HA. C: Auto-scaling adjusts capacity based on demand, improving cost.

B (single instance) lacks HA. D (spot instances) are cheaper but not for real-time HA. E (GPU) is not necessarily cost-effective.

287
MCQmedium

A machine learning engineer needs to deploy a model that performs real-time inference with strict latency requirements of under 100 milliseconds. The model is a large ensemble of 10 deep learning models. Which SageMaker deployment strategy is MOST appropriate?

A.Use batch transform and cache predictions.
B.Deploy each model as a separate endpoint and route traffic using Application Load Balancer.
C.Use a SageMaker Inference Pipeline with serial inference within a single endpoint.
D.Use a multi-model endpoint to host all models.
AnswerC

Inference Pipelines allow chaining containers in a single endpoint, reducing latency.

Why this answer

Large ensemble models can be deployed using SageMaker Inference Pipelines to chain multiple containers. Real-time endpoints with a single variant are standard for low latency. Multi-model endpoints are for multiple models, not ensembles.

Batch transform is for offline. Multi-variant endpoints are for A/B testing.

288
MCQeasy

A machine learning engineer needs to deploy a model that performs real-time fraud detection. The model must be highly available and scalable. Which AWS service should be used to host the model?

A.AWS Lambda
B.Amazon ECS with a custom container
C.Amazon SageMaker batch transform
D.Amazon SageMaker real-time endpoint
AnswerD

Purpose-built for real-time inference with auto-scaling.

Why this answer

Option D is correct because Amazon SageMaker real-time endpoints are designed for low-latency, scalable, and highly available model hosting. Option A is wrong because AWS Lambda has limited execution time and is not suitable for heavy inference. Option B is wrong because Amazon ECS can host containers but requires more management; SageMaker is purpose-built.

Option C is wrong because SageMaker batch transform is for offline predictions.

289
MCQmedium

An ML engineer is deploying a model to a SageMaker endpoint for real-time inference. The model requires a custom inference script that preprocesses input data and postprocesses predictions. Which SageMaker feature should be used to implement this custom logic?

A.Use SageMaker Ground Truth to transform inference requests
B.Use SageMaker Processing jobs to preprocess data before inference
C.Use a built-in SageMaker algorithm with the default inference code
D.Create a SageMaker model with a custom inference script that includes pre- and post-processing functions
AnswerD

Custom inference scripts allow full control over request handling.

Why this answer

Option B is correct because a custom inference script is packaged in the inference code and used by SageMaker to handle requests. Option A (built-in algorithm) does not allow custom logic. Option C (SageMaker Processing) is for batch jobs.

Option D (SageMaker Ground Truth) is for labeling.

290
MCQeasy

A company is using Amazon SageMaker to train a model. The training data is stored in an S3 bucket in a different AWS account. Which IAM policy configuration is required to allow SageMaker to access the data?

A.Add a bucket policy that allows s3:GetObject for the SageMaker execution role's ARN.
B.Add a bucket policy allowing access from the SageMaker execution role ARN, and ensure the SageMaker execution role has an IAM policy allowing s3:GetObject on the bucket.
C.Create an IAM user in the data owner's account and use its credentials in SageMaker.
D.Use the data owner's IAM role as the SageMaker execution role.
AnswerB

Both policies are needed for cross-account access.

Why this answer

Option B is correct because cross-account access requires the SageMaker execution role to have an IAM policy allowing access to the S3 bucket, and the S3 bucket policy must grant access to that role. Option A is wrong because SageMaker cannot assume a role in another account without proper trust policy. Option C is wrong because the data owner's role cannot be used directly.

Option D is wrong because SageMaker does not use the data owner's IAM user credentials.

291
MCQhard

A data scientist is using SageMaker to train a model with a custom algorithm. The training script uses TensorFlow and runs on GPU instances. The training job fails with 'CUDA_ERROR_OUT_OF_MEMORY'. What is the most likely cause?

A.The S3 bucket is in a different region
B.The batch size is too large for the GPU memory
C.The GPU driver is outdated
D.The training script has a memory leak on CPU
E.The instance type does not have enough CPU cores
AnswerB

Large batch sizes can exceed GPU memory, causing out-of-memory errors.

Why this answer

Option C is correct because the error indicates GPU memory exhaustion, often due to batch size being too large. Option A (insufficient CPU) would not cause CUDA errors. Option B (S3 bandwidth) is unrelated.

Option D (CPU memory) is not the issue. Option E (driver version) would give a different error.

292
MCQeasy

A data scientist is training a TensorFlow model on a single GPU instance. The training is taking too long. Which AWS service should be used to reduce training time by distributing the workload across multiple GPUs?

A.Amazon SageMaker
B.AWS Glue
C.Amazon EMR
D.AWS Batch
AnswerA

SageMaker provides built-in distributed training libraries for multi-GPU training.

Why this answer

Amazon SageMaker supports distributed training across multiple GPUs using the SageMaker distributed training libraries. Option B is correct. Option A is wrong because AWS Batch is for batch computing, not specifically optimized for GPU training.

Option C is wrong because Amazon EMR is for big data processing. Option D is wrong because AWS Glue is for ETL jobs.

293
MCQhard

A company wants to automate the retraining of a model weekly using new data. The training script is in a SageMaker notebook. Which implementation is most maintainable?

A.Set up a cron job on an EC2 instance to run the training script
B.Schedule the notebook to run via a SageMaker Lifecycle Configuration script
C.Convert the notebook to a Python script, create a Docker container, and use SageMaker Pipelines with a schedule
D.Use AWS CloudFormation to provision a training job on a schedule
AnswerC

Pipelines provide a robust, scheduled workflow for training.

Why this answer

Option C is correct: convert the notebook to a Python script, package it in a Docker container, and schedule SageMaker Pipeline runs. Option A (run notebook via Lifecycle Config) is brittle. Option B (IAC) is for infrastructure, not training.

Option D (Cron job on EC2) is less managed.

294
MCQhard

A company is using SageMaker to host a model that makes predictions on streaming data from Amazon Kinesis. The model must provide predictions with sub-second latency. Which approach should the company use?

A.Use SageMaker asynchronous inference with a Kinesis trigger
B.Use a SageMaker real-time endpoint and invoke it from an AWS Lambda function that is triggered by Kinesis
C.Use Amazon Kinesis Data Analytics with a built-in ML model
D.Use SageMaker batch transform to process batches of records from Kinesis
AnswerB

Real-time endpoint plus Lambda provides sub-second latency.

Why this answer

Option C is correct: SageMaker real-time endpoint provides sub-second latency with Kinesis via Lambda. Option A (batch transform) is not real-time. Option B (async inference) has higher latency.

Option D (Kinesis Analytics) does not use SageMaker.

295
MCQmedium

A company is deploying a model to an Amazon SageMaker endpoint for real-time inference. The model requires a GPU for low-latency predictions. Which instance type should be chosen?

A.ml.c5.xlarge
B.ml.r5.2xlarge
C.ml.g4dn.xlarge
D.ml.m5.large
AnswerC

GPU instance suitable for inference.

Why this answer

SageMaker GPU instances (ml.p3, ml.p4, ml.g4dn, ml.g5) are designed for GPU inference. Option D is correct. Option A is wrong because ml.c5 is CPU only.

Option B is wrong because ml.m5 is CPU only. Option C is wrong because ml.r5 is CPU only.

296
MCQeasy

A DevOps engineer created a SageMaker notebook instance using the Terraform configuration shown. The notebook instance is in a VPC with a public subnet. However, the notebook instance cannot access the internet. What is the most likely cause?

A.The role_arn is incorrect or missing permissions.
B.The instance type ml.t2.medium does not support internet access.
C.The subnet does not have a route to an internet gateway.
D.The direct_internet_access parameter is set to 'Enabled' but should be 'Disabled'.
AnswerC

Without a route to an internet gateway, the notebook cannot access the internet despite the setting.

Why this answer

Option C is correct because a SageMaker notebook instance in a VPC with a public subnet requires a route to an internet gateway (IGW) in the subnet's route table to access the internet. Without that route, traffic from the notebook cannot reach the internet, even if `direct_internet_access` is enabled. The Terraform configuration likely omitted the route to the IGW, causing the connectivity failure.

Exam trap

The trap here is that candidates often confuse `direct_internet_access` with the actual network routing requirement, assuming the parameter alone controls internet access, when in reality it only controls whether the notebook uses a public or private subnet, and the subnet must still have proper routing to the internet gateway.

How to eliminate wrong answers

Option A is wrong because the `role_arn` being incorrect or missing permissions would cause API failures (e.g., unable to create the notebook or access SageMaker resources), not a lack of internet connectivity from the notebook instance itself. Option B is wrong because the instance type `ml.t2.medium` fully supports internet access; SageMaker notebook instances of any type can reach the internet when properly configured. Option D is wrong because setting `direct_internet_access` to 'Enabled' is the correct setting for allowing internet access; setting it to 'Disabled' would intentionally block internet access, which is the opposite of what is needed.

297
MCQhard

Refer to the exhibit. An ML engineer attaches this IAM policy to a user. The user wants to invoke the SageMaker endpoint my-endpoint from an EC2 instance with public IP 52.1.1.1. What will happen?

A.The invocation fails because the user does not have permission to create an endpoint.
B.The invocation is denied because the Deny statement applies to all resources.
C.The invocation is allowed because the source IP is not in the denied ranges.
D.The invocation is denied because the user is not in a VPC.
AnswerC

The Deny condition does not match the public IP, so Allow prevails.

Why this answer

The policy has an Allow for InvokeEndpoint on the specific endpoint, but also a Deny with a condition that denies InvokeEndpoint if the source IP is within private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). The user's IP is 52.1.1.1, which is a public IP, not in the deny condition. Therefore, the Allow takes effect, and the invocation is permitted.

Note: Deny always overrides Allow, but the condition does not match, so Deny is not applied.

298
Multi-Selecthard

You are deploying a custom Docker container for a SageMaker model that requires a specific NVIDIA CUDA version. Which THREE steps must you take to ensure the container runs correctly on SageMaker?

Select 3 answers
A.Define a health check endpoint
B.Use SageMaker Batch Transform
C.Include the SageMaker inference toolkit in the container
D.Choose a GPU instance type for the endpoint
E.Set the container's entry point to the inference script
AnswersC, D, E

Required for SageMaker to interface with the container.

Why this answer

Options A, B, and D are correct. Option A: Container must support the SageMaker inference toolkit. Option B: Must set the correct entry point.

Option D: Must run on a GPU instance. Option C is wrong because batch transform is not required. Option E is wrong because health check is recommended but not mandatory.

299
MCQhard

Refer to the exhibit. A data scientist is training a PyTorch model on a SageMaker ml.p3.2xlarge instance (16 GB GPU memory). The training fails with the shown error. Which change should the scientist make to resolve the error?

A.Reduce the batch size in the training script.
B.Increase the number of instances to 2.
C.Use SageMaker Managed Spot Training.
D.Increase the number of epochs.
AnswerA

Smaller batch size reduces GPU memory consumption.

Why this answer

Reducing the batch size (Option D) reduces GPU memory usage per iteration and can resolve OOM errors. Option A (increase instance count) does not reduce per-GPU memory. Option B (use Spot) does not help.

Option C (increase epochs) will still fail.

300
MCQhard

A data scientist is training a model using Amazon SageMaker with a custom Docker container. The training job fails with an error: 'Resource exhausted: Out of memory'. The training data is stored in S3. What should the data scientist do to resolve this issue?

A.Increase the instance memory by selecting a larger instance type.
B.Increase the EBS volume size attached to the training instance.
C.Use Pipe mode for data loading instead of File mode.
D.Reduce the batch size in the training script.
AnswerA

Larger instance provides more memory.

Why this answer

Option B is correct because 'Out of memory' indicates the instance does not have enough memory. Increasing the instance memory resolves the issue. Option A is wrong because using Pipe mode streams data directly from S3 and can reduce memory usage, but the error is about memory exhaustion, not data loading.

Option C is wrong because EBS volume size does not affect memory. Option D is wrong because reducing batch size might help but is not the primary fix; increasing instance memory directly addresses the issue.

← PreviousPage 4 of 5 · 351 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Machine Learning Implementation and Operations questions.