Knowledge + Practice

CCNA Machine Learning Implementation and Operations Questions

51 of 351 questions · Page 5/5 · Machine Learning Implementation and Operations · Answers revealed

Practice these questions Domain overview All questions

301

Multi-Selecthard

A company is deploying a machine learning model on Amazon SageMaker. The model needs to be updated frequently with new versions. The team wants to minimize downtime and test the new model version before routing all traffic to it. Which TWO strategies should be used together?

Select 2 answers

A.Use a rolling update strategy.

B.Use a multi-model endpoint.

C.Use Amazon SageMaker A/B testing.

D.Use Amazon SageMaker canary deployment.

E.Use Amazon SageMaker blue/green deployment.

AnswersD, E

Canary deployment sends a small percentage of traffic to the new version.

Why this answer

Option A (blue/green deployment) and Option D (canary deployment) are correct. Blue/green allows a new version to be deployed alongside the old one, and canary deployment routes a small percentage of traffic to the new version for testing. Option B is wrong because A/B testing in SageMaker is typically done with production variants but does not inherently include canary routing; canary is a specific feature.

Option C is wrong because rolling update is not a native SageMaker feature for endpoints. Option E is wrong because multi-model endpoints host multiple models but do not facilitate traffic shifting for updates.

Practice this question →

302

MCQeasy

A data scientist wants to use Amazon SageMaker to train a deep learning model on a large dataset stored in S3. The training job is expected to take several hours. Which storage option should be used to minimize data loading time and cost?

A.Attach an Amazon EBS volume with the dataset pre-loaded

B.Use File mode to copy data to the training instance's local storage

C.Use Pipe mode to stream data directly from S3 during training

D.Mount an Amazon EFS file system to the training instance

AnswerC

Pipe mode streams data on the fly, reducing startup time and cost.

Why this answer

Option C is correct because Pipe mode streams data directly from S3 without downloading, minimizing time and cost. Option A (File mode) downloads entire dataset, increasing time and cost. Option B (Amazon EFS) is unnecessary and adds complexity.

Option D (Amazon EBS) is not directly integrated with SageMaker.

Practice this question →

303

MCQeasy

A company uses SageMaker to train a model, but the training job fails due to insufficient memory. What is the most cost-effective way to resolve this?

A.Use a larger instance type with more memory

B.Use Spot Instances to reduce cost

C.Reduce the batch size in the training script

D.Switch to distributed training across multiple instances

AnswerA

Larger instance types provide more memory, directly solving the issue.

Why this answer

Option A is correct because increasing instance memory addresses the issue directly. Option B is wrong because reducing batch size may not solve memory issues if the model itself is large. Option C is wrong because Spot Instances have no memory advantage.

Option D is wrong because distributed training adds complexity and cost.

Practice this question →

304

MCQhard

A financial services company uses Amazon SageMaker to train a fraud detection model. The training data is stored in an S3 bucket encrypted with AWS KMS. The SageMaker training job is configured to use a custom Docker container that reads data from S3 and writes model artifacts back to S3. The training job fails with the error: 'Unable to write model artifact to s3://my-bucket/output/model.tar.gz. Access Denied.' The IAM role used by the training job has the following permissions: s3:GetObject and s3:PutObject on the bucket, and kms:Decrypt on the KMS key. The training job is not using a VPC. What is the MOST likely cause of the failure?

A.The S3 bucket is in a different region than the training job

B.The IAM role does not have kms:GenerateDataKey permission on the KMS key

C.The S3 bucket requires S3 Batch Operations for writing artifacts

D.The IAM role does not have s3:PutObject permission on the output bucket

AnswerB

Required for writing encrypted objects.

Why this answer

Option B is correct because SageMaker training jobs need kms:GenerateDataKey permission to write encrypted objects. Option A is wrong because the role already has s3:PutObject. Option C is wrong because the job is not in a VPC, so no VPC endpoint issues.

Option D is wrong because S3 Batch Operations is irrelevant.

Practice this question →

305

MCQmedium

Refer to the exhibit. A SageMaker training job uses an IAM role with this policy. The training job writes output to s3://my-bucket/output/. Which statement about the policy is true?

A.The Allow statement allows all PutObject requests regardless of encryption

B.The training job can write output objects only if server-side encryption with SSE-S3 is used

C.The Deny statement blocks all PutObject requests

D.The GetObject permission requires the object to be encrypted with SSE-S3

AnswerB

Deny requires AES256 encryption.

Why this answer

Option C is correct because the Deny statement blocks PutObject without SSE-S3 (AES256). Option A is wrong because Deny with condition allows PutObject when condition is met. Option B is wrong because Deny overrides Allow.

Option D is wrong because GetObject is allowed without encryption requirement.

Practice this question →

306

Multi-Selecteasy

A data scientist is using Amazon SageMaker to train a model. The training data is stored in an S3 bucket encrypted with AWS KMS. Which TWO actions are necessary to allow SageMaker to access the data?

Select 2 answers

A.Ensure the SageMaker execution role has s3:GetObject permission.

B.Enable S3 Transfer Acceleration.

C.Set up a VPC endpoint for S3.

D.Add a bucket policy allowing SageMaker access.

E.Grant the SageMaker execution role kms:Decrypt permission.

AnswersA, E

Required to read objects.

Why this answer

Option A is correct because SageMaker needs permission to decrypt. Option C is correct because the execution role needs permission. Option B is wrong because bucket policy is separate.

Option D is wrong because it's not required. Option E is wrong because VPC endpoint is not required.

Practice this question →

307

Multi-Selecteasy

A data scientist is using Amazon SageMaker to build a custom training algorithm. The algorithm requires a specific library that is not included in the default SageMaker containers. The scientist wants to create a custom container that includes this library. Which TWO steps are required? (Choose TWO.)

Select 2 answers

A.Upload the Docker image to an Amazon S3 bucket

B.Create an AWS Lambda layer with the library

C.Build a Docker image with the required library

D.Register the container in the SageMaker Model Registry

E.Push the Docker image to Amazon ECR

AnswersC, E

Docker is used to create custom containers.

Why this answer

Options B and D are correct. The custom container must be built using Docker, and it must be pushed to Amazon ECR to be used by SageMaker. Option A is wrong because the container does not need to be registered in SageMaker Model Registry.

Option C is wrong because the container image is stored in ECR, not S3. Option E is wrong because AWS Lambda is for serverless functions, not for running training containers.

Practice this question →

308

MCQmedium

An IAM policy is attached to a SageMaker notebook instance. The data scientist wants to use the notebook to train a model using data from S3 bucket 'my-bucket'. However, the training job fails with an access denied error. What is the MOST likely cause?

A.The notebook instance role does not have iam:PassRole permission to pass the SageMaker execution role

B.The sagemaker:CreateTrainingJob permission is not allowed on the specific resource

C.The S3 bucket resource ARN is incorrectly formatted

D.The s3:GetObject permission is missing for the bucket

AnswerA

SageMaker needs the notebook role to pass an execution role to training jobs.

Why this answer

Option B is correct because the notebook instance role needs permission to pass the execution role to SageMaker (iam:PassRole). The policy allows s3 and sagemaker actions but not PassRole. Option A is incorrect because GetObject and PutObject are allowed.

Option C is incorrect because SageMaker actions are allowed. Option D is incorrect because the resource is correct.

Practice this question →

309

MCQhard

A company is using Amazon SageMaker to deploy a model for real-time inference. The model endpoint is behind an Application Load Balancer (ALB) for A/B testing. The data scientist notices that the endpoint is returning HTTP 503 errors intermittently. The CloudWatch metrics show that the endpoint's Invocations metric is within limits, but the ModelLatency metric has high variance. What is the most likely cause?

A.The model container is using a custom inference code that has a bug.

B.The ALB health check is misconfigured and marking instances unhealthy.

C.The endpoint instance type does not have enough memory for the model.

D.The endpoint is configured with too few instances; increase the instance count.

AnswerC

Insufficient memory can cause the model to fail to respond, leading to 503 errors.

Why this answer

Option C is correct because high variance in ModelLatency combined with intermittent 503 errors strongly indicates that the model container is running out of memory under load. When memory is insufficient, the inference process may be killed by the kernel (OOM killer) or the container may be throttled, causing sporadic failures that manifest as 503s even though the Invocations metric (request count) appears within limits. The latency spikes occur because the container struggles to allocate memory for each request, leading to timeouts or crashes.

Exam trap

The trap here is that candidates confuse 'Invocations within limits' with 'sufficient capacity,' overlooking that memory exhaustion can cause failures even when request rate is low, and they incorrectly attribute 503s solely to scaling issues (Option D) rather than resource constraints on each instance.

How to eliminate wrong answers

Option A is wrong because a bug in custom inference code would typically cause consistent errors (e.g., 500s) or incorrect predictions, not intermittent 503s with high latency variance; the 503 status specifically points to resource exhaustion or overload, not application logic bugs. Option B is wrong because a misconfigured ALB health check would cause the ALB to mark instances as unhealthy and stop routing traffic to them, resulting in persistent 503s for all requests, not intermittent errors with high latency variance; the health check failure would be visible in ALB metrics, not ModelLatency. Option D is wrong because too few instances would cause the Invocations metric to exceed the instance's capacity, leading to throttling and 503s, but the question states Invocations is within limits; increasing instance count would not fix memory exhaustion on each instance, which is the root cause.

Practice this question →

310

MCQeasy

A data scientist needs to perform hyperparameter optimization for a model. Which AWS service provides built-in hyperparameter tuning jobs?

A.Amazon EMR

B.AWS Step Functions

C.Amazon SageMaker

D.AWS Batch

AnswerC

SageMaker has automatic model tuning.

Why this answer

Amazon SageMaker provides built-in hyperparameter tuning (automatic model tuning). Option C is correct. Option A is wrong because AWS Batch is not for tuning.

Option B is wrong because Amazon EMR is for big data. Option D is wrong because AWS Step Functions orchestrates workflows but does not perform tuning.

Practice this question →

311

Multi-Selecteasy

Which TWO services can be used to orchestrate a machine learning pipeline?

Select 2 answers

A.Amazon SageMaker Pipelines

B.Amazon SageMaker Ground Truth

C.AWS Step Functions

D.Amazon Redshift

E.AWS Glue

AnswersA, C

SageMaker Pipelines is designed for ML pipeline orchestration.

Why this answer

Options A and D are correct. Option B is wrong because it's for labeling. Option C is wrong because it's for data transformation.

Option E is wrong because it's for data warehousing.

Practice this question →

312

MCQmedium

An ML team deploys a real-time inference endpoint on Amazon SageMaker. Users report high latency. The model is a PyTorch model using a custom container. Which combination of changes should the team implement to reduce latency? (Choose the best answer.)

A.Switch to asynchronous inference endpoint.

B.Use SageMaker Elastic Inference to attach an accelerator.

C.Compile the model using SageMaker Neo.

D.Use SageMaker Inference Recommender to benchmark different instance families and select the best.

AnswerD

Inference Recommender automates benchmarking to find the optimal configuration for low latency.

Why this answer

Enabling SageMaker Inference Recommender (Option D) generates multiple endpoint configurations to find the optimal instance type and model compilation settings. Option A (compilation only) helps but may not find the best instance. Option B (Elastic Inference) is deprecated.

Option C (Asynchronous Inference) is for near-real-time, not true low-latency.

Practice this question →

313

MCQeasy

A company wants to track and compare metrics from multiple machine learning experiments. Which Amazon SageMaker feature should be used?

A.SageMaker Experiments

B.SageMaker Ground Truth

C.SageMaker Model Monitor

D.SageMaker Debugger

AnswerA

Specifically designed for experiment tracking and comparison.

Why this answer

SageMaker Experiments helps track and compare metrics. Option B is correct. Option A is wrong because Model Monitor is for monitoring drift.

Option C is wrong because Debugger is for debugging training. Option D is wrong because Ground Truth is for labeling.

Practice this question →

314

MCQhard

A SageMaker training job log shows the exhibit. The training job fails immediately after starting. The training data is supposed to be provided via Pipe mode from S3. What is the most likely cause?

A.The input data channel is not properly configured

B.The instance type does not have enough memory

C.The S3 bucket has insufficient permissions

D.The training script is using File mode instead of Pipe mode

E.The hyperparameters are incorrectly specified

AnswerA

The training job is looking for data at /opt/ml/input/data/training, but Pipe mode should provide a pipe.

Why this answer

Option B is correct because the error indicates that the data channel path is incorrect or not configured. Option A is wrong because if the file mode was used, the path would exist. Option C is wrong because S3 permissions would cause a different error.

Option D is wrong because a wrong instance type would not cause this error. Option E is wrong because hyperparameters would not cause a missing file error.

Practice this question →

315

MCQhard

A company has deployed a machine learning model on Amazon SageMaker for real-time inference. The endpoint uses a single ml.c5.xlarge instance. Recently, the traffic has increased, and the endpoint is returning HTTP 503 (Service Unavailable) errors during peak hours. The CloudWatch metrics show that the CPU utilization is consistently above 90% during peak times, and the Invocations metric shows that requests are being throttled. The data science team has already optimized the model to reduce inference time by 20%, but the errors persist. The company needs to resolve the issue without increasing costs significantly. Which course of action should be taken?

A.Change the instance type to a larger size, such as ml.c5.2xlarge

B.Switch to batch transform to process requests in batches

C.Use spot instances to reduce costs and add more instances

D.Configure auto-scaling for the endpoint to add instances based on CPU utilization

AnswerD

Auto-scaling adds instances only when needed, handling peak traffic and reducing costs during low traffic.

Why this answer

Option A is correct because adding auto-scaling based on CPU utilization or invocations will dynamically adjust the number of instances to handle the load, reducing errors without incurring costs during low traffic. Option B is wrong because increasing to a larger instance type will increase costs even during low traffic. Option C is wrong because switching to batch transform is for offline processing, not real-time.

Option D is wrong because using spot instances could lead to interruptions and does not solve the capacity issue.

Practice this question →

316

MCQhard

A company is using Amazon SageMaker to host a model that performs real-time inference. The model receives around 100 requests per second with occasional spikes up to 500 requests per second. The current endpoint uses 2 ml.m5.large instances. During spikes, latency increases significantly, and some requests time out. What is the MOST cost-effective solution to handle the spikes without losing requests?

A.Replace the instances with a single larger instance type, such as ml.m5.4xlarge

B.Use an Amazon SQS queue to buffer incoming requests and process them asynchronously

C.Use AWS Lambda with a provisioned concurrency to handle the spikes

D.Configure SageMaker managed scaling with a target tracking policy and add a buffer based on the average spike duration

AnswerD

Managed scaling with a buffer allows proactive scaling to handle spikes.

Why this answer

Option C is correct because adding a buffer to the autoscaling policy allows the endpoint to scale proactively before the spike fully hits, while managed scaling adjusts instances based on demand. Option A (increase instance size) is less cost-effective than scaling out. Option B (SQS) adds latency.

Option D (Lambda) may not be suitable for real-time inference.

Practice this question →

317

MCQmedium

A data scientist is using SageMaker to train a deep learning model. The training script uses TensorFlow and runs on a single p3.2xlarge instance. The scientist wants to reduce training time by using multiple GPUs. What should the scientist do?

A.Increase the instance count to 4 without changing the script.

B.Modify the training script to use Horovod for distributed training.

C.Switch to PyTorch framework.

D.Use SageMaker Managed Spot Training.

AnswerB

Horovod enables multi-GPU and multi-instance distributed training.

Why this answer

Modifying the training script to use Horovod (Option B) is required for distributed training across multiple GPUs. Option A (increasing instance type) only uses more GPUs on one machine if the script supports it. Option C (using Spot) does not add GPUs.

Option D (changing to PyTorch) is unnecessary.

Practice this question →

318

MCQmedium

During training of a deep learning model on a GPU instance in SageMaker, the training job fails with an insufficient memory error. Which step should be taken first to resolve this issue?

A.Add dropout layers

B.Use a smaller learning rate

C.Use gradient clipping

D.Reduce the batch size

AnswerD

Smaller batch size reduces GPU memory footprint.

Why this answer

Option B is correct because reducing the batch size directly decreases GPU memory usage. Option A is wrong because it reduces training time but not memory per step. Option C is wrong because it addresses vanishing gradients, not memory.

Option D is wrong because it reduces overfitting, not memory.

Practice this question →

319

MCQeasy

An engineer sees the error in the exhibit when trying to deploy a model from a model registry in SageMaker. What is the MOST likely cause?

A.The IAM role lacks permission to access the model registry

B.The model package version does not exist in the registry

C.The model package is still in 'Approved' status

D.The SageMaker endpoint is already deployed with the same model

AnswerB

The ARN includes a version number; the error says 'Could not find'.

Why this answer

Option A is correct because the error indicates the model package ARN does not exist. Option B would show a different error. Option C is not indicated.

Option D would show a different error.

Practice this question →

320

Multi-Selectmedium

A company is using Amazon SageMaker to run a hyperparameter tuning job. The tuning job uses Bayesian optimization. Which THREE statements about Bayesian optimization are correct? (Choose THREE.)

Select 3 answers

A.It can only handle a maximum of 5 hyperparameters

B.It works well for continuous hyperparameters

C.It selects hyperparameter combinations based on previous trial results

D.It often finds optimal hyperparameters in fewer trials than random search

E.It requires more trials than grid search to find optimal values

AnswersB, C, D

Bayesian optimization handles continuous parameters naturally.

Why this answer

Options A, B, and D are correct. Bayesian optimization uses past results to choose hyperparameters (A), it often finds better values than random search (B), and it is suitable for continuous hyperparameters (D). Option C is false because Bayesian optimization typically requires fewer trials than grid search.

Option E is false because it can handle many hyperparameters, though may need more trials.

Practice this question →

321

MCQhard

A company uses Amazon SageMaker to train a model. The training job uses a custom Docker container. The job fails with the error 'CannotStartContainerError: API error (500).' Which of the following is the most likely cause?

A.The Docker image is built for a different CPU architecture.

B.The training script has a syntax error.

C.The S3 input data is missing.

D.The output path is not writable.

AnswerA

Incompatible architecture prevents container from running.

Why this answer

Option D is correct because incompatible CPU instruction sets can cause container start failures. Option A is wrong because it would cause a different error. Option B is wrong because the error is during start.

Option C is wrong because the error mentions container, not file system.

Practice this question →

322

MCQmedium

A company wants to deploy a machine learning model that performs real-time inference with sub-second latency. The model is a deep neural network with 500 MB of weights. The inference endpoint must scale to zero when not in use to minimize cost. Which AWS service should the company use?

A.Deploy the model as an AWS Lambda function with provisioned concurrency.

B.Use Amazon SageMaker Serverless Inference to host the model.

C.Host the model on Amazon ECS with Fargate and use a target tracking scaling policy.

D.Create an Amazon SageMaker real-time endpoint with automatic scaling policies.

AnswerB

SageMaker Serverless Inference automatically scales to zero when idle, reducing costs, and can handle sub-second latency for suitable workloads. It also supports large model sizes.

Why this answer

Amazon SageMaker Serverless Inference is designed for workloads with intermittent traffic patterns, automatically scaling to zero when idle and scaling up for real-time requests. It supports models up to 1 GB in size and provides sub-second latency for inference, making it ideal for this 500 MB deep neural network. This service eliminates the need to manage underlying infrastructure while meeting the latency and cost requirements.

Exam trap

The trap here is that candidates often confuse SageMaker Serverless Inference with SageMaker real-time endpoints, assuming automatic scaling can reduce costs to zero, but real-time endpoints always require a minimum instance count, whereas Serverless Inference truly scales to zero.

How to eliminate wrong answers

Option A is wrong because AWS Lambda has a maximum deployment package size of 250 MB (unzipped, including layers) and a 15-minute execution timeout, making it unsuitable for a 500 MB model and real-time inference with sub-second latency. Option C is wrong because Amazon ECS with Fargate does not natively scale to zero; it requires at least one running task to handle requests, and target tracking scaling policies maintain a baseline capacity, incurring costs even when idle. Option D is wrong because Amazon SageMaker real-time endpoints with automatic scaling policies cannot scale to zero; they maintain a minimum number of instances to ensure availability, leading to ongoing costs when not in use.

Practice this question →

323

MCQhard

An ML engineer is deploying a model on a SageMaker endpoint and wants to ensure that only authorized users and services can invoke the endpoint. The company uses AWS IAM for access control and requires that the endpoint be invoked only from within a specific VPC. What combination of actions should the engineer take? (Choose the single best answer.)

A.Use AWS CloudFront to restrict access based on IP addresses.

B.Use API Gateway in front of the SageMaker endpoint and attach a resource policy to API Gateway.

C.Create a VPC endpoint for Amazon SageMaker and attach a policy that only allows invocation from the VPC. Use IAM roles to restrict which users can invoke the endpoint.

D.Configure network ACLs on the VPC subnet to allow only the endpoint's security group.

AnswerC

VPC endpoint with policy ensures only traffic from the VPC can reach SageMaker API, and IAM controls user permissions.

Why this answer

The best approach is to attach an IAM policy to the endpoint (via AWS Lambda or resource policy) that denies access unless the request originates from the VPC. However, SageMaker endpoints do not support resource-based policies directly; instead, use VPC Endpoint policies and IAM. The correct answer is to create a VPC endpoint for SageMaker and attach a policy that restricts invocation to the VPC, combined with IAM roles that allow only specific principals.

Option B (network ACLs) is not sufficient for authentication. Option C (API Gateway) adds unnecessary complexity. Option D (CloudFront) is for CDN, not access control.

Practice this question →

324

MCQmedium

A company uses Amazon SageMaker to train machine learning models. The training data contains personally identifiable information (PII). The company needs to ensure that the data is encrypted in transit between S3 and SageMaker. Which configuration is REQUIRED?

A.Use SageMaker in a VPC with VPC endpoints for S3

B.Enable S3 server-side encryption

C.Set the S3 bucket policy to require SSL

D.Use a custom Docker image with TLS configured

AnswerA

VPC endpoints ensure traffic stays within AWS network and uses HTTPS.

Why this answer

Option A is correct because SageMaker uses HTTPS by default for S3 access. Option B is for data at rest. Option C is optional.

Option D is not required for encryption in transit.

Practice this question →

325

MCQhard

A company is using Amazon SageMaker to train a model with a custom algorithm. The training script reads data from an S3 bucket using boto3. The training job fails with an 'AccessDenied' error when trying to access the S3 bucket. The IAM role attached to the SageMaker notebook instance has full S3 access. What is the most likely cause?

A.The S3 bucket has a bucket policy that denies access from the SageMaker service.

B.The SageMaker execution role used for the training job does not have S3 access permissions.

C.The training script is using an incorrect S3 bucket name.

D.The SageMaker training job is not configured to use the S3 VPC endpoint.

AnswerB

The training job uses its own execution role, which must be granted S3 access.

Why this answer

The IAM role attached to the SageMaker notebook instance is used for interactive development, but training jobs run under a separate SageMaker execution role. Even if the notebook role has full S3 access, the training job's execution role must also have explicit S3 permissions. The 'AccessDenied' error indicates that the execution role lacks the necessary s3:GetObject or s3:ListBucket actions for the S3 bucket.

Exam trap

The trap here is that candidates confuse the IAM role attached to the SageMaker notebook instance with the execution role used by the training job, assuming they are the same or that permissions propagate automatically.

How to eliminate wrong answers

Option A is wrong because a bucket policy that denies SageMaker access would typically produce a different error (e.g., 'AccessDenied' with a specific denial message), and the question states the role has full S3 access, so a bucket policy conflict is less likely than a missing execution role permission. Option C is wrong because an incorrect bucket name would result in a 'NoSuchBucket' or '404' error, not an 'AccessDenied' error. Option D is wrong because a missing S3 VPC endpoint would cause a network timeout or connectivity error, not an IAM permission error, and SageMaker can access S3 over the public internet by default.

Practice this question →

326

MCQhard

A SageMaker endpoint has a CloudWatch alarm configured as shown in the exhibit. The alarm fires when the p99 latency exceeds 500 ms for two consecutive minutes. Which action should the data scientist take to reduce latency?

A.Increase the number of instances behind the endpoint

B.Increase the batch size in the inference request

C.Use SageMaker asynchronous inference instead of real-time

D.Switch to GPU instances even if the model does not require GPU

AnswerA

More instances distribute load, reducing latency.

Why this answer

Option A is correct: Increasing instance count reduces load per instance, reducing latency. Option B (GPU) may not help if model is not compute-bound. Option C (batch size) depends on model.

Option D (async inference) changes architecture.

Practice this question →

327

MCQhard

A data scientist is using SageMaker Autopilot to automatically build a classification model. The dataset is highly imbalanced (1% positive class). Which configuration should the scientist set to handle the class imbalance?

A.Set the problem_type to 'BinaryClassification' and enable 'balance_class_weights'.

B.Use the 'AutoMLJobObjective' with 'F1' metric.

C.Set the 'sample_weight' attribute in the input data.

D.Manually downsample the majority class before training.

AnswerB

Optimizing for F1 helps address class imbalance by balancing precision and recall.

Why this answer

SageMaker Autopilot does not directly handle class imbalance automatically. The user can specify a 'problem_type' and 'target_attribute_name', but to address imbalance, they should enable oversampling or use custom recipes. However, among the options, setting the objective metric to 'F1' or 'AUC' is a common technique, but Autopilot allows setting 'balance_class_weights' to 'True' or using 'AutoMLJobObjective' with 'F1'.

The correct answer is to use the 'AutoMLJobObjective' with 'F1' metric, as Autopilot will focus on optimizing F1, which is sensitive to imbalance.

Practice this question →

328

MCQeasy

A SageMaker endpoint configuration is shown in the exhibit. The company wants to deploy the model to a real-time endpoint. What is missing from this configuration to successfully create the endpoint?

A.The model name is missing

B.The endpoint name is not specified in the configuration

C.The initial instance count is missing

D.The accelerator type is missing

E.The data capture configuration is missing

AnswerB

Endpoint name is provided when creating the endpoint, not in the config.

Why this answer

Option C is correct because the endpoint configuration requires at least one production variant, which is present, but the endpoint name is specified when creating the endpoint, not in the config. Option A is wrong because the model name is specified. Option B is wrong because instance count is set.

Option D is wrong because accelerator type is optional. Option E is wrong because data capture config is optional.

Practice this question →

329

MCQhard

An engineer runs the AWS CLI command in the exhibit to create a SageMaker endpoint configuration. The endpoint is created successfully, but when invoked, the inference response is slow. The engineer wants to test with a different instance type. Which action should the engineer take?

A.Create a new endpoint configuration and use it to create a new endpoint

B.Modify the existing endpoint directly using the update-endpoint API with a new instance type parameter

C.Delete the endpoint and create a new one with the desired instance type

D.Update the endpoint configuration with the new instance type and then update the endpoint

AnswerD

You can update the endpoint configuration and then call update-endpoint to apply changes.

Why this answer

Option B is correct because updating the endpoint configuration and then updating the endpoint will apply the new instance type. Option A creates a new endpoint, not efficient. Option C does not change instance type.

Option D is wrong because the endpoint can be updated.

Practice this question →

330

MCQhard

A company is using Amazon SageMaker to train a large natural language processing model. The training job uses a GPU instance and is expected to take several hours. The data scientist wants to monitor GPU utilization in real-time. Which approach is MOST effective?

A.Use SageMaker Managed Spot Training to reduce cost and monitor utilization via spot instance status

B.Modify the training script to periodically log GPU utilization to a file in S3

C.Use SageMaker Debugger to capture GPU utilization tensors

D.Enable CloudWatch metrics for the training job and view GPU utilization in the CloudWatch console

AnswerD

SageMaker automatically publishes GPU metrics to CloudWatch.

Why this answer

Option A is correct because SageMaker publishes CloudWatch metrics for GPU utilization. Option B uses a custom solution when a built-in one exists. Option C is for debugging, not real-time monitoring.

Option D is for managed spot training, not monitoring.

Practice this question →

331

MCQmedium

A company is deploying a real-time inference endpoint using Amazon SageMaker. The model is a large deep learning model that requires low latency. The team is concerned about cost. Which SageMaker hosting option should the team use?

A.Use a SageMaker batch transform job.

B.Use a SageMaker Serverless Inference endpoint.

C.Use a single-instance endpoint with a large instance type.

D.Use a SageMaker multi-model endpoint.

AnswerD

Multi-model endpoints share resources and reduce cost per model.

Why this answer

Option C is correct because multi-model endpoints share resources and reduce cost. Option A is wrong because it's for testing. Option B is wrong because serverless can have cold starts.

Option D is wrong because batch transform is not real-time.

Practice this question →

332

MCQeasy

An ML team wants to perform batch inference on a large dataset stored in Amazon S3 using a pre-trained model. The team needs to process the data in parallel across multiple instances to reduce processing time. Which approach should they use?

A.Use SageMaker Processing to run a custom inference script.

B.Use SageMaker Batch Transform with multiple instances.

C.Use SageMaker Training to run inference as a training job.

D.Use SageMaker Ground Truth to process the data.

AnswerB

Batch Transform splits the input data and runs inference in parallel.

Why this answer

SageMaker Batch Transform is designed for batch inference, automatically distributing the dataset across instances for parallel processing. SageMaker Processing (B) is for data preprocessing, not inference. SageMaker Training (C) is for training models.

SageMaker Ground Truth (D) is for labeling.

Practice this question →

333

MCQmedium

A team uses SageMaker to train a deep learning model. They notice the training job is using only a fraction of the GPU memory. Which configuration change would most improve GPU utilization?

A.Increase the batch size in the training script

B.Decrease the batch size to reduce memory fragmentation

C.Use a single GPU instead of multiple GPUs

D.Enable SageMaker Managed Spot Training

AnswerA

Larger batch sizes consume more GPU memory and improve utilization.

Why this answer

Option A is correct: increasing batch size uses more GPU memory and improves utilization. Option B (reducing batch size) would decrease utilization. Option C (using single GPU) would not help.

Option D (spot training) does not affect utilization.

Practice this question →

334

MCQeasy

A data scientist needs to run a one-time SQL query on a large dataset in S3 to create a training dataset. The query involves aggregations and joins. Which service is most suitable?

A.AWS Glue ETL job

B.Amazon Athena

C.Amazon EMR with Spark SQL

D.Amazon RDS with data loaded into it

AnswerB

Athena is serverless and allows ad-hoc SQL queries directly on S3 data.

Why this answer

Option C (Amazon Athena) is correct for serverless SQL queries on S3. Option A (Amazon EMR) is overkill for a one-time query. Option B (AWS Glue) is for ETL jobs, not ad-hoc queries.

Option D (Amazon RDS) requires moving data.

Practice this question →

335

MCQmedium

A data scientist uses SageMaker to train a model. The training job takes 10 hours, but the team needs to reduce costs. Which approach is MOST cost-effective?

A.Enable Managed Spot Training

B.Use SageMaker Automatic Model Tuning

C.Use a larger instance type to finish faster

D.Use SageMaker Distributed Training with more instances

AnswerA

Spot instances offer significant discounts, reducing cost.

Why this answer

Spot instances can reduce costs up to 90%. Managed Spot Training is the most cost-effective. Option C is correct.

Option A increases cost. Option B may reduce time but not necessarily cost. Option D is for hyperparameter tuning.

Practice this question →

336

MCQeasy

A team is using Amazon SageMaker to train a linear regression model on a dataset with 10 features. After training, they notice the model has high bias. Which action is MOST likely to reduce bias?

A.Increase the regularization parameter lambda

B.Add L2 regularization

C.Use a smaller training dataset

D.Add polynomial features to capture non-linear relationships

AnswerD

Adding features increases model complexity, reducing bias.

Why this answer

Option D is correct because high bias indicates underfitting, which can be reduced by adding more features or increasing model complexity. Option A reduces risk of overfitting, not bias. Option B increases regularization, which increases bias.

Option C reduces data, potentially increasing bias.

Practice this question →

337

MCQhard

A company is using SageMaker Ground Truth to label images for a computer vision model. After launching the labeling job, they notice that the labeling throughput is lower than expected. What should they do to increase throughput?

A.Use a private workforce with more workers.

B.Change the labeling task to use a single annotator per image.

C.Reduce the number of workers assigned to each task.

D.Increase the time allowed for each labeling task.

AnswerA

More workers increase labeling parallelism and throughput.

Why this answer

Option D is correct because using a private workforce with more workers increases parallelism. Option A is incorrect because reducing the number of workers decreases throughput. Option B is incorrect because using a single annotator per task may not speed up labeling.

Option C is incorrect because increasing task time per image slows down throughput.

Practice this question →

338

MCQmedium

A team is training a large language model using PyTorch on multiple GPUs. The training is taking too long due to inefficient data loading. Which AWS service can help accelerate data loading by caching data close to the GPU instances?

A.Amazon FSx for Lustre

B.Amazon EBS Snapshots for fast restore

C.Amazon S3 Transfer Acceleration

D.Amazon CloudFront

AnswerA

High-performance file system with sub-millisecond latency.

Why this answer

Amazon EBS Snapshots are not for caching. FSx for Lustre provides high-performance file system optimized for ML workloads. Option B is wrong because S3 Transfer Acceleration speeds up uploads, not loading.

Option D is wrong because CloudFront is a CDN for web content.

Practice this question →

339

MCQeasy

A company wants to use Amazon Rekognition to detect objects in images stored in an S3 bucket. The images are uploaded by users. Which IAM policy statement is necessary to allow Rekognition to read from the bucket?

A.s3:PutObject

B.s3:DeleteObject

C.s3:GetObject

D.s3:ListBucket

AnswerC

GetObject allows Rekognition to read images.

Why this answer

Rekognition needs s3:GetObject permission (Option B) to read images. Option A (s3:PutObject) is for writing. Option C (s3:ListBucket) is for listing.

Option D (s3:DeleteObject) is for deletion.

Practice this question →

340

MCQhard

A company uses Amazon SageMaker to train machine learning models. The data science team has developed a training script that uses TensorFlow. They want to run the training job on a GPU instance (ml.p3.2xlarge) and store the model artifact in Amazon S3. The training job completes successfully, but the model artifact is not saved to S3. The team has confirmed that the S3 bucket policy allows write access from the SageMaker execution role. The training script uses the TensorFlow estimator with the following configuration: ``` tensorflow_estimator = TensorFlow( entry_point='train.py', role='arn:aws:iam::123456789012:role/SageMakerExecutionRole', instance_count=1, instance_type='ml.p3.2xlarge', output_path='s3://my-bucket/output', framework_version='2.3', py_version='py37', ) ``` The train.py script saves the model using `model.save('/opt/ml/model')`. What is the MOST likely reason the model artifact is not being saved to S3?

A.The training script must save the model to /opt/ml/model/saved_model instead of /opt/ml/model.

B.The SageMaker execution role does not have the s3:PutObject permission for the S3 bucket.

C.The output_path parameter is incorrectly formatted; it should include a trailing slash.

D.The TensorFlow estimator requires the model_dir parameter to be set to the S3 output path.

AnswerB

Correct: The role needs s3:PutObject to write to S3.

Why this answer

The TensorFlow estimator's output_path specifies where the model artifact should be uploaded after training. However, SageMaker automatically uploads the contents of /opt/ml/model to S3 at the end of training. The script is saving to the correct directory.

The issue is likely that the training script is not saving the model correctly or the training fails before saving. But given the job completes successfully, the most common cause is that the SageMaker execution role does not have permission to write to the S3 bucket. The bucket policy allows write access, but the IAM role may lack the necessary S3 permissions.

Option C is correct because the role needs s3:PutObject permission on the bucket. Option A is incorrect because the output_path is correctly specified. Option B is incorrect because the script saves to the right directory.

Option D is incorrect because the estimator does not have a 'model_dir' parameter that overrides the default; the default is /opt/ml/model.

Practice this question →

341

MCQhard

A team wants to automate the retraining of a model weekly using new data that arrives in S3. Which combination of services should they use?

A.AWS Lambda and S3 events

B.Amazon SageMaker Processing jobs

C.AWS Step Functions and AWS Glue

D.Amazon SageMaker Pipelines and S3 events

AnswerD

SageMaker Pipelines is designed for ML workflows and can be triggered by S3 events.

Why this answer

Option C is correct because SageMaker Pipelines provides a managed workflow for training and retraining. Option A is wrong because Step Functions alone is not ML-specific. Option B is wrong because Lambda is not designed for long-running training.

Option D is wrong because SageMaker Processing is for data processing, not full pipeline automation.

Practice this question →

342

Multi-Selectmedium

A data scientist is training a model using Amazon SageMaker and wants to reduce the training time. The training job uses a single GPU instance. Which THREE actions can reduce training time?

Select 3 answers

A.Use distributed training across multiple GPU instances.

B.Use Pipe input mode instead of File input mode.

C.Use a larger instance type with more GPU memory and compute.

D.Increase the amount of training data.

E.Reduce the batch size.

AnswersA, B, C

Distributed training parallelizes the workload.

Why this answer

Options A, B, and D are correct. Using multiple GPUs (distributed training), using Pipe input mode, and using a larger instance type with more compute power can reduce training time. Option C is wrong because using more training data increases training time.

Option E is wrong because reducing batch size can actually increase training time due to more iterations.

Practice this question →

343

MCQmedium

Refer to the exhibit. A company is using an IAM role with the attached policy to deploy a SageMaker model. The data scientist can create training jobs and models, but when trying to create an endpoint, they receive an access denied error. What is the missing permission?

A.cloudwatch:PutMetricData

B.iam:PassRole

C.ec2:CreateNetworkInterface

D.sagemaker:InvokeEndpoint

E.kms:Decrypt

AnswerC

SageMaker creates an ENI in the VPC for the endpoint.

Why this answer

Option E is correct because to create an endpoint, SageMaker needs permission to call ec2:CreateNetworkInterface to set up the elastic network interface in the customer's VPC. Option A (sagemaker:InvokeEndpoint) is for invoking, not creating. Option B (iam:PassRole) is needed to pass the execution role to SageMaker, but the error is about creating endpoint, not passing role.

Option C (kms:Decrypt) is for encrypted data. Option D (cloudwatch:PutMetricData) is for publishing metrics.

Practice this question →

344

Multi-Selectmedium

A data science team is deploying a machine learning model using Amazon SageMaker. The model requires GPU inference and must handle variable traffic with low latency. Which TWO options should the team implement to meet these requirements? (Choose TWO.)

Select 2 answers

A.Use a SageMaker multi-model endpoint with a GPU instance to serve multiple models.

B.Deploy to a SageMaker real-time endpoint using a CPU instance and attach an Elastic Inference accelerator.

C.Use AWS Lambda with an attached GPU function for inference.

D.Host the model on a SageMaker batch transform job with GPU instances.

E.Deploy the model to a SageMaker real-time endpoint using a GPU instance type.

AnswersA, E

Correct: Multi-model endpoint on GPU provides GPU inference and efficient resource utilization for variable traffic.

Why this answer

Option A (SageMaker real-time endpoint with GPU instance) ensures GPU inference and low latency for real-time predictions. Option E (SageMaker multi-model endpoint) reduces cost by sharing a GPU instance across multiple models, which is efficient for variable traffic. The other options either do not support GPU (B, D) or are not suitable for real-time low-latency inference (C).

Practice this question →

345

MCQmedium

A data scientist is using Amazon SageMaker to train a model on a large dataset (10 TB) stored in S3 in Parquet format. The training job uses an ml.p3.16xlarge instance with multiple GPUs. The data scientist notices that the GPU utilization is low (around 30%) and the training is slow. The dataset consists of hundreds of thousands of small Parquet files. The data scientist suspects that the I/O is bottlenecked. What should the data scientist do to improve GPU utilization and training speed?

A.Increase the batch size

B.Consolidate the small Parquet files into larger files (e.g., 1 GB each)

C.Use a smaller instance type to reduce cost

D.Use Pipe input mode to stream data directly

AnswerB

Larger files reduce I/O overhead.

Why this answer

Option A is correct because consolidating small files into larger files reduces the overhead of reading many files from S3, improving I/O throughput and keeping GPUs busy. Option B (use Pipe mode) may help but does not address the file size issue. Option C (increase batch size) may improve utilization but the I/O bottleneck remains.

Option D (use a smaller instance) would not improve speed.

Practice this question →

346

MCQeasy

A data scientist is using AWS Glue to prepare training data. The job reads from an S3 bucket, performs transformations, and writes to another S3 bucket. The job is failing due to insufficient memory. Which solution should the data scientist use to fix this?

A.Use AWS Glue's job bookmark feature.

B.Increase the number of DPU (Data Processing Units) for the job.

C.Use Amazon Athena instead of AWS Glue.

D.Use a columnar file format like Parquet.

AnswerB

More workers provide more memory.

Why this answer

Option C is correct because increasing the number of workers adds more memory. Option A is wrong because changing the file format does not increase memory. Option B is wrong because it's for ETL performance.

Option D is wrong because using Athena is not a job.

Practice this question →

347

MCQmedium

A data scientist has this IAM policy attached to their IAM role. They are trying to run a SageMaker training job that reads data from 'my-bucket' and writes output to 'my-bucket'. The job fails. What is the most likely reason?

A.The sagemaker:CreateTrainingJob action is not allowed on specific resources

B.Missing s3:ListBucket permission on the bucket

C.Missing iam:PassRole permission

D.The training job requires permissions to write to CloudWatch Logs

AnswerC

SageMaker needs permission to pass the execution role to the training job.

Why this answer

Option B is correct: the policy does not grant permission to pass the execution role to SageMaker (iam:PassRole). Option A is incorrect because s3:GetObject and s3:PutObject are present. Option C is incorrect because the actions are allowed.

Option D is irrelevant.

Practice this question →

348

MCQmedium

A machine learning engineer is building a pipeline using Amazon SageMaker Pipelines. The pipeline has multiple steps including data preprocessing, training, and evaluation. Which statement about SageMaker Pipelines is correct?

A.Steps in a pipeline must run sequentially.

B.Pipelines support caching of step outputs.

C.Pipelines can only use built-in algorithms.

D.Pipelines cannot have conditional branches.

AnswerB

Caching speeds up re-runs.

Why this answer

Option D is correct because SageMaker Pipelines supports caching of step outputs to avoid re-execution. Option A is wrong because steps can be conditional. Option B is wrong because pipelines can include custom scripts.

Option C is wrong because pipelines support parallel execution.

Practice this question →

349

MCQeasy

A machine learning engineer is building a pipeline to preprocess data and train a model using Amazon SageMaker. The data is stored in Amazon S3 and the preprocessing step is computationally intensive. The engineer wants to minimize costs while ensuring that the preprocessing step does not fail due to instance termination. Which instance type should be used for the preprocessing step?

A.Reserved instances

B.On-demand instances

C.A larger instance type to speed up processing

D.Spot instances

AnswerB

On-demand instances are reliable and not terminated, ensuring the step completes.

Why this answer

Option C is correct because using on-demand instances guarantees that the instance will not be terminated during the preprocessing step. Option A is wrong because spot instances can be terminated, causing failures. Option B is wrong because reserved instances require a long-term commitment.

Option D is wrong because a larger instance type increases costs unnecessarily.

Practice this question →

350

Drag & Dropmedium

Drag and drop the steps to train a model using Amazon SageMaker built-in algorithm in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Training involves data preparation, job creation, algorithm selection, input/output paths, and execution.

Practice this question →

351

MCQhard

A data scientist is using Amazon SageMaker to train a TensorFlow model on a dataset that includes sensitive personal information (PII). The data is stored in Amazon S3 with server-side encryption using AWS KMS (SSE-KMS). The training job fails with an Access Denied error when trying to read from S3. The data scientist has already verified that the SageMaker execution role has s3:GetObject permissions on the S3 bucket. What additional configuration is needed?

A.Add kms:Decrypt permission to the SageMaker execution role.

B.Add kms:Encrypt permission to the SageMaker execution role.

C.Add a bucket policy that grants s3:GetObject to the SageMaker role.

D.Configure a VPC endpoint for S3 and attach a policy.

AnswerA

SSE-KMS requires decrypt permission to read objects.

Why this answer

Option A is correct because SageMaker needs kms:Decrypt permission to read SSE-KMS encrypted objects. Option B is wrong because SageMaker does not need kms:Encrypt for reading. Option C is wrong because S3 bucket policy is not needed if role has permissions.

Option D is wrong because VPC endpoint policy is not the issue.

Practice this question →

← PreviousPage 5 of 5 · 351 questions total

Ready to test yourself?

Try a timed practice session using only Machine Learning Implementation and Operations questions.

Start 20-question session