Knowledge + Practice

CCNA Deployment and Orchestration of ML Workflows Questions

75 of 124 questions · Page 1/2 · Deployment and Orchestration of ML Workflows · Answers revealed

Practice these questions Domain overview All questions

1

Multi-Selecthard

A company is building a CI/CD pipeline for ML models using AWS CodePipeline and SageMaker. The pipeline should include steps to automatically retrain, evaluate, and deploy models. Which THREE components are essential for this pipeline? (Choose three.)

Select 3 answers

A.SageMaker Pipelines to orchestrate training and evaluation steps.

B.Amazon S3 bucket to store training data and model artifacts.

C.Amazon CloudWatch to log API calls.

D.SageMaker Model Registry to store and version models.

E.AWS Lambda function to trigger evaluation.

AnswersA, B, D

Pipelines define the sequence of steps and conditional logic for retraining and evaluation.

Why this answer

SageMaker Pipelines is essential because it provides a native orchestration service to define, automate, and manage the end-to-end ML workflow, including training, evaluation, and conditional deployment steps. It integrates directly with other SageMaker components and CodePipeline, enabling a seamless CI/CD pipeline without requiring custom orchestration logic.

Exam trap

The trap here is that candidates often confuse monitoring services like CloudWatch with essential pipeline components, or assume that a serverless function like Lambda is required for evaluation when SageMaker Pipelines already provides native evaluation capabilities.

Practice this question →

2

MCQmedium

A team used the above config to create an endpoint. However, the endpoint fails to invoke because of a "ModelError". What is the most likely cause?

A.The instance type is not available in the region.

B.The IAM role does not have permission to access the S3 bucket.

C.The model data URL points to a non-existent file.

D.The ECR image URI is incorrect for the region.

AnswerB

Without s3:GetObject, the endpoint cannot load the model artifact.

Why this answer

The most likely cause of a ModelError when invoking a SageMaker endpoint is that the IAM role associated with the endpoint does not have the necessary permissions to access the S3 bucket containing the model artifacts. SageMaker downloads the model data from S3 during endpoint creation, and if the role lacks s3:GetObject permission on the bucket, the model fails to load, resulting in a ModelError.

Exam trap

AWS often tests the distinction between errors that occur during model creation (e.g., invalid S3 URI, missing file) versus errors that occur at invocation time (ModelError), leading candidates to incorrectly choose Option C when the actual cause is a permissions issue that prevents the model from being loaded.

How to eliminate wrong answers

Option A is wrong because an unavailable instance type would cause an 'InsufficientInstanceCapacity' or 'ResourceLimitExceeded' error, not a ModelError. Option C is wrong because a non-existent model data file would cause a 'ModelError' only if the file path is syntactically valid but missing; however, the question states the endpoint fails to invoke, and a missing file typically raises a 'ValidationError' during model creation, not a runtime ModelError. Option D is wrong because an incorrect ECR image URI would cause an 'ImageNotFoundException' or 'AccessDeniedException' during model creation, not a ModelError at invocation time.

Practice this question →

3

MCQhard

A company uses SageMaker endpoint with production variants for canary deployments. The team wants to gradually shift traffic from the old model variant (variant A) to the new model variant (variant B) over a period of 10 minutes. After the shift, if the new variant's error rate increases by more than 5%, they want to roll back automatically. Which solution meets these requirements with minimal manual intervention?

A.Use AWS Cloud Map to register the new variant and perform a slow rollout.

B.Deploy variant B as a separate endpoint and use Route 53 weighted routing to shift traffic.

C.Use the SageMaker UpdateEndpoint API with a linear traffic shift from variant A to variant B over 10 minutes, and configure a CloudWatch alarm on the new variant's error rate that triggers a Lambda function to revert the traffic weights.

D.Use AWS CodeDeploy with a deployment group to shift traffic and automatically roll back if CloudWatch alarms trigger.

AnswerC

This approach automates both the gradual shift and the rollback based on error rates.

Why this answer

Option C is correct because the SageMaker UpdateEndpoint API supports a linear traffic shift between production variants, allowing you to gradually route traffic from variant A to variant B over a specified time period (here, 10 minutes). By attaching a CloudWatch alarm on the new variant's error rate that triggers a Lambda function to revert the traffic weights, you achieve automatic rollback with minimal manual intervention when the error rate exceeds the 5% threshold.

Exam trap

The trap here is that candidates often assume AWS CodeDeploy (Option D) can manage SageMaker endpoints because it supports canary deployments for other services, but SageMaker has its own native traffic shifting and rollback mechanisms that are not integrated with CodeDeploy.

How to eliminate wrong answers

Option A is wrong because AWS Cloud Map is a service for service discovery and does not provide traffic shifting or canary deployment capabilities for SageMaker endpoints. Option B is wrong because deploying variant B as a separate endpoint and using Route 53 weighted routing would shift traffic at the DNS level, which introduces latency due to DNS caching and does not integrate with SageMaker's native variant monitoring or automatic rollback mechanisms. Option D is wrong because AWS CodeDeploy does not natively support SageMaker endpoints; it is designed for EC2, Lambda, and ECS deployments, and cannot directly manage traffic shifting or rollback for SageMaker production variants.

Practice this question →

4

MCQhard

During a blue/green deployment of a SageMaker endpoint, the team notices that traffic is not being fully shifted to the new variant after the update. The endpoint has two variants with equal initial weights (50% each). The team wants to shift 100% traffic to the new variant. What is the most likely cause?

A.The new variant is using a different instance type that is not supported in the same endpoint

B.The new variant's model container is failing health checks, so traffic is not routed to it

C.The new variant's weight was set to 100 but the maximum weight per variant is 50

D.The endpoint's load balancer is misconfigured and not forwarding traffic to the new variant

AnswerB

SageMaker performs health checks; if the new variant fails, it stays in 'Creating' state and no traffic is routed.

Why this answer

Option B is correct because SageMaker endpoints route traffic only to variants that pass health checks. If the new variant's model container fails health checks (e.g., due to a misconfigured inference script or incompatible dependencies), SageMaker will not send any traffic to it, regardless of the weight setting. This explains why traffic remains stuck at 50% on the old variant despite the intended shift to 100%.

Exam trap

The trap here is that candidates assume weight settings alone control traffic distribution, overlooking that SageMaker enforces health checks as a prerequisite for routing traffic to any variant.

How to eliminate wrong answers

Option A is wrong because SageMaker endpoints support multiple instance types across variants; a different instance type does not prevent traffic routing. Option C is wrong because SageMaker allows a single variant's weight to be set to 100 (the maximum is 100, not 50), so this would not block the shift. Option D is wrong because SageMaker endpoints use an internal application load balancer managed by the service; there is no customer-accessible load balancer to misconfigure.

Practice this question →

5

MCQeasy

A company has a model that receives low traffic but needs to handle sudden spikes. Which deployment option is most cost-effective?

A.SageMaker Serverless Inference

B.SageMaker Real-Time Endpoint with Auto Scaling

C.SageMaker Multi-Model Endpoint

D.SageMaker Batch Transform

AnswerA

Serverless scales to zero during idle periods and handles spikes, minimizing cost.

Why this answer

SageMaker Serverless Inference is the most cost-effective option for low-traffic models with sudden spikes because it automatically scales to zero when not in use and scales up instantly to handle bursts, charging only for the compute time consumed per inference request. This eliminates the cost of idle provisioned infrastructure, making it ideal for unpredictable or intermittent traffic patterns.

Exam trap

AWS often tests the misconception that auto-scaling (Option B) is the most cost-effective for spikes, but the trap is that auto-scaling still requires a baseline of provisioned instances that incur cost even when idle, whereas serverless inference scales to zero and charges only for active compute time.

How to eliminate wrong answers

Option B (SageMaker Real-Time Endpoint with Auto Scaling) is wrong because it requires always-on provisioned instances, incurring costs even during idle periods, and auto-scaling has a lag that may not handle sudden spikes as quickly as serverless. Option C (SageMaker Multi-Model Endpoint) is wrong because it still uses provisioned instances that run continuously, and while it shares resources across models, it does not scale to zero or handle sudden spikes without pre-provisioned capacity. Option D (SageMaker Batch Transform) is wrong because it is designed for offline, asynchronous batch processing on a complete dataset, not for real-time inference with low latency or handling live traffic spikes.

Practice this question →

6

MCQhard

Refer to the exhibit. A SageMaker Pipeline fails with 'Invalid output reference' at the TrainingStep. What is the most likely cause?

A.The TuningStep output name is misspelled

B.The pipeline role lacks permissions

C.The TrainingStep expects a single artifact but TuningStep produces multiple

D.The instance type is incompatible

AnswerC

Tuning step outputs multiple models; directly passing to training step causes ambiguity.

Why this answer

Option C is correct because in SageMaker Pipelines, a `TrainingStep` that expects a single artifact as input will fail with 'Invalid output reference' if the preceding `TuningStep` produces multiple artifacts (e.g., from multiple training jobs). The pipeline cannot resolve which specific artifact to pass, causing the error.

Exam trap

AWS often tests the subtle distinction between output reference errors caused by naming mismatches versus those caused by cardinality mismatches, where candidates mistakenly focus on permissions or spelling instead of the pipeline's inability to handle multiple artifacts.

How to eliminate wrong answers

Option A is wrong because a misspelled output name would cause a different error (e.g., 'Property not found'), not 'Invalid output reference', and the pipeline would fail at the step referencing the name, not at the TrainingStep. Option B is wrong because insufficient pipeline role permissions typically result in an 'AccessDenied' or 'UnauthorizedOperation' error, not 'Invalid output reference'. Option D is wrong because an incompatible instance type causes an 'InsufficientInstanceCapacity' or 'ResourceLimitExceeded' error during step execution, not an output reference validation error.

Practice this question →

7

Multi-Selecteasy

A company is adopting Amazon SageMaker Pipelines to automate their ML workflow. They want to choose three key benefits that SageMaker Pipelines provides over traditional manual scripts and ad-hoc steps. Which THREE benefits are correct?

Select 3 answers

A.Model lineage tracking from raw data to trained model artifacts.

B.Automated deployment of models to endpoints upon pipeline completion.

C.Event-driven execution when new data arrives in S3.

D.Automatic scaling of compute resources based on data volume.

E.Reproducible execution through a directed acyclic graph (DAG) of steps with re-run capabilities.

AnswersA, C, E

Pipelines automatically capture lineage metadata.

Why this answer

Option A is correct because SageMaker Pipelines automatically captures and tracks the lineage of every artifact, including datasets, processing jobs, training jobs, and model versions. This lineage is stored in SageMaker's metadata store, enabling full traceability from raw data to the final model artifact, which is critical for auditability and compliance in ML workflows.

Exam trap

AWS often tests the distinction between orchestration features (like SageMaker Pipelines) and infrastructure management features (like auto-scaling), leading candidates to confuse pipeline benefits with SageMaker's broader managed service capabilities.

Practice this question →

8

MCQmedium

A company is using SageMaker Model Registry to manage model versions. They want to automatically deploy the latest approved model to production after retraining. Which approach is best?

A.Manually deploy the approved model using the SageMaker console

B.Use AWS Lambda to update the endpoint whenever a new model version is created

C.Create a SageMaker Pipeline that includes a model approval step and deployment step

D.Schedule a CloudWatch Event to invoke a SageMaker update endpoint API daily

AnswerC

SageMaker Pipelines can model the entire workflow including conditional deployment based on approval.

Why this answer

Option C is correct because a SageMaker Pipeline can orchestrate the entire workflow from retraining to deployment, including a model approval step that gates deployment to production only when the model is approved. This automates the process end-to-end, ensuring that only approved models are deployed, which aligns with the requirement to automatically deploy the latest approved model after retraining.

Exam trap

The trap here is that candidates may choose Option B because it sounds automated, but they overlook the critical requirement for model approval before deployment, which Lambda alone cannot enforce without additional logic.

How to eliminate wrong answers

Option A is wrong because manual deployment via the SageMaker console does not automate the process and violates the requirement for automatic deployment after retraining. Option B is wrong because using AWS Lambda to update the endpoint whenever a new model version is created would deploy models without waiting for approval, bypassing the model approval step and potentially deploying unapproved models. Option D is wrong because scheduling a CloudWatch Event to invoke a SageMaker update endpoint API daily does not tie deployment to model approval or retraining events; it deploys on a fixed schedule regardless of model status.

Practice this question →

9

Multi-Selecthard

A company is using an AWS Step Functions state machine to orchestrate a multi-step ML deployment. The workflow includes: training a model, evaluating it, registering the model, and deploying to a staging endpoint. They need to implement an approval gate before deploying to production. Which THREE components are necessary to achieve this? (Choose three.)

Select 3 answers

A.An AWS CodePipeline pipeline with approval stage

B.A task in the state machine that pauses and waits for manual approval via SNS or Lambda

C.Model Registry to store the approved model version after evaluation

D.An Amazon SNS topic for notification of approval status

E.An API call to SageMaker to create or update the production endpoint

AnswersB, C, E

Step Functions can use 'Wait for Task Token' to implement human approval.

Why this answer

Option B is correct because Step Functions can use a task with a callback pattern (`.waitForTaskToken`) to pause the workflow and wait for external manual approval. When combined with an SNS topic or Lambda function that sends a task success or failure signal back to Step Functions, this creates a reliable approval gate. This pattern allows the state machine to halt execution until a human approves or rejects the deployment, which is essential for production deployment control.

Exam trap

AWS often tests the distinction between a notification-only service (like SNS) and a service that can actively pause and resume a workflow (like Step Functions with task tokens), leading candidates to mistakenly select SNS as a sufficient approval gate component.

Practice this question →

10

MCQhard

A company is running multiple SageMaker endpoints for different models, each serving a separate business unit. The total cost is growing rapidly. The ML engineering team wants to reduce costs without sacrificing performance or isolation. They are considering either consolidating models into a Multi-Model Endpoint (MME) or onto a Multi-Container Endpoint (MCE). The models vary in size from 100 MB to 5 GB, and traffic patterns are unpredictable. Which recommendation is MOST appropriate?

A.Use a Multi-Model Endpoint with a single large instance type to host all models, and enable SageMaker inference pipelines if pre-processing is needed.

B.Use Multi-Container Endpoints to deploy multiple models on a single instance.

C.Migrate all models to AWS Lambda functions for serverless inference.

D.Keep individual endpoints but switch to Graviton-based instances for cost savings.

AnswerA

Multi-Model Endpoints load models on demand, allowing many small models to share an instance, reducing cost. They support isolation through model directories and can be combined with inference pipelines.

Why this answer

A Multi-Model Endpoint (MME) is the most appropriate choice because it allows hosting multiple models on a single instance while keeping them isolated in separate memory spaces, which reduces cost by sharing the underlying infrastructure. MME dynamically loads and unloads models based on traffic, making it ideal for unpredictable patterns and model sizes ranging from 100 MB to 5 GB. Inference pipelines can be added for pre-processing without breaking the multi-model architecture, preserving performance and isolation.

Exam trap

The trap here is that candidates confuse Multi-Model Endpoints with Multi-Container Endpoints, assuming both provide similar isolation and cost benefits, but MCE is designed for microservices-like architectures where all containers must be active, not for dynamic model loading based on traffic.

How to eliminate wrong answers

Option B is wrong because Multi-Container Endpoints (MCE) run multiple containers on the same instance but all containers are always active, which does not optimize for unpredictable traffic and can lead to higher memory usage and cost. Option C is wrong because AWS Lambda has a maximum deployment package size of 250 MB (unzipped, including layers) and a 15-minute timeout, making it unsuitable for models up to 5 GB and real-time inference with unpredictable latency. Option D is wrong because keeping individual endpoints, even with Graviton instances, does not address the core issue of cost from multiple endpoints; it only reduces per-instance cost marginally, while consolidation is needed for significant savings.

Practice this question →

11

MCQhard

A team uses SageMaker Pipelines for CI/CD. The training step fails due to insufficient memory. How to fix without rewriting code?

A.Modify the training algorithm to use less memory

B.Reduce the batch size in the training script

C.Increase the instance type in the pipeline step configuration

D.Enable managed spot training

AnswerC

Changing instance type is a configuration change, not a code change.

Why this answer

Option C is correct because SageMaker Pipelines allows you to specify the instance type for each training step in the pipeline definition. By increasing the instance type (e.g., from ml.m5.large to ml.m5.xlarge or a memory-optimized instance like ml.r5.large), you allocate more memory to the training container without modifying the training script or algorithm. This directly resolves the out-of-memory error while preserving the existing code.

Exam trap

AWS often tests the distinction between infrastructure-level fixes (changing instance type in pipeline config) and code-level fixes (modifying script or algorithm), trapping candidates who think reducing batch size or enabling spot instances solves memory issues without considering the 'no code rewrite' constraint.

How to eliminate wrong answers

Option A is wrong because modifying the training algorithm to use less memory requires rewriting code, which violates the constraint of fixing the issue without rewriting code. Option B is wrong because reducing the batch size in the training script also requires modifying the training code, and while it may reduce memory usage, it does not meet the 'without rewriting code' condition. Option D is wrong because enabling managed spot training does not increase memory; it only reduces cost by using spare EC2 capacity and can cause interruptions, but it does not address insufficient memory for the training step.

Practice this question →

12

MCQeasy

A data science team needs to deploy a frequently updated PyTorch model for real-time inference. The model is retrained weekly and versioned using SageMaker Model Registry. Which deployment strategy minimizes downtime and allows easy rollback?

A.Deploy the model on an EC2 instance behind an Application Load Balancer and manually update the instance with the new model version.

B.Deploy the model using AWS Lambda with a container image and trigger via API Gateway.

C.Configure SageMaker endpoints with multiple production variants and use canary deployment to shift traffic gradually.

D.Use SageMaker hosting with a single production variant and update the endpoint with a new model configuration each week.

AnswerC

Canary deployment allows gradual traffic shift, minimizing downtime and enabling rollback.

Why this answer

Option C is correct because SageMaker endpoints with multiple production variants enable canary deployment, which shifts traffic gradually from the old model to the new one. This minimizes downtime by keeping both variants active during the transition and allows easy rollback by simply redirecting all traffic back to the previous variant if issues arise.

Exam trap

The trap here is that candidates often assume a single production variant with endpoint updates is sufficient, overlooking the downtime and rollback limitations, while the canary deployment pattern with multiple variants directly addresses the requirements for minimal downtime and easy rollback.

How to eliminate wrong answers

Option A is wrong because manually updating an EC2 instance behind an ALB introduces downtime during the update process and lacks automated rollback capabilities, making it unsuitable for a frequently updated model requiring minimal downtime. Option B is wrong because AWS Lambda has a maximum invocation duration of 15 minutes and is designed for stateless, short-lived functions, not for hosting real-time inference workloads that require persistent, low-latency serving. Option D is wrong because using a single production variant and updating the endpoint configuration each week requires a full endpoint update, which causes downtime during the deployment and does not support gradual traffic shifting or easy rollback without redeploying the previous version.

Practice this question →

13

Multi-Selectmedium

A company is deploying a machine learning model using SageMaker hosting. They need to support multiple versions of the model for A/B testing. Which TWO actions are required to set up the A/B test? (Choose two.)

Select 2 answers

A.Enable shadow variants to capture traffic for the new model without affecting users

B.Set up a batch transform job to compare performance offline

C.Configure the endpoint to route a percentage of traffic to each variant using initial variant weight

D.Register both models in SageMaker Model Registry

E.Create an endpoint with two production variants, each serving a different model version

AnswersC, E

Traffic splitting is achieved via variant weights.

Why this answer

Option C is correct because SageMaker endpoints use `initial variant weight` to distribute traffic among production variants. By setting this weight, you can route a specific percentage of inference requests to each model version, enabling A/B testing without changing the endpoint configuration.

Exam trap

The trap here is that candidates confuse shadow variants (which are for passive monitoring) with production variants (which are for active traffic splitting), leading them to select Option A instead of understanding that A/B testing requires explicit traffic routing via variant weights.

Practice this question →

14

MCQhard

A healthcare company is deploying a model for predicting patient outcomes. The model must be deployed across multiple AWS accounts to meet compliance requirements. Each account has its own Amazon SageMaker endpoint. The company wants to centralize monitoring of model performance without exposing data across accounts. Which solution should the company use?

A.Establish VPC peering between accounts and call the endpoints from a central monitoring service.

B.Replicate the inference data to a central S3 bucket in the management account using cross-account replication, then run Model Monitor centrally.

C.Use SageMaker Model Monitor in each account and publish custom metrics to a central CloudWatch account using cross-account observability.

D.Create a shared SageMaker Model Registry across accounts and aggregate monitoring.

AnswerC

This allows centralized monitoring without moving data across accounts.

Why this answer

Option C is correct because it uses SageMaker Model Monitor in each account to detect data drift and model degradation locally, then publishes custom metrics to a central CloudWatch account via cross-account observability. This approach centralizes monitoring without moving raw inference data across accounts, satisfying the compliance requirement of not exposing data.

Exam trap

The trap here is confusing data replication (which exposes raw data) with metric aggregation (which exposes only statistical summaries), leading candidates to pick Option B despite its compliance violation.

How to eliminate wrong answers

Option A is wrong because VPC peering enables network connectivity but does not provide a mechanism to centralize monitoring metrics or avoid exposing inference data across accounts; it would require direct data transfer to a central service, violating compliance. Option B is wrong because replicating inference data to a central S3 bucket using cross-account replication exposes raw data across accounts, which directly violates the requirement to not expose data. Option D is wrong because a shared SageMaker Model Registry aggregates model metadata and versions, not real-time monitoring metrics or data drift detection; it does not provide centralized performance monitoring.

Practice this question →

15

MCQmedium

A machine learning engineer is configuring auto-scaling for a SageMaker real-time endpoint. The endpoint is expected to have steady traffic during business hours and low traffic at night. The engineer wants to minimize costs by scaling in during low traffic, but the model container has a long start-up time (about 5 minutes). Which scaling policy should the engineer use to prevent request drops during sudden traffic spikes?

A.Use a step scaling policy based on invocations per minute with a step that adds two instances at a time.

B.Use a target tracking scaling policy based on average invocations per minute with a warm-up of 300 seconds.

C.Use a scheduled scaling action to add instances before business hours and remove them after.

D.Use a simple scaling policy based on average CPU utilization with a cooldown period of 5 minutes.

AnswerB

Target tracking with a warm-up period ensures that newly launched instances are not included in the metric until they are ready, preventing traffic loss.

Why this answer

Option B is correct because target tracking scaling policies in SageMaker automatically adjust capacity to maintain a target metric value, and the warm-up time of 300 seconds accounts for the 5-minute container start-up latency. This prevents request drops during sudden traffic spikes by ensuring new instances are fully initialized before they receive traffic, while still allowing the endpoint to scale in during low traffic to minimize costs.

Exam trap

The trap here is that candidates often choose a step scaling policy (Option A) because they think adding multiple instances at once handles spikes faster, but they overlook the critical need for a warm-up period to account for container start-up latency, which target tracking with warm-up explicitly addresses.

How to eliminate wrong answers

Option A is wrong because step scaling policies add instances in fixed increments (e.g., two at a time) without considering the long start-up time; this can lead to over-provisioning or under-provisioning during sudden spikes, and the lack of a warm-up period means new instances may not be ready to handle incoming requests, causing drops. Option C is wrong because scheduled scaling actions only handle predictable traffic patterns (e.g., business hours) and cannot react to sudden, unplanned traffic spikes, leaving the endpoint vulnerable to request drops. Option D is wrong because simple scaling policies based on average CPU utilization with a cooldown period of 5 minutes do not account for the model container's start-up latency; the cooldown prevents further scaling actions during the start-up period, but the policy itself cannot pre-warm instances, so traffic spikes during the cooldown can still cause request drops.

Practice this question →

16

MCQeasy

A machine learning engineer is deploying a model using AWS Lambda for inference. The model is a small scikit-learn classifier with a size of 50 MB. The Lambda function is invoked by an API Gateway REST API. The engineer notices that cold starts are causing high latency. Which action would most effectively reduce cold start latency without increasing costs significantly?

A.Store the model in Amazon EFS and load it at runtime.

B.Increase the Lambda function memory to the maximum of 10,240 MB.

C.Configure provisioned concurrency for the Lambda function.

D.Package the model in a container image and deploy using Lambda container support.

AnswerC

Provisioned concurrency keeps instances initialized and ready to respond immediately.

Why this answer

Option C is correct because provisioned concurrency pre-initializes the Lambda execution environment, keeping it warm and ready to handle requests immediately. This eliminates the cold start overhead for the first request, directly reducing latency without incurring the ongoing costs of a larger memory allocation or the complexity of EFS/container management.

Exam trap

The trap here is that candidates often confuse 'reducing cold start latency' with 'reducing compute time' or 'improving model loading speed', leading them to choose options like increasing memory or using EFS, which do not address the fundamental issue of environment initialization.

How to eliminate wrong answers

Option A is wrong because Amazon EFS adds network latency for each invocation to load the model, which can actually increase cold start time and does not address the root cause of cold starts. Option B is wrong because increasing memory to the maximum (10,240 MB) increases cost significantly (Lambda pricing scales linearly with memory) and does not eliminate cold starts; it only reduces compute time for the same workload. Option D is wrong because deploying as a container image does not inherently reduce cold start latency; container images can actually increase cold start time due to image pull overhead unless combined with provisioned concurrency.

Practice this question →

17

MCQeasy

A machine learning team needs to deploy a model that was built using scikit-learn. They want to use SageMaker for hosting. Which approach should they take?

A.Create a Jupyter notebook that loads the model and runs predictions on the SageMaker notebook instance

B.Create a custom Docker container with scikit-learn and deploy it on SageMaker

C.Launch a SageMaker training job with the model and use the training instance as an endpoint

D.Package the model artifacts and use the SageMaker built-in scikit-learn container for inference

AnswerD

Built-in container supports scikit-learn models; simply point to model artifacts.

Why this answer

Option D is correct because SageMaker provides a pre-built, optimized Docker container for scikit-learn that supports inference. By packaging the model artifacts (e.g., a joblib or pickle file) and deploying them using the built-in container, the team avoids the overhead of custom container creation while ensuring compatibility with SageMaker's hosting infrastructure, including automatic scaling and load balancing.

Exam trap

The trap here is that candidates often overcomplicate the solution by assuming a custom Docker container is always required for scikit-learn, overlooking the fact that SageMaker provides a fully managed, built-in container specifically for this framework.

How to eliminate wrong answers

Option A is wrong because a Jupyter notebook on a notebook instance is designed for interactive development and testing, not for production hosting; it lacks the necessary endpoint management, scaling, and availability features of SageMaker hosting. Option B is wrong because while a custom Docker container is a valid approach, it is unnecessary when SageMaker provides a built-in scikit-learn container that already includes the required dependencies and is optimized for inference, making this option over-engineered and more complex than needed. Option C is wrong because a SageMaker training job is ephemeral and intended for model training, not for serving inference requests; using a training instance as an endpoint is not supported, as training instances lack the persistent endpoint infrastructure (e.g., HTTPS endpoints, auto-scaling groups) required for production hosting.

Practice this question →

18

Multi-Selecteasy

A company wants to deploy a model on SageMaker serverless inference. Which TWO of the following are limitations of serverless endpoints compared to real-time endpoints? (Choose two.)

Select 2 answers

A.Cold starts can cause increased latency for infrequent requests

B.Cannot deploy multiple containers in the same endpoint

C.No support for GPU instances

D.Maximum memory configuration is 6 GB

E.No automatic scaling – must be configured manually

AnswersC, D

Serverless endpoints only support CPU.

Why this answer

Option C is correct because SageMaker serverless inference does not support GPU instances; it only runs on CPU-based instances. This is a fundamental limitation for workloads requiring GPU acceleration, such as deep learning models. In contrast, real-time endpoints support both CPU and GPU instance types.

Exam trap

The trap here is that candidates may confuse cold starts (option A) as a limitation unique to serverless endpoints, but the question asks for limitations compared to real-time endpoints, and cold starts are inherent to serverless, not a comparative limitation; the two correct answers are the specific technical constraints of no GPU support and the 6 GB memory cap.

Practice this question →

19

MCQeasy

A data science team deploys a PyTorch model on Amazon SageMaker for real-time inference. The model requires GPU for low latency. Which instance type is MOST cost-effective while meeting the GPU requirement?

A.ml.m5.2xlarge

B.ml.p4d.24xlarge

C.ml.p3.2xlarge

D.ml.c5.2xlarge

AnswerC

ml.p3.2xlarge provides a GPU at a cost-effective price point.

Why this answer

Option C (ml.p3.2xlarge) is correct because it provides a GPU (NVIDIA V100) necessary for low-latency PyTorch inference on SageMaker, while being the most cost-effective among GPU options. The ml.p3.2xlarge offers a single GPU with sufficient compute for many real-time inference workloads, avoiding the higher cost of larger instances like ml.p4d.24xlarge.

Exam trap

The trap here is that candidates may assume any GPU instance is equally cost-effective, overlooking that ml.p4d.24xlarge is overprovisioned for typical inference, while CPU-only instances like ml.m5 and ml.c5 are tempting but fail the explicit GPU requirement.

How to eliminate wrong answers

Option A (ml.m5.2xlarge) is wrong because it is a general-purpose CPU instance with no GPU, failing to meet the GPU requirement for low-latency PyTorch inference. Option B (ml.p4d.24xlarge) is wrong because, while it provides powerful GPUs (NVIDIA A100), it is significantly more expensive than necessary for typical real-time inference, making it not the most cost-effective choice. Option D (ml.c5.2xlarge) is wrong because it is a compute-optimized CPU instance with no GPU, which cannot satisfy the GPU requirement for low-latency inference.

Practice this question →

20

Multi-Selectmedium

A company uses SageMaker to orchestrate a training pipeline with multiple steps including preprocessing, training, and evaluation. They want to ensure that each step can be reused and tracked. Which three SageMaker features support this? (Select THREE.)

Select 3 answers

A.SageMaker Pipelines

B.SageMaker Experiments

C.SageMaker Processing Jobs

D.SageMaker Clarify

E.SageMaker Model Monitor

AnswersA, B, C

Pipelines orchestrate multiple steps and support reuse.

Why this answer

SageMaker Pipelines is correct because it provides a directed acyclic graph (DAG) of steps that can be defined, parameterized, and reused across different runs. Each step (preprocessing, training, evaluation) is a distinct, versioned component that can be independently tracked and re-executed, enabling modular orchestration of ML workflows.

Exam trap

The trap here is that candidates confuse SageMaker Clarify and Model Monitor as pipeline orchestration tools, when they are actually separate services for model governance and production monitoring, not for step reuse and tracking.

Practice this question →

21

MCQeasy

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but also needs to keep costs low when traffic is low. Which SageMaker feature should they use?

A.SageMaker Debugger

B.SageMaker Managed Spot Training

C.SageMaker Elastic Inference

D.SageMaker Model Monitor

AnswerC

Elastic Inference attaches GPU acceleration to any SageMaker instance, reducing cost.

Why this answer

SageMaker Elastic Inference (EI) allows you to attach a fraction of a GPU to a SageMaker endpoint for inference, providing GPU acceleration at a lower cost than using a full GPU instance. This is ideal for scenarios with variable traffic because you can scale the EI accelerator independently of the instance, and pay only for the accelerator when it's used, keeping costs low during low-traffic periods.

Exam trap

The trap here is that candidates often confuse SageMaker Managed Spot Training (cost savings for training) with inference cost optimization, or assume that GPU acceleration for inference requires a full GPU instance like ml.p3.2xlarge, overlooking Elastic Inference as a fractional GPU solution.

How to eliminate wrong answers

Option A is wrong because SageMaker Debugger is a tool for monitoring and debugging training jobs (e.g., detecting vanishing gradients), not for accelerating inference or reducing inference costs. Option B is wrong because SageMaker Managed Spot Training is a feature for reducing training costs by using spot instances, not for inference or GPU acceleration at the endpoint. Option D is wrong because SageMaker Model Monitor is used to detect data drift and quality issues in deployed models, not to provide GPU acceleration or cost savings for inference.

Practice this question →

22

MCQeasy

A company wants to use SageMaker to serve real-time predictions with a model that has a large memory footprint. They need to ensure the endpoint can handle traffic spikes. Which scaling policy should they use?

A.Simple scaling policy

B.Scheduled scaling policy

C.Target tracking policy

D.Step scaling policy

AnswerC

Target tracking automatically adjusts capacity to maintain a target metric value.

Why this answer

Target tracking scaling policy is the correct choice because it automatically adjusts the number of instances in the SageMaker endpoint based on a target metric, such as InvocationsPerInstance or ModelLatency, to handle traffic spikes without manual intervention. This policy is ideal for real-time inference with large memory models because it dynamically scales resources up or down to maintain the target metric, ensuring consistent performance during unpredictable traffic bursts.

Exam trap

The trap here is that candidates often confuse step scaling with target tracking, assuming step scaling is more responsive for spikes, but target tracking is actually the recommended and simpler approach for handling unpredictable traffic in SageMaker real-time endpoints.

How to eliminate wrong answers

Option A is wrong because simple scaling policy only triggers a single adjustment based on a CloudWatch alarm breach and then waits for a cooldown period, which cannot handle rapid traffic spikes effectively and may lead to under- or over-provisioning. Option B is wrong because scheduled scaling policy adjusts capacity at predetermined times, which is unsuitable for unpredictable traffic spikes that do not follow a fixed schedule. Option D is wrong because step scaling policy requires defining multiple step adjustments with thresholds, which is more complex to configure and may not react as smoothly to sudden spikes compared to target tracking, which continuously adjusts to maintain a target metric.

Practice this question →

23

MCQmedium

Refer to the exhibit. A data scientist creates a SageMaker Pipeline definition using the JSON shown. The pipeline runs successfully, but the scientist notices that the training step did not use the parameter 'TrainingInstanceCount' defined in Parameters. Why did this happen?

A.The pipeline encountered a runtime error and fell back to default values.

B.The parameter name has a typo; it should be 'TrainingInstanceCount' not 'TrainingInstanceCount'.

C.The steps do not reference the Parameters; the values are hardcoded in the step definitions.

D.The training image is not compatible with the specified instance type.

AnswerC

Parameters must be explicitly referenced in steps to take effect.

Why this answer

Option C is correct because the SageMaker Pipeline definition shows that the training step's `InstanceCount` field is hardcoded to `1` in the step definition, rather than referencing the `TrainingInstanceCount` parameter using the `Parameters` object (e.g., `Parameters.TrainingInstanceCount`). In SageMaker Pipelines, parameters defined in the `Parameters` section must be explicitly referenced within the step definitions using the `Parameters` object; otherwise, the pipeline uses the hardcoded values and ignores the parameters entirely.

Exam trap

AWS often tests the misconception that simply defining a parameter in the `Parameters` section automatically applies it to all steps, when in reality each step must explicitly reference the parameter using the `Parameters` object.

How to eliminate wrong answers

Option A is wrong because the pipeline ran successfully, and a runtime error would have caused the pipeline to fail, not fall back to default values; SageMaker Pipelines does not silently fall back to defaults on error. Option B is wrong because the parameter name 'TrainingInstanceCount' is spelled identically in both the Parameters section and the step definition, so there is no typo. Option D is wrong because the training image compatibility with the instance type would cause a runtime error during execution, not cause the parameter to be ignored; the pipeline would fail if the image were incompatible.

Practice this question →

24

MCQmedium

A company uses Amazon SageMaker to train and deploy machine learning models. The security team requires that all data in transit between the training job and S3 be encrypted, and that no data traverses the public internet. Which configuration should the company use?

A.Create a VPC with S3 VPC endpoints, attach a VPC-only policy to the SageMaker execution role, and enable KMS encryption for training jobs.

B.Use an S3 bucket with SSE-S3 encryption and restrict bucket access to a VPC.

C.Enable default encryption on the S3 bucket and use HTTPS for all SageMaker endpoints.

D.Create a VPC with a NAT gateway, and configure SageMaker to use the VPC and enforce HTTPS.

AnswerA

S3 VPC endpoints keep traffic within AWS network, and KMS encrypts data in transit and at rest.

Why this answer

Option A is correct because it ensures that data in transit between SageMaker and S3 stays within the AWS network and is encrypted. By creating a VPC with S3 VPC endpoints, traffic uses AWS private IPs and never traverses the public internet. Attaching a VPC-only policy to the SageMaker execution role restricts the training job to only use VPC endpoints, and enabling KMS encryption for the training job ensures data is encrypted in transit (via TLS) and at rest.

Exam trap

The trap here is that candidates often confuse encryption in transit (HTTPS) with keeping traffic off the public internet, not realizing that HTTPS can still traverse the public internet unless a VPC endpoint or Direct Connect is used.

How to eliminate wrong answers

Option B is wrong because SSE-S3 only encrypts data at rest in S3, not data in transit; it also does not prevent data from traversing the public internet. Option C is wrong because default bucket encryption and HTTPS only address encryption in transit but do not keep traffic off the public internet; HTTPS can still route over the public internet. Option D is wrong because a NAT gateway is used for outbound internet access, which would send traffic over the public internet, violating the requirement that no data traverses the public internet; HTTPS alone does not enforce private network routing.

Practice this question →

25

Multi-Selectmedium

A machine learning engineer is deploying a model using SageMaker and needs to ensure that the endpoint can automatically scale based on traffic patterns. Which TWO actions should the engineer take? (Choose two.)

Select 2 answers

A.Define a scaling policy using Application Auto Scaling for the SageMaker endpoint variant.

B.Set up an Amazon CloudWatch alarm to trigger scaling based on the InvocationsPerInstance metric.

C.Enable SageMaker Model Monitor to detect data drift.

D.Configure a multi-model endpoint to serve multiple models.

E.Use SageMaker batch transform to handle variable traffic.

AnswersA, B

Auto Scaling policies adjust capacity based on CloudWatch metrics.

Why this answer

Option A is correct because SageMaker endpoints use Application Auto Scaling to automatically adjust the number of instances based on traffic. You define a scaling policy (e.g., target tracking, step scaling) that references a CloudWatch metric. Option B is correct because the InvocationsPerInstance metric is a standard SageMaker endpoint metric that reflects the load per instance, and a CloudWatch alarm on this metric can trigger the scaling policy to add or remove instances as traffic changes.

Exam trap

The trap here is confusing monitoring and scaling: candidates often pick Model Monitor (Option C) because it sounds like it monitors traffic, but it is for data drift, not scaling; similarly, batch transform (Option E) is mistaken for a scaling solution when it is a separate inference mode.

Practice this question →

26

Multi-Selecthard

A machine learning team is building a CI/CD pipeline to train and deploy models using Amazon SageMaker. They want to ensure that the deployment step only proceeds if the model evaluation metrics exceed a certain threshold. Which THREE components should the team include in the pipeline? (Choose THREE.)

Select 3 answers

A.An AWS Lambda function for manual approval.

B.An AWS CodeBuild project to compile the model artifacts.

C.The SageMaker Model Registry to approve and store the model after evaluation.

D.A SageMaker endpoint deployment step that runs only after approval.

E.A condition step that checks if the evaluation metric exceeds the threshold.

AnswersC, D, E

Model Registry can store model versions and track approval status.

Why this answer

Option C is correct because the SageMaker Model Registry is the central component for approving and storing model versions after evaluation. It enables governance by allowing you to set approval statuses (e.g., Approved, Rejected) and track model lineage, ensuring only validated models proceed to deployment.

Exam trap

AWS often tests the misconception that manual approval via Lambda is required for gating deployments, but the correct approach uses SageMaker Model Registry's built-in approval mechanism combined with a condition step in the pipeline.

Practice this question →

27

MCQhard

A company is deploying a deep learning model for real-time inference using Amazon SageMaker. The model is a CPU-intensive XGBoost model that performs well with CPU. However, the team wants to minimize latency further by using hardware acceleration. They are considering Amazon Elastic Inference (EI) or moving to a GPU instance. The model is not optimized for GPU, so significant code changes would be required. Which approach is the MOST cost-effective way to reduce latency without changing the model code?

A.Use a GPU instance (ml.p3.2xlarge) and optimize the model with SageMaker Neo compilation.

B.Attach an Elastic Inference accelerator (e.g., ml.eia2.medium) to the existing CPU endpoint.

C.Use SageMaker Neo to compile the model for CPU with INT8 quantization.

D.Migrate the model to AWS Lambda with a custom runtime and use AVX instructions.

AnswerB

Elastic Inference provides cost-effective acceleration for XGBoost and other models without code changes.

Why this answer

Option B is correct because Amazon Elastic Inference (EI) allows you to attach a low-cost GPU-powered acceleration to an existing SageMaker CPU endpoint without any code changes. Since the XGBoost model is CPU-optimized and not GPU-native, EI provides hardware acceleration for the inference computation (specifically matrix operations) while keeping the model execution on the CPU, thus reducing latency without requiring model modifications.

Exam trap

The trap here is that candidates assume GPU instances are always the best for hardware acceleration, but the question explicitly states the model is not GPU-optimized and requires significant code changes, making Elastic Inference the only viable option that reduces latency without code modifications.

How to eliminate wrong answers

Option A is wrong because using a GPU instance (ml.p3.2xlarge) would require significant code changes to leverage GPU acceleration, as XGBoost is not natively GPU-optimized for inference; SageMaker Neo compilation does not automatically adapt the model to run on GPU hardware without code changes. Option C is wrong because SageMaker Neo compilation for CPU with INT8 quantization reduces model size and improves throughput, but it does not provide hardware acceleration (like GPU or EI) to reduce latency; it optimizes for CPU execution, not hardware-accelerated inference. Option D is wrong because AWS Lambda does not support attaching Elastic Inference accelerators, and using AVX instructions is a CPU-level optimization that does not provide the hardware acceleration needed to reduce latency beyond CPU capabilities; moreover, Lambda has a 15-minute timeout and is not designed for real-time inference with large models.

Practice this question →

28

MCQhard

A company is deploying a ML model for real-time fraud detection using SageMaker. The model must process requests within 50 ms and scale to handle up to 10,000 requests per second during peak hours. The data includes PII, so all traffic must stay within a VPC. The team has configured the SageMaker endpoint with a VPC and an internet gateway for model downloads. During a load test, the endpoint fails to achieve the required throughput. Which change would most likely resolve the issue?

A.Remove the VPC configuration and use public endpoints to reduce network overhead.

B.Use VPC endpoints (interface endpoint for SageMaker and gateway endpoint for S3) to keep traffic within AWS backbone.

C.Add a NAT gateway to allow the SageMaker endpoint to access the internet efficiently.

D.Increase the instance count and use a larger instance type to handle the throughput.

AnswerB

VPC endpoints reduce latency and keep traffic within AWS network, improving throughput.

Why this answer

The correct answer is B because the endpoint is currently using an internet gateway for model downloads, which forces traffic out to the public internet and back, adding latency and risking throughput failures. By using VPC interface endpoints for SageMaker and gateway endpoints for S3, all traffic stays within the AWS backbone network, reducing network overhead and meeting the 50 ms latency requirement. This also keeps PII traffic within the VPC, satisfying security constraints.

Exam trap

The trap here is that candidates often assume throughput issues are always solved by scaling compute resources (Option D), when the real bottleneck is network architecture—specifically, the unnecessary internet gateway hop that adds latency and reduces throughput.

How to eliminate wrong answers

Option A is wrong because removing the VPC configuration would expose PII traffic to the public internet, violating security requirements, and public endpoints can still suffer from internet-related latency and bandwidth limitations. Option C is wrong because a NAT gateway is used to allow outbound internet access from private subnets, but the issue is not about internet access—it's about reducing latency by keeping traffic on the AWS backbone; a NAT gateway would add another hop and increase latency. Option D is wrong because increasing instance count and size addresses compute capacity but does not fix the network bottleneck caused by routing traffic through an internet gateway; the throughput failure is likely due to network latency, not insufficient compute resources.

Practice this question →

29

Multi-Selectmedium

Which TWO of the following are best practices for deploying machine learning models on SageMaker? (Select TWO.)

Select 2 answers

A.Store model artifacts in Amazon EBS volumes attached to the endpoint instances

B.Use separate production and staging endpoints to test new models before full rollout

C.Manually track model versions using tags because SageMaker Model Registry is not available for deployment

D.Disable CloudWatch Logs to reduce costs during inference

E.Enable data capture on endpoints to log predictions for auditing and model monitoring

AnswersB, E

Testing on staging before production is a best practice.

Why this answer

Option B and Option D are correct. Option A is wrong because model should be in S3, not EBS. Option C is wrong because you should use the SageMaker Model Registry for versioning.

Option E is wrong because CloudWatch Logs are enabled by default, not disabled.

Practice this question →

30

Multi-Selecthard

A company deploys a SageMaker endpoint that is InService, but inference requests are returning 503 Service Unavailable errors when traffic is high. The endpoint uses three ml.m5.large instances with target tracking scaling based on CPU utilization. The team has confirmed the model container is healthy. Which TWO possible issues could cause 503 errors?

Select 2 answers

A.The model output is too large for the response buffer.

B.The instance memory is insufficient for the model, causing the container to run out of memory under load.

C.The Auto Scaling group has a cooldown period that prevents adding new instances quickly during traffic spikes.

D.The execution role does not have permission to invoke SageMaker endpoint.

E.The inference request size exceeds the maximum payload size limit.

AnswersB, C

Out-of-memory conditions can cause the container to become unresponsive, leading to 503 errors.

Why this answer

Option B is correct because the endpoint might not have enough instances to handle peak load if scaling policies have cooldown delays. Option E is correct because if the model is large and instances run out of memory, new requests may be rejected. Option A is wrong because execution role permissions cause 403 or 500 errors, not 503.

Option C is wrong because 503 indicates server overload, not client request timeout. Option D is wrong because 400 errors are from client side.

Practice this question →

31

MCQmedium

A team is using AWS Step Functions to orchestrate a machine learning workflow that includes data preprocessing, training, and model evaluation. The team wants to run the workflow whenever new data arrives in an S3 bucket. Which approach should they use to trigger the Step Functions workflow?

A.Configure the S3 bucket to send an event notification directly to the Step Functions state machine.

B.Use S3 event notifications to send a message to an Amazon SQS queue, and have a Lambda function poll the queue to start the execution.

C.Use a CloudWatch Logs metric filter to trigger the Step Functions execution.

D.Configure the S3 bucket to send events to Amazon EventBridge, and create an EventBridge rule that targets the Step Functions state machine.

AnswerD

EventBridge can directly invoke Step Functions based on S3 events, providing a simple serverless trigger.

Why this answer

Option D is correct because Amazon S3 can send event notifications directly to Amazon EventBridge, and EventBridge rules can target AWS Step Functions state machines as a target. This provides a fully managed, serverless integration that allows the Step Functions workflow to be triggered automatically whenever new data arrives in the S3 bucket, without needing intermediate polling or custom code.

Exam trap

The trap here is that candidates may assume S3 can directly invoke Step Functions (Option A) because they know S3 can trigger Lambda, but they overlook that Step Functions is not a supported direct destination for S3 event notifications.

How to eliminate wrong answers

Option A is wrong because S3 event notifications cannot directly target a Step Functions state machine; S3 event notifications support only Lambda, SQS, SNS, and EventBridge as destinations. Option B is wrong because while it would work, it introduces unnecessary complexity and latency by requiring a Lambda function to poll an SQS queue, which is not the simplest or most efficient approach when EventBridge provides direct integration. Option C is wrong because CloudWatch Logs metric filters are designed to monitor log data and trigger alarms or metrics, not to trigger Step Functions executions; they cannot directly invoke a state machine.

Practice this question →

32

MCQmedium

An ML team uses AWS Step Functions to orchestrate a multi-step inference pipeline: data preprocessing, model inference, and postprocessing. The pipeline runs on demand for single records. The team notices that the pipeline occasionally fails due to timeouts in the preprocessing step. They want to implement retries with exponential backoff and a maximum retry count of 3 for that step. How should they configure this?

A.Implement retry logic inside the preprocessing Lambda function code.

B.Modify the Step Functions state machine definition to add a Retry field on the preprocessing state with a maximum retry count of 3 and an exponential backoff rate of 2.0.

C.Wrap the preprocessing step in a SageMaker Pipeline step with retry policy.

D.Add a Catch in the state machine to rerun the entire pipeline if preprocessing fails.

AnswerB

Step Functions Retry field automatically implements exponential backoff and retry logic.

Why this answer

Option B is correct because AWS Step Functions natively supports retry logic with exponential backoff directly in the state machine definition. By adding a `Retry` field on the preprocessing state with `MaxAttempts: 3` and `BackoffRate: 2.0`, the service automatically retries the step on specified errors (e.g., `States.Timeout` or `Lambda.ServiceException`) with exponentially increasing wait times, without requiring custom code or external orchestration.

Exam trap

The trap here is that candidates often assume retry logic must be coded inside the Lambda function (Option A) or that a Catch block (Option D) is the correct way to handle failures, but Step Functions provides a declarative Retry mechanism that is more robust and easier to maintain for orchestrated workflows.

How to eliminate wrong answers

Option A is wrong because implementing retry logic inside the Lambda function code would not leverage Step Functions' built-in exponential backoff and would require custom sleep logic, increasing complexity and violating the separation of concerns between orchestration and business logic. Option C is wrong because SageMaker Pipeline steps are designed for batch training and model building workflows, not for orchestrating a lightweight inference pipeline with single-record processing; wrapping a preprocessing Lambda in a SageMaker Pipeline step adds unnecessary overhead and does not natively support the simple retry policy needed here. Option D is wrong because adding a `Catch` to rerun the entire pipeline on preprocessing failure would restart all steps (including inference and postprocessing), wasting compute time and resources, whereas a targeted retry on only the preprocessing step is more efficient and aligns with the requirement.

Practice this question →

33

MCQhard

A financial services company deploys multiple models on a single Amazon SageMaker endpoint using a multi-model endpoint (MME). The models are stored in Amazon S3. Each model is approximately 500 MB and is loaded on demand. Users report high latency for cold-start scenarios. What should the company do to reduce cold-start latency?

A.Reduce the instance size to increase the number of instances per unit cost.

B.Increase the number of instances in the endpoint's auto-scaling group.

C.Deploy each model on a separate endpoint to avoid concurrent loading.

D.Configure the endpoint to use a larger 'ModelCacheSize' parameter.

AnswerD

Increasing the model cache size allows more models to be cached in memory, reducing load time.

Why this answer

Option D is correct because increasing the 'ModelCacheSize' parameter allows the SageMaker multi-model endpoint to keep more models loaded in memory, reducing the frequency of cold starts where a model must be downloaded from S3 and loaded into memory. This directly addresses the latency issue by caching models that are frequently accessed, avoiding repeated loading overhead.

Exam trap

The trap here is that candidates often confuse scaling the number of instances (Option B) with improving per-request latency, but horizontal scaling does not reduce the time to load a model from S3 into memory on a given instance.

How to eliminate wrong answers

Option A is wrong because reducing instance size decreases available memory and compute resources, which can increase cold-start latency and degrade performance for loading 500 MB models. Option B is wrong because increasing the number of instances in the auto-scaling group scales the endpoint horizontally but does not reduce cold-start latency for individual model loads; it only helps with overall request throughput. Option C is wrong because deploying each model on a separate endpoint eliminates the multi-model endpoint's shared caching benefit and increases operational complexity and cost, while still requiring cold-start loading on each endpoint.

Practice this question →

34

MCQmedium

An engineer runs: aws sagemaker describe-endpoint --endpoint-name my-endpoint and receives the exhibit output. The engineer wants to update the endpoint to use a new model version stored in ECR with tag ':2'. Which step is necessary to perform the update?

A.Create a new endpoint configuration (my-endpoint-config-v2) referencing the new image, then call update-endpoint with the new config name.

B.Modify the existing endpoint configuration (my-endpoint-config-v1) to use the new image, then update the endpoint.

C.Use the update-endpoint command directly with the new image ARN.

D.Delete the endpoint and recreate it with the new model image.

AnswerA

Standard process: create new endpoint config, then update endpoint to use it.

Why this answer

Option A is correct because SageMaker endpoints are immutable with respect to their configuration; you cannot modify an existing endpoint configuration in place. To update an endpoint to use a new model version, you must create a new endpoint configuration (e.g., my-endpoint-config-v2) that points to the new ECR image tag ':2', then call update-endpoint with the new configuration name. This triggers a zero-downtime deployment where SageMaker gradually shifts traffic to the new variant.

Exam trap

The trap here is that candidates assume endpoint configurations are mutable like a text file, but AWS SageMaker enforces immutability — you must create a new configuration for any change, even a simple image tag update.

How to eliminate wrong answers

Option B is wrong because SageMaker endpoint configurations are immutable after creation; you cannot modify an existing configuration (my-endpoint-config-v1) to reference a new image — you must create a new configuration. Option C is wrong because the update-endpoint command does not accept a direct image ARN; it only accepts an endpoint configuration name, and the model image is specified within that configuration. Option D is wrong because deleting and recreating the endpoint would cause downtime and is unnecessary; SageMaker supports rolling updates via update-endpoint with a new configuration, which avoids service interruption.

Practice this question →

35

MCQmedium

A company uses SageMaker Pipelines to automate model retraining. The pipeline runs daily but sometimes fails due to data quality issues. What is the best design to handle this?

A.Add a data quality check step with Conditional to skip training if data fails.

B.Use SageMaker Debugger to monitor training.

C.Use SageMaker Model Registry to track model versions.

D.Increase the instance size for the training step.

AnswerA

A conditional step checks data quality and only proceeds to training if criteria are met, preventing failures.

Why this answer

Option A is correct because SageMaker Pipelines supports a data quality check step that can be integrated with a ConditionStep. If the data quality check fails, the ConditionStep can skip the training step entirely, preventing the pipeline from failing due to bad data. This design ensures the pipeline completes successfully (or exits gracefully) without wasting compute resources on training with invalid data.

Exam trap

The trap here is that candidates may confuse monitoring tools (Debugger) or model management (Model Registry) with pipeline orchestration and conditional logic, failing to recognize that a ConditionStep is the correct mechanism to gate execution based on data quality.

How to eliminate wrong answers

Option B is wrong because SageMaker Debugger is designed to monitor training jobs for issues like overfitting, vanishing gradients, or hardware bottlenecks, not to prevent pipeline failures caused by data quality issues before training starts. Option C is wrong because SageMaker Model Registry is used for cataloging, versioning, and approving model artifacts, not for handling data quality checks or pipeline failure prevention. Option D is wrong because increasing the instance size for the training step addresses performance or memory constraints, not data quality issues; it would not prevent the pipeline from failing if the input data is invalid.

Practice this question →

36

MCQhard

A SageMaker endpoint is failing with the exhibited error. What is the most likely cause of this error?

A.The Docker container does not have the necessary IAM role to read the model artifacts.

B.The model archive uploaded to S3 does not contain the 'classes.txt' file.

C.The inference script is referencing the wrong path for the model directory.

D.The SageMaker endpoint does not have internet access to download the model from S3.

AnswerB

If the file is missing from the tar.gz, the endpoint cannot find it.

Why this answer

The error indicates that 'classes.txt' is missing from /opt/ml/model. Most likely, the file was not included in the model archive or the archive was not extracted properly.

Practice this question →

37

MCQmedium

A company is deploying multiple models on a single endpoint to reduce costs. They need to update one model without affecting others. Which solution?

A.Use multiple single-model endpoints behind an Application Load Balancer

B.Use SageMaker Batch Transform for some models

C.Use SageMaker Multi-Model Endpoint

D.Use a SageMaker Endpoint with multiple production variants

AnswerC

Multi-model endpoints host multiple models and allow updating one model independently.

Why this answer

SageMaker Multi-Model Endpoint (MME) allows hosting multiple models on a single endpoint, sharing the underlying compute instance. When you need to update one model, you can simply upload a new model artifact (e.g., a new `model.tar.gz`) to Amazon S3, and the endpoint will automatically load the updated version on subsequent inference requests without affecting the other models currently cached or in use.

Exam trap

AWS often tests the distinction between multi-model endpoints (for hosting many models on one endpoint) and production variants (for routing traffic between versions of the same model), leading candidates to incorrectly choose option D when they need to update one model independently.

How to eliminate wrong answers

Option A is wrong because using multiple single-model endpoints behind an Application Load Balancer does not reduce costs (it increases them by requiring separate endpoints) and updating one model still requires managing each endpoint individually. Option B is wrong because SageMaker Batch Transform is designed for offline, asynchronous batch predictions on a dataset, not for real-time inference or updating models on a live endpoint. Option D is wrong because multiple production variants are used for A/B testing, canary deployments, or routing traffic between different versions of the same model, not for hosting and independently updating multiple distinct models on a single endpoint.

Practice this question →

38

MCQmedium

A data scientist runs this pipeline but the Train step fails with "ResourceLimitExceeded". What is the most likely cause?

A.The account has a limit of 0 for ml.p3.2xlarge instances.

B.The volume size is too small for training.

C.The Preprocess step did not complete successfully.

D.The training image is not accessible.

AnswerA

A zero limit or insufficient quota results in ResourceLimitExceeded.

Why this answer

The 'ResourceLimitExceeded' error indicates that the requested instance type (ml.p3.2xlarge) exceeds the account's service quota for that specific instance family. In AWS SageMaker, each account has a default limit of 0 for certain GPU instance types like ml.p3.2xlarge unless a quota increase has been requested and approved. This error occurs at the Train step because SageMaker attempts to launch the training job with an instance type that is not allowed by the current quota.

Exam trap

AWS often tests the distinction between resource limits (quotas) and other failure modes; the trap here is that candidates may confuse 'ResourceLimitExceeded' with a generic 'insufficient capacity' error, but the error specifically refers to account-level service quotas, not AWS resource availability.

How to eliminate wrong answers

Option B is wrong because volume size limits (e.g., EBS volume size) do not cause a 'ResourceLimitExceeded' error; they would result in an 'InsufficientVolumeCapacity' or 'VolumeLimitExceeded' error. Option C is wrong because if the Preprocess step had failed, the pipeline would stop at that step and the Train step would not be attempted, so the error would be a different one (e.g., 'StepFailure'). Option D is wrong because an inaccessible training image would produce an 'ImageNotFoundException' or 'AccessDeniedException', not a 'ResourceLimitExceeded' error.

Practice this question →

39

MCQhard

A company deploys a model using SageMaker and enables data capture for monitoring. After a week, they notice that the captured data is not being written to the specified S3 bucket. The endpoint is running and invocations are successful. What is the most likely cause?

A.The IAM role used for the endpoint does not have s3:PutObject permission for the capture bucket.

B.The capture bucket is in a different region.

C.The endpoint is using a multi-model endpoint which does not support data capture.

D.The DataCaptureConfig parameter in the endpoint configuration is missing the "CaptureOptions" field.

AnswerA

Without write permission, data capture fails silently.

Why this answer

The most likely cause is that the IAM role associated with the SageMaker endpoint lacks the `s3:PutObject` permission for the target S3 bucket. Without this permission, the endpoint cannot write the captured inference data to S3, even though invocations succeed because the model itself does not require S3 write access to serve predictions.

Exam trap

The trap here is that candidates often assume data capture fails due to endpoint misconfiguration (like missing CaptureOptions) or regional restrictions, when in fact the root cause is almost always an IAM permissions issue with the S3 bucket.

How to eliminate wrong answers

Option B is wrong because SageMaker data capture supports cross-region S3 buckets; the bucket can be in a different region as long as the endpoint has network access and proper permissions. Option C is wrong because multi-model endpoints fully support data capture; there is no restriction that prevents capture on multi-model endpoints. Option D is wrong because the `CaptureOptions` field is optional; if omitted, SageMaker uses default capture options (e.g., capturing both input and output).

The missing field would not prevent data from being written to S3.

Practice this question →

40

MCQmedium

Refer to the exhibit. A SageMaker endpoint is logging this error when processing inference requests that require database access. What is the most likely cause?

A.Data capture is not enabled

B.The endpoint instance type is too small

C.The model is not compatible with the instance

D.The endpoint lacks a VPC configuration with proper security groups

AnswerD

Without VPC configuration, the endpoint cannot reach resources inside a VPC.

Why this answer

The error indicates that the SageMaker endpoint cannot connect to the database when processing inference requests. This is most likely because the endpoint is not configured with a VPC that includes proper security groups and network ACLs to allow outbound traffic to the database. Without a VPC configuration, the endpoint uses the default SageMaker network, which lacks the necessary routing and security rules to reach resources in a private subnet.

Exam trap

AWS often tests the misconception that network connectivity errors are caused by instance size or model compatibility, when the actual issue is a missing or misconfigured VPC configuration for the SageMaker endpoint.

How to eliminate wrong answers

Option A is wrong because data capture is a feature for logging inference request/response payloads, not for enabling network connectivity to a database. Option B is wrong because an undersized instance type would cause performance issues like latency or out-of-memory errors, not a network connectivity failure to a database. Option C is wrong because model-instance compatibility issues typically manifest as runtime errors (e.g., 'CUDA error' or 'model loading failed'), not as a failure to establish a database connection.

Practice this question →

41

MCQeasy

A company has trained a custom model using PyTorch on Amazon SageMaker. The model achieves high accuracy, but the inference latency on a real-time endpoint is above the required 100ms SLA. The model is a large neural network with many layers. The company wants to reduce latency without significantly impacting accuracy. Which approach should the machine learning engineer take?

A.Reduce the batch size used during inference.

B.Use SageMaker Neo to compile the model for the target hardware.

C.Increase the instance size of the endpoint.

D.Implement a cache for frequent inference requests.

AnswerB

Neo applies hardware-specific optimizations that reduce latency without retraining.

Why this answer

SageMaker Neo compiles trained models into an optimized binary for the target hardware (e.g., CPU, GPU, or Inferentia). It applies graph-level optimizations, operator fusion, and quantization-aware tuning to reduce inference latency while preserving model accuracy. This directly addresses the need to lower latency below 100ms without retraining or sacrificing significant accuracy.

Exam trap

AWS often tests the misconception that simply scaling up hardware (Option C) or batching (Option A) is the primary solution for latency issues, when in fact model compilation (Option B) is the targeted optimization for inference speed without accuracy loss.

How to eliminate wrong answers

Option A is wrong because reducing batch size typically increases latency per request (due to lower hardware utilization) and does not address the fundamental computational bottleneck of a large neural network. Option C is wrong because increasing instance size may reduce latency but at higher cost and without optimizing the model itself; it does not guarantee meeting the 100ms SLA and can introduce unnecessary expense. Option D is wrong because caching only helps for repeated identical requests, not for unique or dynamic inference inputs, and does not reduce the per-inference computation time for the model.

Practice this question →

42

MCQhard

A team is deploying a TensorFlow model on a SageMaker real-time endpoint with automatic scaling. They set the scaling policy to target an average CPU utilization of 50%. However, during traffic spikes, the endpoint experiences high latency and 503 errors. The instance type is ml.c5.large. What should the team do to resolve this while minimizing cost?

A.Pre-warm the endpoint by keeping a fixed number of additional instances

B.Increase the scale-in cooldown period to avoid frequent downsizing

C.Change the instance type to a larger one like ml.c5.xlarge to handle the spikes

D.Add a scaling policy based on the number of concurrent requests per instance

AnswerD

Concurrent requests metric often provides faster and more accurate scaling for ML endpoints.

Why this answer

Option D is correct because scaling based on CPU utilization alone is often insufficient for inference workloads where latency is the primary concern. By adding a scaling policy based on the number of concurrent requests per instance, the team can proactively scale out before CPU saturation occurs, reducing latency and eliminating 503 errors. SageMaker's automatic scaling supports multiple target tracking metrics, and using concurrent requests per instance aligns more closely with the actual demand on the model serving container.

Exam trap

The trap here is that candidates assume larger instances (Option C) are the only way to handle spikes, but the exam tests understanding that scaling policies based on the right metric (concurrent requests) can be more cost-effective and responsive than simply scaling up instance size.

How to eliminate wrong answers

Option A is wrong because pre-warming with a fixed number of additional instances increases cost without adapting to variable traffic patterns, and it does not address the root cause of scaling delays during spikes. Option B is wrong because increasing the scale-in cooldown period only delays instance termination, which does not help during rapid traffic increases; it may even worsen resource waste. Option C is wrong because moving to a larger instance type (ml.c5.xlarge) increases cost per instance and still relies on CPU-based scaling, which may still lag behind sudden spikes; it does not solve the fundamental issue of scaling responsiveness.

Practice this question →

43

MCQeasy

A company wants to automate its machine learning pipeline using AWS CodePipeline and Amazon SageMaker. The pipeline should train a model, evaluate it, and if the evaluation passes, register the model in the SageMaker Model Registry. Which service should the company use to orchestrate the training and evaluation steps?

A.AWS CodePipeline

B.AWS Glue Workflows

C.AWS Step Functions

D.Amazon SageMaker Pipelines

AnswerD

SageMaker Pipelines natively supports ML steps like training, evaluation, and model registration.

Why this answer

Amazon SageMaker Pipelines is the correct choice because it is a purpose-built, fully managed service for creating end-to-end machine learning workflows directly within the SageMaker ecosystem. It natively integrates with SageMaker training jobs, processing jobs for evaluation, and the Model Registry for conditional registration, allowing the entire pipeline—train, evaluate, and conditionally register—to be defined as a directed acyclic graph (DAG) of steps without needing to stitch together separate services.

Exam trap

The trap here is that candidates may confuse AWS Step Functions (a general-purpose orchestrator) with SageMaker Pipelines (a specialized ML orchestrator), overlooking that SageMaker Pipelines provides built-in SageMaker step types and native Model Registry integration, which Step Functions lacks without custom Lambda functions.

How to eliminate wrong answers

Option A is wrong because AWS CodePipeline is a CI/CD service designed for software delivery pipelines (e.g., building, testing, deploying applications), not for orchestrating ML training and evaluation steps that require direct integration with SageMaker resources like training jobs or the Model Registry. Option B is wrong because AWS Glue Workflows are used for orchestrating ETL (extract, transform, load) jobs and data preparation tasks within AWS Glue, not for managing ML training or model evaluation workflows. Option C is wrong because while AWS Step Functions can orchestrate SageMaker API calls, it requires custom integration code and does not provide native, declarative support for SageMaker-specific steps like training, tuning, or model registration, making it less efficient and more error-prone than SageMaker Pipelines for this use case.

Practice this question →

44

MCQhard

A streaming media company uses Amazon SageMaker to host a recommendation model at a real-time endpoint. The model is updated weekly, and the team deploys new model versions using SageMaker's blue/green deployments. Recently, after a deployment, the new endpoint variant began returning HTTP 503 errors (Service Unavailable) for approximately 5 minutes before stabilizing. The deployment uses a linear transition with a 10-minute window. The old variant continues to serve traffic during the transition. The team notices that the error rate spikes right after the new variant becomes active. The endpoint is configured with two instances for each variant. Instance logs show that the new model container is taking longer than expected to load and initialize (e.g., downloading model artifacts from S3 and loading into memory). The team needs to resolve this issue without changing the model or container image. Which combination of actions should the team take to eliminate the 503 errors?

A.Switch from a linear transition to a canary transition with a 10% traffic weight for the new variant for 5 minutes before moving to 100%.

B.Increase the number of instances per variant to 4, and configure the endpoint's 'ModelDataDownloadTimeoutInSeconds' and 'ContainerHealthCheckTimeoutInSeconds' to higher values, and add a 'InferenceExecutionConfig' with a 'Mode' set to 'Serial' to allow the container a longer warm-up period.

C.Decrease the number of instances per variant from 2 to 1 to reduce the amount of model artifact downloads and speed up initialization.

D.Reduce the linear transition window from 10 minutes to 2 minutes so that the new variant becomes active faster and stabilizes quickly.

AnswerB

Increasing instances provides more capacity, and increasing timeout settings ensures that SageMaker waits longer for the container to become healthy before routing traffic, preventing 503 errors during initialization.

Why this answer

Option D is the correct course of action. Increasing the keep-alive timeout (warm-up period) ensures the new instances are fully ready before traffic is routed to them. Decreasing the batch size and increasing the number of instances per variant further reduces load and provides more capacity, helping the new variant handle traffic without errors.

Option A is incorrect because linear transition would still route traffic before instances are ready. Option B is incorrect because faster transition would worsen the issue. Option C is incorrect because reducing instances reduces capacity and may increase errors.

Practice this question →

45

MCQhard

A financial services company deploys a fraud detection model on a SageMaker real-time endpoint. The inference logic includes a pre-processing step that requires access to a DynamoDB table for user metadata. The model container is a custom Docker image. How should the team grant the endpoint access to DynamoDB?

A.Store IAM credentials in the container image as environment variables

B.Attach an IAM instance profile to the underlying EC2 instance

C.Create an IAM role with DynamoDB read access and assign it to the SageMaker endpoint as the execution role

D.Retrieve temporary credentials from AWS Secrets Manager within the container code

AnswerC

SageMaker assumes the execution role to access other AWS services.

Why this answer

Option C is correct because SageMaker endpoints require an IAM execution role to be assigned at creation time. This role defines the permissions the endpoint's container has when making AWS API calls, such as reading from DynamoDB. By attaching a policy with DynamoDB read access to this execution role, the endpoint securely obtains temporary credentials via the AWS STS service, eliminating the need to hardcode or manage long-term credentials.

Exam trap

The trap here is that candidates confuse SageMaker endpoints with EC2-based deployments and incorrectly think they need to manage instance profiles or embed credentials, when in fact SageMaker abstracts the underlying compute and uses an execution role for all API access.

How to eliminate wrong answers

Option A is wrong because storing IAM credentials as environment variables in a container image is a security anti-pattern; credentials would be baked into the image, exposed in the container's environment, and not automatically rotated. Option B is wrong because SageMaker endpoints do not run on EC2 instances that you manage; they run on SageMaker-managed infrastructure, so attaching an instance profile to an underlying EC2 instance is not applicable. Option D is wrong because while Secrets Manager can store credentials, the container code would still need permissions to access Secrets Manager itself, and the standard, simpler approach is to use the endpoint's execution role rather than managing temporary credentials manually.

Practice this question →

46

Multi-Selecthard

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

Select 3 answers

A.Deploy a shadow endpoint for comparison

B.Enable data capture on the endpoint

C.Use SageMaker Debugger for monitoring

D.Create a SageMaker Model Monitor schedule

E.Configure baseline constraints from training data

AnswersB, D, E

Data capture logs inference requests for monitoring.

Why this answer

Option B is correct because enabling data capture on the SageMaker endpoint is a prerequisite for monitoring data drift and model quality. Data capture automatically records input requests and output responses from the endpoint, which SageMaker Model Monitor later analyzes against a baseline to detect drift. Without data capture, there is no data to compare against the baseline constraints.

Exam trap

The trap here is that candidates confuse SageMaker Debugger (for training) with SageMaker Model Monitor (for inference), leading them to select Debugger instead of the correct monitoring schedule and baseline configuration.

Practice this question →

47

MCQeasy

An ML team wants to deploy a model that was trained using XGBoost in SageMaker. They want to use the built-in XGBoost algorithm container for inference. Which inference option requires the least custom code?

A.Create a custom Docker container with XGBoost and deploy to an endpoint

B.Deploy to a real-time endpoint using the built-in XGBoost container

C.Attach Elastic Inference to a generic container

D.Use SageMaker Python SDK to download the model and run local inference

AnswerB

The built-in container handles inference automatically.

Why this answer

Option B is correct because the built-in XGBoost container in SageMaker is pre-configured with the XGBoost serving stack, including the necessary inference code and dependencies. Deploying a model trained with XGBoost to a real-time endpoint using this container requires no custom inference script or Docker image, only the model artifact and endpoint configuration. This minimizes custom code to just the SageMaker SDK calls for creating the model and endpoint.

Exam trap

AWS often tests the misconception that Elastic Inference can accelerate any ML model, but it is specifically designed for deep learning models and does not apply to tree-based algorithms like XGBoost.

How to eliminate wrong answers

Option A is wrong because creating a custom Docker container with XGBoost introduces unnecessary custom code and maintenance overhead, whereas the built-in container already provides the same functionality. Option C is wrong because Elastic Inference is an acceleration technology for deep learning models (e.g., TensorFlow, PyTorch) and is not compatible with XGBoost, which is a gradient boosting framework; attaching it to a generic container would not reduce custom code and would be architecturally incorrect. Option D is wrong because using the SageMaker Python SDK to download the model and run local inference moves the inference workload outside of SageMaker's managed infrastructure, requiring custom orchestration code and defeating the purpose of a managed deployment.

Practice this question →

48

MCQhard

A financial services company is deploying a credit risk model using SageMaker. They require that the model always uses the latest approved version from the Model Registry. They also need to maintain a detailed audit trail of all model version transitions (e.g., from PendingApproval to Approved). The deployment should be fully automated and must roll back immediately if the new model's error rate exceeds the old model's error rate by more than 2% during a canary deployment. Which solution meets these requirements with the least custom code?

A.Use AWS CodePipeline with a deployment action that uses AWS CloudFormation to update the endpoint. Add a manual approval step for rollback.

B.Use SageMaker Pipelines with a conditional step to deploy the model after approval, and include a canary deployment using a weight endpoint variant. Use CloudWatch alarms to trigger automatic rollback.

C.Create an AWS Lambda function that is triggered by Model Registry events, deploys the model to a staging endpoint, runs a canary test, and if successful, updates the production endpoint.

D.Use an Amazon EKS cluster with a custom inference container and use ArgoCD for automated deployments.

AnswerB

Pipelines natively integrate with Model Registry, conditional logic, and CloudWatch for automated canary and rollback.

Why this answer

Option B is correct because SageMaker Pipelines natively supports conditional execution and canary deployments using endpoint weight variants, which together enable automated rollback triggered by CloudWatch alarms when the error rate exceeds the 2% threshold. This approach requires minimal custom code by leveraging built-in SageMaker capabilities for model registry integration, deployment, and monitoring.

Exam trap

The trap here is that candidates may overcomplicate the solution by choosing custom Lambda or Kubernetes options, missing that SageMaker Pipelines provides a fully managed, code-minimal way to orchestrate canary deployments with automated rollback via CloudWatch alarms.

How to eliminate wrong answers

Option A is wrong because it relies on a manual approval step for rollback, which violates the requirement for fully automated rollback; CloudFormation alone does not provide canary deployment or automatic error-rate comparison. Option C is wrong because it requires custom Lambda code to handle model registry events, canary testing, and endpoint updates, which contradicts the 'least custom code' requirement; SageMaker Pipelines already provides these capabilities natively. Option D is wrong because using Amazon EKS with ArgoCD introduces unnecessary complexity and custom infrastructure, and does not integrate directly with SageMaker Model Registry or provide built-in canary deployment with error-rate-based rollback.

Practice this question →

49

Multi-Selecthard

A team is deploying a model using SageMaker Pipelines. They have defined a pipeline with steps: preprocessing, training, evaluation, and conditional registration. The evaluation step produces a JSON file with metrics. If accuracy > 0.9, the model is registered; else, the pipeline fails. Which TWO statements about this pipeline are correct? (Choose TWO.)

Select 2 answers

A.The evaluation step must output a JSON file in a specific format to be used by the condition step.

B.The condition step can reference the accuracy value using a pipeline parameter or property file.

C.The conditional step should be implemented as a separate Lambda function called from the pipeline.

D.The pipeline will automatically retry the training step if the condition fails.

E.The model registration step should be placed before the condition step to ensure the model is always registered.

AnswersA, B

SageMaker Pipelines expects the evaluation metrics in a JSON file for condition evaluation.

Why this answer

Option A is correct because the SageMaker Pipelines condition step expects the evaluation step to output a JSON file with a specific format, typically containing a metrics dictionary. The condition step then uses a property file to extract the accuracy value from that JSON, enabling the conditional logic to evaluate whether accuracy > 0.9.

Exam trap

The trap here is that candidates may confuse the built-in ConditionStep with a Lambda-based custom step, or assume that pipeline failure triggers automatic retries, when in fact SageMaker Pipelines requires explicit retry policies and does not retry on condition failures.

Practice this question →

50

MCQmedium

A team has a large number of models that need to be deployed for batch inference weekly. They want to minimize cost and management overhead. Which approach is MOST efficient?

A.Use SageMaker Pipelines to run inference as part of the pipeline.

B.Use SageMaker Batch Transform with separate jobs for each model.

C.Create a single SageMaker endpoint for all models and update the model periodically.

D.Deploy each model to a separate SageMaker endpoint and delete after use.

AnswerB

Batch Transform jobs are ephemeral and cost-effective for batch workloads.

Why this answer

SageMaker Batch Transform is the most efficient approach for weekly batch inference because it automatically provisions and terminates compute resources for each job, minimizing cost and management overhead. Running separate jobs for each model allows independent scaling and avoids the complexity of managing persistent endpoints or multi-model hosting for batch workloads.

Exam trap

AWS often tests the distinction between batch and real-time inference, where candidates mistakenly choose persistent endpoints (Option C or D) for batch workloads, overlooking that Batch Transform is purpose-built for cost-efficient, ephemeral batch processing.

How to eliminate wrong answers

Option A is wrong because SageMaker Pipelines is an orchestration service for building and managing ML workflows, not optimized for running batch inference; using it for inference would add unnecessary complexity and cost without the automatic resource teardown of Batch Transform. Option C is wrong because a single endpoint for all models would require frequent model updates and cannot efficiently handle batch inference at scale, leading to idle costs and management overhead. Option D is wrong because deploying each model to a separate endpoint and deleting after use incurs significant provisioning delays and cost for endpoint creation/teardown, whereas Batch Transform handles this automatically with managed instances.

Practice this question →

51

MCQeasy

A company uses Amazon SageMaker to train and deploy machine learning models. They need to run batch predictions on 10 TB of data stored in Amazon S3 every night. The model is a PyTorch neural network that fits in GPU memory. The predictions are not time-sensitive, but the job must complete within 8 hours. Which approach would be the MOST cost-effective?

A.Use SageMaker processing job with a script to load the model and run inference.

B.Create a real-time endpoint and send all data as a large batch.

C.Use multiple ml.c5.4xlarge instances in a batch transform job with custom partitioning.

D.Use SageMaker batch transform with a single ml.p3.2xlarge instance.

AnswerD

A single GPU instance can handle the workload within 8 hours, minimizing cost. Batch transform is designed for high-throughput inference.

Why this answer

Option D is the most cost-effective because SageMaker batch transform with a single ml.p3.2xlarge instance provides GPU acceleration for the PyTorch neural network, which fits in GPU memory, and can process 10 TB of data within 8 hours. Batch transform automatically handles data partitioning and inference, eliminating the need for custom orchestration, and the single instance avoids the overhead and cost of multiple instances. The ml.p3.2xlarge offers a balance of GPU compute and cost, making it ideal for non-time-sensitive nightly batch jobs.

Exam trap

AWS often tests the misconception that multiple CPU instances are more cost-effective than a single GPU instance for batch inference, but the trap here is that GPU acceleration dramatically reduces processing time and instance count for neural networks, making a single GPU instance cheaper overall than a cluster of CPU instances.

How to eliminate wrong answers

Option A is wrong because SageMaker processing jobs are designed for data preprocessing and postprocessing, not optimized for running inference on large datasets; they lack built-in inference features like automatic data partitioning and model loading, leading to higher development effort and potential inefficiency. Option B is wrong because real-time endpoints are intended for low-latency, synchronous requests and are not designed for batch processing; sending 10 TB of data as a large batch would overwhelm the endpoint, cause timeouts, and incur high costs due to per-inference pricing and idle time. Option C is wrong because using multiple ml.c5.4xlarge instances (CPU-only) for a GPU-optimized PyTorch neural network would be significantly slower and more expensive per inference compared to a single GPU instance, as CPU instances lack the parallel processing power needed for neural network inference, and custom partitioning adds unnecessary complexity.

Practice this question →

52

MCQeasy

A data science team needs to deploy a PyTorch model for real-time inference with low latency. The model requires GPU acceleration. Which SageMaker endpoint configuration should they use?

A.Create a multi-model endpoint using ml.m5.large instances

B.Create a serverless endpoint with memory set to 6144 MB

C.Create a batch transform job using an ml.c5.xlarge instance

D.Create a real-time endpoint using an ml.p3.2xlarge instance

AnswerD

Real-time endpoints support GPU instances for low-latency inference.

Why this answer

Option D is correct because real-time SageMaker endpoints with GPU instances like ml.p3.2xlarge are specifically designed for low-latency, synchronous inference with GPU acceleration. PyTorch models requiring GPU must use instance types that support NVIDIA CUDA, and the ml.p3 family provides the necessary GPU compute for real-time predictions.

Exam trap

The trap here is that candidates may confuse batch transform jobs or serverless endpoints with real-time inference, overlooking the explicit GPU requirement and the need for persistent, low-latency compute resources.

How to eliminate wrong answers

Option A is wrong because multi-model endpoints using ml.m5.large instances are CPU-based and lack GPU acceleration, making them unsuitable for PyTorch models that require GPU for low-latency inference. Option B is wrong because serverless endpoints do not support GPU acceleration; they are limited to CPU compute and cannot meet the GPU requirement. Option C is wrong because batch transform jobs are designed for asynchronous, offline inference on large datasets, not for real-time, low-latency predictions.

Practice this question →

53

MCQmedium

An ML team uses SageMaker Model Registry to manage model versions. They want to automatically deploy a model to a staging endpoint when a new version is approved. Which AWS service can orchestrate this?

A.Amazon EventBridge

B.AWS Lambda

C.SageMaker Pipelines

D.AWS Step Functions

AnswerD

Step Functions can coordinate multiple steps including approval and deployment.

Why this answer

AWS Step Functions is the correct choice because it can orchestrate a workflow that triggers on a Model Registry event (e.g., model version approval) and then deploys the model to a staging endpoint using SageMaker SDK calls. Step Functions provides built-in integration with SageMaker via service integrations, allowing you to chain approval checks, model creation, and endpoint deployment without custom code.

Exam trap

The trap here is that candidates often pick SageMaker Pipelines (Option C) because it is associated with model workflows, but Pipelines is for training and registration, not for post-approval deployment orchestration, which requires a state machine like Step Functions.

How to eliminate wrong answers

Option A is wrong because Amazon EventBridge can detect the approval event but cannot directly orchestrate the deployment workflow; it would need to invoke another service like Step Functions or Lambda to perform the deployment steps. Option B is wrong because AWS Lambda can execute deployment logic but lacks native workflow orchestration features like retries, branching, or state management, making it less suitable for multi-step orchestration. Option C is wrong because SageMaker Pipelines is designed for building, training, and registering ML models, not for orchestrating post-approval deployment to a staging endpoint; it does not natively trigger on Model Registry approval events.

Practice this question →

54

Multi-Selecthard

An organization is deploying a large language model on SageMaker and needs to optimize inference costs while maintaining low latency. Which three strategies should they consider? (Select THREE.)

Select 3 answers

A.Use SageMaker Inference Recommender to find optimal instance and configuration.

B.Enable SageMaker Model Parallelism for inference.

C.Use SageMaker Elastic Inference to attach GPU acceleration.

D.Deploy the model to a multi-model endpoint.

E.Use SageMaker Batch Transform for real-time requests.

AnswersA, C, D

Inference Recommender provides cost-performance recommendations.

Why this answer

A is correct because SageMaker Inference Recommender runs load tests against your model to recommend the most cost-effective instance type and configuration (e.g., instance count, container parameters) that meets your latency and throughput requirements. This eliminates guesswork and ensures you are not over-provisioning or under-provisioning resources, directly optimizing inference costs.

Exam trap

AWS often tests the distinction between training parallelism (Model Parallelism) and inference optimization, leading candidates to incorrectly select Model Parallelism for inference cost savings.

Practice this question →

55

Multi-Selectmedium

A company wants to deploy a PyTorch model on SageMaker for real-time inference. Which two steps are required? (Select TWO.)

Select 2 answers

A.Upload the training data to an S3 bucket.

B.Register the model in the SageMaker Model Registry.

C.Package the model artifacts into a tar.gz file.

D.Create a SageMaker endpoint configuration with the desired instance type.

E.Set up a SageMaker Notebook instance.

AnswersC, D

SageMaker expects model artifacts in a tar.gz format.

Why this answer

Option C is correct because SageMaker requires model artifacts to be packaged as a single tar.gz file (containing the model weights, serialized PyTorch model, and any dependencies) for deployment. This compressed archive is uploaded to S3 and referenced when creating the model object for real-time inference.

Exam trap

The trap here is that candidates often confuse the optional Model Registry step (B) as mandatory for deployment, or mistakenly think uploading training data (A) is needed for inference, when in fact only the model artifact packaging (C) and endpoint configuration (D) are the two required steps for real-time inference on SageMaker.

Practice this question →

56

MCQeasy

An ML engineer needs to deploy a model as an AWS Lambda function for serverless inference. The model is a scikit-learn pipeline serialized as a pickle file. What is the best way to include the model in the Lambda deployment?

A.Create a Lambda layer with the model file and use it in the function

B.Use API Gateway to proxy requests to the model stored in S3

C.Store the model in S3 and download it on every invocation

D.Mount an EFS file system containing the model

AnswerA

A layer allows the model to be included without increasing the function code size.

Why this answer

Option A is correct because Lambda layers allow you to package and include large dependencies, such as a serialized scikit-learn pipeline, separately from your function code. Layers are extracted into the /opt directory and are available across function invocations without cold-start overhead from downloading, making them the most efficient and best-practice approach for bundling static model artifacts in serverless inference.

Exam trap

The trap here is that candidates may think downloading from S3 on every invocation (Option C) is acceptable for serverless, but they overlook the severe cold-start latency and cost implications, or they confuse API Gateway's role as a proxy (Option B) without realizing it still needs a compute backend.

How to eliminate wrong answers

Option B is wrong because API Gateway is a front-end service for creating RESTful APIs; it cannot proxy requests directly to a model stored in S3 — you would still need a compute layer (like Lambda) to load the model and run inference. Option C is wrong because downloading the model from S3 on every invocation introduces significant latency and cost, and may cause timeouts or throttling under load; models should be loaded once and reused across invocations. Option D is wrong because mounting an EFS file system adds complexity, cost, and potential cold-start delays, and is overkill for a static pickle file that can be included directly in a layer; EFS is better suited for large, dynamic datasets that need concurrent access across multiple functions.

Practice this question →

57

MCQeasy

A data scientist needs to version and manage multiple models for a team of five. The team frequently experiments with different algorithms and hyperparameters. They need a centralized registry to store, deploy, and compare model versions. Which AWS service should the data scientist use?

A.Store each model artifact in Amazon S3 with manual versioning in the key name.

B.Use AWS Config to track model version changes.

C.Use AWS CodeArtifact to store model packages.

D.Use Amazon SageMaker Model Registry.

AnswerD

Model Registry provides centralized version control, metadata, and stage transitions (Draft, Approved, Deployed).

Why this answer

Amazon SageMaker Model Registry is the correct choice because it provides a centralized repository specifically designed for cataloging, versioning, approving, and deploying machine learning models. It integrates natively with SageMaker pipelines and endpoints, enabling the team to compare model versions, manage metadata (e.g., hyperparameters, metrics), and promote models through stages (e.g., from staging to production) with approval workflows.

Exam trap

The trap here is that candidates confuse AWS CodeArtifact (a package manager for code libraries) with a model registry, overlooking that SageMaker Model Registry is purpose-built for ML model versioning, metadata tracking, and deployment orchestration.

How to eliminate wrong answers

Option A is wrong because manual versioning in S3 key names lacks built-in model metadata tracking, approval workflows, and deployment integration, making it error-prone and unscalable for a team of five. Option B is wrong because AWS Config is a service for auditing and evaluating resource configurations (e.g., compliance rules), not for versioning or managing ML model artifacts. Option C is wrong because AWS CodeArtifact is a package management service for software libraries (e.g., Python packages, Maven artifacts), not for storing and versioning trained ML model artifacts or their metadata.

Practice this question →

58

MCQhard

Your team manages a SageMaker real-time endpoint for a financial services application that requires low latency for fraud detection. The model is a 1 GB XGBoost model. The endpoint is deployed on two ml.m5.xlarge instances with target tracking auto-scaling based on average CPU utilization at 70%. During peak hours, the endpoint receives a sudden burst of traffic that increases from 500 requests per second to 2000 requests per second within 30 seconds. Many requests start failing with 503 errors. The CPU utilization metric shows that the instances are at 90% before the scaling policy launches new instances. However, by the time the new instances are added (approximately 3 minutes), the burst has subsided. You need to prevent these failures during future bursts while keeping costs reasonable. Which action would be MOST effective?

A.Reduce the target tracking scaling metric to 45% CPU utilization and set a warm-up time of 120 seconds.

B.Change the scaling policy to step scaling with a lower cooldown (60 seconds) and add an alarm on invocation count.

C.Replace the two m5.xlarge instances with one m5.2xlarge instance and keep the same scaling policy.

D.Implement scheduled scaling to add two instances 5 minutes before the expected peak hour.

AnswerA

Lowering the threshold triggers scaling earlier, and warm-up ensures new instances are ready before receiving traffic.

Why this answer

Option A is correct because reducing the target tracking scaling metric to 45% CPU utilization triggers scaling actions earlier, before the burst pushes CPU to 90%. Setting a warm-up time of 120 seconds ensures new instances are fully initialized and ready to serve traffic, preventing the 503 errors caused by the 3-minute lag in instance availability.

Exam trap

AWS often tests the misconception that reducing the scaling metric threshold or changing scaling types (e.g., step scaling) alone can solve latency-related failures, when the real bottleneck is the time required for new instances to become fully operational (warm-up time).

How to eliminate wrong answers

Option B is wrong because step scaling with a lower cooldown (60 seconds) does not address the root cause: the scaling action still takes ~3 minutes to launch new instances, and reducing cooldown only affects how quickly subsequent scaling actions can occur, not the initial delay. Option C is wrong because replacing two m5.xlarge instances with one m5.2xlarge instance reduces total compute capacity (from 8 vCPUs to 4 vCPUs), making the endpoint more vulnerable to bursts and increasing the likelihood of 503 errors. Option D is wrong because scheduled scaling adds instances 5 minutes before the expected peak hour, but the burst is unpredictable and occurs within 30 seconds, so scheduled scaling cannot react to sudden, unplanned traffic spikes.

Practice this question →

59

MCQmedium

A machine learning engineer has configured a SageMaker Model Monitor schedule for data quality monitoring as shown in the exhibit. The schedule is set to run hourly. However, the engineer notices that the monitoring jobs are not producing output in the specified S3 bucket. What is the most likely cause?

A.The output_path is incorrectly placed; it should be under the MonitoringOutputConfig.

B.The DataAnalysisStartTime and DataAnalysisEndTime are set to a past date, so no data is analyzed.

C.The MonitoringType should be 'ModelQuality' to enable data quality monitoring.

D.The cron expression is incorrectly formatted for an hourly schedule.

AnswerB

The monitoring job looks for data within the specified time range; if it's in the past and no data exists, no output is produced.

Why this answer

Option B is correct because the DataAnalysisStartTime and DataAnalysisEndTime parameters define the time window for which SageMaker Model Monitor analyzes data. When both are set to a past date that has already passed, the monitoring job finds no new data to analyze within that window, resulting in no output being written to the S3 bucket. The schedule runs hourly, but the analysis window is fixed to a historical period, so each execution produces no results.

Exam trap

The trap here is that candidates often overlook the significance of the DataAnalysisStartTime and DataAnalysisEndTime parameters, assuming they are optional or default to the current time, when in fact they strictly define the data range and can cause silent failures if set to a past date.

How to eliminate wrong answers

Option A is wrong because the output_path is correctly placed under the MonitoringOutputConfig in the exhibit; SageMaker Model Monitor requires the output location to be specified within MonitoringOutputConfig, not as a separate top-level parameter. Option C is wrong because the MonitoringType should be 'DataQuality' for data quality monitoring, not 'ModelQuality'; 'ModelQuality' is used for model quality monitoring (e.g., accuracy, precision), which is a different monitoring type. Option D is wrong because the cron expression 'cron(0 * * * ? *)' is correctly formatted for an hourly schedule at the start of each hour; there is no syntax error in the expression.

Practice this question →

60

MCQeasy

A company deploys a deep learning model to a real-time SageMaker endpoint. After deployment, users report high inference latency. Which action is the MOST effective first step to reduce latency?

A.Switch to a larger instance type with more GPU memory.

B.Compile the model using SageMaker Neo to optimize for the target instance.

C.Enable SageMaker Model Monitor to capture inference data.

D.Increase the number of instances in the endpoint to handle more requests.

AnswerB

Neo optimizes the model for the specific hardware, reducing inference latency with minimal accuracy loss.

Why this answer

SageMaker Neo compiles the trained model to optimize it for the target instance hardware, reducing inference latency without requiring additional resources. This is the most effective first step because it directly addresses model execution efficiency, often yielding significant speedups for deep learning models.

Exam trap

The trap here is that candidates often confuse latency reduction with throughput improvement, incorrectly choosing horizontal scaling (Option D) or vertical scaling (Option A) as the first step, when model optimization via compilation is the most direct and cost-effective approach.

How to eliminate wrong answers

Option A is wrong because switching to a larger instance type with more GPU memory may reduce latency if the model is memory-bound, but it is not the most effective first step—it increases cost and does not address software-level inefficiencies. Option C is wrong because SageMaker Model Monitor is used for capturing inference data to detect data drift and model quality issues, not for reducing latency. Option D is wrong because increasing the number of instances (horizontal scaling) improves throughput and handles more concurrent requests, but it does not reduce the latency of individual inference requests; it may even add network overhead.

Practice this question →

61

MCQmedium

A company uses SageMaker Pipelines to train and register models. They want to automate the deployment of approved models from the model registry to a staging endpoint. Which service should they use to orchestrate the deployment workflow?

A.AWS Step Functions

B.AWS CloudFormation

C.Amazon EventBridge

D.AWS CodePipeline

AnswerA

Step Functions can orchestrate SageMaker API calls and integrate with Model Registry.

Why this answer

AWS Step Functions is the correct choice because it is a serverless orchestration service designed to coordinate multiple AWS services into flexible, event-driven workflows. For SageMaker Pipelines, Step Functions can trigger model deployment from the registry to a staging endpoint by chaining actions like invoking a Lambda function for approval checks, calling SageMaker's CreateEndpoint API, and handling rollback logic on failure.

Exam trap

AWS often tests the distinction between orchestration (Step Functions) and event routing (EventBridge) or CI/CD (CodePipeline), leading candidates to pick EventBridge because they confuse event-driven triggers with the need for sequential workflow coordination.

How to eliminate wrong answers

Option B (AWS CloudFormation) is wrong because it is an Infrastructure as Code (IaC) service for provisioning and managing AWS resources declaratively, not for orchestrating event-driven deployment workflows with conditional logic and error handling. Option C (Amazon EventBridge) is wrong because it is a serverless event bus for routing events between services, but it lacks built-in workflow orchestration capabilities like sequencing, branching, and human approval steps required for deployment pipelines. Option D (AWS CodePipeline) is wrong because it is a CI/CD service focused on source code build, test, and deploy stages, but it does not natively integrate with SageMaker model registry approval workflows or provide the granular orchestration needed for ML model deployment from registry to endpoint.

Practice this question →

62

MCQmedium

A media company uses SageMaker to host a real-time video recommendation model. The model is deployed on a single ml.c5.xlarge endpoint. During a major live event, traffic surges to 10 times the normal load, and the endpoint becomes unresponsive, causing high latency and errors. The team had set up an Application Auto Scaling target tracking policy based on CPU utilization with a target of 70%. However, scaling did not trigger quickly enough. After the event, the team reviews CloudWatch metrics and notices that CPU utilization never exceeded 70% during the surge, but memory utilization peaked at 95%. The model is memory-bound. The team wants to ensure the endpoint scales automatically before performance degrades during future events. What should the team do?

A.Change the target tracking metric to memory utilization and set a target of 70%

B.Increase the target CPU utilization to 90% so that scaling triggers at higher load

C.Change the endpoint instance type to ml.c5.4xlarge to provide more memory per instance

D.Create a scheduled scaling policy to add instances during the known event time

AnswerA

Memory is the bottleneck; scaling on memory utilization will trigger before memory runs out.

Why this answer

Option A is correct because the model is memory-bound, and the current CPU-based target tracking policy failed to trigger scaling since CPU utilization never exceeded 70% during the surge. By switching to a memory utilization metric with a target of 70%, scaling will activate based on the actual resource constraint (memory), preventing performance degradation before the endpoint becomes unresponsive.

Exam trap

The trap here is that candidates assume CPU utilization is always the correct metric for scaling, but the question explicitly states the model is memory-bound, so the scaling policy must match the actual bottleneck to be effective.

How to eliminate wrong answers

Option B is wrong because increasing the CPU target to 90% does not address the root cause: CPU utilization never exceeded 70% during the surge, so the policy would still not trigger scaling. Option C is wrong because changing to a larger instance type (ml.c5.4xlarge) provides more memory per instance but does not enable automatic scaling; the endpoint would still be a single instance and could become overwhelmed under similar traffic spikes. Option D is wrong because a scheduled scaling policy assumes predictable event timing, but the question describes a major live event where the timing may be known; however, the team wants a reactive scaling mechanism that triggers automatically before performance degrades, not a pre-scheduled one that may not align with actual traffic patterns.

Practice this question →

63

MCQeasy

A data science team uses SageMaker notebooks to develop models. They want to automate the process of training and registering models whenever new data arrives in an S3 bucket. The team has limited DevOps experience and needs a solution that requires minimal maintenance. Which approach should the team use?

A.Configure an S3 event notification to trigger an AWS Step Functions state machine that runs a SageMaker Pipeline.

B.Use AWS Glue to detect new data and trigger a SageMaker training job via a Lambda function.

C.Write a Python script that runs on a scheduled EC2 instance to check S3 for new data and trigger training.

D.Use Amazon EventBridge to schedule a SageMaker training job every hour, regardless of whether new data exists.

AnswerA

Step Functions orchestrates training and model registration serverlessly, triggered by new data.

Why this answer

Option A is correct because S3 event notifications can directly trigger an AWS Step Functions state machine, which orchestrates a SageMaker Pipeline to automate model training and registration when new data arrives. This serverless approach requires minimal maintenance and aligns with the team's limited DevOps experience, as Step Functions handles retries, error handling, and workflow coordination without custom infrastructure.

Exam trap

The trap here is that candidates often choose a scheduled approach (Option D) or a Lambda-based trigger (Option B) because they seem simpler, but the exam tests the ability to select the fully managed, event-driven orchestration (Step Functions + SageMaker Pipeline) that minimizes operational burden while ensuring conditional execution based on new data.

How to eliminate wrong answers

Option B is wrong because AWS Glue is primarily an ETL service, not designed to detect new S3 objects; using it for this purpose adds unnecessary complexity and cost, and the Lambda trigger for training jobs would still require custom orchestration. Option C is wrong because running a Python script on a scheduled EC2 instance introduces manual maintenance overhead (patching, scaling, monitoring) and violates the 'minimal maintenance' requirement. Option D is wrong because scheduling a training job every hour with EventBridge ignores the condition of new data, leading to wasteful training runs and potential model versioning issues when no new data exists.

Practice this question →

64

Multi-Selectmedium

A company uses SageMaker Pipelines for model training and wants to incorporate model evaluation before deployment into production. Which THREE components are essential? (Choose three.)

Select 3 answers

A.A model registry approval step

B.A batch transform step for evaluation

C.A condition step in the pipeline

D.A human review step

E.A SageMaker Processing step for evaluation

AnswersA, C, E

Approval step creates a model version with approval status to gate deployment.

Why this answer

A model registry approval step is essential because it gates the deployment of a model based on its evaluation results. In SageMaker Pipelines, you register the model to the Model Registry after training, and the approval status (e.g., Approved or Rejected) determines whether downstream deployment steps execute. This ensures only models meeting quality thresholds are promoted to production.

Exam trap

The trap here is that candidates confuse batch transform (used for inference) with model evaluation (which requires a Processing step to compute metrics), and they overlook that a condition step is the core decision-making component, not a human review step.

Practice this question →

65

MCQmedium

A company uses Amazon SageMaker Pipelines to automate its ML workflow. The pipeline includes a training step and a model evaluation step. If the evaluation step fails, the pipeline should stop and notify the team. How should the company configure the pipeline?

A.Define a ConditionStep that checks the evaluation metric and fail the pipeline if the metric is below a threshold.

B.Use Amazon SageMaker Model Monitor to detect failures in the evaluation step.

C.Create an AWS Step Function state machine that monitors the pipeline and stops it on failure.

D.Configure an Amazon CloudWatch alarm on the evaluation step's execution time to stop the pipeline.

AnswerA

A ConditionStep can be used to evaluate metrics and fail the pipeline if conditions are not met.

Why this answer

Option A is correct because SageMaker Pipelines natively supports a ConditionStep that can evaluate a metric (e.g., model accuracy) and branch the pipeline execution. By configuring the ConditionStep to check if the evaluation metric falls below a threshold, you can explicitly fail the pipeline and trigger a notification (e.g., via SNS) when the condition is not met. This is the idiomatic, pipeline-native way to halt execution on evaluation failure without external dependencies.

Exam trap

The trap here is that candidates confuse SageMaker Pipelines' built-in conditional branching (ConditionStep) with external monitoring services like Model Monitor or Step Functions, assuming that pipeline failures must be handled outside the pipeline itself.

How to eliminate wrong answers

Option B is wrong because Amazon SageMaker Model Monitor is designed for detecting data drift and model quality degradation in production endpoints, not for halting a pipeline execution step. Option C is wrong because while AWS Step Functions can orchestrate SageMaker Pipelines, creating a separate state machine to monitor and stop the pipeline adds unnecessary complexity and latency; the pipeline itself should handle conditional failures internally. Option D is wrong because a CloudWatch alarm on execution time would only stop the pipeline based on a timeout, not on the actual evaluation metric result, and it cannot directly fail the pipeline step based on model performance.

Practice this question →

66

MCQeasy

A team wants to apply a custom container for inference on SageMaker. The container needs to implement a web server that responds to API requests. Which protocol and port must the container listen on to be compatible with SageMaker hosting?

A.The container must listen on port 8080 and use HTTPS protocol.

B.The container must listen on port 8080 and use HTTP protocol.

C.The container can listen on any port as long as the port is specified in the endpoint configuration.

D.The container must listen on port 8000 and use HTTP protocol.

AnswerB

SageMaker expects HTTP on port 8080 for /invocations and /ping.

Why this answer

SageMaker requires custom inference containers to listen on port 8080 and communicate over HTTP (not HTTPS). The SageMaker hosting service uses a proxy that terminates HTTPS and forwards plain HTTP requests to the container on port 8080. This ensures compatibility with the built-in model serving infrastructure.

Exam trap

The trap here is that candidates assume SageMaker requires HTTPS for security, but the service actually handles encryption externally, so the container must use plain HTTP on port 8080.

How to eliminate wrong answers

Option A is wrong because SageMaker's proxy handles TLS termination, so the container must use HTTP, not HTTPS; using HTTPS would cause a protocol mismatch and connection failure. Option C is wrong because SageMaker mandates port 8080 for custom containers; the endpoint configuration does not allow overriding this port. Option D is wrong because the required port is 8080, not 8000; port 8000 is not recognized by SageMaker's hosting proxy.

Practice this question →

67

Multi-Selectmedium

An ML team is running multiple SageMaker endpoints for various models. The monthly cost is higher than expected. Which TWO actions would help reduce costs without negatively impacting performance?

Select 2 answers

A.Consolidate multiple small models into a single Multi-Model Endpoint on a larger instance.

B.Increase the number of minimum instances to handle traffic spikes without scaling.

C.Right-size the instances by analyzing CloudWatch metrics and reducing instance size for underutilized endpoints.

D.Limit the maximum number of concurrent invocations per endpoint.

E.Use a scheduled scaling to turn off endpoints during non-business hours.

AnswersA, C

Multi-Model Endpoints reduce cost by sharing an instance among multiple models.

Why this answer

Option A is correct because SageMaker Multi-Model Endpoints allow you to host multiple small models on a single endpoint behind a common serving container, sharing the underlying instance resources. This reduces the number of endpoints and instances needed, lowering costs without degrading performance, as models are loaded and unloaded dynamically based on traffic.

Exam trap

The trap here is that candidates may confuse cost reduction with availability or scaling strategies, incorrectly assuming that reducing instance count or limiting concurrency is always beneficial, without considering the impact on performance or the specific capabilities of SageMaker Multi-Model Endpoints.

Practice this question →

68

MCQhard

A company is deploying a large model (10GB) for real-time inference. The inference latency is too high. What optimization technique can help?

A.Increase the endpoint's memory allocation

B.Switch to a batch transform job

C.Use SageMaker Neo to compile the model for the target instance

D.Reduce the model size by quantization

AnswerC

Neo optimizes the model for inference speed on specific hardware.

Why this answer

SageMaker Neo compiles the model to optimize it for the target instance hardware, reducing inference latency without sacrificing accuracy. This is especially effective for large models (e.g., 10GB) where runtime performance gains come from hardware-specific optimizations like instruction set tuning and memory access pattern improvements.

Exam trap

The trap here is that candidates often assume quantization (Option D) is the only way to reduce latency for large models, but they overlook SageMaker Neo's compilation, which optimizes without accuracy loss and is specifically designed for deployment scenarios.

How to eliminate wrong answers

Option A is wrong because increasing memory allocation may help with out-of-memory errors but does not directly reduce inference latency; latency is more dependent on compute efficiency and model size. Option B is wrong because batch transform jobs are designed for offline, asynchronous processing, not real-time inference, and switching to batch would increase latency due to queuing and processing delays. Option D is wrong because quantization reduces model size and can improve latency, but it may degrade accuracy and is not a SageMaker-specific optimization; SageMaker Neo provides a more targeted, hardware-aware compilation that preserves accuracy while reducing latency.

Practice this question →

69

MCQmedium

A company is deploying a multi-model endpoint using SageMaker to serve multiple models from a single endpoint. They notice that one model consumes excessive memory and impacts others. What is the BEST practice to isolate resource usage?

A.Configure instance type with more memory.

B.Use separate endpoints for each model.

C.Use SageMaker Model Parallelism.

D.Use multi-model endpoint with model cache size limit.

AnswerB

Separate endpoints provide complete isolation of compute resources.

Why this answer

Option B is correct because using separate endpoints for each model ensures complete resource isolation at the instance level. When one model consumes excessive memory, it cannot impact others because each model runs on its own dedicated endpoint with its own compute resources. This is the best practice for isolating resource usage in production environments where memory-intensive models are deployed.

Exam trap

The trap here is that candidates often assume multi-model endpoints are designed for resource isolation, but in reality they share memory and compute, so the correct answer is to use separate endpoints for strict isolation.

How to eliminate wrong answers

Option A is wrong because simply configuring an instance type with more memory does not isolate resource usage; all models on the same multi-model endpoint still share the same memory pool, so a memory spike in one model can still starve others. Option C is wrong because SageMaker Model Parallelism is designed for splitting large models across multiple GPUs for training, not for isolating resource usage during inference on a multi-model endpoint. Option D is wrong because setting a model cache size limit only controls how many models are cached in memory, but does not prevent a single model from consuming excessive memory once loaded; the memory usage of an individual model is not capped by this setting.

Practice this question →

70

MCQeasy

A data scientist wants to automate retraining of a model weekly and deploy the new model automatically after passing validation. Which AWS service combination is best?

A.SageMaker Pipelines + AWS Step Functions

B.Amazon EventBridge + SageMaker training job

C.Amazon SageMaker Autopilot

D.AWS Lambda + SageMaker training job

AnswerA

SageMaker Pipelines manages training and validation, Step Functions can orchestrate deployment on approval.

Why this answer

SageMaker Pipelines orchestrates the ML workflow including training and validation, and Step Functions can trigger deployment. SageMaker alone lacks native scheduling, and Lambda cannot orchestrate complex workflows.

Practice this question →

71

MCQmedium

A machine learning team is deploying a fraud detection model using SageMaker. They use the SageMaker Model Registry to track model versions. They want to automatically deploy the latest approved model to a production endpoint whenever a new model version is approved. The team uses a CI/CD pipeline with AWS CodePipeline. The pipeline currently includes a source stage (S3), a build stage (CodeBuild), and a deploy stage (manual approval). They want to automate the deployment of approved models. Which solution will meet these requirements with the least operational overhead?

A.Add a custom action to CodePipeline that uses a SageMaker deployment step.

B.Create a Lambda function that triggers on Model Registry approval events and updates the endpoint using the boto3 SDK.

C.Configure an EventBridge rule to trigger a CodePipeline execution when the model approval status changes.

D.Use SageMaker Pipelines to deploy the model directly upon training completion.

AnswerC

EventBridge natively integrates with Model Registry events and triggers the pipeline automatically.

Why this answer

Option C is correct because it directly integrates SageMaker Model Registry approval events with CodePipeline via EventBridge, enabling fully automated deployment of the latest approved model to a production endpoint with minimal operational overhead. This approach avoids custom code or additional pipeline stages, leveraging native AWS event-driven architecture to trigger the pipeline only when a model version is approved.

Exam trap

AWS often tests the misconception that you must build a custom Lambda or pipeline action to integrate SageMaker Model Registry with CodePipeline, when in fact EventBridge provides a native, low-overhead solution for event-driven pipeline triggers.

How to eliminate wrong answers

Option A is wrong because adding a custom action to CodePipeline that uses a SageMaker deployment step would require significant custom development and maintenance, increasing operational overhead compared to a native EventBridge trigger. Option B is wrong because creating a Lambda function to poll or react to Model Registry approval events and update the endpoint directly bypasses the existing CodePipeline CI/CD process, losing pipeline visibility, approval gates, and rollback capabilities. Option D is wrong because SageMaker Pipelines are designed for orchestrating training and deployment workflows upon training completion, not for reacting to Model Registry approval events in a CI/CD pipeline, and would require additional integration to trigger on approval rather than training.

Practice this question →

72

Multi-Selectmedium

A company uses Amazon SageMaker to deploy a model for real-time inference. They want to perform A/B testing between two model versions. Which TWO actions should the company take to set up A/B testing? (Choose TWO.)

Select 2 answers

A.Create an endpoint configuration with multiple production variants, each with a different model.

B.Use Amazon CloudWatch Evidently to split traffic between models.

C.Set the initial weight of each production variant to the desired traffic split.

D.Enable auto scaling for each production variant individually.

E.Set the second production variant's weight to 0 and update later to 100.

AnswersA, C

Production variants allow multiple models on the same endpoint.

Why this answer

Option A is correct because in SageMaker, A/B testing between two model versions is achieved by creating an endpoint configuration with multiple production variants, each pointing to a different model. This allows the endpoint to host both models simultaneously and route traffic between them based on assigned weights.

Exam trap

The trap here is that candidates confuse the separate service Amazon CloudWatch Evidently with SageMaker's native traffic splitting, or think that auto scaling or zero-weight strategies are prerequisites for A/B testing.

Practice this question →

73

MCQeasy

A company trained a model using SageMaker and wants to deploy it with low latency for real-time inference. Which SageMaker feature is MOST suitable?

A.SageMaker Endpoint with Auto Scaling

B.SageMaker Serverless Inference

C.SageMaker Real-Time Endpoint

D.SageMaker Batch Transform

AnswerC

Real-time endpoints provide low-latency inference suitable for online predictions.

Why this answer

SageMaker Real-Time Endpoint is the most suitable feature for low-latency real-time inference because it provisions dedicated, persistent instances that respond to requests synchronously with predictable latency. This option directly meets the requirement for serving individual predictions with minimal delay, unlike batch or serverless alternatives that introduce higher latency or are designed for asynchronous processing.

Exam trap

The trap here is that candidates confuse 'Auto Scaling' (a scaling mechanism) with a separate deployment option, or they assume 'Serverless' always provides low latency, ignoring the cold start penalty that makes it unsuitable for real-time inference.

How to eliminate wrong answers

Option A is wrong because SageMaker Endpoint with Auto Scaling is not a distinct feature; it is a configuration applied to a Real-Time Endpoint to adjust capacity based on load, but the core requirement for low-latency real-time inference is already met by the Real-Time Endpoint itself, and Auto Scaling does not change the fundamental synchronous nature. Option B is wrong because SageMaker Serverless Inference automatically scales from zero and incurs cold start latency (often seconds) when there is no prior traffic, making it unsuitable for applications requiring consistently low latency for real-time inference. Option D is wrong because SageMaker Batch Transform is designed for asynchronous, offline inference on large datasets where latency is not a concern, processing data in batches and writing results to S3, not for real-time, synchronous requests.

Practice this question →

74

MCQhard

A financial services company uses Amazon SageMaker to deploy a fraud detection model for real-time inference. The model is deployed on an ml.m5.large instance with a SageMaker real-time endpoint. The endpoint has an auto scaling policy configured using a custom scaling policy based on average CPU utilization, with scale out threshold at 70% and scale in threshold at 30%. During a flash sale event, the traffic to the endpoint spikes tenfold within minutes. The endpoint fails to handle the load, resulting in increased latency and timeouts. The data science team needs to improve the scalability of the endpoint to handle sudden traffic spikes. Which solution should the team implement?

A.Implement a SageMaker Model Ensemble with two additional models to balance the load.

B.Replace the custom scaling policy with a target tracking scaling policy based on the number of invocations per instance, with a target value of 1000.

C.Implement a SageMaker Inference Pipeline with a pre-processing step to reduce model input size.

D.Switch to a GPU instance type, such as ml.p3.2xlarge, to increase compute capacity.

AnswerB

Target tracking on request count provides faster reaction to traffic spikes because it directly measures the traffic, whereas CPU utilization is a lagging indicator.

Why this answer

Option D is correct because target tracking scaling policies based on request count respond faster to traffic spikes than CPU-based scaling, which suffers from lag. Option A is incorrect because GPU instances do not address the scaling policy lag. Option B is incorrect because model ensemble increases compute load.

Option C is incorrect because inference pipelines add latency, not reduce it.

Practice this question →

75

MCQhard

A company uses SageMaker Ground Truth to create a labeled dataset, then trains a model using SageMaker Training. They want to automate the pipeline so that whenever a labeling job is completed, it triggers the training job. Which architecture meets this requirement with minimal latency?

A.Use AWS Step Functions to poll the labeling job status and then start training.

B.Configure an S3 event notification on the labeling job output bucket to trigger a Lambda function that starts training.

C.Use Amazon CloudWatch Events (EventBridge) to detect the completed labeling job and trigger a SageMaker Pipeline execution.

D.Set up a scheduled cron job in EventBridge to check for completed labeling jobs every hour and start training if found.

AnswerC

EventBridge directly supports SageMaker events and can start a pipeline execution with minimal latency.

Why this answer

Option C is correct because Amazon EventBridge can natively capture SageMaker job state changes (e.g., `SageMaker Labeling Job State Change` to `Completed`) and directly trigger a SageMaker Pipeline execution. This event-driven approach eliminates polling overhead and provides the lowest latency by reacting immediately when the labeling job finishes.

Exam trap

The trap here is that candidates often assume S3 event notifications are the simplest event-driven trigger, but they overlook the fact that S3 events can fire on intermediate writes (e.g., partial output files) rather than waiting for the labeling job's definitive `Completed` state, leading to data integrity issues.

How to eliminate wrong answers

Option A is wrong because polling the labeling job status with AWS Step Functions introduces unnecessary latency and cost from repeated API calls, and it is not a true event-driven architecture. Option B is wrong because S3 event notifications on the labeling job output bucket may fire before the labeling job is fully complete (e.g., partial writes) and do not guarantee that the job has transitioned to the `Completed` state, risking training on incomplete data. Option D is wrong because a scheduled cron job running every hour introduces up to 60 minutes of latency, which fails the 'minimal latency' requirement and is inefficient compared to an event-driven trigger.

Practice this question →

Page 1 of 2 · 124 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Deployment and Orchestration of ML Workflows questions.

Start 20-question session