CCNA Mla Deployment Orchestration Questions

33 of 108 questions · Page 2/2 · Mla Deployment Orchestration topic · Answers revealed

76
Multi-Selectmedium

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

Select 3 answers
A.SageMaker (training jobs or pipelines)
B.Amazon EventBridge
C.AWS Lambda
D.AWS Glue
E.Amazon CloudWatch Logs
AnswersA, B, C

SageMaker executes the actual retraining.

Why this answer

Amazon SageMaker provides the training jobs and pipelines that execute the ML retraining workflow. Amazon EventBridge acts as the event bus that triggers retraining based on events such as new data arrival or model drift detection. AWS Lambda serves as the lightweight compute layer that can preprocess events, invoke SageMaker APIs, or orchestrate conditional logic before starting a training job.

Exam trap

The trap here is that candidates often confuse AWS Glue as a compute trigger for ML retraining, but Glue is designed for batch ETL and lacks the event-driven, low-latency invocation capabilities required for this architecture.

77
MCQeasy

A company wants to deploy a trained XGBoost model for batch inference on a large dataset stored in S3. The inference job should be cost-effective and does not require real-time responses. Which SageMaker inference option should they use?

A.SageMaker Batch Transform
B.SageMaker real-time endpoint
C.SageMaker Asynchronous Inference
D.SageMaker Serverless Inference
AnswerA

Batch Transform is designed for batch inference on S3 data, cost-effective and no real-time requirement.

Why this answer

SageMaker Batch Transform is designed for batch inference on large datasets stored in S3, processing data in chunks and writing results to S3. It is cost-effective for non-real-time scenarios. Real-time endpoints are for low-latency inference.

Serverless is for on-demand, not batch. Asynchronous is for near-real-time with S3 input/output but still not ideal for large batch jobs.

78
MCQeasy

A machine learning engineer needs to deploy a new version of a model gradually, initially sending 5% of traffic to the new version and 95% to the current version, while monitoring for errors. Which deployment pattern should they use?

A.Blue/green deployment
B.Canary deployment
C.Shadow testing
D.Rolling deployment
AnswerB

Canary deployment gradually shifts traffic, starting with a small percentage like 5%, to test the new version.

Why this answer

Canary deployment is the correct pattern because it allows the ML engineer to route a small percentage of traffic (e.g., 5%) to the new model version while keeping the majority (95%) on the current version. This enables gradual rollout with real-time monitoring for errors, and if issues are detected, traffic can be instantly shifted back to the stable version.

Exam trap

The trap here is that candidates often confuse canary deployment with blue/green deployment, assuming both involve gradual traffic shifting, but blue/green is an all-or-nothing switch, while canary specifically supports incremental percentage-based routing with monitoring.

How to eliminate wrong answers

Option A is wrong because blue/green deployment involves switching all traffic at once from the current environment (blue) to the new environment (green), which does not support gradual traffic shifting or incremental error monitoring. Option C is wrong because shadow testing sends a copy of live traffic to the new model without affecting user-facing responses, but it does not route actual user traffic to the new version, so it cannot be used to gradually shift real traffic percentages. Option D is wrong because rolling deployment updates instances incrementally (e.g., replacing pods one by one), but it does not provide fine-grained traffic splitting like 5% vs 95% and lacks the instant rollback capability of canary deployments.

79
MCQmedium

A team wants to use MLflow on SageMaker to track experiments and manage model lifecycle. They need to register models in the SageMaker Model Registry after training. Which approach allows them to use MLflow for experiment tracking and then register the best model to SageMaker Model Registry?

A.Use MLflow to track experiments and register models in MLflow's native registry, then export to SageMaker
B.Use SageMaker Experiments for tracking, then manually register the model using SageMaker console
C.Use MLflow tracking server on SageMaker, then use the SageMaker MLflow plugin to register the model in SageMaker Model Registry
D.MLflow cannot be used with SageMaker; use SageMaker Experiments instead
AnswerC

The integration enables MLflow tracking and direct registration to SageMaker Model Registry.

Why this answer

Option C is correct because the SageMaker MLflow plugin (sagemaker-mlflow) allows you to use an MLflow tracking server hosted on SageMaker for experiment tracking, and then directly register the best model into the SageMaker Model Registry via the plugin's integration. This avoids manual export steps and keeps the model lifecycle management within SageMaker's native registry, which is required by the team's goal.

Exam trap

Cisco often tests the misconception that MLflow and SageMaker are mutually exclusive or require complex workarounds, when in fact SageMaker provides a first-class MLflow integration via the tracking server and plugin.

How to eliminate wrong answers

Option A is wrong because exporting models from MLflow's native registry to SageMaker Model Registry is not a supported direct workflow; it would require manual conversion and re-registration, defeating the purpose of seamless integration. Option B is wrong because using SageMaker Experiments for tracking and then manually registering via the console is a valid but less efficient approach that does not leverage MLflow, which the team explicitly wants to use for experiment tracking. Option D is wrong because MLflow can indeed be used with SageMaker; SageMaker provides native support for hosting an MLflow tracking server and the MLflow plugin for model registry integration.

80
MCQhard

A team uses SageMaker real-time endpoints for inference. They want to deploy a new model version and compare its performance with the current version under live traffic without affecting user experience. Which method should they use?

A.A/B testing with production variant traffic splitting
B.Batch transform on a holdout test set
C.Blue/green deployment
D.Shadow testing with SageMaker
AnswerD

Shadow testing duplicates traffic to a shadow variant without serving it to users, allowing safe comparison.

Why this answer

Shadow testing (or shadow deployment) sends a copy of live traffic to the new model variant while the current variant serves the actual response. The shadow variant's performance can be monitored without impacting the user.

81
MCQmedium

A company wants to deploy a single model that processes images from a production line. The images are uploaded to an S3 bucket every few minutes, and the inference results must be stored back to S3. The team wants to avoid paying for idle compute and prefers a fully managed, on-demand solution. Which SageMaker inference option should they use?

A.SageMaker batch transform
B.SageMaker asynchronous inference
C.SageMaker real-time endpoint with auto scaling
D.SageMaker serverless inference
AnswerB

Asynchronous inference is ideal for near-real-time, event-driven workloads with S3 input/output and scales to zero when idle.

Why this answer

Asynchronous inference is designed for this use case: it processes images from S3 input, writes results to S3 output, scales to zero when idle, and is fully managed. Real-time endpoints are always running and incur cost when idle. Batch transform is not event-driven.

Serverless inference is event-driven but has a payload limit and cold start that may not be suitable for image payloads.

82
MCQmedium

A company wants to deploy a PyTorch model on SageMaker using the NVIDIA Triton Inference Server for GPU acceleration. They have an existing Triton configuration. Which approach should they take?

A.Use SageMaker Neo to compile the model for Triton
B.Package Triton as a custom container and use SageMaker batch transform
C.Use the SageMaker Triton Inference Server container from the Deep Learning Containers
D.Use the standard SageMaker PyTorch container and install Triton at runtime
AnswerC

The SageMaker Triton DLC is pre-configured for Triton and supports PyTorch models.

Why this answer

Option C is correct because AWS provides a pre-built SageMaker Triton Inference Server container as part of the Deep Learning Containers (DLCs), which is optimized for GPU acceleration and supports the existing Triton configuration without modification. This container integrates directly with SageMaker hosting endpoints, enabling seamless deployment of PyTorch models with Triton's features like dynamic batching and model concurrency.

Exam trap

The trap here is that candidates may assume SageMaker Neo is a universal compilation tool for any inference server, but Neo is specifically for hardware-specific optimization and does not support Triton's runtime environment, leading them to incorrectly select Option A.

How to eliminate wrong answers

Option A is wrong because SageMaker Neo compiles models for specific hardware targets (e.g., Intel, ARM) and does not support compilation for the NVIDIA Triton Inference Server; Neo is designed for edge devices and does not integrate with Triton's serving architecture. Option B is wrong because while packaging Triton as a custom container is possible, using SageMaker batch transform is not the recommended approach for real-time inference with GPU acceleration; batch transform is for offline, asynchronous processing, not for low-latency serving. Option D is wrong because installing Triton at runtime on the standard PyTorch container is inefficient and error-prone; it adds startup latency, may cause dependency conflicts, and bypasses the pre-optimized, tested Triton container that AWS provides.

83
MCQhard

A company needs to update a model in production without any downtime. They currently have a single real-time endpoint serving traffic. Which approach allows them to deploy a new model version and switch traffic gradually while being able to roll back quickly?

A.Use a canary deployment by creating a new production variant with the new model and shifting traffic incrementally
B.Use a multi-model endpoint and replace the model file
C.Stop the endpoint, update the model, and restart the endpoint
D.Update the existing endpoint's model directly using UpdateEndpoint
AnswerA

This allows gradual traffic shift and the old variant can be used for rollback if needed.

Why this answer

SageMaker supports production variants with traffic splitting. By creating a new variant with the new model and shifting traffic gradually, the old variant remains available for rollback. Blue/green deployment with a new endpoint and endpoint configuration swap also allows quick rollback.

The key is to have both variants active during the transition.

84
Multi-Selecthard

A machine learning engineer is deploying a TensorFlow model for real-time inference. The model has high latency on CPU. Which TWO actions can reduce inference latency? (Choose two.)

Select 2 answers
A.Enable SageMaker Model Monitor
B.Switch to a multi-model endpoint
C.Attach Amazon Elastic Inference to the endpoint
D.Use a larger instance type with more vCPUs
E.Compile the model with SageMaker Neo
AnswersC, E

Elastic Inference adds GPU acceleration, reducing latency.

Why this answer

Compiling with SageMaker Neo optimizes the model for the target hardware. Attaching Elastic Inference provides GPU acceleration without moving to a full GPU instance.

85
MCQmedium

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

A.Increase the instance count for the Processing step
B.Enable pipeline caching for the Processing step
C.Use a larger instance type with more vCPUs
D.Use a Tuning step instead
AnswerA

More instances allow distributed processing, reducing wall clock time.

Why this answer

Increasing the instance count for the SageMaker Processing step enables distributed execution of the feature engineering script across multiple nodes. SageMaker Processing supports distributed processing by default when you set the instance_count > 1, which can dramatically reduce wall-clock time for embarrassingly parallel workloads like feature engineering. This is the most effective approach because it directly parallelizes the computation without requiring code changes if the script is designed to work with distributed frameworks like PySpark or if the data is sharded appropriately.

Exam trap

Cisco often tests the distinction between vertical scaling (larger instance) and horizontal scaling (more instances), where candidates mistakenly choose a larger instance type thinking it's always faster, but for distributed workloads like feature engineering, horizontal scaling is more effective and cost-efficient.

How to eliminate wrong answers

Option B is wrong because pipeline caching only avoids re-running the step if the inputs and parameters haven't changed; it does not speed up the execution of the step itself when it must run. Option C is wrong because using a larger instance type with more vCPUs provides vertical scaling, which has diminishing returns due to CPU/memory bottlenecks and does not scale as effectively as horizontal scaling (multiple instances) for large-scale feature engineering. Option D is wrong because a Tuning step is designed for hyperparameter optimization, not for running feature engineering scripts; it would not execute the script and would add unnecessary complexity and cost.

86
MCQeasy

A company uses SageMaker Pipelines to automate their ML workflow. They notice that the pipeline reruns all steps even when the input data has not changed. Which feature should they enable to avoid unnecessary recomputation?

A.Enable pipeline caching
B.Use a Lambda step to check input changes
C.Use a Conditional step to skip steps
D.Set the pipeline execution mode to 'Parallel'
AnswerA

Caching stores step outputs and reuses them when inputs are identical, preventing unnecessary reruns.

Why this answer

Pipeline caching in SageMaker Pipelines automatically reuses the output of a step if its inputs (including parameters, data, and code) have not changed since the last successful execution. This avoids recomputation by comparing a hash of the step's dependencies against previous runs, making it the correct feature to prevent unnecessary reruns when input data remains identical.

Exam trap

The trap here is that candidates confuse caching with conditional branching or parallel execution, assuming that skipping steps via conditions or running steps in parallel will avoid recomputation, when in fact only caching directly reuses prior outputs based on input immutability.

How to eliminate wrong answers

Option B is wrong because a Lambda step is used for custom processing or integration (e.g., invoking external APIs), not for detecting input changes or caching step outputs; it would add complexity without solving the core caching requirement. Option C is wrong because a Conditional step evaluates a condition to branch the pipeline (e.g., skip a step based on a metric), but it does not automatically detect unchanged inputs or cache results; it requires manual logic and still incurs overhead for the condition check. Option D is wrong because setting the pipeline execution mode to 'Parallel' controls whether steps run sequentially or concurrently, but it does not prevent recomputation of steps whose inputs have not changed; it only affects execution order, not caching.

87
Multi-Selectmedium

A data science team is deploying a PyTorch model for real-time inference with sub-second latency requirements. They need to minimize cost while handling variable traffic. Which TWO approaches should they consider? (Choose TWO.)

Select 2 answers
A.Compile the model with SageMaker Neo
B.Attach Amazon Elastic Inference to a real-time endpoint
C.Use a batch transform job to process requests in batches
D.Use SageMaker serverless inference with a configured max concurrency
E.Use a multi-model endpoint (MME) to host the model
AnswersA, D

Neo optimizes the model for the target hardware, reducing inference latency and often allowing a smaller instance type.

Why this answer

Serverless inference auto-scales to zero when not in use and charges per request, minimizing cost for variable traffic. SageMaker Neo compiles the model for optimal hardware performance, achieving low latency. Multi-model endpoints (MME) are for hosting multiple models, not single-model optimization.

Elastic Inference adds GPU acceleration at lower cost than a full GPU instance, but with Neo compilation the team may not need it. Batch transform is for offline, not real-time.

88
Multi-Selectmedium

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

Select 2 answers
A.Configure two production variants on the endpoint, each with an initial weight
B.Use SageMaker Model Registry to approve both variants
C.Use the UpdateEndpointWeightsAndCapacity API to adjust traffic after analysis
D.Deploy each variant to a separate endpoint and use Route53 weighted routing
E.Enable shadow testing on the endpoint
AnswersA, C

This defines the variants and their traffic split.

Why this answer

Option A is correct because SageMaker endpoints support multiple production variants, each with an assigned weight that determines the proportion of traffic routed to that variant. By setting initial weights (e.g., 50/50 or 90/10), you can split traffic between two model variants for A/B testing without deploying separate endpoints.

Exam trap

The trap here is that candidates confuse shadow testing (Option E) with A/B testing, but shadow testing does not split live traffic—it only mirrors requests for offline analysis, while A/B testing requires actual traffic distribution between variants.

89
MCQmedium

A team uses MLflow on SageMaker for experiment tracking. They want to automate the retraining of a model when new training data arrives in an S3 bucket. Which combination of services should they use?

A.SageMaker Pipelines scheduled trigger every hour
B.EventBridge -> Lambda -> SageMaker Training Job
C.S3 Event Notifications -> SQS -> SageMaker Training Job
D.AWS Step Functions with S3 poller
AnswerB

EventBridge captures S3 events, Lambda initiates training, and MLflow can log the run.

Why this answer

EventBridge can detect S3 PutObject events and trigger a Lambda function that starts a SageMaker training job, possibly using MLflow for tracking.

90
MCQmedium

A machine learning engineer needs to deploy a TensorFlow model that requires a custom inference environment with specific system libraries. The model will be used in a real-time application with variable traffic. They want to minimize cold start latency. Which SageMaker hosting option should they choose?

A.SageMaker real-time endpoint with a custom container
B.SageMaker Serverless Inference with a custom container
C.SageMaker Multi-Model Endpoint with a custom container
D.SageMaker Asynchronous Inference with a custom container
AnswerA

Real-time endpoints are always warm (no cold starts) and support custom containers.

Why this answer

SageMaker real-time endpoints with a custom container are the correct choice because they provide persistent, always-on infrastructure that eliminates cold start latency. By packaging the TensorFlow model with required system libraries in a custom Docker image, the engineer ensures the inference environment is ready immediately, and the endpoint can scale to handle variable traffic with minimal delay.

Exam trap

The trap here is that candidates often confuse 'minimizing cold start latency' with 'scaling to zero' and incorrectly choose Serverless Inference, failing to recognize that Serverless inherently introduces cold starts on first request after idle periods.

How to eliminate wrong answers

Option B is wrong because SageMaker Serverless Inference automatically scales to zero when idle, incurring cold start latency (typically 5–10 seconds) when traffic resumes, which contradicts the requirement to minimize cold start latency. Option C is wrong because SageMaker Multi-Model Endpoints are designed to host multiple models on a single container, but they still require a pre-configured inference environment; they do not inherently reduce cold start latency for a single custom model. Option D is wrong because SageMaker Asynchronous Inference is intended for non-real-time workloads with larger payloads and queuing, and it also experiences cold starts when scaling from zero, making it unsuitable for real-time applications with variable traffic.

91
MCQeasy

A company wants to deploy a model using a serverless inference endpoint that can automatically scale to zero when not in use and has a configurable maximum concurrency. Which SageMaker inference option meets these requirements?

A.Serverless inference
B.Real-time endpoint with auto-scaling
C.Batch transform
D.Asynchronous inference
AnswerA

Serverless inference scales to zero and has configurable max concurrency and memory.

Why this answer

SageMaker Serverless Inference is the correct choice because it automatically scales to zero when the endpoint is idle, eliminating costs during periods of no traffic, and it allows you to configure a maximum concurrency limit per endpoint to control throughput. This fully managed, pay-per-invoke option is designed for workloads with intermittent or unpredictable traffic patterns, meeting both requirements precisely.

Exam trap

The trap here is that candidates confuse 'auto-scaling' with 'scaling to zero' and incorrectly choose the real-time endpoint with auto-scaling, not realizing that auto-scaling maintains a minimum instance count and cannot reduce to zero.

How to eliminate wrong answers

Option B is wrong because a real-time endpoint with auto-scaling can scale down to a minimum number of instances (e.g., 1) but cannot scale to zero; it always keeps at least one instance running, incurring base costs. Option C is wrong because batch transform is not a real-time inference endpoint; it processes entire datasets offline and does not support automatic scaling to zero or configurable concurrency for live requests. Option D is wrong because asynchronous inference endpoints can scale to zero when idle, but they do not support a configurable maximum concurrency; concurrency is managed internally based on the payload size and queue depth, not directly set by the user.

92
MCQeasy

A machine learning engineer wants to automatically trigger a retraining pipeline whenever new training data arrives in an S3 bucket. The pipeline uses SageMaker Pipelines. Which AWS service should be used to detect the S3 event and start the pipeline?

A.SageMaker Pipelines native S3 trigger
B.AWS Step Functions
C.Amazon CloudWatch Logs
D.Amazon EventBridge
AnswerD

EventBridge can react to S3 events and invoke a Lambda function that starts the SageMaker Pipeline execution.

Why this answer

Amazon EventBridge can be configured to listen for S3 events (e.g., PutObject) and then invoke a Lambda function that starts the SageMaker Pipeline execution. Step Functions could orchestrate the pipeline but is not needed to trigger on S3 events. SageMaker Pipelines does not natively listen to S3 events.

CloudWatch Events is the older name for EventBridge.

93
MCQmedium

A team built a SageMaker Pipeline that includes a training step and a model evaluation step. They want to automatically register a model in SageMaker Model Registry only if the evaluation metric (accuracy) exceeds 0.9. Which pipeline step should be used to implement this conditional logic?

A.RegisterModel step
B.Condition step
C.Processing step
D.Transform step
AnswerB

Condition step evaluates a condition and routes execution to different branches (e.g., register model if accuracy > 0.9).

Why this answer

The Condition step in SageMaker Pipelines allows you to add conditional branching logic, such as evaluating a metric and proceeding only if a condition is met. In this scenario, you would use a ConditionStep to check if the accuracy metric from the evaluation step exceeds 0.9, and then conditionally execute a RegisterModel step to register the model in SageMaker Model Registry.

Exam trap

Cisco often tests the misconception that the RegisterModel step itself can conditionally register a model based on metrics, but in SageMaker Pipelines, conditional logic must be implemented explicitly with a ConditionStep.

How to eliminate wrong answers

Option A is wrong because the RegisterModel step is used to register a model in the Model Registry, but it does not have built-in conditional logic; it would register the model unconditionally unless placed inside a ConditionStep. Option C is wrong because a Processing step is used for data processing, feature engineering, or evaluation tasks, not for implementing conditional branching logic. Option D is wrong because a Transform step is used for batch inference or model serving, not for conditional evaluation or registration decisions.

94
Multi-Selecthard

An ML engineer is designing a SageMaker Pipeline for model training and registration. They need to ensure that the pipeline can be re-run with different datasets without manual intervention, and that the steps are only re-executed if inputs have changed. Which THREE features should they configure? (Select THREE.)

Select 3 answers
A.Add a Condition step to manually check for data changes
B.Enable step caching to reuse outputs when inputs are unchanged
C.Configure lineage tracking to record the origin of models
D.Use Parameterized execution to pass different values at runtime
E.Define pipeline parameters for dataset location and hyperparameters
AnswersB, D, E

Why this answer

Pipeline parameters allow passing different inputs. Step caching reuses step outputs when inputs are identical. Using Parameterized execution is synonymous with using parameters.

Lineage tracking is not for skipping steps. Condition steps are for branching, not caching. Model Registry is for versioning.

95
Multi-Selectmedium

A team is migrating their ML infrastructure to AWS and wants to use infrastructure as code to manage SageMaker Studio domains, user profiles, and associated resources. Which services can they use for this purpose? (Select THREE.)

Select 3 answers
A.AWS CDK (Cloud Development Kit)
B.SageMaker Python SDK
C.Boto3
D.Terraform by HashiCorp
E.AWS CloudFormation
AnswersA, D, E

CDK allows defining infrastructure using programming languages and synthesizes CloudFormation templates.

Why this answer

AWS CDK (Cloud Development Kit) is correct because it allows you to define AWS infrastructure, including SageMaker Studio domains and user profiles, using familiar programming languages like Python or TypeScript. CDK synthesizes these definitions into CloudFormation templates, enabling infrastructure as code (IaC) for SageMaker resources. This approach provides type safety and high-level abstractions, making it suitable for managing complex ML environments.

Exam trap

The trap here is that candidates often confuse the SageMaker Python SDK (used for ML workflows) or Boto3 (used for general AWS API calls) with infrastructure as code tools, but neither provides declarative, state-managed provisioning of SageMaker Studio resources like CloudFormation, CDK, or Terraform do.

96
Multi-Selecthard

An ML engineer is designing a SageMaker Pipeline for a computer vision model. The pipeline includes steps for data processing, training, evaluation, and registration. The engineer wants to enable caching to avoid reprocessing when step inputs have not changed. For which steps is caching supported? (Select TWO.)

Select 2 answers
A.Processing step
B.Transform step
C.Condition step
D.Lambda step
E.RegisterModel step
AnswersA, B

Processing steps support caching.

Why this answer

Caching is supported for the following step types: Processing, Training, Tuning, Transform, and AutoML. Condition steps and Lambda steps do not support caching because they are control flow steps.

97
MCQhard

A team needs to deploy a PyTorch model that uses custom CUDA kernels. They want to use NVIDIA Triton Inference Server on SageMaker for high-performance serving. Which SageMaker configuration is required to use Triton?

A.Create a custom container from scratch with Triton and deploy on SageMaker
B.Use the SageMaker pre-built Triton Inference Server container available in Amazon ECR
C.Use a Multi-Model Endpoint with Triton
D.Attach an Amazon Elastic Inference accelerator to the endpoint
AnswerB

SageMaker provides a pre-built container with Triton, ready for deployment.

Why this answer

Option B is correct because SageMaker provides a pre-built Triton Inference Server container in Amazon ECR that is optimized for high-performance serving of models, including those with custom CUDA kernels. This container eliminates the need to build a custom image from scratch, ensuring compatibility with SageMaker's deployment infrastructure and reducing operational overhead.

Exam trap

Cisco often tests the misconception that custom containers are always required for custom code, but the trap here is that SageMaker's pre-built Triton container fully supports custom CUDA kernels, making option A a redundant and incorrect choice.

How to eliminate wrong answers

Option A is wrong because creating a custom container from scratch is unnecessary and error-prone; SageMaker already offers a pre-built Triton container that handles the integration with SageMaker's hosting environment, including health checks and model loading. Option C is wrong because Multi-Model Endpoints are designed to host multiple models on a single container, but they do not inherently support Triton's specific features like dynamic batching and model pipelines; Triton requires its own server process, which is not compatible with the Multi-Model Endpoint architecture. Option D is wrong because Amazon Elastic Inference accelerators are deprecated and do not support custom CUDA kernels or Triton; they are limited to specific frameworks like TensorFlow and PyTorch without custom ops, and they cannot accelerate custom CUDA code.

98
MCQmedium

A team has 200 small ML models that need to be served via HTTPS endpoints. Each model is used infrequently, and the team wants to minimize hosting costs. Which SageMaker deployment approach is MOST cost-effective?

A.Use SageMaker Serverless Inference for each model
B.Deploy each model on a separate real-time endpoint
C.Use Batch Transform for all models
D.Use a single multi-model endpoint (MME)
AnswerD

MME dynamically loads models from Amazon S3 onto shared instances, minimizing cost for many infrequently used models.

Why this answer

Multi-model endpoints (MME) allow hosting multiple models on a single endpoint, sharing instances and reducing costs, especially for infrequently used models.

99
MCQmedium

A machine learning team uses SageMaker Pipelines to automate retraining. They want to avoid re-running data processing steps if the data has not changed since the last successful pipeline run. Which built-in feature should they enable?

A.Pipeline caching
B.Model lineage tracking
C.Parameterized pipeline executions
D.Step parallelism
AnswerA

Caching reuses step outputs when inputs and configuration haven't changed, avoiding redundant processing.

Why this answer

Pipeline caching is the correct choice because SageMaker Pipelines can cache the outputs of each step based on a hash of the step's input parameters, configuration, and code. If the hash matches a previous successful run, the cached output is reused, avoiding redundant execution of data processing steps when the underlying data hasn't changed.

Exam trap

The trap here is that candidates confuse lineage tracking (Option B) with caching, assuming that tracking data versions automatically prevents re-execution, when in fact lineage only records history without affecting pipeline execution behavior.

How to eliminate wrong answers

Option B is wrong because model lineage tracking (via SageMaker ML Lineage Tracking) records the relationships between data, models, and training jobs, but it does not prevent re-running steps; it only provides auditability and provenance. Option C is wrong because parameterized pipeline executions allow you to pass different input values at runtime, but they do not automatically skip unchanged steps—caching is required for that. Option D is wrong because step parallelism controls the concurrency of step execution within a pipeline, not the reuse of previous outputs.

100
Multi-Selectmedium

A company is using AWS Step Functions to orchestrate their ML retraining pipeline. They want to trigger retraining when new data arrives, but only if the model's performance has degraded below a threshold. Which THREE AWS services should they use together to achieve this? (Choose three.)

Select 3 answers
A.AWS Step Functions
B.AWS Lambda
C.Amazon EventBridge
D.Amazon CloudWatch Logs
E.SageMaker Model Registry
AnswersA, B, C

Step Functions orchestrates the retraining pipeline.

Why this answer

A solution: Amazon EventBridge detects S3 events (new data), invokes a Lambda function that checks model performance (e.g., via SageMaker Model Monitor or custom metrics), and then starts a Step Functions workflow if degradation is detected. The other services: SageMaker Pipelines could replace Step Functions but is not listed as an option; SageMaker Model Monitor can track performance but is not an event source; CloudWatch Logs is not directly involved in the trigger logic.

101
MCQeasy

A team wants to deploy a model that performs inference on large video files (up to 2 GB each) uploaded to an S3 bucket. The inference can tolerate a few minutes of latency. Which SageMaker inference option is most cost-effective?

A.Batch transform
B.Asynchronous inference
C.Serverless inference
D.Real-time endpoint
AnswerB

Asynchronous inference handles large payloads via S3 and processes them within minutes, with cost-effective scaling.

Why this answer

Asynchronous inference is the most cost-effective option for large video files (up to 2 GB) with a tolerance for a few minutes of latency because it queues incoming requests, processes them in the background, and automatically scales down to zero when idle, eliminating the cost of idle compute. It supports payloads up to 1 GB natively and can handle larger files via S3 input, making it ideal for this workload without requiring a continuously running endpoint.

Exam trap

Cisco often tests the payload size and timeout limits of Serverless inference (6 MB, 60 seconds) versus Asynchronous inference (1 GB, 15 minutes default timeout) to trick candidates into choosing Serverless for large files, ignoring its hard constraints.

How to eliminate wrong answers

Option A (Batch transform) is wrong because it is designed for offline, scheduled processing of entire datasets, not for real-time or near-real-time inference triggered by individual file uploads; it would require additional orchestration to react to S3 events and incurs costs for spinning up instances even when no jobs are running. Option C (Serverless inference) is wrong because it has a maximum payload size of 6 MB and a maximum invocation timeout of 60 seconds, making it incapable of processing multi-GB video files. Option D (Real-time endpoint) is wrong because it requires always-on instances that incur costs even when idle, and its 60-second timeout is insufficient for processing large video files, leading to failed invocations or the need for oversized instances.

102
Multi-Selectmedium

A company uses SageMaker Pipelines to automate their ML workflow. They need to add model versioning and approval workflow. Which THREE steps should they include in their pipeline to achieve this? (Choose THREE.)

Select 3 answers
A.RegisterModel step
B.Training step
C.Condition step
D.Processing step for evaluation
E.Transform step
AnswersA, C, D

This step creates a new model version in the Model Registry.

Why this answer

The RegisterModel step is correct because it creates a model package in SageMaker Model Registry, which enables versioning and approval workflows. This step registers the trained model artifact along with metadata, allowing the pipeline to track model versions and trigger approval processes for deployment.

Exam trap

The trap here is that candidates may think the Training step alone suffices for versioning, but AWS explicitly separates model training from model registration, requiring the RegisterModel step for registry integration.

103
Multi-Selecteasy

A company wants to trigger a model retraining pipeline whenever new training data arrives in an S3 bucket. They also need to send a notification to a Slack channel when the retraining completes. Which TWO AWS services should they use to implement this event-driven workflow? (Select TWO.)

Select 2 answers
A.Amazon SQS
B.AWS Lambda
C.AWS CloudTrail
D.Amazon EventBridge
E.SageMaker Model Registry
AnswersB, D

Why this answer

AWS Lambda is correct because it can be triggered directly by S3 events (e.g., s3:ObjectCreated) to invoke the model retraining pipeline. Amazon EventBridge is correct because it can capture completion events from the retraining pipeline (e.g., SageMaker training job state changes) and route them to a target like a Slack webhook via Lambda or SNS, enabling the notification workflow.

Exam trap

Cisco often tests the distinction between services that trigger actions (Lambda, EventBridge) versus services that store or audit (SQS, CloudTrail, Model Registry), leading candidates to pick SQS for decoupling or CloudTrail for monitoring, which are not event-driven triggers for this workflow.

104
MCQeasy

A company has 50 small PyTorch models that are used infrequently for inference. They want to minimize costs while maintaining the ability to serve all models from a single endpoint. Which SageMaker feature should they use?

A.Multi-container endpoint
B.Batch transform job
C.Real-time endpoint with 50 production variants
D.Multi-model endpoint
AnswerD

MME hosts many models on one endpoint, loading each model on demand. Ideal for many small, infrequently used models.

Why this answer

Multi-model endpoints (MME) allow hosting multiple models on a single endpoint, loading models dynamically based on the target model in the request. This reduces cost for many small, infrequently used models by sharing the underlying instance.

105
MCQeasy

A company wants to deploy a PyTorch model that uses dynamic batching and model ensemble. They need to serve multiple models with different frameworks (PyTorch, TensorFlow) within the same endpoint. Which SageMaker feature should they use?

A.Triton Inference Server on SageMaker
B.Multi-container endpoint
C.Separate endpoints for each framework
D.Multi-model endpoint (MME)
AnswerB

Multi-container endpoints allow up to 15 containers, each with its own framework, running on the same instance.

Why this answer

B is correct because a multi-container endpoint allows you to run multiple containers (e.g., one for PyTorch, one for TensorFlow) within the same SageMaker endpoint, enabling model ensemble and dynamic batching across different frameworks. This feature supports serving models with heterogeneous frameworks and dependencies without needing separate endpoints, while still providing a single inference endpoint for clients.

Exam trap

Cisco often tests the distinction between multi-model endpoints (which share a container) and multi-container endpoints (which run separate containers), so the trap here is assuming that MME can handle different frameworks, when in fact it requires all models to be compatible with the same container environment.

How to eliminate wrong answers

Option A is wrong because Triton Inference Server on SageMaker is optimized for high-performance inference with GPU acceleration and supports multiple frameworks within a single container, but it does not natively support running separate containers for different frameworks within the same endpoint; it is a single-container solution. Option C is wrong because separate endpoints for each framework would require managing multiple endpoints, increasing latency for ensemble requests and complicating orchestration, which contradicts the requirement for a single endpoint. Option D is wrong because a multi-model endpoint (MME) hosts multiple models within a single container, but it does not support running different frameworks (e.g., PyTorch and TensorFlow) simultaneously within the same endpoint, as MME requires all models to share the same container environment and inference code.

106
Multi-Selectmedium

A data science team uses SageMaker Pipelines to automate their ML workflow. They want to reduce costs by reusing outputs from previous pipeline runs when the input data and code have not changed. Which TWO actions should they take? (Choose two.)

Select 2 answers
A.Set the StepStatus of successful steps to 'Cached'
B.Use parallel execution of pipeline steps
C.Create multiple pipeline versions for each run
D.Disable caching for all steps to avoid unnecessary storage costs
E.Enable step caching in the pipeline definition
AnswersA, E

This is part of the caching configuration to mark steps as cacheable.

Why this answer

Option A is correct because setting the StepStatus of successful steps to 'Cached' is not a direct action; rather, SageMaker Pipelines uses step caching to automatically reuse outputs from previous runs when the input data, code, and parameters are unchanged. By enabling step caching in the pipeline definition (Option E), SageMaker checks a cache key (hash of inputs, code, and parameters) and, if a match is found, skips re-execution and uses the cached output, reducing compute costs. Option A describes the result of caching (a step's status becomes 'Cached'), but the action to achieve that is enabling caching in the pipeline definition, which is why both A and E are correct.

Exam trap

The trap here is that candidates may think 'Set the StepStatus to Cached' (Option A) is a manual action, when in reality it is an automatic result of enabling step caching (Option E), and both are required to achieve the goal of reusing outputs.

107
Multi-Selectmedium

A machine learning team needs to deploy a PyTorch model that has been compiled with SageMaker Neo to improve inference performance on edge devices. Which TWO statements about SageMaker Neo are correct? (Select TWO.)

Select 2 answers
A.Neo reduces model inference latency through optimization techniques
B.Neo requires the model to be trained on SageMaker
C.Neo compiles models for a specific hardware target, such as Intel or ARM
D.Neo can only compile models trained with SageMaker built-in algorithms
E.Neo automatically scales SageMaker endpoints based on demand
AnswersA, C

Why this answer

SageMaker Neo optimizes models for specific hardware targets (e.g., ARM, Intel, NVIDIA) and reduces latency. It does not require training frameworks; it compiles trained models. It does not automatically scale endpoints.

It is not limited to built-in algorithms.

108
MCQhard

A financial services company needs to deploy a machine learning model for real-time fraud detection. The model must be highly available across multiple Availability Zones and must support automatic scaling based on request volume. The company also needs to perform canary deployments to test new model versions with a small percentage of traffic before full rollout. Which SageMaker feature should they use?

A.SageMaker real-time endpoint with production variants
B.SageMaker Multi-Model Endpoint
C.SageMaker Batch Transform
D.SageMaker Serverless Inference
AnswerA

Real-time endpoints support multi-AZ, auto scaling, and traffic splitting for canary deployments.

Why this answer

SageMaker real-time endpoints with production variants enable canary deployments by routing a small percentage of traffic to a new model version while the majority goes to the current version. This feature also supports multi-AZ deployment for high availability and automatic scaling based on request volume via Application Auto Scaling, meeting all the stated requirements.

Exam trap

Cisco often tests the distinction between real-time endpoints with production variants and Multi-Model Endpoints, where candidates mistakenly think Multi-Model Endpoints support canary deployments because they can host multiple models, but they lack traffic splitting and weighted routing capabilities.

How to eliminate wrong answers

Option B is wrong because SageMaker Multi-Model Endpoint hosts multiple models on the same endpoint but does not support canary deployments or traffic shifting between model versions; it is designed for cost-efficient hosting of many models, not staged rollouts. Option C is wrong because SageMaker Batch Transform is for offline, asynchronous inference on large datasets, not real-time fraud detection with low latency and automatic scaling. Option D is wrong because SageMaker Serverless Inference automatically scales to zero and has a cold start latency that is unsuitable for real-time fraud detection requiring consistent sub-second response times, and it does not support canary deployments with traffic splitting.

← PreviousPage 2 of 2 · 108 questions total

Ready to test yourself?

Try a timed practice session using only Mla Deployment Orchestration questions.

CCNA Mla Deployment Orchestration Questions — Page 2 of 2 | Courseiva