Knowledge + Practice

AWS Certified Machine Learning Engineer Associate MLA-C01 (MLA-C01) — Questions 676–750

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 10 of 14

676

MCQeasy

A machine learning engineer is preparing a dataset that contains both numerical and categorical features. The categorical features have high cardinality (e.g., zip code with thousands of unique values). Which technique is most appropriate for encoding these high-cardinality categorical features?

A.Label encoding

B.One-hot encoding

C.Frequency encoding

D.Target encoding

AnswerD

Encodes using target mean, handles high cardinality well.

Why this answer

Target encoding is the most appropriate technique for high-cardinality categorical features because it replaces each category with the mean of the target variable for that category, effectively capturing the predictive signal while keeping the feature as a single numeric column. This avoids the dimensionality explosion of one-hot encoding and the arbitrary ordinality of label encoding, making it a common choice in gradient boosting frameworks like XGBoost or LightGBM for datasets with thousands of unique categories.

Exam trap

AWS often tests the misconception that one-hot encoding is always the safest choice for categorical data, but candidates fail to recognize that high cardinality makes it impractical, leading them to overlook target encoding as a more efficient alternative.

How to eliminate wrong answers

Option A is wrong because label encoding assigns arbitrary integer values to categories, which introduces a false ordinal relationship that can mislead tree-based models into treating high-cardinality features as ordered, degrading performance. Option B is wrong because one-hot encoding creates a binary column for each unique category, which with thousands of categories leads to an extremely high-dimensional and sparse feature space, causing memory issues and overfitting. Option C is wrong because frequency encoding replaces categories with their occurrence counts, which loses the relationship between the category and the target variable, often resulting in weaker predictive power compared to target encoding.

Full explanation →

677

MCQmedium

An ML team is developing a regression model using Amazon SageMaker. They have a 100 GB CSV dataset stored in Amazon S3. The data is contained in a single large file. They launch a SageMaker training job with an ml.p3.8xlarge instance using a custom Docker container. The training script loads the data using pandas' read_csv from S3 directly. The team observes that the training job takes over 24 hours, and CloudWatch metrics show: GPU utilization is consistently above 90%, but CPU utilization is below 30%. Network I/O is moderate, and disk I/O is low. The team has already tried switching to a larger instance type (ml.p3.16xlarge) with no significant improvement. They need to reduce training time. Which action is MOST likely to achieve this?

A.Use SageMaker Pipe Mode to stream data directly from S3 to the algorithm, bypassing the local file system.

B.Split the CSV file into multiple smaller files (e.g., 100 MB each) and update the training script to read from a list of files in S3.

C.Use Amazon SageMaker Managed Spot Training to reduce cost, then use the savings to rent a larger instance.

D.Increase the number of training instances by using a distributed training configuration with Horovod.

AnswerB

This allows SageMaker to parallelize data loading across multiple instances or even multiple processes within one instance, improving I/O throughput.

Why this answer

The bottleneck is data loading. The single large CSV file prevents parallelism; SageMaker's Pipe mode streams data directly to the algorithm, but custom containers must support it. However, a simpler and effective approach is to split the data into multiple smaller files, enabling SageMaker's distributed data loading across instances and improving I/O parallelism.

Increasing instance count with single file doesn't help because each instance still reads the same file. Changing instance type already tried. Spot instances don't improve speed.

EBS volume doesn't matter.

Full explanation →

678

MCQmedium

A company uses Amazon SageMaker Ground Truth to label images for object detection. They want to minimize labeling costs while maintaining high accuracy. Which feature should they enable?

A.Active learning to automatically select samples for labeling

B.Use of mechanical turk for all labeling

C.Pre-built annotation workflows for bounding boxes

D.Automated data labeling with AWS Lambda

AnswerA

Active learning prioritizes samples where the model is uncertain, reducing labeling effort.

Why this answer

Active learning in Ground Truth selects the most informative images for labeling, reducing the number of labels needed while maintaining model quality.

Full explanation →

679

MCQmedium

A trained model needs to be deployed for real-time inference with low latency. Which AWS service is best suited for this?

A.SageMaker Batch Transform

B.SageMaker endpoints

C.SageMaker Hyperparameter Tuning

D.AWS Lambda with model packaged

AnswerB

Endpoints are designed for real-time inference with automatic scaling and low latency.

Why this answer

SageMaker endpoints are designed for real-time inference by provisioning persistent, auto-scaled HTTPS endpoints that return predictions with millisecond latency. They support automatic scaling, A/B testing, and can be deployed behind a VPC for low-latency access, making them the ideal choice for serving a trained model in production.

Exam trap

Cisco often tests the distinction between batch and real-time inference, and the trap here is that candidates confuse SageMaker Batch Transform (which processes data in bulk) with a real-time serving solution, or they overestimate Lambda's ability to handle large model payloads and sustained low-latency requests.

How to eliminate wrong answers

Option A is wrong because SageMaker Batch Transform is an asynchronous, batch-processing service that processes large datasets in chunks and returns results to S3, not suitable for real-time, low-latency inference. Option C is wrong because SageMaker Hyperparameter Tuning is a model training optimization process that searches for optimal hyperparameters, not a deployment or inference service. Option D is wrong because AWS Lambda has a maximum execution timeout of 15 minutes and a payload limit of 6 MB, making it impractical for hosting large models or handling sustained real-time inference with low latency; it is better suited for lightweight, event-driven tasks.

Full explanation →

680

MCQhard

A financial services company is training a large natural language processing (NLP) model using PyTorch on a SageMaker distributed training job. The cluster consists of 4 ml.p3.16xlarge instances (8 GPUs each). The training job runs successfully but takes 72 hours, exceeding the allotted 48-hour window. The team must reduce training time without sacrificing model quality. The model architecture has 1.5 billion parameters and currently uses the SageMaker data parallel library with Horovod for all-reduce. Observing CloudWatch metrics, the team notices that GPU utilization averages only 45% and network throughput is near maximum. Which action will most effectively reduce training time?

A.Enable Elastic Fabric Adapter (EFA) for faster inter-node connectivity.

B.Increase the batch size to improve GPU utilization.

C.Increase the number of instances from 4 to 8 to add more GPUs.

D.Switch to SageMaker model parallel library with pipeline parallelism to reduce communication overhead.

AnswerD

Model parallelism partitions the model across devices, reducing communication volume and improving utilization.

Why this answer

Option C is correct because with low GPU utilization and high network bandwidth consumption, the bottleneck is likely communication overhead. Model parallelism splits the model across GPUs, reducing the need for frequent all-reduce of large gradients, thus improving GPU utilization. Option A is wrong because increasing instance count would increase communication overhead and likely not improve utilization.

Option B is wrong because data parallelism already uses GPUs; increasing batch size may cause memory overflow. Option D is wrong because enabling EFA improves network, but network is already near maximum; the bottleneck is not network speed but the frequency of communication.

Full explanation →

681

MCQeasy

Refer to the exhibit. The Glue job reads a CSV file and attempts to write to a Parquet table. What is the most likely cause of this error?

A.The 'price' column is missing from some rows

B.The schema inference incorrectly detected the column as String

C.The 'price' column contains non-numeric values in some rows

D.The CSV file is compressed and not properly decompressed

AnswerC

Non-numeric strings like 'N/A' or commas cause conversion errors.

Why this answer

Option C is correct because the error message indicates a 'NumberFormatException' when parsing the 'price' column, which occurs when Spark attempts to convert a string value to a numeric type. Since the Glue job's schema inference likely detected 'price' as a numeric column based on the majority of rows, any row containing a non-numeric value (e.g., 'N/A', 'null', or a currency symbol) will cause this parsing failure during the write to Parquet.

Exam trap

AWS often tests the distinction between schema inference behavior and runtime type conversion errors, where candidates mistakenly attribute the error to missing data or schema detection rather than the actual parsing failure caused by malformed values.

How to eliminate wrong answers

Option A is wrong because missing values in a column would result in a null value, not a NumberFormatException; Spark can handle nulls in numeric columns without throwing a parsing error. Option B is wrong because if the schema inference had incorrectly detected the column as String, the write to Parquet would succeed without any type conversion error; the error occurs only when Spark tries to parse a string as a number. Option D is wrong because compressed CSV files are automatically decompressed by Spark/Glue based on the file extension (e.g., .gz, .bz2), and a decompression issue would produce an IOException or a different error, not a NumberFormatException.

Full explanation →

682

MCQmedium

A team has deployed a real-time inference endpoint. They need to monitor the latency experienced by end users, including network overhead. Which CloudWatch metric should they use?

A.ModelLatency

B.OverheadLatency

C.Latency

D.Invocations

AnswerB

OverheadLatency captures the additional latency from infrastructure, network, and container startup time.

Why this answer

The OverheadLatency metric captures the total time from when the client sends a request to when it receives the response, including network round-trip time and any intermediate processing. This is the correct metric for monitoring end-user latency because it accounts for network overhead, unlike ModelLatency which only measures the time the model takes to generate a prediction inside the endpoint.

Exam trap

The trap here is that candidates confuse ModelLatency (model-only time) with total user-perceived latency, overlooking that OverheadLatency explicitly includes network overhead and is the correct metric for end-user monitoring.

How to eliminate wrong answers

Option A is wrong because ModelLatency measures only the time the inference model takes to process a request, excluding network overhead, so it does not reflect the full end-user experience. Option C is wrong because Latency is not a standard CloudWatch metric for SageMaker endpoints; the correct metric names are ModelLatency and OverheadLatency. Option D is wrong because Invocations counts the number of inference requests, not latency, and provides no timing information.

Full explanation →

683

MCQmedium

A data scientist is using Amazon SageMaker Processing to run a feature engineering job. The job requires installing additional Python libraries not included in the default SageMaker containers. Which approach should the data scientist use to include these libraries?

A.Add the libraries to the `requirements.txt` file in the same S3 bucket as the script

B.Create a custom Docker image with the libraries installed and specify it in the ProcessingInput

C.Use Amazon EFS to store the libraries and mount them to the processing container

D.Use the `pip install` command within the processing script at runtime

AnswerB

A custom image ensures dependencies are available without runtime installation.

Why this answer

Option B is correct because SageMaker Processing jobs run in isolated containers that cannot install packages at runtime via pip without internet access or custom images. Creating a custom Docker image with the required libraries pre-installed ensures the environment is consistent, reproducible, and avoids dependency resolution failures during job execution. This approach aligns with SageMaker's best practice for custom dependencies.

Exam trap

The trap here is that candidates assume SageMaker containers have internet access by default or that a `requirements.txt` in S3 is automatically processed, but in reality, SageMaker Processing jobs often run in isolated subnets without outbound internet, making pip install impossible without a pre-built custom image.

How to eliminate wrong answers

Option A is wrong because a `requirements.txt` file in S3 is not automatically processed by SageMaker Processing; the container does not read it unless explicitly handled in a custom entry point or lifecycle script, and even then, pip install requires network access or a pre-built wheel. Option C is wrong because Amazon EFS is a file system for shared storage, not for distributing Python libraries; mounting EFS to a processing container would require custom network configuration and does not integrate with Python's import system without additional setup. Option D is wrong because `pip install` inside the processing script at runtime will fail if the container lacks internet access (common in VPC-only modes) or if the required build tools are missing, and it violates the principle of immutable infrastructure.

Full explanation →

684

MCQhard

A data science team is using Amazon SageMaker Pipelines to orchestrate a multi-step workflow that includes data preprocessing, training, and model evaluation. They want to reuse the preprocessed data across multiple pipeline executions without re-running the preprocessing step if the source data hasn't changed. What should they configure?

A.Use SageMaker Training steps with checkpointing

B.Use SageMaker Processing steps with caching

C.Use SageMaker Feature Store to store the preprocessed features

D.Use SageMaker Data Wrangler for the preprocessing

AnswerB

Caching in SageMaker Pipelines reuses step outputs when inputs are identical, avoiding redundant computation.

Why this answer

Option B is correct because SageMaker Processing steps support caching, which allows the pipeline to skip re-execution of the preprocessing step if the input data and pipeline parameters have not changed. This is achieved by configuring a `CacheConfig` with a caching key based on the input data source and step parameters, ensuring that the preprocessed data is reused across multiple pipeline executions without redundant computation.

Exam trap

The trap here is that candidates may confuse checkpointing (for training resumption) with caching (for step reuse), or assume that Feature Store or Data Wrangler inherently provide caching, when in fact only Processing steps with explicit CacheConfig enable this behavior in SageMaker Pipelines.

How to eliminate wrong answers

Option A is wrong because SageMaker Training steps with checkpointing are designed to save intermediate model state during training (e.g., for resuming from failures), not to cache or reuse preprocessed data across pipeline executions. Option C is wrong because SageMaker Feature Store is a managed repository for storing, sharing, and managing features for ML models, but it does not automatically cache the output of a preprocessing step; it requires explicit feature ingestion and retrieval, which adds complexity and does not directly address skipping the preprocessing step based on unchanged source data. Option D is wrong because SageMaker Data Wrangler is a visual interface for data preparation and feature engineering, but it does not provide built-in caching for pipeline steps; it can be used within a Processing step, but the caching behavior is a property of the Processing step itself, not of Data Wrangler.

Full explanation →

685

MCQmedium

A data scientist is training a large model on SageMaker and wants to reduce training time by using multiple GPUs. The model is small enough to fit on a single GPU but training is slow. Which SageMaker feature should be used?

A.Data parallelism using SageMaker's Distributed Data Parallel

B.Use a larger instance with more vCPUs

C.Model parallelism using SageMaker's Model Parallel

D.Use Elastic Inference

AnswerA

Data parallelism distributes the training across multiple GPUs, reducing training time for models that fit on a single GPU.

Why this answer

SageMaker's Distributed Data Parallel (DDP) is the correct choice because it splits the mini-batch across multiple GPUs, allowing each GPU to hold a copy of the model and process a subset of the data simultaneously. This reduces training time for models that fit on a single GPU by leveraging data parallelism, where gradients are synchronized across GPUs after each step.

Exam trap

The trap here is that candidates confuse model parallelism (for large models) with data parallelism (for slow training of small models), or mistakenly think Elastic Inference can accelerate training when it is strictly for inference latency reduction.

How to eliminate wrong answers

Option B is wrong because using a larger instance with more vCPUs does not directly accelerate GPU-bound training; the bottleneck is GPU compute, not CPU cores. Option C is wrong because model parallelism is designed for models that are too large to fit on a single GPU, partitioning layers across devices, which adds communication overhead and is unnecessary when the model fits on one GPU. Option D is wrong because Elastic Inference attaches a separate accelerator for inference only, not for training, and cannot be used to speed up training loops.

Full explanation →

686

MCQhard

A data science team at a financial services company is deploying a real-time fraud detection model using Amazon SageMaker. The model is a gradient boosting classifier trained on historical transaction data. The model is deployed to a SageMaker endpoint with an ML.M5.LARGE instance for real-time inference. After deployment, the team observes that the endpoint's latency spikes to over 2 seconds during peak hours (10:00-12:00 and 14:00-16:00), causing timeouts for client applications. The average latency during off-peak hours is 200 ms. The team has enabled auto-scaling with a target average CPU utilization of 70%, but the endpoint still experiences high latency during peak hours. The instance count never scales beyond 2 instances during peaks. The model size is 500 MB, and each request includes 200 features. The team needs to reduce latency to under 500 ms at the 99th percentile during peak hours without increasing costs beyond the current budget. Which course of action should the team take?

A.Configure SageMaker batch transform for the real-time endpoint to process requests asynchronously.

B.Increase the auto-scaling maximum instance count to 10 and set target CPU utilization to 50%.

C.Switch the endpoint instance type to a GPU instance such as ml.g4dn.xlarge to accelerate inference.

D.Enable data compression on the endpoint to reduce payload size and network latency.

AnswerC

GPU instances can accelerate inference for gradient boosting models by parallelizing computations, reducing per-request latency significantly.

Why this answer

Option C is correct because GPU instances like ml.g4dn.xlarge are optimized for compute-intensive workloads such as gradient boosting inference, which involves numerous matrix operations. By offloading the computation to the GPU, the model can process each request faster, reducing latency from over 2 seconds to under 500 ms at the 99th percentile without increasing the instance count or budget. This directly addresses the root cause—CPU-bound inference during peak hours—while keeping costs stable.

Exam trap

The trap here is that candidates assume auto-scaling or instance count adjustments will solve latency issues, but the real bottleneck is per-instance compute capacity, which GPU acceleration directly addresses without increasing costs.

How to eliminate wrong answers

Option A is wrong because SageMaker batch transform is designed for offline, asynchronous processing of large datasets, not for real-time inference; it would introduce unacceptable delays and cannot meet the sub-500 ms latency requirement. Option B is wrong because increasing the maximum instance count to 10 and lowering CPU target to 50% would significantly increase costs (more instances running) and still not guarantee sub-500 ms latency if each instance is CPU-bound; the current scaling limit of 2 instances suggests the bottleneck is per-instance compute capacity, not scaling policy. Option D is wrong because data compression reduces payload size and network latency, but the primary latency spike is due to compute time (model inference), not network transfer; the 500 MB model and 200 features are already moderate, and compression would offer minimal improvement for the compute-bound bottleneck.

Full explanation →

687

MCQhard

A company uses SageMaker training jobs that need to access data in an S3 bucket in a different AWS account. The bucket uses a bucket policy that allows access only from a specific VPC. How should they configure the training job?

A.Use AWS DataSync to copy data to the training account's S3.

B.Create an IAM role in the source account and assume it from the training account.

C.Use an S3 VPC endpoint in the training job's VPC and attach a bucket policy that allows the VPC.

D.Use cross-account access with an IAM role and add a bucket policy allowing the training job's VPC.

AnswerD

This combines IAM role assumption and VPC condition to meet both requirements.

Why this answer

Option D is correct because the training job in Account A needs to access an S3 bucket in Account B that is restricted to a specific VPC. This requires both cross-account IAM role trust (so the training job can assume a role in Account B) and a bucket policy that explicitly allows access from the VPC where the training job runs. Without the VPC condition in the bucket policy, the S3 service would deny requests even if the IAM role is valid, because the bucket policy enforces the VPC restriction.

Exam trap

The trap here is that candidates often think a cross-account IAM role alone is sufficient, forgetting that the bucket policy's VPC restriction is a separate, mandatory condition that must be explicitly satisfied, and that the VPC endpoint alone does not grant cross-account permissions.

How to eliminate wrong answers

Option A is wrong because AWS DataSync is a data transfer service for large-scale migrations or syncs, not for real-time access during a SageMaker training job; it would add latency and complexity without solving the VPC-based access restriction. Option B is wrong because simply creating an IAM role in the source account and assuming it from the training account does not satisfy the bucket policy's VPC condition; the bucket policy explicitly requires requests to originate from the specified VPC, and the IAM role alone does not control the network origin. Option C is wrong because using an S3 VPC endpoint in the training job's VPC is necessary but insufficient on its own; the bucket policy must also explicitly allow the VPC (via the `aws:SourceVpc` condition), and cross-account access still requires an IAM role in the source account to grant permissions to the training account's principal.

Full explanation →

688

MCQeasy

A data scientist is using SageMaker built-in XGBoost algorithm for a binary classification task. Which objective metric is MOST appropriate for SageMaker Automatic Model Tuning to maximize?

A.validation:mae

B.validation:rmse

C.validation:ndcg

D.validation:auc

AnswerD

AUC is a common binary classification metric and is available in XGBoost.

Full explanation →

689

MCQmedium

A machine learning engineer is training a deep learning model on SageMaker and notices that the training loss decreases rapidly in the first few epochs but then plateaus. The validation loss starts increasing after 10 epochs. Which action should the engineer take to improve generalization?

A.Add more layers to the model

B.Use early stopping with validation loss monitoring

C.Increase the learning rate

D.Decrease the batch size

AnswerB

Early stopping halts training when validation loss stops decreasing, reducing overfitting.

Why this answer

Early stopping is the correct action because the validation loss increasing after 10 epochs while training loss continues to decrease is a classic sign of overfitting. By monitoring validation loss and halting training when it stops improving (e.g., using a patience parameter), the engineer prevents the model from memorizing noise in the training data, thereby improving generalization. SageMaker's built-in training job features or the `EarlyStopping` callback in frameworks like TensorFlow or PyTorch can implement this directly.

Exam trap

AWS often tests the distinction between underfitting and overfitting symptoms, and the trap here is that candidates mistake a plateauing training loss for a need to increase model complexity or learning rate, when the rising validation loss clearly signals overfitting that early stopping can mitigate.

How to eliminate wrong answers

Option A is wrong because adding more layers increases model capacity, which would exacerbate overfitting and likely cause validation loss to rise even sooner, not improve generalization. Option C is wrong because increasing the learning rate would make training more unstable, potentially causing the loss to diverge or oscillate, and would not address the overfitting indicated by the rising validation loss. Option D is wrong because decreasing the batch size introduces more noise into gradient estimates, which can sometimes help escape local minima but does not directly prevent overfitting; it may even slow convergence and does not target the core issue of validation loss increasing.

Full explanation →

690

MCQmedium

A company uses an Amazon SageMaker endpoint with auto-scaling. They notice that during traffic bursts, new instances take several minutes to become healthy, causing 503 errors. What is the BEST way to reduce the time to serve requests during scaling events?

A.Set up a scheduled scaling policy to pre-warm instances before known traffic bursts.

B.Decrease the cooldown period for the scaling policy to add instances faster.

C.Use a larger instance type so that fewer instances are needed, and the scaling threshold is triggered less often.

D.Increase the maximum number of instances to allow more capacity.

AnswerC

Larger instances can serve more traffic, reducing scaling events.

Why this answer

Option D is correct because using a larger instance type with more compute resources can handle more requests per instance, reducing the need to scale as aggressively. Option A is wrong because proactive scaling with a schedule can help but doesn't reduce the time to become healthy. Option B is wrong because decreasing cooldown period could cause thrashing.

Option C is wrong because increasing maximum instances doesn't speed up each instance's startup.

Full explanation →

691

MCQhard

A company operates an IoT platform that ingests sensor data from thousands of devices. Data is streamed via Amazon Kinesis Data Streams and stored in an S3 bucket using a Kinesis Firehose delivery stream, which writes data in 5-minute windows. The data is then used to train a machine learning model for anomaly detection. Recently, the data science team noticed that the training dataset is always missing the last 5 minutes of events from the end of each day. The S3 objects show that the last delivery stream buffer window is incomplete. The data engineer checked the Kinesis Firehose metrics and found no delivery errors or data loss, but the 'IncomingBytes' and 'IncomingRecords' metrics show consistent data for all periods. The S3 bucket has Lifecycle policies that do not delete objects. The team suspects the issue is related to the data preparation pipeline. Which course of action would correctly resolve the missing data problem?

A.Increase the buffer size to 10 MB and reduce the buffer interval to 60 seconds in the Firehose delivery stream configuration

B.Reprocess the Kinesis stream data from the beginning using a custom application

C.Modify the data preparation pipeline to use AWS Lambda to write data to S3 directly from Kinesis

D.Increase the buffer interval to 600 seconds to allow more time for data to accumulate

AnswerA

Reducing the buffer interval to 60 seconds ensures that data is flushed every minute, preventing incomplete windows from being missed at the end of the day.

Why this answer

Option A is correct because the issue is that the last 5-minute buffer window at the end of each day never completes, so Firehose never delivers that final object to S3. By reducing the buffer interval to 60 seconds and increasing the buffer size to 10 MB, Firehose will flush data more frequently, ensuring that even small residual data at the end of the day is delivered before the stream stops. This directly addresses the incomplete last window without requiring reprocessing or changing the pipeline architecture.

Exam trap

The trap here is that candidates assume the missing data is due to data loss or pipeline errors, but the real issue is that Firehose's buffer window never completes when data stops arriving, so no S3 object is created for that final period.

How to eliminate wrong answers

Option B is wrong because reprocessing the entire Kinesis stream from the beginning is unnecessary and inefficient; the data is not lost, it is simply never delivered due to the buffer window not closing. Option C is wrong because switching to a Lambda-based direct write from Kinesis to S3 would bypass Firehose entirely, adding complexity and potential for data loss or duplication, and does not fix the root cause of the incomplete buffer window. Option D is wrong because increasing the buffer interval to 600 seconds would make the problem worse, as it would extend the time needed for a buffer window to complete, increasing the likelihood of incomplete windows at day boundaries.

Full explanation →

692

MCQeasy

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

A.SageMaker Debugger

B.AWS Inferentia

C.SageMaker Neo

D.Amazon Elastic Inference

AnswerC

Neo optimizes models for target hardware, including ARM CPUs, using its compiler.

Why this answer

SageMaker Neo compiles trained models for specific hardware targets, including ARM CPUs, to optimize inference performance.

Full explanation →

693

MCQmedium

A data scientist is working with a dataset that contains missing values in several numerical columns. The missing data is not completely at random (MNAR). The scientist wants to minimize bias in the imputed values. Which imputation strategy is most appropriate?

A.Delete all rows with missing values

B.Use a model-based imputation like iterative imputer or KNN imputer

C.Replace missing values with the median of each column

D.Replace missing values with the mean of each column

AnswerB

Model-based imputation uses correlations between features to estimate missing values, reducing bias in MNAR settings.

Why this answer

Model-based imputation methods, such as using a regressor to predict missing values based on other features, can capture complex relationships and reduce bias compared to simple mean/median imputation, especially when data is not MCAR.

Full explanation →

694

Multi-Selecteasy

A company wants to monitor its Amazon SageMaker real-time endpoint for data quality issues. Which TWO actions should the company take?

Select 2 answers

A.Create a baseline from the training data to compare against live data.

B.Use SageMaker Debugger to analyze training jobs.

C.Set up an AWS Lambda function to preprocess incoming requests.

D.Configure Amazon S3 bucket notifications for model artifacts.

E.Enable data capture on the SageMaker endpoint.

AnswersA, E

A baseline provides the expected statistics and constraints for the data.

Why this answer

Option A is correct because creating a baseline from the training data establishes a statistical profile (e.g., mean, standard deviation, distribution) of expected input features. SageMaker Model Monitor then compares live endpoint data against this baseline to detect data quality drift, such as missing values or feature distribution shifts.

Exam trap

The trap here is that candidates confuse SageMaker Debugger (training-time debugging) with SageMaker Model Monitor (post-deployment data quality), leading them to select Option B instead of recognizing that data capture and baseline creation are the two required actions for endpoint monitoring.

Full explanation →

695

MCQmedium

A company uses SageMaker JumpStart to deploy a foundation model for a summarization task. They want to minimize costs while still meeting a latency requirement of under 2 seconds. Which option should they consider?

A.Use SageMaker Inference Recommender to select the cheapest instance that meets latency

B.Deploy the model on a serverless endpoint

C.Enable auto-scaling to handle variable traffic

D.Use the largest GPU instance to ensure fast inference

AnswerA

Inference Recommender benchmarks the model on different instances to find the optimal balance of cost and latency.

Why this answer

SageMaker Inference Recommender runs load tests against your model on various instance types and provides latency and cost metrics. By selecting the cheapest instance that still meets the sub-2-second latency requirement, you directly minimize cost while satisfying the performance constraint. This is the most systematic and cost-effective approach for this scenario.

Exam trap

Cisco often tests the misconception that serverless endpoints are always the cheapest option, but for latency-sensitive workloads with large models, the cold-start overhead and lack of guaranteed compute resources make them unsuitable, and Inference Recommender is the correct tool for cost-latency trade-off analysis.

How to eliminate wrong answers

Option B is wrong because serverless endpoints have a cold-start latency that can exceed 2 seconds, especially for large foundation models, and they do not guarantee consistent sub-2-second inference under variable traffic. Option C is wrong because auto-scaling handles variable traffic but does not reduce per-invocation cost or latency; it only adjusts capacity, and the chosen instance type still determines base latency and cost. Option D is wrong because using the largest GPU instance is unnecessarily expensive and may provide excess compute capacity that is not needed to meet a 2-second latency requirement, violating the cost-minimization goal.

Full explanation →

696

MCQhard

A team is fine-tuning a foundation model using LoRA for a text summarization task. They want to reduce memory footprint during training. Which technique should they combine with LoRA?

A.Data parallelism

B.Gradient checkpointing

C.Mixed precision

D.QLoRA

AnswerD

Full explanation →

697

MCQmedium

A financial services company ingests transaction data from multiple sources into an S3 data lake. They want to use AWS Glue to catalog this data and make it queryable by Amazon Athena. The data schema changes frequently as new sources are added. Which AWS Glue feature should they enable to automatically detect and update the schema?

A.AWS Glue DataBrew

B.AWS Glue crawlers with schema update policy set to 'UPDATE'

C.AWS Glue ETL job scheduled to run daily

D.Manual schema definition in the AWS Glue Data Catalog

AnswerB

Crawlers automatically detect new partitions and schema changes, updating the Data Catalog accordingly.

Why this answer

AWS Glue crawlers can automatically scan data in S3, infer schemas, and update the Data Catalog. Schema evolution is supported natively when crawlers are configured to update table definitions. Manual schema definition would not handle frequent changes.

Full explanation →

698

Multi-Selecthard

An ML team uses SageMaker to deploy a model for real-time inference. They want to monitor and improve cost efficiency. Which THREE actions should they take? (Select THREE.)

Select 3 answers

A.Use SageMaker Inference Recommender to find the optimal instance type and count

B.Enable auto-scaling to adjust the number of instances based on demand

C.Create a CloudWatch dashboard to monitor endpoint latency

D.Use SageMaker Managed Spot Training for endpoint instances

E.Purchase SageMaker Savings Plans for a discounted rate

AnswersA, B, E

Inference Recommender provides recommendations to avoid over-provisioning.

Why this answer

SageMaker Inference Recommender runs load tests against your model to generate instance type and count recommendations that balance performance and cost. By selecting the optimal configuration, you avoid over-provisioned instances that waste money or under-provisioned ones that degrade user experience, directly improving cost efficiency.

Exam trap

The trap here is that candidates confuse monitoring (Option C) with cost optimization, or they mistakenly apply Spot Training (Option D) to inference endpoints, not realizing that Spot instances are only supported for training and not for real-time inference due to interruption risk.

Full explanation →

699

MCQeasy

A data science team deploys a machine learning model to a SageMaker endpoint for real-time inference. They need to monitor the model for feature distribution drift over time to ensure the model's predictions remain accurate. Which AWS service should they use?

A.Amazon CloudWatch Evidently

B.AWS Glue DataBrew

C.SageMaker Clarify

D.SageMaker Model Monitor

E.SageMaker Debugger

AnswerD

Correct. SageMaker Model Monitor monitors data and model quality, including drift detection.

Why this answer

SageMaker Model Monitor is the correct service because it is specifically designed to continuously monitor machine learning models deployed to SageMaker endpoints for data quality issues, including feature distribution drift. It automatically captures inference data, computes statistics against a baseline, and triggers alerts when drift is detected, ensuring the model's predictions remain accurate over time.

Exam trap

The trap here is confusing SageMaker Model Monitor with SageMaker Clarify or Debugger, as candidates often misattribute drift monitoring to Clarify's bias detection or Debugger's training-time analysis, but only Model Monitor handles post-deployment feature drift.

How to eliminate wrong answers

Option A is wrong because Amazon CloudWatch Evidently is a feature flag and A/B testing service, not designed for monitoring feature distribution drift in ML models. Option B is wrong because AWS Glue DataBrew is a visual data preparation tool for cleaning and normalizing data, not for monitoring model drift. Option C is wrong because SageMaker Clarify is used for bias detection and explainability of model predictions, not for continuous drift monitoring.

Option E is wrong because SageMaker Debugger is used for debugging training jobs by monitoring system and model metrics during training, not for monitoring inference data drift post-deployment.

Full explanation →

700

MCQeasy

A machine learning engineer needs to split a dataset for binary classification where the positive class represents only 2% of the data. Which data splitting strategy ensures that both training and test sets maintain the same class proportion as the original dataset?

A.Stratified sampling based on the target variable

B.Time-series split respecting the timestamp order

C.Simple random split with a 80/20 ratio

D.K-fold cross-validation without stratification

AnswerA

Stratified sampling ensures each fold retains the same class proportion as the full dataset.

Why this answer

Stratified splitting preserves the original class distribution in each split, which is critical for imbalanced datasets.

Full explanation →

701

Multi-Selectmedium

A data scientist is training a binary classification model using Amazon SageMaker. The dataset is highly imbalanced (95% negative class, 5% positive class). The model is evaluated on a held-out test set, and the F1 score is 0.12. The data scientist wants to improve the F1 score. Which two actions should the data scientist take? (Choose two.)

Select 2 answers

A.Reduce the model complexity by decreasing the number of layers in a deep neural network.

B.Apply SMOTE (Synthetic Minority Oversampling Technique) to the training data using a preprocessing script in SageMaker Processing.

C.Increase the decision threshold to reduce false positives.

D.Use recall as the primary evaluation metric instead of F1.

E.Set the `scale_pos_weight` parameter in the SageMaker XGBoost estimator to the ratio of negative to positive samples.

AnswersB, E

Correct: SMOTE generates synthetic samples of the minority class, balancing the dataset and improving F1.

Why this answer

Option B is correct because SMOTE generates synthetic samples for the minority class by interpolating between existing minority instances, which directly addresses the class imbalance by creating a more balanced training set. This increases the model's exposure to positive examples, improving recall and precision, and thus the F1 score. Using SageMaker Processing allows this preprocessing step to be integrated into the ML pipeline efficiently.

Exam trap

The trap here is that candidates often confuse threshold tuning with addressing imbalance directly, not realizing that adjusting the threshold without rebalancing the data or weighting classes typically fails to improve F1 score because it does not change the underlying model's learned distribution.

Full explanation →

702

MCQeasy

A company is using SageMaker Pipelines to automate a multi-step ML workflow. The pipeline includes data preprocessing, training, and model evaluation. The team wants to ensure that if the evaluation step fails, the pipeline stops and sends an alert to the operations team. Which SageMaker Pipelines feature should they use?

A.Configure an Amazon CloudWatch Events rule to monitor the pipeline execution status and stop it if the evaluation step fails

B.Register the model in the Model Registry only if evaluation passes, and configure the pipeline to stop if registration fails

C.Add a Lambda step after the evaluation step that checks the evaluation metrics and sends an SNS notification if the metrics are below a threshold

D.Use a Condition step to check the evaluation result and route to a Fail step if the result indicates failure

AnswerD

Condition step allows branching; a Fail step terminates the pipeline and can trigger notifications via SNS.

Why this answer

Option D is correct because SageMaker Pipelines provides a built-in Condition step that evaluates a boolean expression (e.g., checking if evaluation metrics meet a threshold) and then routes execution to different steps. If the condition fails, you can direct the pipeline to a Fail step, which immediately stops the pipeline and marks it as failed. This is the native, event-driven way to halt a pipeline based on step output without relying on external services.

Exam trap

The trap here is that candidates often confuse external monitoring (CloudWatch) or post-step actions (Lambda) with native pipeline control flow, missing that SageMaker Pipelines has a dedicated Condition step for conditional branching and halting execution.

How to eliminate wrong answers

Option A is wrong because CloudWatch Events rules can monitor pipeline state changes but cannot stop a running pipeline; they can only trigger notifications or invoke other actions after the fact. Option B is wrong because registering a model in the Model Registry is an optional downstream step, not a mechanism to stop the pipeline; if registration fails, the pipeline would still continue to subsequent steps unless explicitly handled. Option C is wrong because a Lambda step can send SNS notifications but does not have the ability to halt the pipeline execution; it would only alert after the step completes, not prevent further steps from running.

Full explanation →

703

Multi-Selecteasy

A data science team uses SageMaker to train models. They need to track the lineage of each model, including the dataset used, training job, and hyperparameters. Which TWO SageMaker features can they use together? (Select TWO.)

Select 2 answers

A.SageMaker SDK

B.SageMaker Model Registry

C.SageMaker ML Lineage Tracking

D.SageMaker Pipelines

E.SageMaker Experiments

AnswersA, C

The SDK automatically creates lineage entities when used with SageMaker training jobs.

Why this answer

SageMaker ML Lineage Tracking captures the relationships between artifacts, actions, and contexts. SageMaker SDK automates lineage tracking when using the SDK for training jobs. SageMaker Experiments can also be used to track runs, but lineage tracking is specifically for provenance.

Full explanation →

704

Multi-Selecthard

Which THREE steps should be taken to optimize a large-scale distributed training job on SageMaker? (Choose 3.)

Select 3 answers

A.Attach multiple EBS volumes with throughput provisioning.

B.Use GPU instances with high bandwidth and memory (e.g., ml.p4d.24xlarge).

C.Enable batch transform for offline inference after training.

D.Use Elastic Fabric Adapter (EFA) for low-latency inter-node communication.

E.Select the appropriate distributed training strategy (e.g., Horovod, SageMaker data parallel, or model parallel).

AnswersB, D, E

GPU instances are necessary for large model training.

Why this answer

Option B is correct because GPU instances like ml.p4d.24xlarge provide high-bandwidth GPU memory and NVLink inter-GPU connectivity, which are essential for large-scale distributed training. These instances reduce communication bottlenecks and allow larger batch sizes, directly improving throughput and model convergence speed.

Exam trap

The trap here is that candidates confuse storage optimization (EBS) or inference features (batch transform) with training optimization, failing to recognize that distributed training performance hinges on compute, memory, and inter-node communication, not disk I/O or post-training steps.

Full explanation →

705

MCQhard

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

A.Enable pipeline caching by setting the CacheConfig property for each step

B.Configure the pipeline to run on a schedule instead of on-demand

C.Use the Parameter step to pass previous execution ID

D.Use Lambda step to check data changes before running

AnswerA

Caching causes the pipeline to skip steps if inputs and configuration haven't changed, saving time and cost.

Why this answer

Option A is correct because SageMaker Pipelines supports step caching via the `CacheConfig` property. When enabled, the pipeline checks if the step's inputs (including source data, parameters, and code) have changed since the last successful run. If no changes are detected, the step is skipped and the previous output is reused, eliminating redundant compute.

Exam trap

The trap here is that candidates may think caching requires external logic (like a Lambda step) or scheduling, when SageMaker Pipelines has a native `CacheConfig` property that directly addresses redundant runs with minimal configuration.

How to eliminate wrong answers

Option B is wrong because scheduling the pipeline does not prevent redundant runs; it only triggers execution at fixed intervals, which could still re-run all steps even when data hasn't changed. Option C is wrong because passing a previous execution ID via a Parameter step does not enable caching; it merely provides a reference but does not automatically skip unchanged steps. Option D is wrong because using a Lambda step to check data changes before running adds custom logic but is not a built-in mechanism for step-level caching; SageMaker Pipelines already provides `CacheConfig` for this purpose, making a Lambda workaround unnecessary and less efficient.

Full explanation →

706

MCQmedium

A company is building a time series forecasting model using SageMaker DeepAR. The raw data is a CSV with columns: timestamp, item_id, and value. What is the correct data format required for DeepAR training?

A.JSON Lines files with 'start', 'target', and optional fields per time series

B.A wide-format CSV where each column is a different time series

C.Parquet files with a schema containing timestamp, item_id, and value

D.A single CSV file with columns: timestamp, item_id, value

AnswerA

DeepAR's training data format is JSON Lines with start timestamp and target array.

Why this answer

DeepAR requires time series data to be provided in JSON Lines format, where each line represents a single time series with a 'start' timestamp (in ISO 8601 format), a 'target' array of values, and optional fields like 'cat' for categorical features. This structured format allows DeepAR to handle variable-length sequences and missing values natively, which is not possible with simple CSV or wide-format data.

Exam trap

The trap here is that candidates assume DeepAR can accept raw CSV data like other SageMaker built-in algorithms (e.g., XGBoost), but DeepAR is a specialized time series algorithm that requires a specific JSON Lines structure with 'start' and 'target' fields, not a simple tabular format.

How to eliminate wrong answers

Option B is wrong because wide-format CSV (each column as a separate time series) is not supported by DeepAR; it expects each time series to be a separate JSON object, not columns. Option C is wrong because Parquet files are not a native input format for DeepAR; the built-in algorithm specifically requires JSON Lines or RecordIO-protobuf format. Option D is wrong because a single CSV with timestamp, item_id, and value columns does not provide the 'start' and 'target' structure DeepAR needs; it would require significant preprocessing to group by item_id and convert to the required JSON Lines format.

Full explanation →

707

MCQeasy

A data scientist is preparing a dataset for a machine learning model that predicts customer churn. The dataset contains a column 'CustomerID' that is a unique identifier. What should the data scientist do with this column before training the model?

A.Keep the column as a feature because it uniquely identifies each customer.

B.Use the column as the target variable.

C.Remove the column from the feature set.

D.Encode the column using one-hot encoding.

AnswerC

Removing unique identifiers prevents overfitting and is standard practice.

Why this answer

Option C is correct because 'CustomerID' is a unique identifier with no predictive power for churn. Including it as a feature would cause the model to memorize individual customers rather than learn generalizable patterns, leading to overfitting and poor performance on unseen data. In machine learning, such columns should be removed during data preparation to ensure the model learns from meaningful features.

Exam trap

The trap here is that candidates may think unique identifiers are useful for tracking or that they can be encoded as categorical features, but the exam tests the principle that identifiers with no predictive relationship to the target must be removed to avoid overfitting and data leakage.

How to eliminate wrong answers

Option A is wrong because keeping 'CustomerID' as a feature introduces a high-cardinality categorical variable with no correlation to the target, which can cause overfitting and degrade model generalization. Option B is wrong because the target variable for churn prediction should be a binary or categorical label indicating churn status, not a unique identifier that has no relationship to the outcome. Option D is wrong because one-hot encoding a unique identifier like 'CustomerID' would create thousands of sparse binary columns, dramatically increasing dimensionality without adding any predictive value, and is computationally wasteful.

Full explanation →

708

MCQmedium

Refer to the exhibit. A SageMaker Processing job fails with the following error log. Which change during data preparation would resolve the issue?

A.In SageMaker Data Wrangler, set the 'age' column type to 'number'

B.Drop rows with missing values in the 'age' column before training

C.Remove the 'age' column from the dataset entirely

D.Modify the preprocessing script to cast 'age' to float using astype(float)

AnswerD

Casting the column ensures numeric operations work.

Why this answer

Option D is correct because the error log indicates a type mismatch when processing the 'age' column, likely due to mixed data types (e.g., strings and numbers) in a column expected to be numeric. By explicitly casting the column to float using astype(float) in the preprocessing script, you ensure consistent numeric type handling, which resolves the failure during SageMaker Processing job execution.

Exam trap

The trap here is that candidates often assume missing value handling (Option B) or column removal (Option C) is the fix, when the actual issue is a data type inconsistency that requires explicit type casting in the preprocessing code.

How to eliminate wrong answers

Option A is wrong because setting the 'age' column type to 'number' in SageMaker Data Wrangler only affects the visual interface and exported recipe, but does not enforce type casting in the actual processing script, so the underlying data type mismatch persists. Option B is wrong because dropping rows with missing values does not address the core issue of mixed data types (e.g., strings like 'N/A' or 'unknown') in the 'age' column; the error is about type conversion, not missing values. Option C is wrong because removing the 'age' column entirely discards potentially valuable feature data and does not solve the type mismatch problem; it is an overly aggressive workaround that reduces model performance.

Full explanation →

709

MCQmedium

A team uses SageMaker Pipelines to automate retraining. They want to skip the training step if the data has not changed since the last run. Which feature should they enable?

A.Parameterized executions

B.Lineage tracking

C.Step caching

D.Condition step with a custom check

AnswerC

Why this answer

Step caching in SageMaker Pipelines allows you to reuse the output from a previous execution of a step if its input data and configuration parameters have not changed. By enabling caching on the training step, the pipeline automatically skips re-executing that step when the data is identical, saving time and cost. This directly addresses the requirement to skip retraining when data has not changed.

Exam trap

Cisco often tests the distinction between step caching (automatic, built-in) and a Condition step (manual, custom logic), leading candidates to overthink and choose the more complex option D when the simpler caching feature is the correct answer.

How to eliminate wrong answers

Option A is wrong because parameterized executions allow you to pass different parameters into a pipeline run, but they do not automatically skip steps based on data changes; they simply enable dynamic input values. Option B is wrong because lineage tracking records the relationships between artifacts and steps for governance and reproducibility, but it does not provide any mechanism to skip step execution. Option D is wrong because while a Condition step can branch pipeline execution based on a custom check, it requires you to implement the logic to compare data versions manually, whereas step caching provides built-in, automatic detection of unchanged inputs.

Full explanation →

710

Multi-Selecthard

A data scientist is using SageMaker Experiments to track multiple training runs. They want to compare runs based on the objective metric and visualize performance. Which THREE steps should they perform? (Choose THREE.)

Select 3 answers

A.Deploy the best model to an endpoint

B.Use SageMaker Studio Experiments UI to list and compare trials

C.Log hyperparameters and metrics using the SageMaker SDK

D.Create a SageMaker Experiment

E.Enable SageMaker Model Monitor for each run

AnswersB, C, D

The UI provides visualization and comparison.

Why this answer

To track and compare runs, you create an experiment, log parameters and metrics, and then use the Experiments UI or SDK to list and compare trials.

Full explanation →

711

MCQeasy

A company wants to deploy a machine learning model that was trained on-premises using TensorFlow. The model is a TensorFlow SavedModel. The company uses AWS and wants to minimize operational overhead. Which deployment option meets these requirements?

A.Deploy the model on Amazon ECS using a custom Docker image.

B.Deploy the model as an AWS Lambda function with the TensorFlow runtime.

C.Deploy the model using Amazon SageMaker Studio.

D.Deploy the model using Amazon SageMaker with a TensorFlow inference container.

AnswerD

SageMaker provides pre-built TensorFlow containers and manages the endpoint, reducing operational overhead.

Why this answer

Amazon SageMaker provides a fully managed TensorFlow inference container that directly supports TensorFlow SavedModel format, enabling deployment without any custom infrastructure management. This minimizes operational overhead compared to self-managed options like ECS or Lambda, as SageMaker handles scaling, load balancing, and model updates automatically.

Exam trap

AWS often tests the distinction between SageMaker Studio (an IDE) and SageMaker hosting (deployment endpoints), leading candidates to mistakenly select Studio as a deployment option when it is only for development and experimentation.

How to eliminate wrong answers

Option A is wrong because deploying on Amazon ECS with a custom Docker image requires you to build, maintain, and scale the container infrastructure yourself, increasing operational overhead. Option B is wrong because AWS Lambda has a maximum deployment package size limit (250 MB unzipped) and a 15-minute timeout, making it unsuitable for large TensorFlow SavedModels or inference requests that require significant compute. Option C is wrong because Amazon SageMaker Studio is an integrated development environment (IDE) for building, training, and debugging models, not a deployment target; the actual deployment would still require creating an endpoint, which is covered by Option D.

Full explanation →

712

Multi-Selecthard

An ML engineer needs to deploy a model that requires GPU acceleration but wants to reduce inference cost by optimizing the model. They are considering SageMaker Neo compilation and Amazon Elastic Inference. Which TWO statements are correct about these services? (Choose two.)

Select 2 answers

A.Amazon Elastic Inference attaches a dedicated GPU accelerator to a CPU instance, reducing cost compared to a full GPU instance

B.SageMaker Neo and Amazon Elastic Inference cannot be used together

C.SageMaker Neo provides a GPU acceleration service similar to Elastic Inference

D.SageMaker Neo optimizes the model by compiling it for the target hardware, reducing inference latency

E.Amazon Elastic Inference compiles the model to run on GPU hardware

AnswersA, D

Elastic Inference provides GPU acceleration at lower cost.

Why this answer

Option A is correct because Amazon Elastic Inference allows you to attach a fraction of a GPU accelerator to an Amazon EC2 CPU instance, providing GPU acceleration at a lower cost than using a full GPU instance. This reduces inference cost by only paying for the GPU compute you need, without the overhead of a dedicated GPU instance.

Exam trap

The trap here is confusing model compilation (SageMaker Neo) with hardware acceleration (Elastic Inference), leading candidates to think they are mutually exclusive or that Elastic Inference performs compilation.

Full explanation →

713

Multi-Selecthard

A company is deploying a machine learning model using Amazon SageMaker. The model is a large deep learning model that requires GPU for inference. The company expects unpredictable traffic patterns with occasional bursts. They want to minimize cost while ensuring low latency during bursts. Which TWO actions should they take? (Select TWO.)

Select 2 answers

A.Use a serverless endpoint configuration to automatically scale.

B.Use a multi-model endpoint with a mix of CPU and GPU instances to handle variable traffic.

C.Use Spot instances for the endpoint to reduce cost.

D.Provision multiple on-demand GPU instances behind a load balancer.

E.Use Amazon SageMaker Elastic Inference to attach GPU acceleration to a CPU instance.

AnswersB, E

Multi-model endpoints allow efficient resource utilization and cost savings.

Why this answer

Option B is correct because a multi-model endpoint with a mix of CPU and GPU instances allows the company to host multiple models on the same endpoint, reducing cost by sharing underlying instances. By including GPU instances, the endpoint can handle the GPU-intensive deep learning inference for the large model, while the CPU instances can serve lighter loads or fallback traffic, ensuring low latency during unpredictable bursts without over-provisioning.

Exam trap

The trap here is that candidates often confuse serverless endpoints with GPU support, not realizing that SageMaker serverless endpoints are CPU-only, and they may overlook that multi-model endpoints can mix instance types to balance cost and performance for bursty GPU workloads.

Full explanation →

714

Multi-Selectmedium

A company has deployed a SageMaker endpoint for real-time inference. The security team needs to monitor for potential security threats such as unauthorized access attempts and tampering with the model configuration. Which TWO actions should the team take? (Choose TWO.)

Select 2 answers

A.Enable AWS CloudTrail for the SageMaker endpoint API calls

B.Enable AWS Config to monitor endpoint configuration changes

C.Enable SageMaker Data Capture on the endpoint

D.Enable SageMaker Model Monitor for the endpoint

E.Enable Amazon GuardDuty for the endpoint

AnswersA, B

CloudTrail logs all API calls, providing an audit trail for security analysis.

Why this answer

Option A is correct because AWS CloudTrail records all API calls made to SageMaker endpoints, including calls like InvokeEndpoint, CreateEndpoint, and UpdateEndpoint. By enabling CloudTrail, the security team can audit who made requests, from which IP address, and what actions were performed, which is essential for detecting unauthorized access attempts and monitoring API-level security threats.

Exam trap

The trap here is that candidates confuse data monitoring services (Data Capture, Model Monitor) with security monitoring services (CloudTrail, Config), leading them to select options that track model behavior rather than API calls or configuration changes.

Full explanation →

715

MCQhard

A machine learning engineer deploys a new model version to a SageMaker endpoint with production variants. They want to gradually shift traffic from the old model to the new model, monitoring for errors, and automatically roll back if the error rate exceeds 5%. Which deployment pattern should they use?

A.Canary deployment with CloudWatch alarms

B.A/B testing with traffic splitting

C.Blue/green deployment

D.Shadow testing

AnswerA

Why this answer

Canary deployments gradually shift traffic and allow automated rollback based on CloudWatch alarms. Blue/green switches all at once. A/B testing is for comparing variants.

Shadow testing mirrors traffic but doesn't serve the new model to users.

Full explanation →

716

MCQeasy

A data scientist wants to quickly build a binary classification model without writing any code. Which SageMaker feature is MOST suitable?

A.SageMaker Debugger

B.SageMaker Model Monitor

C.SageMaker Ground Truth

D.SageMaker Autopilot

AnswerD

Full explanation →

717

MCQmedium

A data scientist needs to ingest streaming clickstream data from a website into an S3 data lake for ML training. The data must be processed in near real-time and partitioned by hour. Which AWS service combination should be used?

A.Amazon Kinesis Data Firehose with S3 as destination and dynamic partitioning enabled

B.Amazon S3 Transfer Acceleration with direct uploads from the website

C.Amazon Kinesis Data Streams with a custom consumer writing to S3

D.AWS Glue ETL job reading from Kinesis Data Streams and writing to S3

AnswerA

Firehose delivers streaming data to S3 with automatic partitioning based on time, meeting all requirements.

Why this answer

Amazon Kinesis Data Firehose can directly stream data to S3 with automatic partitioning by time interval, which meets the near real-time and hourly partitioning requirements. Option A is the correct combination. Option B uses Kinesis Data Streams which requires a consumer to write to S3, adding complexity.

Option C does not partition by hour automatically. Option D is not designed for streaming ingestion.

Full explanation →

718

MCQhard

A team uses SageMaker Pipelines to retrain a model nightly. They want to skip the training step if the new data is unchanged (same checksum as previous run) to save cost and time. Which pipeline configuration achieves this?

A.Enable pipeline caching on the training step

B.Use a Lambda step to check data before running the training step

C.Use a ConditionStep that compares the current data checksum to the previous run's checksum, and branch to a NoOp step if unchanged

D.Set the training step's CacheConfig with a TTL of 24 hours

AnswerC

This allows skipping the training step dynamically based on data content changes.

Why this answer

Option C is correct because SageMaker Pipelines' ConditionStep allows you to evaluate a condition—such as comparing the current data checksum to a stored previous checksum—and branch accordingly. If the checksums match, you can route to a NoOp step (which does nothing) instead of executing the training step, thereby skipping the training and saving cost and time. This is the native, recommended pattern for conditional execution in SageMaker Pipelines.

Exam trap

The trap here is that candidates confuse pipeline caching (which caches based on step input parameters) with conditional branching based on external data state, leading them to pick Option A or D, which do not actually evaluate data checksums.

How to eliminate wrong answers

Option A is wrong because pipeline caching in SageMaker reuses a step's output only if the step's input parameters and source code are unchanged; it does not evaluate external data checksums, so it would not detect unchanged new data. Option B is wrong because a Lambda step can check the data, but it cannot directly skip the training step; you would still need a ConditionStep to branch based on the Lambda's result, making the Lambda step redundant and adding unnecessary complexity. Option D is wrong because CacheConfig with a TTL of 24 hours caches the step's output for that duration regardless of data changes, which would incorrectly skip training even if the data had changed within the TTL window, and it does not compare checksums.

Full explanation →

719

Multi-Selecthard

A team wants to ensure that their SageMaker training jobs cannot access the internet for security reasons. However, they need to download a public PyTorch package for training. Which TWO steps should they take? (Choose TWO.)

Select 2 answers

A.Configure the training job to run in VPC-only mode

B.Use a public subnet for the training job

C.Disable network isolation for the training job

D.Attach a NAT Gateway to the VPC to allow outbound internet

E.Create an S3 VPC interface endpoint to access S3 privately

AnswersA, E

VPC-only blocks internet access.

Why this answer

Option A is correct because enabling VPC-only mode (also known as network isolation) for a SageMaker training job ensures the job runs within a specified VPC and cannot access the internet. This satisfies the security requirement of blocking internet access. Option E is correct because creating an S3 VPC interface endpoint allows the training job to download the public PyTorch package from S3 privately, using AWS PrivateLink, without traversing the internet.

Exam trap

The trap here is that candidates often confuse 'no internet access' with 'no network access at all,' and incorrectly assume that disabling network isolation or using a NAT Gateway is necessary for downloading packages, when in fact private connectivity via VPC endpoints is the correct approach.

Full explanation →

720

MCQhard

A financial services company must deploy a SageMaker endpoint that processes sensitive customer data. They require that all traffic between the endpoint and the model containers be encrypted, and that the endpoint cannot be accessed from outside a specific VPC. Which combination of settings should they use?

A.Use a private VPC and enable data encryption at rest using KMS

B.Enable inter-container traffic encryption and configure the endpoint with VPC-only mode

C.Enable network isolation mode and inter-container traffic encryption

D.Deploy the endpoint in a private subnet and use a VPC endpoint for SageMaker API

AnswerB

VPC-only mode makes the endpoint only accessible from the VPC, and inter-container traffic encryption encrypts data between containers.

Why this answer

Option B is correct because inter-container traffic encryption ensures that data between the SageMaker endpoint and the model containers is encrypted in transit, typically using TLS. Configuring the endpoint with VPC-only mode restricts all inference traffic to the specified VPC, preventing any access from outside that VPC. This combination directly addresses the requirements for encrypted inter-container traffic and VPC-restricted access.

Exam trap

The trap here is confusing network isolation mode with inter-container traffic encryption and VPC-only mode, as candidates often assume network isolation alone secures all traffic and access, but it does not encrypt inter-container communication or restrict inbound endpoint access to a VPC.

How to eliminate wrong answers

Option A is wrong because enabling data encryption at rest using KMS only protects stored data, not traffic between the endpoint and model containers, and using a private VPC alone does not enforce VPC-only mode for endpoint access. Option C is wrong because network isolation mode prevents the model container from accessing the internet but does not encrypt inter-container traffic nor restrict endpoint access to a specific VPC. Option D is wrong because deploying the endpoint in a private subnet and using a VPC endpoint for the SageMaker API controls API calls but does not encrypt inter-container traffic or enforce VPC-only mode for inference requests.

Full explanation →

721

MCQhard

A machine learning engineer is using SageMaker Debugger to detect if a neural network has dead ReLU units during training. Which built-in rule should they enable?

A.DeadRelu

B.Overfit

C.ExplodingGradients

D.LossNotDecreasing

AnswerA

DeadRelu rule specifically detects dead ReLU units.

Why this answer

The 'DeadRelu' rule in Debugger monitors the fraction of ReLU activations that are zero and alerts if too many neurons are dead.

Full explanation →

722

MCQeasy

A machine learning engineer is building a regression model to predict house prices. The feature 'square_footage' has values ranging from 500 to 10,000, while 'num_bedrooms' ranges from 1 to 10. Which preprocessing step is most critical before training a model that uses gradient descent?

A.Standardize both features to have zero mean and unit variance.

B.Apply a logarithmic transformation to both features.

C.Encode the 'num_bedrooms' feature using one-hot encoding.

D.Impute missing values using the mean of the feature.

AnswerA

Standardization brings features to a common scale, crucial for gradient descent.

Why this answer

Gradient descent is sensitive to the scale of features because it updates weights proportionally to the feature values. With 'square_footage' (500–10,000) and 'num_bedrooms' (1–10), the large range difference causes the loss function's contours to be elongated, leading to slow or unstable convergence. Standardizing both features to zero mean and unit variance ensures each feature contributes equally to the gradient updates, enabling faster and more reliable optimization.

Exam trap

AWS often tests the distinction between scaling for gradient-based optimizers versus other preprocessing steps like encoding or transformation, trapping candidates who confuse feature scaling with handling outliers or categorical data.

How to eliminate wrong answers

Option B is wrong because applying a logarithmic transformation is not the most critical step for gradient descent; it is used to handle skewed distributions or multiplicative relationships, not to address feature scale differences. Option C is wrong because one-hot encoding is for categorical features, and 'num_bedrooms' is ordinal (integer-valued), not nominal; encoding it would create unnecessary sparsity and lose the natural ordering. Option D is wrong because imputing missing values is a general data cleaning step, but the question does not mention any missing data; the core issue here is feature scaling for gradient descent, not missingness.

Full explanation →

723

MCQmedium

A company has 200 small PyTorch models that are each used infrequently but need to be available for real-time inference. To minimize costs, they want to host all models on a single endpoint. Which SageMaker feature should they use?

A.Multi-model endpoint (MME)

B.Multi-container endpoint

C.Batch Transform job

D.Asynchronous inference endpoint

AnswerA

Why this answer

Multi-model endpoints allow hosting hundreds of models on a single endpoint, automatically loading/unloading models based on traffic. Multi-container endpoints are for different containers, not multiple models. Batch and asynchronous are not real-time.

Full explanation →

724

Multi-Selectmedium

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

Select 3 answers

A.SageMaker (training jobs or pipelines)

B.Amazon EventBridge

C.AWS Lambda

D.AWS Glue

E.Amazon CloudWatch Logs

AnswersA, B, C

SageMaker executes the actual retraining.

Why this answer

Amazon SageMaker provides the training jobs and pipelines that execute the ML retraining workflow. Amazon EventBridge acts as the event bus that triggers retraining based on events such as new data arrival or model drift detection. AWS Lambda serves as the lightweight compute layer that can preprocess events, invoke SageMaker APIs, or orchestrate conditional logic before starting a training job.

Exam trap

The trap here is that candidates often confuse AWS Glue as a compute trigger for ML retraining, but Glue is designed for batch ETL and lacks the event-driven, low-latency invocation capabilities required for this architecture.

Full explanation →

725

Multi-Selecteasy

A data scientist is using Amazon SageMaker Data Wrangler to prepare a dataset. The dataset contains a column with missing values, a column with outliers, and a column with text data. The scientist wants to use built-in transforms to handle these issues. Which THREE transforms are available in Data Wrangler for these tasks? (Select THREE.)

Select 3 answers

A.Handle missing values (imputation)

B.SMOTE oversampling

C.One-hot encoding

D.Handle outliers (clipping or Z-score)

E.Text processing (tokenization, TF-IDF)

AnswersA, D, E

Data Wrangler includes transforms to impute missing values using mean, median, etc.

Why this answer

Data Wrangler provides built-in transforms for handling missing values (e.g., imputation), handling outliers (e.g., clipping), and processing text (e.g., tokenization). SMOTE is available for class imbalance, and one-hot encoding is for categorical features.

Full explanation →

726

Multi-Selectmedium

A machine learning engineer is building a real-time fraud detection pipeline using Amazon Kinesis Data Streams. The data must be prepared (e.g., feature engineering, normalization) before being fed into a SageMaker endpoint. Which TWO steps should the engineer implement to ensure low-latency data preparation?

Select 2 answers

A.Use AWS Lambda functions to apply feature transformations on each record as it arrives.

B.Use SageMaker batch transform jobs scheduled every hour to process the streaming data.

C.Use AWS Glue ETL jobs running on a recurring schedule to transform the data.

D.Use Amazon Kinesis Data Analytics to perform SQL-based transformations on the stream.

E.Use SageMaker Processing jobs to read from Kinesis and write transformed data to S3.

AnswersA, D

Lambda can run custom code (e.g., Python) with low latency on each Kinesis record.

Why this answer

AWS Lambda can process records from Kinesis in near real-time for lightweight transformations like normalization and feature engineering. For streaming data, SageMaker batch transform is not real-time. Glue ETL is batch-oriented and adds latency.

Amazon Kinesis Data Analytics can perform SQL-based transformations in real-time. SageMaker Processing jobs are designed for offline processing.

Full explanation →

727

MCQeasy

A company wants to deploy a trained XGBoost model for batch inference on a large dataset stored in S3. The inference job should be cost-effective and does not require real-time responses. Which SageMaker inference option should they use?

A.SageMaker Batch Transform

B.SageMaker real-time endpoint

C.SageMaker Asynchronous Inference

D.SageMaker Serverless Inference

AnswerA

Batch Transform is designed for batch inference on S3 data, cost-effective and no real-time requirement.

Why this answer

SageMaker Batch Transform is designed for batch inference on large datasets stored in S3, processing data in chunks and writing results to S3. It is cost-effective for non-real-time scenarios. Real-time endpoints are for low-latency inference.

Serverless is for on-demand, not batch. Asynchronous is for near-real-time with S3 input/output but still not ideal for large batch jobs.

Full explanation →

728

MCQeasy

A company is building a recommendation system and has trained a matrix factorization model using SageMaker. They want to evaluate the model's performance using precision at k (P@k) and recall at k (R@k). They have a test set of user-item interactions. The data scientist implements a custom evaluation script that computes these metrics, but the precision values are consistently zero. What is the most likely cause?

A.The model outputs are not being ranked correctly.

B.The model is overfitting.

C.The test set contains only positive interactions.

D.The k value is too large.

AnswerC

Correct: Without negative examples, precision is undefined or zero if no test items are in the recommendation list.

Why this answer

Option C is correct because if the test set contains only positive interactions (i.e., every user-item pair in the test set is a ground-truth positive), then precision at k will be zero unless the model recommends exactly those items. Since the model's top-k recommendations are unlikely to perfectly match the test set's positive items for every user, precision (the fraction of recommended items that are relevant) will be zero. This is a known pitfall when evaluating implicit feedback models without negative samples.

Exam trap

The trap here is that candidates assume precision at k can be computed directly from a test set of positive interactions, overlooking that without negative labels, the metric becomes meaningless because the denominator (k) will always yield zero unless the model's top-k exactly matches the test positives.

How to eliminate wrong answers

Option A is wrong because even if the model outputs are not ranked correctly, precision at k would not be consistently zero—it would be some non-zero value if any relevant items appear in the top-k, just potentially lower than expected. Option B is wrong because overfitting would typically cause high training performance and poor generalization, but it would not force precision to be exactly zero; some relevant items could still appear in recommendations. Option D is wrong because a k value that is too large would increase recall (more items considered) but would not cause precision to be zero; precision would still be non-zero if any relevant items are among the top-k.

Full explanation →

729

MCQmedium

A machine learning engineer is troubleshooting a model that is producing unexpectedly low accuracy in production. The engineer examines the model's training data and finds that the distribution of the target variable in production is significantly different from the training set. What type of drift is the model experiencing?

A.Prior probability shift

B.Concept drift

C.Data drift

D.Covariate shift

AnswerB

Concept drift is a change in the statistical properties of the target variable.

Why this answer

Option B is correct because a change in the target variable distribution is concept drift. Option A is wrong because covariate shift is input distribution change. Option C is wrong because prior probability shift is a type of concept drift, but not the best answer here.

Option D is wrong because data drift is a general term.

Full explanation →

730

MCQmedium

An e-commerce company uses Amazon SageMaker to deploy a real-time inference endpoint for product recommendations. The endpoint receives bursty traffic, with occasional spikes. The company wants to minimize cost while ensuring that latency remains under 100 ms. Which approach should the company take?

A.Use an elastic inference accelerator to reduce latency instead of scaling.

B.Use a scheduled scaling plan based on historical traffic patterns.

C.Deploy the model on one large instance to handle peak load.

D.Deploy the model on a multi-model endpoint with automatic scaling and configure a warm-up period for new instances.

AnswerD

Multi-model endpoint with scaling and warm-up can handle bursts cost-effectively.

Why this answer

Option D is correct because a multi-model endpoint with automatic scaling allows multiple models to share a single endpoint, reducing cost while handling bursty traffic. Configuring a warm-up period ensures new instances are fully initialized before receiving traffic, preventing cold-start latency spikes and keeping inference under 100 ms.

Exam trap

The trap here is that candidates confuse latency optimization techniques (like elastic inference) with scaling strategies, overlooking that bursty traffic requires dynamic scaling with warm-up to prevent cold-start latency spikes.

How to eliminate wrong answers

Option A is wrong because elastic inference accelerators reduce per-inference latency but do not address the need to scale out during traffic spikes; they add cost without solving the bursty traffic problem. Option B is wrong because scheduled scaling based on historical patterns cannot react to unpredictable spikes, leading to either over-provisioning or latency violations during unexpected bursts. Option C is wrong because deploying on one large instance creates a single point of failure and is cost-inefficient for bursty traffic; it either underutilizes resources during low traffic or fails to handle peak load without latency degradation.

Full explanation →

731

MCQmedium

A company is fine-tuning a large language model using LoRA with a Hugging Face estimator in SageMaker. They want to reduce memory usage during training. Which instance type is most cost-effective for this workload?

A.ml.p4d.24xlarge

B.ml.g5.xlarge

C.ml.c5.2xlarge

D.ml.trn1.2xlarge

AnswerB

G5 instances are cost-effective for fine-tuning with LoRA, providing good performance at lower cost.

Why this answer

LoRA reduces the number of trainable parameters, allowing training on smaller GPUs. ml.g5 instances are optimized for machine learning inference and training with a good price-performance for fine-tuning.

Full explanation →

732

Multi-Selectmedium

A data scientist is preparing a dataset with a categorical feature that has 20 levels. The target variable is continuous. Which THREE encoding methods are appropriate for this scenario? (Select THREE.)

Select 3 answers

A.One-hot encoding

B.Ordinal encoding

C.Target encoding

D.Binary encoding

E.Label encoding

AnswersA, B, C

One-hot encoding creates binary columns for each category; works for any categorical feature.

Why this answer

One-hot encoding, ordinal encoding (if order exists), and target encoding are all applicable for categorical features with a continuous target. Label encoding is similar to ordinal but usually implies arbitrary order, but still acceptable; however, the question expects three of the listed. The three most directly appropriate are one-hot, ordinal, and target.

Full explanation →

733

MCQeasy

A machine learning engineer needs to deploy a new version of a model gradually, initially sending 5% of traffic to the new version and 95% to the current version, while monitoring for errors. Which deployment pattern should they use?

A.Blue/green deployment

B.Canary deployment

C.Shadow testing

D.Rolling deployment

AnswerB

Canary deployment gradually shifts traffic, starting with a small percentage like 5%, to test the new version.

Why this answer

Canary deployment is the correct pattern because it allows the ML engineer to route a small percentage of traffic (e.g., 5%) to the new model version while keeping the majority (95%) on the current version. This enables gradual rollout with real-time monitoring for errors, and if issues are detected, traffic can be instantly shifted back to the stable version.

Exam trap

The trap here is that candidates often confuse canary deployment with blue/green deployment, assuming both involve gradual traffic shifting, but blue/green is an all-or-nothing switch, while canary specifically supports incremental percentage-based routing with monitoring.

How to eliminate wrong answers

Option A is wrong because blue/green deployment involves switching all traffic at once from the current environment (blue) to the new environment (green), which does not support gradual traffic shifting or incremental error monitoring. Option C is wrong because shadow testing sends a copy of live traffic to the new model without affecting user-facing responses, but it does not route actual user traffic to the new version, so it cannot be used to gradually shift real traffic percentages. Option D is wrong because rolling deployment updates instances incrementally (e.g., replacing pods one by one), but it does not provide fine-grained traffic splitting like 5% vs 95% and lacks the instant rollback capability of canary deployments.

Full explanation →

734

MCQmedium

A media company uses SageMaker endpoints to serve a model that predicts video engagement. They have two production variants: Variant A (ml.c5.large) for regular traffic and Variant B (ml.c5.xlarge) for burst traffic. They use weighted routing (90% to A, 10% to B). Recently, during peak hours, Variant A's latency increase causes many requests to time out. The metrics show that both variants are under similar CPU load, but the number of concurrent requests to Variant A is very high. The team wants to ensure that burst traffic is handled properly without manual intervention. What should they do?

A.Increase the traffic weight to Variant B to 70% and reduce Variant A to 30%.

B.Configure Application Auto Scaling for each variant with a target tracking scaling policy based on the number of concurrent requests per instance.

C.Set a CloudWatch alarm on Variant A's p99 latency and trigger a step scaling policy to add instances.

D.Create a separate endpoint for burst traffic and route peak traffic to it via DNS.

AnswerB

Autoscaling adjusts capacity based on load, preventing timeouts.

Why this answer

Option B is correct because changing to target tracking scaling based on the number of concurrent requests (or InvocationsPerInstance) ensures each variant scales based on its load. Option A (swap weights) doesn't fix scaling. Option C (p99 latency alarm) might trigger too late.

Option D (separate endpoint) is not necessary.

Full explanation →

735

MCQmedium

A company has a SageMaker endpoint that was deployed successfully and is in service. However, when the team sends test inferences using the InvokeEndpoint API, they receive a 500 internal server error. The endpoint logs in CloudWatch show a stack trace indicating 'OutOfMemoryError: Java heap space'. The model is a large XGBoost model (2 GB) and the endpoint is using an ml.m5.large instance with 8 GB of memory. What is the MOST likely cause and solution?

A.The endpoint needs to have a smaller batch size configured in the real-time inference request.

B.The instance type has insufficient memory for the model size; use a larger instance type like ml.m5.xlarge (16 GB) or ml.m5.2xlarge.

C.The model is a Transformer model and requires a GPU instance; use ml.g4dn.xlarge instead.

D.The SageMaker container is not compatible with XGBoost; switch to a framework container.

AnswerB

A 2 GB model plus runtime overhead (e.g., Java heap for XGBoost) can exceed 8 GB. Increasing instance memory resolves the out-of-memory error.

Why this answer

The OutOfMemoryError in Java heap space indicates that the model (2 GB) plus the runtime overhead of the XGBoost container and Java-based inference code exceed the available memory on the ml.m5.large instance (8 GB total, but not all is available for the Java heap). The most direct fix is to use a larger instance type, such as ml.m5.xlarge (16 GB) or ml.m5.2xlarge, to provide sufficient heap space for the model and inference operations.

Exam trap

The trap here is that candidates may incorrectly attribute the OutOfMemoryError to batch size or container compatibility, rather than recognizing that the instance's memory is insufficient for the model size and Java heap overhead.

How to eliminate wrong answers

Option A is wrong because batch size configuration is not applicable to real-time InvokeEndpoint requests (which are single inference calls), and reducing batch size would not resolve a Java heap space error caused by model size and overhead. Option C is wrong because the model is explicitly stated as XGBoost, not a Transformer model, and XGBoost runs efficiently on CPU instances; GPU instances are not required. Option D is wrong because SageMaker provides native support for XGBoost via built-in containers, and the error is a memory issue, not a compatibility issue with the container.

Full explanation →

736

Multi-Selecthard

A data scientist is developing a gradient boosting model and observes that the model is overfitting to the training data. Which three techniques can help reduce overfitting? (Select THREE.)

Select 3 answers

A.Reduce the learning rate

B.Apply early stopping

C.Increase the maximum depth of trees

D.Increase the regularization parameters (e.g., lambda, alpha)

E.Add subsampling of data or features

.Increase the number of trees

AnswersA, B, E

A lower learning rate makes the model more robust and reduces overfitting.

Why this answer

Reducing the learning rate shrinks the contribution of each tree in the gradient boosting ensemble, forcing the model to take smaller steps toward the target. This slows down the learning process and reduces the risk of overfitting by preventing the model from fitting noise in the training data too aggressively.

Exam trap

Cisco often tests the misconception that increasing model complexity (e.g., deeper trees, more trees) always improves performance, when in fact these changes increase overfitting unless accompanied by countermeasures like regularization or reduced learning rate.

Full explanation →

737

MCQmedium

A company uses AWS Glue to run ETL jobs that prepare data for machine learning. The data is stored in Amazon S3 in Parquet format. A data engineer notices that the Glue job is running slowly and consuming a lot of resources. What is the MOST cost-effective way to improve the performance of the Glue job?

A.Use the G.1X worker type, which provides more memory per worker compared to the Standard worker type.

B.Use partition pruning on the source data to reduce the amount of data processed.

C.Switch the output format from Parquet to CSV to reduce processing overhead.

D.Use a larger instance type for the Glue job by increasing the number of DPUs.

AnswerA

G.1X offers more memory, reducing memory-related bottlenecks without increasing DPU count.

Why this answer

Increasing the number of DPUs (Data Processing Units) in AWS Glue can improve parallelism and reduce job runtime, but it increases cost. Using G.1X worker type with more memory per worker can improve performance without increasing DPU count, offering better resource utilization. Switching to CSV may degrade performance.

Using partition pruning on the source data can reduce data scanned but may not address resource consumption.

Full explanation →

738

MCQhard

A company deploys a SageMaker model using AWS KMS for encryption at rest. They have a compliance requirement to rotate the KMS key every year without causing downtime for the inference endpoint. Which approach should they take?

A.Use AWS Certificate Manager (ACM) for encryption

B.Create a new KMS key and update the endpoint configuration

C.Manually rotate the key by recreating the endpoint

D.Enable automatic key rotation on the existing KMS key

AnswerD

Automatic rotation rotates the key material without changing the key ID, causing no downtime.

Why this answer

AWS KMS supports automatic key rotation, which creates new backing keys annually while retaining the same key ID and metadata. This ensures that the SageMaker endpoint continues to use the same KMS key alias and configuration, so no endpoint update or downtime is required. Automatic rotation satisfies the compliance requirement without any manual intervention or endpoint recreation.

Exam trap

The trap here is that candidates may think rotating a KMS key requires creating a new key and updating the resource (Option B), or that manual recreation is necessary (Option C), when in fact AWS KMS automatic key rotation handles the rotation seamlessly without any endpoint modification or downtime.

How to eliminate wrong answers

Option A is wrong because AWS Certificate Manager (ACM) is for managing SSL/TLS certificates, not for encryption at rest of SageMaker model data; it does not provide KMS key rotation capabilities. Option B is wrong because creating a new KMS key and updating the endpoint configuration would require a deployment update, which can cause a brief interruption or require a rolling update, and it does not leverage the simpler automatic rotation mechanism. Option C is wrong because manually rotating the key by recreating the endpoint would cause downtime during the recreation process, violating the no-downtime requirement.

Full explanation →

739

Multi-Selecteasy

Which TWO data storage options are commonly used by Amazon SageMaker Feature Store for offline and online storage?

Select 2 answers

A.Amazon Redshift

B.Amazon RDS

C.Amazon ElastiCache

D.Amazon S3

E.Amazon DynamoDB

AnswersD, E

S3 is the default offline store for large historical feature data.

Why this answer

Amazon SageMaker Feature Store uses Amazon S3 as the default offline storage layer because it provides durable, scalable, and cost-effective object storage for large volumes of historical feature data. Amazon DynamoDB is used as the default online storage layer because it offers low-latency, single-digit millisecond read/write performance required for real-time inference serving.

Exam trap

The trap here is that candidates often confuse Amazon ElastiCache (a caching layer) with the primary online storage service, or assume Amazon Redshift is used for offline storage due to its analytical capabilities, but SageMaker Feature Store specifically integrates DynamoDB for online and S3 for offline storage as first-class options.

Full explanation →

740

MCQmedium

A team wants to use MLflow on SageMaker to track experiments and manage model lifecycle. They need to register models in the SageMaker Model Registry after training. Which approach allows them to use MLflow for experiment tracking and then register the best model to SageMaker Model Registry?

A.Use MLflow to track experiments and register models in MLflow's native registry, then export to SageMaker

B.Use SageMaker Experiments for tracking, then manually register the model using SageMaker console

C.Use MLflow tracking server on SageMaker, then use the SageMaker MLflow plugin to register the model in SageMaker Model Registry

D.MLflow cannot be used with SageMaker; use SageMaker Experiments instead

AnswerC

The integration enables MLflow tracking and direct registration to SageMaker Model Registry.

Why this answer

Option C is correct because the SageMaker MLflow plugin (sagemaker-mlflow) allows you to use an MLflow tracking server hosted on SageMaker for experiment tracking, and then directly register the best model into the SageMaker Model Registry via the plugin's integration. This avoids manual export steps and keeps the model lifecycle management within SageMaker's native registry, which is required by the team's goal.

Exam trap

Cisco often tests the misconception that MLflow and SageMaker are mutually exclusive or require complex workarounds, when in fact SageMaker provides a first-class MLflow integration via the tracking server and plugin.

How to eliminate wrong answers

Option A is wrong because exporting models from MLflow's native registry to SageMaker Model Registry is not a supported direct workflow; it would require manual conversion and re-registration, defeating the purpose of seamless integration. Option B is wrong because using SageMaker Experiments for tracking and then manually registering via the console is a valid but less efficient approach that does not leverage MLflow, which the team explicitly wants to use for experiment tracking. Option D is wrong because MLflow can indeed be used with SageMaker; SageMaker provides native support for hosting an MLflow tracking server and the MLflow plugin for model registry integration.

Full explanation →

741

MCQhard

A data science team is building a model to predict fraudulent transactions. The dataset has 1 million legitimate transactions and only 1,000 fraudulent ones. They plan to use Amazon SageMaker to train a model. Which data preparation technique should they apply to address the severe class imbalance before training?

A.Apply data augmentation using image transformations because fraud detection is like image classification.

B.Randomly oversample the fraudulent class to match the legitimate count by duplicating existing fraud records.

C.Use SMOTE (Synthetic Minority Oversampling Technique) to generate synthetic fraudulent samples.

D.Randomly undersample the legitimate class to 1,000 samples to create a balanced dataset.

AnswerC

SMOTE creates synthetic examples by interpolating between existing minority instances, reducing overfitting risk.

Why this answer

SMOTE (Synthetic Minority Oversampling Technique) is the correct choice because it generates synthetic fraudulent samples by interpolating between existing minority class instances in feature space, rather than simply duplicating records. This creates more diverse and realistic training data, reducing overfitting risk while addressing the severe 1:1000 class imbalance. Amazon SageMaker's built-in algorithms and data processing capabilities can easily integrate SMOTE-applied datasets for training.

Exam trap

Cisco often tests the misconception that simple random oversampling (Option B) is sufficient, but the trap is that it causes overfitting, whereas SMOTE's synthetic generation provides better generalization for imbalanced datasets.

How to eliminate wrong answers

Option A is wrong because data augmentation using image transformations (e.g., rotations, flips) is specific to image data and does not apply to tabular fraud detection datasets; it introduces irrelevant noise and breaks feature relationships. Option B is wrong because randomly oversampling the fraudulent class by duplicating existing records leads to overfitting, as the model simply memorizes the exact same fraud patterns without learning generalizable features. Option D is wrong because randomly undersampling the legitimate class to 1,000 samples discards 999,000 legitimate transactions, causing massive information loss and severely degrading model performance on the majority class.

Full explanation →

742

MCQhard

A data scientist is trying to create a SageMaker endpoint configuration with 6 instances of ml.c5.large for a production variant. The creation fails with the error shown in the exhibit. Which action should the data scientist take to resolve this issue?

A.Create two separate endpoint configurations, each with 3 instances, and distribute traffic between them.

B.Request a service quota increase for ml.c5.large for real-time endpoints from the AWS Service Quotas console.

C.Use a different instance type, such as ml.m5.large, which has a higher limit.

D.Delete unused endpoints to free up resources.

AnswerB

Increasing the quota allows provisioning the requested number of instances.

Why this answer

The error indicates that the requested number of instances exceeds the service quota for ml.c5.large for real-time endpoints. AWS enforces default limits on instance counts per instance type per region. Requesting a quota increase via the Service Quotas console is the correct action to raise the limit and allow the deployment of 6 instances.

Exam trap

The trap here is that candidates may confuse service quotas with resource availability, thinking that deleting unused endpoints or splitting configurations will free up capacity, when in fact the quota is a hard limit that must be explicitly increased.

How to eliminate wrong answers

Option A is wrong because creating two separate endpoint configurations does not bypass the service quota; the total instance count across all endpoints still counts against the same quota. Option C is wrong because using a different instance type like ml.m5.large does not inherently have a higher limit; each instance type has its own default quota, and the limit for ml.m5.large may also be insufficient or unknown without checking. Option D is wrong because deleting unused endpoints does not increase the quota for ml.c5.large; it only frees up currently used instances, but the quota itself remains unchanged.

Full explanation →

743

Multi-Selecthard

A machine learning team is setting up Model Monitor for a deployed model. Which THREE factors should they consider when configuring the monitoring schedule? (Select three.)

Select 3 answers

A.The monitoring job can be configured to send notifications via Amazon SNS.

B.The frequency of monitoring should be at least daily.

C.The monitoring job should analyze a sufficient sample size to be statistically significant.

D.The monitoring job should run on a schedule that aligns with data arrival patterns.

E.The constraints file must be updated after each monitoring run.

AnswersA, C, D

SNS notifications can alert teams when violations are detected.

Why this answer

Option A is correct because Amazon SageMaker Model Monitor can be configured to send notifications via Amazon SNS when monitoring violations are detected. This allows the team to proactively respond to data drift or quality issues without manually polling the monitoring results.

Exam trap

The trap here is that candidates assume monitoring must run daily (Option B) because of common best practices, but the exam tests that the schedule should be based on data arrival patterns, not a fixed minimum frequency.

Full explanation →

744

MCQmedium

A machine learning engineer observes that model performance on a SageMaker endpoint has degraded over the past week. Ground truth labels are available with a 2-day delay. The engineer wants to automatically trigger a retraining pipeline when prediction quality drops below an acceptable threshold. Which approach is most appropriate?

A.Use SageMaker Model Monitor - Model Quality Monitor with ground truth, create a CloudWatch alarm on the metric, and trigger an AWS Lambda function to start retraining

B.Manually evaluate the model weekly and retrain as needed

C.Use SageMaker Model Monitor - Data Quality Monitor to detect drift, then trigger retraining

D.Use SageMaker Clarify to monitor bias drift and trigger retraining

AnswerA

Model Quality Monitor evaluates predictions against ground truth; CloudWatch alarm on quality metric triggers retraining.

Why this answer

Option A is correct because SageMaker Model Monitor's Model Quality Monitor is specifically designed to compare model predictions against ground truth labels (available with a 2-day delay) and track metrics like accuracy, precision, recall, or F1 score. You can configure a CloudWatch alarm on a metric such as 'accuracy' dropping below a threshold, which triggers an AWS Lambda function to start the retraining pipeline. This automates the detection of prediction quality degradation and the retraining response without manual intervention.

Exam trap

The trap here is that candidates confuse Data Quality Monitor (which monitors input data drift) with Model Quality Monitor (which monitors prediction accuracy against ground truth), leading them to choose Option C incorrectly.

How to eliminate wrong answers

Option B is wrong because manually evaluating the model weekly is not automated and does not meet the requirement to automatically trigger retraining when prediction quality drops; it introduces latency and human error. Option C is wrong because Data Quality Monitor detects drift in input data distribution (e.g., feature skew), not in prediction quality against ground truth labels, so it cannot directly measure model performance degradation. Option D is wrong because SageMaker Clarify is used for bias detection and explainability, not for monitoring prediction quality or triggering retraining based on performance metrics.

Full explanation →

745

Multi-Selectmedium

A data scientist is performing feature engineering for a dataset with both numerical and categorical features. The data scientist wants to apply transformations that preserve the interpretability of the features. Which TWO transformations should the data scientist use? (Select TWO)

Select 2 answers

A.Log transformation of skewed numerical features

B.Target encoding of high-cardinality categorical features

C.Standard scaling of numerical features

D.PCA dimensionality reduction

E.One-hot encoding of categorical features

AnswersA, C

Log transformation reduces skewness while keeping feature order.

Why this answer

Log transformation is correct because it reduces skewness in numerical features by compressing the scale of large values, making the distribution more normal while preserving the original feature's interpretability (e.g., a log-transformed income value still relates to income). This is a monotonic transformation, so the order of values is maintained, and the feature remains directly understandable.

Exam trap

AWS often tests the misconception that one-hot encoding always preserves interpretability (it does, but the question pairs it with target encoding as a distractor), leading candidates to select one-hot encoding instead of recognizing that standard scaling is the correct second choice for numerical features.

Full explanation →

746

MCQeasy

A team wants to monitor the number of requests and latency of their SageMaker endpoint using a unified dashboard. Which AWS service should they use to create a custom dashboard with these metrics?

A.Amazon CloudWatch Dashboards

B.AWS CloudTrail

C.AWS Config

D.SageMaker Studio

AnswerA

CloudWatch Dashboards can display real-time and historical metrics from SageMaker endpoints in a customizable layout.

Why this answer

Amazon CloudWatch Dashboards allow you to create custom views of metrics from any source, including SageMaker endpoint metrics like Invocations and Latency. SageMaker itself does not provide a dashboard for these metrics.

Full explanation →

747

Multi-Selecthard

A data scientist is using Amazon SageMaker Data Wrangler to create a feature engineering pipeline for a dataset with both numeric and categorical features. The scientist wants to apply transformations that are appropriate for a linear model. Which THREE transformations should the scientist apply? (Choose THREE.)

Select 3 answers

A.MinMaxScaler on numeric features

B.Remove features with high pairwise correlation

C.One-hot encoding on categorical features

D.Label encoding on categorical features

E.StandardScaler on numeric features

AnswersB, C, E

High correlation between features can cause multicollinearity, making linear model coefficients unstable and hard to interpret.

Why this answer

Linear models assume features are numeric, scaled, and not highly correlated. StandardScaler ensures all numeric features have comparable scales. One-hot encoding converts categorical features into binary columns without imposing ordinality.

Removing highly correlated features reduces multicollinearity which can destabilize coefficient estimates.

Full explanation →

748

MCQhard

A team uses SageMaker real-time endpoints for inference. They want to deploy a new model version and compare its performance with the current version under live traffic without affecting user experience. Which method should they use?

A.A/B testing with production variant traffic splitting

B.Batch transform on a holdout test set

C.Blue/green deployment

D.Shadow testing with SageMaker

AnswerD

Shadow testing duplicates traffic to a shadow variant without serving it to users, allowing safe comparison.

Why this answer

Shadow testing (or shadow deployment) sends a copy of live traffic to the new model variant while the current variant serves the actual response. The shadow variant's performance can be monitored without impacting the user.

Full explanation →

749

MCQmedium

A company uses Amazon SageMaker Ground Truth to create labeled datasets for object detection. The output must be in COCO format for downstream model training. How should the data preparation process be configured?

A.Use a built-in transformation to convert from Ground Truth JSON to COCO after labeling

B.Use a pre-built AWS Lambda function to transform annotations to COCO

C.Write a custom SageMaker Processing script to convert the output to COCO

D.Select 'Object Detection' task type and specify 'COCO' as the output format in the labeling job configuration

AnswerD

Ground Truth supports COCO output for object detection tasks.

Why this answer

Option D is correct because Amazon SageMaker Ground Truth natively supports outputting object detection labeling jobs in COCO format. When you select 'Object Detection' as the task type, the labeling job configuration includes an option to specify 'COCO' as the output format, which automatically structures the labeled data into the required COCO JSON schema without any post-processing.

Exam trap

The trap here is that candidates assume post-processing is always required for format conversion, overlooking that Ground Truth can directly output COCO format when the correct task type and output format are selected in the labeling job configuration.

How to eliminate wrong answers

Option A is wrong because Ground Truth does not provide a built-in transformation to convert its default JSON output to COCO format; the conversion must be handled externally. Option B is wrong because while AWS Lambda can be used for custom transformations, it is not a pre-built solution for this specific conversion; using a Lambda function would require writing custom code and is not the recommended or simplest approach. Option C is wrong because writing a custom SageMaker Processing script is an unnecessary extra step; Ground Truth can directly output COCO format, eliminating the need for any post-labeling transformation.

Full explanation →

750

MCQeasy

A data engineer needs to convert a JSON dataset to Parquet format for efficient querying with Amazon Athena. The JSON files are in an S3 bucket. Which service can perform this conversion with minimal coding?

A.Amazon SageMaker Processing

B.Amazon EMR

C.AWS Lambda

D.AWS Glue Studio with a visual job

AnswerD

Glue Studio's drag-and-drop interface enables JSON to Parquet conversion with minimal coding.

Why this answer

AWS Glue Studio with a visual job is the correct choice because it provides a no-code, drag-and-drop interface to create ETL jobs that can read JSON from S3 and write it as Parquet, with built-in schema inference and transformation capabilities. This minimizes coding effort while leveraging Glue's serverless Spark engine for efficient conversion, making it ideal for preparing data for Athena queries.

Exam trap

The trap here is that candidates often confuse AWS Glue Studio with AWS Glue DataBrew or assume that any AWS service with 'processing' in its name (like SageMaker Processing) is suitable for simple ETL tasks, overlooking the specific no-code visual job capability of Glue Studio.

How to eliminate wrong answers

Option A is wrong because Amazon SageMaker Processing is designed for data preprocessing and model training workflows within the ML pipeline, not for simple file format conversion; it requires writing custom processing scripts and managing infrastructure, which adds unnecessary complexity. Option B is wrong because Amazon EMR is a managed Hadoop/Spark cluster that can perform the conversion, but it requires provisioning and configuring a cluster, writing Spark or Hive code, and managing lifecycle, which is far more coding and operational overhead than a visual job. Option C is wrong because AWS Lambda has a maximum execution time of 15 minutes and a deployment package size limit, making it impractical for converting large JSON datasets to Parquet; it also requires custom Python code with libraries like PyArrow or Pandas, which is not minimal coding.

Full explanation →

Page 10 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice MLA-C01 by domain

Target a specific domain to shore up weak areas.

ML Model Development Data Preparation for Machine Learning Deployment and Orchestration of ML Workflows ML Solution Monitoring, Maintenance, and Security ML Solution Monitoring, Maintenance and Security

See all domains with question counts →