Scenario practice questions

A data scientist is preparing a large dataset (50 GB) for training a TensorFlow model on SageMaker. The dataset consists of many small CSV files. Training is slow due to I/O bottlenecks. Which data preparation strategy most effectively accelerates training?

Trap 1: Convert the dataset to Parquet format and use Apache Arrow for…

Parquet is good for analytics but not as efficient for TensorFlow training as TFRecord.

Trap 2: Compress the CSV files and decompress during data loading

Decompression adds overhead and does not solve file fragmentation.

Trap 3: Use a larger instance type with more vCPUs

This does not address I/O bottlenecks from many small files.

A
Convert the dataset to TFRecord format and use tf.data pipeline with prefetching
TFRecord combines many records into a few large files, and prefetching improves data pipeline efficiency.
B
Convert the dataset to Parquet format and use Apache Arrow for loading
Why wrong: Parquet is good for analytics but not as efficient for TensorFlow training as TFRecord.
C
Compress the CSV files and decompress during data loading
Why wrong: Decompression adds overhead and does not solve file fragmentation.
D
Use a larger instance type with more vCPUs
Why wrong: This does not address I/O bottlenecks from many small files.

Question 2easymultiple choice

A machine learning engineer at a retail company is monitoring a production model that predicts inventory demand. The model's prediction accuracy has dropped significantly over the past week. The engineer checks the model's input data and notices a new product category was introduced with a different distribution. Which concept is most likely causing the performance degradation?

Trap 1: Concept drift

Concept drift refers to changes in the underlying relationship between features and target, not input distribution.

Trap 2: Data leakage

Data leakage involves the model seeing information it shouldn't, not input distribution change.

Trap 3: Model decay

Model decay is a general term for performance degradation, not the specific cause here.

A
Concept drift
Why wrong: Concept drift refers to changes in the underlying relationship between features and target, not input distribution.
B
Covariate shift
Covariate shift occurs when the distribution of input features changes over time.
C
Data leakage
Why wrong: Data leakage involves the model seeing information it shouldn't, not input distribution change.
D
Model decay
Why wrong: Model decay is a general term for performance degradation, not the specific cause here.

Question 3hardmultiple choice

A financial services company deploys multiple models on a single Amazon SageMaker endpoint using a multi-model endpoint (MME). The models are stored in Amazon S3. Each model is approximately 500 MB and is loaded on demand. Users report high latency for cold-start scenarios. What should the company do to reduce cold-start latency?

Trap 1: Reduce the instance size to increase the number of instances per…

Smaller instances may have less memory, increasing disk swapping and latency.

Trap 2: Increase the number of instances in the endpoint's auto-scaling…

More instances spread the load but each still may have cold starts.

Trap 3: Deploy each model on a separate endpoint to avoid concurrent…

This increases management overhead and cost, and doesn't directly address cold start.

A
Reduce the instance size to increase the number of instances per unit cost.
Why wrong: Smaller instances may have less memory, increasing disk swapping and latency.
B
Increase the number of instances in the endpoint's auto-scaling group.
Why wrong: More instances spread the load but each still may have cold starts.
C
Deploy each model on a separate endpoint to avoid concurrent loading.
Why wrong: This increases management overhead and cost, and doesn't directly address cold start.
D
Configure the endpoint to use a larger 'ModelCacheSize' parameter.
Increasing the model cache size allows more models to be cached in memory, reducing load time.

Question 4easymultiple choice

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but also needs to keep costs low when traffic is low. Which SageMaker feature should they use?

Trap 1: SageMaker Debugger

Debugger is for debugging training jobs.

Trap 2: SageMaker Managed Spot Training

Spot Training is for training jobs, not inference.

Trap 3: SageMaker Model Monitor

Model Monitor is for monitoring inference quality.

A
SageMaker Debugger
Why wrong: Debugger is for debugging training jobs.
B
SageMaker Managed Spot Training
Why wrong: Spot Training is for training jobs, not inference.
C
SageMaker Elastic Inference
Elastic Inference attaches GPU acceleration to any SageMaker instance, reducing cost.
D
SageMaker Model Monitor
Why wrong: Model Monitor is for monitoring inference quality.

Question 5hardmultiple choice

A financial services company operates a real-time inference endpoint for a fraud detection model on Amazon SageMaker. The model was trained on historical transaction data from 2023. Over the past month, the model's precision has dropped from 92% to 78%, while recall remains high at 95%. The data science team suspects data drift and has already enabled SageMaker Model Monitor with data capture and a baseline from the training data. The latest monitoring report indicates no statistically significant drift in any of the input features. The team also verified that the inference code and model artifact have not changed. Despite the stable feature distributions, the model is misclassifying an increasing number of legitimate transactions as fraudulent (false positives). The business is concerned about the impact on customer experience. What is the best course of action?

Trap 1: Replace the model with a more complex algorithm such as a…

Algorithm change is a premature solution; the model's performance drop is likely due to data changes, not model capacity.

Trap 2: Retrain the model using the most recent 30 days of transaction data…

Retraining without understanding the cause may still use incorrect labels or miss a shift in the label distribution; it's better to first investigate ground truth.

Trap 3: Increase the data capture sampling percentage from 10% to 100% for…

Increasing data capture will provide more data but does not address the need for ground truth labels to diagnose the degradation.

A
Replace the model with a more complex algorithm such as a gradient-boosted tree.
Why wrong: Algorithm change is a premature solution; the model's performance drop is likely due to data changes, not model capacity.
B
Retrain the model using the most recent 30 days of transaction data with automated retraining pipelines.
Why wrong: Retraining without understanding the cause may still use incorrect labels or miss a shift in the label distribution; it's better to first investigate ground truth.
C
Increase the data capture sampling percentage from 10% to 100% for more detailed analysis.
Why wrong: Increasing data capture will provide more data but does not address the need for ground truth labels to diagnose the degradation.
D
Investigate recent ground truth labels to check for label drift or changes in the fraud definition.
Label drift occurs when the underlying relationship between features and labels changes. Collecting and analyzing recent labels can confirm if the fraud criteria have shifted.