MLA-C01 Exam Questions and Answers

A data scientist is preparing a large dataset for training a machine learning model. The dataset contains missing values in several columns. Which approach is the MOST efficient for handling missing values in a large dataset using AWS services?

Use AWS Glue ETL to write a custom Python script that imputes missing values with the mean.

Use Amazon SageMaker Data Wrangler to impute missing values using built-in transforms.

Data Wrangler provides efficient, scalable, and visual data preparation without custom code.

Use pandas in a SageMaker notebook to impute missing values with the median.

Remove all rows with missing values from the dataset.

Why: Amazon SageMaker Data Wrangler provides a visual interface and built-in transforms for handling missing values efficiently at scale, without writing custom code. Glue ETL is more code-heavy, and imputation with pandas is not scalable for large datasets. Removing all rows with missing values is not always optimal and may not be efficient.

A company is using AWS Glue to prepare data for a machine learning pipeline. The source data is in an Amazon S3 bucket in CSV format. The data scientist wants to convert the data to Parquet format and partition it by date. Which AWS Glue feature should be used to optimize the data for query performance and reduce storage costs?

Use Amazon Athena to convert the data to JSON format and store it in S3.

Use AWS Glue DynamicFrame to repartition the data and write it as Parquet.

DynamicFrame supports efficient partitioning and columnar format conversion.

Use AWS Glue to convert the data to Apache Hive format.

Use Apache Spark DataFrame to write the data as CSV with Snappy compression.

Why: Option B is correct because AWS Glue DynamicFrames provide built-in optimizations for writing data in columnar formats like Parquet, which improves query performance through predicate pushdown and compression, and reduces storage costs by using efficient encoding. The DynamicFrame's `repartition()` method allows you to control the number of output files, and writing as Parquet directly from Glue avoids intermediate conversions, making it the most efficient choice for this task.

A machine learning engineer is preparing a dataset for a binary classification model. The dataset has a severe class imbalance (95% class A, 5% class B). The engineer wants to use Amazon SageMaker to train the model. Which data preparation technique should the engineer apply to the training dataset to address the imbalance and improve model performance?

Apply data augmentation to the majority class by adding noise.

Apply Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic samples for the minority class.

SMOTE creates synthetic samples, balancing the dataset without losing data.

Use a weighted loss function during training to penalize misclassifications of the minority class.

Apply random under-sampling to reduce the majority class to match the minority class size.

Why: Option B is correct because SMOTE generates synthetic samples for the minority class by interpolating between existing minority instances, which directly addresses the severe class imbalance (95% class A, 5% class B) by creating a more balanced training dataset. This technique is particularly effective for tabular data in Amazon SageMaker, as it increases the representation of the minority class without simply duplicating existing samples, thereby reducing overfitting and improving the model's ability to learn decision boundaries for the minority class.

A data scientist is preparing a dataset for a machine learning model that predicts customer churn. The dataset contains a column 'CustomerID' that is a unique identifier. What should the data scientist do with this column before training the model?

Keep the column as a feature because it uniquely identifies each customer.

Use the column as the target variable.

Remove the column from the feature set.

Removing unique identifiers prevents overfitting and is standard practice.

Encode the column using one-hot encoding.

Why: Option C is correct because 'CustomerID' is a unique identifier with no predictive power for churn. Including it as a feature would cause the model to memorize individual customers rather than learn generalizable patterns, leading to overfitting and poor performance on unseen data. In machine learning, such columns should be removed during data preparation to ensure the model learns from meaningful features.

A company uses AWS Glue to run ETL jobs that prepare data for machine learning. The data is stored in Amazon S3 in Parquet format. A data engineer notices that the Glue job is running slowly and consuming a lot of resources. What is the MOST cost-effective way to improve the performance of the Glue job?

Use the G.1X worker type, which provides more memory per worker compared to the Standard worker type.

G.1X offers more memory, reducing memory-related bottlenecks without increasing DPU count.

Use partition pruning on the source data to reduce the amount of data processed.

Switch the output format from Parquet to CSV to reduce processing overhead.

Use a larger instance type for the Glue job by increasing the number of DPUs.

Why: Increasing the number of DPUs (Data Processing Units) in AWS Glue can improve parallelism and reduce job runtime, but it increases cost. Using G.1X worker type with more memory per worker can improve performance without increasing DPU count, offering better resource utilization. Switching to CSV may degrade performance. Using partition pruning on the source data can reduce data scanned but may not address resource consumption.

A machine learning team is building a model using a dataset that contains a mix of numerical and categorical features. The categorical features have high cardinality (e.g., zip code with thousands of unique values). The team wants to use Amazon SageMaker for training. Which technique should the team use to encode the high-cardinality categorical features effectively?

Apply hash encoding to map categories to a fixed number of buckets.

Apply target encoding (mean encoding) to the high-cardinality features.

Target encoding reduces dimensionality and captures target-related information.

Apply one-hot encoding to all categorical features.

Apply label encoding to assign integer values to each category.

Why: For high-cardinality categorical features, target encoding (mean encoding) replaces each category with the mean of the target variable for that category, which captures information without creating a large number of dummy variables. One-hot encoding would create too many features. Label encoding implies ordinal relationships. Hash encoding can cause collisions.

Want more Data Preparation for Machine Learning practice?

All ML Model Development questions

Domain 2: ML Model Development

A data scientist is training a binary classification model using imbalanced data where the positive class is only 1% of the dataset. The scientist wants to maximize the recall for the positive class while maintaining reasonable precision. Which evaluation metric is most appropriate to tune during model selection?

Log loss

Area under the ROC curve (AUC)

F1 score

F1 score combines precision and recall, making it suitable for imbalanced classes when both matter.

Accuracy

Why: The F1 score is the harmonic mean of precision and recall, making it ideal for imbalanced datasets where the positive class is only 1%. By tuning the F1 score, the data scientist directly balances the trade-off between maximizing recall (capturing true positives) and maintaining reasonable precision (avoiding false positives), which aligns with the stated goal.

A machine learning engineer is training a deep learning model on SageMaker and notices that the training loss decreases rapidly in the first few epochs but then plateaus. The validation loss starts increasing after 10 epochs. Which action should the engineer take to improve generalization?

Add more layers to the model

Use early stopping with validation loss monitoring

Early stopping halts training when validation loss stops decreasing, reducing overfitting.

Increase the learning rate

Decrease the batch size

Why: Early stopping is the correct action because the validation loss increasing after 10 epochs while training loss continues to decrease is a classic sign of overfitting. By monitoring validation loss and halting training when it stops improving (e.g., using a patience parameter), the engineer prevents the model from memorizing noise in the training data, thereby improving generalization. SageMaker's built-in training job features or the `EarlyStopping` callback in frameworks like TensorFlow or PyTorch can implement this directly.

A team is deploying a machine learning model for real-time fraud detection. The model must have inference latency under 10 ms and handle up to 1000 requests per second. The model is a gradient boosting model using XGBoost. Which SageMaker hosting configuration is MOST cost-effective while meeting the requirements?

Use SageMaker Batch Transform with multiple instances

Use a SageMaker Multi-Model Endpoint (MME) on an ml.c5.4xlarge instance with auto scaling

MME allows multiple models to share a container, reducing cost while scaling to meet demand.

Deploy on a single ml.c5.xlarge instance with a real-time endpoint

Deploy separate real-time endpoints for each model on ml.m5.large instances

Why: Option B is correct because a Multi-Model Endpoint (MME) on a single ml.c5.4xlarge instance allows multiple models to share the same endpoint, reducing cost while still meeting the latency (<10 ms) and throughput (1000 req/s) requirements. The ml.c5.4xlarge provides sufficient compute (16 vCPUs, 32 GB memory) for XGBoost inference, and auto scaling ensures capacity adjusts to handle peak load without over-provisioning.

A data scientist is using Amazon SageMaker to train a linear regression model. After training, the scientist notices that the training and validation errors are both low, but the model performs poorly on new test data. What is the MOST likely cause?

There is data leakage from the validation set into the training set

Data leakage artificially inflates performance on validation but fails on true unseen data.

The features are not scaled properly

The model is overfitting the training data

The model has high bias

Why: Option A is correct because data leakage from the validation set into the training set would allow the model to learn patterns that are not present in truly unseen data, leading to artificially low training and validation errors but poor generalization to new test data. In SageMaker, this can occur if the dataset is not properly split before feature engineering or if preprocessing (e.g., scaling or imputation) is applied to the entire dataset before splitting, causing the validation set to influence the training process.

A company is using SageMaker to train a neural network for image classification. The training job is taking too long. The team wants to reduce training time without sacrificing model accuracy. Which approach should they recommend?

Increase the batch size to the maximum possible

Use a GPU-based instance such as ml.p3.2xlarge

GPUs accelerate matrix operations in neural networks, reducing training time.

Use a learning rate scheduler that reduces the learning rate over time

Add more convolutional layers to the model

Why: Option B is correct because GPU-based instances like ml.p3.2xlarge are specifically designed for parallel processing of matrix operations, which are fundamental to neural network training. By offloading compute-intensive tensor operations to GPU cores, training time can be significantly reduced without altering the model architecture or data, thus preserving accuracy.

A machine learning engineer is using SageMaker Automatic Model Tuning (AMT) to optimize hyperparameters for a random forest model. The engineer notices that the tuning job is taking too long and many hyperparameter combinations are being evaluated but not improving the objective metric. Which action should the engineer take to make the tuning more efficient?

Switch the strategy from Bayesian to random search

Use a smaller instance type for each training job

Increase the maximum number of training jobs

Enable early stopping for the tuning job

Early stops poorly performing trials, reducing wasted computation.

Why: Option D is correct because enabling early stopping in SageMaker Automatic Model Tuning (AMT) terminates poorly performing training jobs before they complete, which reduces wasted compute time and speeds up the tuning process. This is especially effective when using Bayesian optimization, as it allows the algorithm to focus on promising hyperparameter regions and avoid evaluating combinations that are unlikely to improve the objective metric.

Want more ML Model Development practice?

All Deployment and Orchestration of ML Workflows questions

Domain 3: Deployment and Orchestration of ML Workflows

A data science team has trained a PyTorch model using Amazon SageMaker and wants to deploy it with a custom inference container that includes a pre-processing step. The team needs to minimize latency and ensure the pre-processing runs only once per request. Which SageMaker real-time inference option should they use?

Deploy the model on a multi-model endpoint and include pre-processing in the model code.

Use a batch transform job with a pre-processing script.

Package pre-processing and inference in a single container with a custom entry point.

Create a SageMaker inference pipeline with two containers: one for pre-processing and one for inference.

An inference pipeline chains containers sequentially, allowing pre-processing to run once per request with low latency.

Why: Option D is correct because a SageMaker inference pipeline allows you to chain two containers in a single endpoint, where the first container handles pre-processing and the second runs inference. This ensures that pre-processing runs exactly once per request, minimizing latency by avoiding redundant processing and keeping the request within the same HTTP connection.

A company is deploying a real-time inference endpoint for a natural language processing model using Amazon SageMaker. The model requires GPU acceleration and must handle variable traffic patterns, including sudden spikes. The team wants to minimize costs while maintaining low latency during spikes. Which endpoint configuration strategy should they use?

Use a single large GPU instance with provisioned concurrency.

Use a serverless endpoint with GPU support.

Use a single GPU instance in multiple Availability Zones with an Application Load Balancer.

Use a multi-model endpoint on a GPU instance with Auto Scaling based on invocation count.

Multi-model endpoints share instances across models, and Auto Scaling adjusts capacity for spikes.

Why: Option D is correct because a multi-model endpoint on a GPU instance with Auto Scaling based on invocation count allows multiple models to share a single GPU, maximizing utilization and reducing cost. Auto Scaling based on invocation count dynamically adjusts the number of instances to handle traffic spikes while maintaining low latency, as it scales out quickly when the invocation count exceeds a threshold.

A machine learning engineer is deploying a model using AWS Lambda for inference. The model is a small scikit-learn classifier with a size of 50 MB. The Lambda function is invoked by an API Gateway REST API. The engineer notices that cold starts are causing high latency. Which action would most effectively reduce cold start latency without increasing costs significantly?

Store the model in Amazon EFS and load it at runtime.

Increase the Lambda function memory to the maximum of 10,240 MB.

Configure provisioned concurrency for the Lambda function.

Provisioned concurrency keeps instances initialized and ready to respond immediately.

Package the model in a container image and deploy using Lambda container support.

Why: Option C is correct because provisioned concurrency pre-initializes the Lambda execution environment, keeping it warm and ready to handle requests immediately. This eliminates the cold start overhead for the first request, directly reducing latency without incurring the ongoing costs of a larger memory allocation or the complexity of EFS/container management.

A company uses Amazon SageMaker to train and deploy machine learning models. The security team requires that all data in transit between the training job and S3 be encrypted, and that no data traverses the public internet. Which configuration should the company use?

Create a VPC with S3 VPC endpoints, attach a VPC-only policy to the SageMaker execution role, and enable KMS encryption for training jobs.

S3 VPC endpoints keep traffic within AWS network, and KMS encrypts data in transit and at rest.

Use an S3 bucket with SSE-S3 encryption and restrict bucket access to a VPC.

Enable default encryption on the S3 bucket and use HTTPS for all SageMaker endpoints.

Create a VPC with a NAT gateway, and configure SageMaker to use the VPC and enforce HTTPS.

Why: Option A is correct because it ensures that data in transit between SageMaker and S3 stays within the AWS network and is encrypted. By creating a VPC with S3 VPC endpoints, traffic uses AWS private IPs and never traverses the public internet. Attaching a VPC-only policy to the SageMaker execution role restricts the training job to only use VPC endpoints, and enabling KMS encryption for the training job ensures data is encrypted in transit (via TLS) and at rest.

A team is deploying a deep learning model on a SageMaker real-time endpoint. The model has high memory requirements, and the team wants to minimize instance cost while ensuring the endpoint can handle up to 10 concurrent requests. They plan to use a single ml.p3.2xlarge instance (8 vCPUs, 61 GB memory). Which SageMaker endpoint configuration will allow the endpoint to handle 10 concurrent requests without errors?

Disable ModelServerWorkers to reduce overhead.

Set the initial instance count to 1 and configure the container to use multiple ModelServerWorkers.

Multiple workers allow the instance to handle multiple requests concurrently, up to the CPU/memory limit.

Set the initial variant weight to 10.

Set the initial instance count to 10 in the production variant.

Why: Option B is correct because SageMaker's ModelServerWorkers (MSWs) allow a single container to handle multiple inference requests concurrently by running multiple worker processes. With 8 vCPUs on ml.p3.2xlarge, configuring multiple MSWs (e.g., 8 workers) enables the endpoint to process up to 10 concurrent requests without errors, as each worker can handle one request at a time. This minimizes cost by using a single instance while meeting concurrency requirements.

A company wants to deploy a machine learning model that was trained on-premises using TensorFlow. The model is a TensorFlow SavedModel. The company uses AWS and wants to minimize operational overhead. Which deployment option meets these requirements?

Deploy the model on Amazon ECS using a custom Docker image.

Deploy the model as an AWS Lambda function with the TensorFlow runtime.

Deploy the model using Amazon SageMaker Studio.

Deploy the model using Amazon SageMaker with a TensorFlow inference container.

SageMaker provides pre-built TensorFlow containers and manages the endpoint, reducing operational overhead.

Why: Amazon SageMaker provides a fully managed TensorFlow inference container that directly supports TensorFlow SavedModel format, enabling deployment without any custom infrastructure management. This minimizes operational overhead compared to self-managed options like ECS or Lambda, as SageMaker handles scaling, load balancing, and model updates automatically.

Want more Deployment and Orchestration of ML Workflows practice?

All ML Solution Monitoring, Maintenance and Security questions

Domain 4: ML Solution Monitoring, Maintenance and Security

A machine learning engineer at a retail company is monitoring a production model that predicts inventory demand. The model's prediction accuracy has dropped significantly over the past week. The engineer checks the model's input data and notices a new product category was introduced with a different distribution. Which concept is most likely causing the performance degradation?

Concept drift

Covariate shift

Covariate shift occurs when the distribution of input features changes over time.

Data leakage

Model decay

Why: B is correct because covariate shift occurs when the distribution of the input features changes while the relationship between features and the target remains the same. In this scenario, the introduction of a new product category with a different distribution alters the input data distribution, causing the model to encounter unseen patterns and degrade in prediction accuracy.

A data science team is using Amazon SageMaker to train and deploy a binary classification model. They want to continuously monitor the model for data drift in production. Which combination of AWS services and SageMaker features should they use to implement automated drift detection with minimal operational overhead?

SageMaker Debugger and Amazon SNS

SageMaker Pipelines and AWS Lambda

SageMaker Clarify and AWS Config

SageMaker Model Monitor and Amazon CloudWatch

SageMaker Model Monitor detects drift and sends metrics to CloudWatch for alerting.

Why: SageMaker Model Monitor is the native SageMaker feature designed specifically for continuously monitoring deployed models for data drift, bias drift, and feature attribution drift. It automatically captures inference requests and responses, computes statistics, and publishes metrics to Amazon CloudWatch, which can trigger alarms for drift detection. This combination provides automated drift detection with minimal operational overhead because it requires no custom infrastructure or manual scheduling.

A financial services company uses a custom container on Amazon SageMaker to serve a fraud detection model. The model's inference latency has recently increased, causing timeouts for some requests. The team reviews the SageMaker logs and finds that the container is consuming more memory than allocated. What should the team do to maintain service quality while ensuring cost-effectiveness?

Decrease the model's batch size to reduce memory usage

Increase the number of instances in the endpoint to distribute the load

Implement an auto-scaling policy based on memory utilization

Change the instance type to a memory-optimized instance, such as r5.large

Switching to a memory-optimized instance provides more memory per instance, resolving the issue cost-effectively.

Why: The correct answer is D because the root cause is that the container is consuming more memory than allocated, leading to increased latency and timeouts. Switching to a memory-optimized instance like r5.large directly addresses the memory constraint by providing more memory per vCPU, which resolves the performance issue without over-provisioning compute resources. This approach is cost-effective because it targets the specific bottleneck (memory) rather than scaling out or changing unrelated parameters.

A machine learning team is building a CI/CD pipeline for model deployment using Amazon SageMaker. They need to ensure that all model artifacts are encrypted at rest and in transit, and that access to the models is controlled via IAM. Which TWO actions should the team take to meet these requirements? (Choose TWO.)

Set the SageMaker model's 'EnableNetworkIsolation' parameter to true

Enable default encryption on the S3 bucket that stores model artifacts

Enable AWS CloudTrail to log all API calls to SageMaker

Configure the SageMaker notebook instance to use a KMS key for encryption

KMS encrypts data at rest in SageMaker.

Use HTTPS endpoints for invoking the SageMaker model

HTTPS encrypts data in transit.

Why: Option D is correct because configuring a SageMaker notebook instance to use a KMS key ensures that data at rest on the notebook's storage volume (e.g., EBS) is encrypted. This directly addresses the requirement for encryption at rest for model artifacts during development. Option E is correct because using HTTPS endpoints for invoking the SageMaker model ensures encryption in transit via TLS, protecting data as it moves between clients and the model endpoint.

A healthcare company deploys a model to predict patient readmission risk. The model was trained on historical data and is now showing signs of concept drift. The team needs to implement a monitoring solution that can detect drift and automatically retrain the model when drift is detected. Which THREE steps should the team take to build this solution? (Choose THREE.)

Deploy SageMaker Model Monitor to track prediction quality over time

Model Monitor can detect drift using ground truth.

Disable the existing endpoint to prevent stale predictions during retraining

Set up a process to collect ground truth labels from patient outcomes

Ground truth is required to detect concept drift.

Manually compare the model's predictions against a holdout validation set each week

Use AWS Lambda to invoke a SageMaker training job when drift is detected

Lambda can automate the retraining trigger.

Why: A is correct because Amazon SageMaker Model Monitor can continuously track prediction quality metrics (e.g., accuracy, precision) over time by analyzing data captured from the endpoint. This allows the team to detect concept drift by comparing live predictions against a baseline, triggering alerts when performance degrades. It provides a managed, automated way to monitor model quality without manual intervention.

A company is using Amazon SageMaker to host a real-time inference endpoint. They want to restrict access to the endpoint to only a specific VPC and require authentication using AWS IAM. Which TWO configuration steps should they take to achieve this? (Choose TWO.)

Configure the endpoint to be deployed in a private subnet within the VPC

Private subnet restricts traffic to within the VPC.

Enable IAM-based authentication for the endpoint

IAM auth ensures only authorized users can invoke the endpoint.

Attach a resource-based policy to the endpoint that denies all traffic except from the VPC

Place the endpoint behind Amazon CloudFront to act as a proxy

Use a public subnet and configure a security group to allow only the company's IP range

Why: Option A is correct because deploying the SageMaker endpoint in a private subnet within the VPC ensures that the endpoint is not publicly accessible and can only be reached from within that VPC. This is achieved by using a VPC interface endpoint (AWS PrivateLink) or by placing the endpoint directly in the VPC, which restricts network traffic to the VPC boundary.

Want more ML Solution Monitoring, Maintenance and Security practice?

Browse all MLA-C01 questions Take a timed practice test

Frequently asked questions

How many questions are on the MLA-C01 exam?

The MLA-C01 exam has 50 questions and must be completed in 130 minutes. The passing score is 700/1000.

What types of questions appear on the MLA-C01 exam?

Scenario-based questions covering exam objectives with detailed answer explanations.

How are MLA-C01 questions organised by domain?

The exam covers 4 domains: Data Preparation for Machine Learning, ML Model Development, Deployment and Orchestration of ML Workflows, ML Solution Monitoring, Maintenance and Security. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual MLA-C01 exam questions?

No. These are original exam-style practice questions written against the official Amazon Web Services MLA-C01 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 65 MLA-C01 questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

Amazon Web Services · Free Practice Questions · Last reviewed May 2026

MLA-C01 Exam Questions and Answers

24real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

50 exam questions

130 min time limit

Pass: 700/1000 / 1000

4 exam domains

Overview Domain Blueprint Study Guide All QuestionsSample by Domain

1. Data Preparation for Machine Learning 2. ML Model Development 3. Deployment and Orchestration of ML Workflows 4. ML Solution Monitoring, Maintenance and Security

Domain 1: Data Preparation for Machine Learning

All Data Preparation for Machine Learning questions

Use AWS Glue ETL to write a custom Python script that imputes missing values with the mean.

Use Amazon SageMaker Data Wrangler to impute missing values using built-in transforms.

Data Wrangler provides efficient, scalable, and visual data preparation without custom code.

Use pandas in a SageMaker notebook to impute missing values with the median.

Remove all rows with missing values from the dataset.

Use Amazon Athena to convert the data to JSON format and store it in S3.

Use AWS Glue DynamicFrame to repartition the data and write it as Parquet.

DynamicFrame supports efficient partitioning and columnar format conversion.

Use AWS Glue to convert the data to Apache Hive format.

Use Apache Spark DataFrame to write the data as CSV with Snappy compression.

Apply data augmentation to the majority class by adding noise.

Apply Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic samples for the minority class.

SMOTE creates synthetic samples, balancing the dataset without losing data.

Use a weighted loss function during training to penalize misclassifications of the minority class.

Apply random under-sampling to reduce the majority class to match the minority class size.

Keep the column as a feature because it uniquely identifies each customer.

Use the column as the target variable.

Remove the column from the feature set.

Removing unique identifiers prevents overfitting and is standard practice.

Encode the column using one-hot encoding.

Use the G.1X worker type, which provides more memory per worker compared to the Standard worker type.

G.1X offers more memory, reducing memory-related bottlenecks without increasing DPU count.

Use partition pruning on the source data to reduce the amount of data processed.

Switch the output format from Parquet to CSV to reduce processing overhead.

Use a larger instance type for the Glue job by increasing the number of DPUs.

Apply hash encoding to map categories to a fixed number of buckets.

Apply target encoding (mean encoding) to the high-cardinality features.

Target encoding reduces dimensionality and captures target-related information.

Apply one-hot encoding to all categorical features.

Apply label encoding to assign integer values to each category.

Want more Data Preparation for Machine Learning practice?

All ML Model Development questions

Domain 2: ML Model Development

Log loss

Area under the ROC curve (AUC)

F1 score

F1 score combines precision and recall, making it suitable for imbalanced classes when both matter.

Accuracy

Add more layers to the model

Use early stopping with validation loss monitoring

Early stopping halts training when validation loss stops decreasing, reducing overfitting.

Increase the learning rate

Decrease the batch size

Use SageMaker Batch Transform with multiple instances

Use a SageMaker Multi-Model Endpoint (MME) on an ml.c5.4xlarge instance with auto scaling

MME allows multiple models to share a container, reducing cost while scaling to meet demand.

Deploy on a single ml.c5.xlarge instance with a real-time endpoint

Deploy separate real-time endpoints for each model on ml.m5.large instances

There is data leakage from the validation set into the training set

Data leakage artificially inflates performance on validation but fails on true unseen data.

The features are not scaled properly

The model is overfitting the training data

The model has high bias

Increase the batch size to the maximum possible

Use a GPU-based instance such as ml.p3.2xlarge

GPUs accelerate matrix operations in neural networks, reducing training time.

Use a learning rate scheduler that reduces the learning rate over time

Add more convolutional layers to the model

Switch the strategy from Bayesian to random search

Use a smaller instance type for each training job

Increase the maximum number of training jobs

Enable early stopping for the tuning job

Early stops poorly performing trials, reducing wasted computation.

Want more ML Model Development practice?

All Deployment and Orchestration of ML Workflows questions

Domain 3: Deployment and Orchestration of ML Workflows

Deploy the model on a multi-model endpoint and include pre-processing in the model code.

Use a batch transform job with a pre-processing script.

Package pre-processing and inference in a single container with a custom entry point.

Create a SageMaker inference pipeline with two containers: one for pre-processing and one for inference.

An inference pipeline chains containers sequentially, allowing pre-processing to run once per request with low latency.

Use a single large GPU instance with provisioned concurrency.

Use a serverless endpoint with GPU support.

Use a single GPU instance in multiple Availability Zones with an Application Load Balancer.

Use a multi-model endpoint on a GPU instance with Auto Scaling based on invocation count.

Multi-model endpoints share instances across models, and Auto Scaling adjusts capacity for spikes.

Store the model in Amazon EFS and load it at runtime.

Increase the Lambda function memory to the maximum of 10,240 MB.

Configure provisioned concurrency for the Lambda function.

Provisioned concurrency keeps instances initialized and ready to respond immediately.

Package the model in a container image and deploy using Lambda container support.

Create a VPC with S3 VPC endpoints, attach a VPC-only policy to the SageMaker execution role, and enable KMS encryption for training jobs.

S3 VPC endpoints keep traffic within AWS network, and KMS encrypts data in transit and at rest.

Use an S3 bucket with SSE-S3 encryption and restrict bucket access to a VPC.

Enable default encryption on the S3 bucket and use HTTPS for all SageMaker endpoints.

Create a VPC with a NAT gateway, and configure SageMaker to use the VPC and enforce HTTPS.

Disable ModelServerWorkers to reduce overhead.

Set the initial instance count to 1 and configure the container to use multiple ModelServerWorkers.

Multiple workers allow the instance to handle multiple requests concurrently, up to the CPU/memory limit.

Set the initial variant weight to 10.

Set the initial instance count to 10 in the production variant.

Deploy the model on Amazon ECS using a custom Docker image.

Deploy the model as an AWS Lambda function with the TensorFlow runtime.

Deploy the model using Amazon SageMaker Studio.

Deploy the model using Amazon SageMaker with a TensorFlow inference container.

SageMaker provides pre-built TensorFlow containers and manages the endpoint, reducing operational overhead.

Want more Deployment and Orchestration of ML Workflows practice?

All ML Solution Monitoring, Maintenance and Security questions

Domain 4: ML Solution Monitoring, Maintenance and Security

Concept drift

Covariate shift

Covariate shift occurs when the distribution of input features changes over time.

Data leakage

Model decay

SageMaker Debugger and Amazon SNS

SageMaker Pipelines and AWS Lambda

SageMaker Clarify and AWS Config

SageMaker Model Monitor and Amazon CloudWatch

SageMaker Model Monitor detects drift and sends metrics to CloudWatch for alerting.

Decrease the model's batch size to reduce memory usage

Increase the number of instances in the endpoint to distribute the load

Implement an auto-scaling policy based on memory utilization

Change the instance type to a memory-optimized instance, such as r5.large

Switching to a memory-optimized instance provides more memory per instance, resolving the issue cost-effectively.

Set the SageMaker model's 'EnableNetworkIsolation' parameter to true

Enable default encryption on the S3 bucket that stores model artifacts

Enable AWS CloudTrail to log all API calls to SageMaker

Configure the SageMaker notebook instance to use a KMS key for encryption

KMS encrypts data at rest in SageMaker.

Use HTTPS endpoints for invoking the SageMaker model

HTTPS encrypts data in transit.

Deploy SageMaker Model Monitor to track prediction quality over time

Model Monitor can detect drift using ground truth.

Disable the existing endpoint to prevent stale predictions during retraining

Set up a process to collect ground truth labels from patient outcomes

Ground truth is required to detect concept drift.

Manually compare the model's predictions against a holdout validation set each week

Use AWS Lambda to invoke a SageMaker training job when drift is detected

Lambda can automate the retraining trigger.

Configure the endpoint to be deployed in a private subnet within the VPC

Private subnet restricts traffic to within the VPC.

Enable IAM-based authentication for the endpoint

IAM auth ensures only authorized users can invoke the endpoint.

Attach a resource-based policy to the endpoint that denies all traffic except from the VPC

Place the endpoint behind Amazon CloudFront to act as a proxy

Use a public subnet and configure a security group to allow only the company's IP range

Want more ML Solution Monitoring, Maintenance and Security practice?