MLA-C01 Practice Test 37 — 15 Questions

Question 1

A data engineer is building a data pipeline for a machine learning model that requires both structured and unstructured data. The structured data (customer demographics) is in Amazon RDS, and the unstructured data (customer support chat logs) is in Amazon S3 as JSON files. The engineer needs to combine these datasets into a single training dataset stored in S3 in Parquet format. They must also perform feature engineering such as text vectorization on the chat logs. The pipeline should be serverless and cost-effective. Which approach should they use?

Accepted Answer

Use AWS Glue ETL with a Spark script that reads from RDS (via JDBC) and S3, performs transformations, and writes Parquet.. AWS Glue ETL with a Spark script is the correct choice because it natively supports reading from both Amazon RDS (via JDBC) and Amazon S3 (JSON), performing complex transformations like text vectorization, and writing the output as Parquet. Glue is serverless, cost-effective (pay per DPU-hour), and fully managed, making it ideal for batch ETL pipelines that combine structured and unstructured data for ML training.

Answer

Use a SageMaker Processing job with a custom Python script that reads from both sources and writes to S3.

Answer

Use Amazon Athena to join the data from RDS and S3, then export the results as Parquet.

Answer

Use Amazon Kinesis Data Analytics to read from RDS and S3 and produce a continuous stream of processed data.

Question 2

A financial services company is deploying a credit risk model using SageMaker. They require that the model always uses the latest approved version from the Model Registry. They also need to maintain a detailed audit trail of all model version transitions (e.g., from PendingApproval to Approved). The deployment should be fully automated and must roll back immediately if the new model's error rate exceeds the old model's error rate by more than 2% during a canary deployment. Which solution meets these requirements with the least custom code?

Accepted Answer

Use SageMaker Pipelines with a conditional step to deploy the model after approval, and include a canary deployment using a weight endpoint variant. Use CloudWatch alarms to trigger automatic rollback.. Option B is correct because SageMaker Pipelines natively supports conditional execution and canary deployments using endpoint weight variants, which together enable automated rollback triggered by CloudWatch alarms when the error rate exceeds the 2% threshold. This approach requires minimal custom code by leveraging built-in SageMaker capabilities for model registry integration, deployment, and monitoring.

Answer

Use AWS CodePipeline with a deployment action that uses AWS CloudFormation to update the endpoint. Add a manual approval step for rollback.

Answer

Create an AWS Lambda function that is triggered by Model Registry events, deploys the model to a staging endpoint, runs a canary test, and if successful, updates the production endpoint.

Answer

Use an Amazon EKS cluster with a custom inference container and use ArgoCD for automated deployments.

Question 3

A machine learning team is deploying a fraud detection model using SageMaker. They use the SageMaker Model Registry to track model versions. They want to automatically deploy the latest approved model to a production endpoint whenever a new model version is approved. The team uses a CI/CD pipeline with AWS CodePipeline. The pipeline currently includes a source stage (S3), a build stage (CodeBuild), and a deploy stage (manual approval). They want to automate the deployment of approved models. Which solution will meet these requirements with the least operational overhead?

Accepted Answer

Configure an EventBridge rule to trigger a CodePipeline execution when the model approval status changes.. Option C is correct because it directly integrates SageMaker Model Registry approval events with CodePipeline via EventBridge, enabling fully automated deployment of the latest approved model to a production endpoint with minimal operational overhead. This approach avoids custom code or additional pipeline stages, leveraging native AWS event-driven architecture to trigger the pipeline only when a model version is approved.

Answer

Add a custom action to CodePipeline that uses a SageMaker deployment step.

Answer

Create a Lambda function that triggers on Model Registry approval events and updates the endpoint using the boto3 SDK.

Answer

Use SageMaker Pipelines to deploy the model directly upon training completion.

Question 4

A retail company uses SageMaker to train a multi-class image classification model with a custom ResNet-50 implemented in TensorFlow. The training data is 500 GB of images stored in S3. The data scientist uses a ml.p3.2xlarge instance with a single GPU. The training takes 10 hours per epoch, and the model does not converge after 5 epochs. The scientist needs to accelerate training and improve model accuracy. The current implementation loads images individually from S3 using TensorFlow's tf.data API. The scientist also notices high I/O wait time. Which combination of actions should the scientist take? (Assume the scientist is aware of best practices.) The answer is a single choice from A-D.

Accepted Answer

Use SageMaker Pipe mode for data ingestion and upgrade to a ml.p3.8xlarge instance.. Option B is correct because using SageMaker Pipe mode streams data directly from S3 to the training container, reducing I/O bottlenecks. Additionally, switching to a multi-GPU instance like ml.p3.8xlarge speeds up computation. Option A is wrong because increasing epochs does not address I/O or speed. Option C is wrong because batch transform is for inference. Option D is wrong because recordIO is not natively supported by TensorFlow tf.data without conversion, and EFS adds network latency.

Answer

Increase the number of epochs to 20 and enable early stopping with patience 5.

Answer

Convert images to RecordIO format and store them on Amazon EFS for faster access.

Answer

Deploy the model on a SageMaker endpoint and use batch transform for offline predictions.

Question 5

A team is using SageMaker to run a large-scale distributed training job for a language model. They are using SageMaker's Pipe mode to stream data from S3 to reduce IO. They observe that the training throughput is lower than expected, and the CPU utilization is high while GPU utilization is low. The training script uses PyTorch's DataLoader with num_workers=0. The data preprocessing is minimal. Which change is most likely to improve GPU utilization?

Accepted Answer

Increase the number of data loading workers (num_workers).. Option D is correct because setting num_workers=0 forces the main process to load data, causing a CPU bottleneck. Increasing num_workers parallelizes data loading, reducing GPU idle time. Option A is wrong because adding GPUs does not address the data loading bottleneck. Option B is wrong because more vCPUs without more workers does not help. Option C is wrong because switching to File mode would increase IO overhead, worsening the problem.

Answer

Use a larger instance with more vCPUs.

Answer

Increase the number of GPUs per instance.

Answer

Switch from Pipe mode to File mode.

Question 6

A financial services company is training a large natural language processing (NLP) model using PyTorch on a SageMaker distributed training job. The cluster consists of 4 ml.p3.16xlarge instances (8 GPUs each). The training job runs successfully but takes 72 hours, exceeding the allotted 48-hour window. The team must reduce training time without sacrificing model quality. The model architecture has 1.5 billion parameters and currently uses the SageMaker data parallel library with Horovod for all-reduce. Observing CloudWatch metrics, the team notices that GPU utilization averages only 45% and network throughput is near maximum. Which action will most effectively reduce training time?

Accepted Answer

Switch to SageMaker model parallel library with pipeline parallelism to reduce communication overhead.. Option C is correct because with low GPU utilization and high network bandwidth consumption, the bottleneck is likely communication overhead. Model parallelism splits the model across GPUs, reducing the need for frequent all-reduce of large gradients, thus improving GPU utilization. Option A is wrong because increasing instance count would increase communication overhead and likely not improve utilization. Option B is wrong because data parallelism already uses GPUs; increasing batch size may cause memory overflow. Option D is wrong because enabling EFA improves network, but network is already near maximum; the bottleneck is not network speed but the frequency of communication.

Answer

Enable Elastic Fabric Adapter (EFA) for faster inter-node connectivity.

Answer

Increase the batch size to improve GPU utilization.

Answer

Increase the number of instances from 4 to 8 to add more GPUs.

Question 7

A financial services company uses SageMaker to train a fraud detection model. They have imbalanced data with 1% fraud. They trained a Gradient Boosting model using SMOTE for oversampling and achieved 99% accuracy on the test set, but the fraud recall is only 10%. The data scientist is concerned about the model's performance. Which change is most likely to improve fraud recall without sacrificing too much precision?

Accepted Answer

Increase the weight of the fraud class in the loss function.. Option B is correct because increasing the weight of the fraud class in the loss function penalizes misclassifications of fraud more, improving recall. Option A is wrong because reducing the SMOTE ratio (i.e., less oversampling) would likely reduce recall. Option C is wrong because using F1-score as a metric does not change the training objective. Option D is wrong because random undersampling may lose important majority class data, reducing precision.

Answer

Use a different evaluation metric like F1-score during training.

Answer

Reduce the SMOTE sampling ratio to create more synthetic samples.

Answer

Use a random undersampling of the majority class.

Question 8

An ML team is developing a regression model using Amazon SageMaker. They have a 100 GB CSV dataset stored in Amazon S3. The data is contained in a single large file. They launch a SageMaker training job with an ml.p3.8xlarge instance using a custom Docker container. The training script loads the data using pandas' read_csv from S3 directly. The team observes that the training job takes over 24 hours, and CloudWatch metrics show: GPU utilization is consistently above 90%, but CPU utilization is below 30%. Network I/O is moderate, and disk I/O is low. The team has already tried switching to a larger instance type (ml.p3.16xlarge) with no significant improvement. They need to reduce training time. Which action is MOST likely to achieve this?

Accepted Answer

Split the CSV file into multiple smaller files (e.g., 100 MB each) and update the training script to read from a list of files in S3.. The bottleneck is data loading. The single large CSV file prevents parallelism; SageMaker's Pipe mode streams data directly to the algorithm, but custom containers must support it. However, a simpler and effective approach is to split the data into multiple smaller files, enabling SageMaker's distributed data loading across instances and improving I/O parallelism. Increasing instance count with single file doesn't help because each instance still reads the same file. Changing instance type already tried. Spot instances don't improve speed. EBS volume doesn't matter.

Answer

Use SageMaker Pipe Mode to stream data directly from S3 to the algorithm, bypassing the local file system.

Answer

Use Amazon SageMaker Managed Spot Training to reduce cost, then use the savings to rent a larger instance.

Answer

Increase the number of training instances by using a distributed training configuration with Horovod.

Question 9

A gaming company uses a SageMaker endpoint for real-time player churn prediction. The model is updated weekly. After a recent retraining, the team notices that the endpoint's predicted probabilities for churn have shifted dramatically: the average predicted probability dropped from 0.3 to 0.05. The team suspects concept drift (the relationship between features and target changed) rather than data drift. They have SageMaker Model Monitor set up for data drift and quality metrics, but not for bias or explainability. The team needs to confirm concept drift and take corrective action. Which approach should the team take FIRST?

Accepted Answer

Configure SageMaker Model Monitor's model quality monitoring to compare predictions against actual outcomes collected from a week of production traffic. To detect concept drift, the team needs to compare the model's predictions against actual observed outcomes (ground truth). SageMaker Model Monitor's quality monitoring can track prediction accuracy over time if ground truth is provided. Option D (set up Model Monitor's model quality monitoring) is the correct first step. Option A (retrain with more recent data) might help but does not confirm drift. Option B (data drift monitoring) checks feature distribution, not concept drift. Option C (use Clarify for SHAP values) is for feature importance, not drift detection.

Answer

Immediately retrain the model using the most recent month of data and redeploy to the endpoint

Answer

Use Amazon SageMaker Clarify to compute SHAP values and understand which features are driving the new predictions

Answer

Investigate data drift by reviewing the Model Monitor feature distribution constraints and comparing recent input data to the baseline

Question 10

A startup is using SageMaker to train a deep learning model. They use GPU instances for training. The training job takes about 8 hours. The team notices that sometimes the training job fails with an error message indicating that the instance was terminated due to Amazon EBS volume underprovisioned. The team is using the default EBS volume size for the training instance. They want to avoid this error without over-provisioning. What should they do?

Accepted Answer

Specify a larger EBS volume size in the training job's resource configuration.. Option B is correct because increasing the EBS volume size to accommodate the dataset and intermediate checkpoint files prevents the volume full error. Option A (use compute-optimized instances) doesn't fix storage. Option C (Amazon EFS) is a file system but may add latency and is not directly attached to training instances; requires mount. Option D (FSx for Lustre) is high-performance but complex and overkill; also requires separate setup.

Answer

Mount an Amazon EFS file system to the training instance and store all data there.

Answer

Switch to compute-optimized (C5) instances to reduce storage usage.

Answer

Configure the training job to use Amazon FSx for Lustre as a scratch file system.

Question 11

A media company uses SageMaker endpoints to serve a model that predicts video engagement. They have two production variants: Variant A (ml.c5.large) for regular traffic and Variant B (ml.c5.xlarge) for burst traffic. They use weighted routing (90% to A, 10% to B). Recently, during peak hours, Variant A's latency increase causes many requests to time out. The metrics show that both variants are under similar CPU load, but the number of concurrent requests to Variant A is very high. The team wants to ensure that burst traffic is handled properly without manual intervention. What should they do?

Accepted Answer

Configure Application Auto Scaling for each variant with a target tracking scaling policy based on the number of concurrent requests per instance.. Option B is correct because changing to target tracking scaling based on the number of concurrent requests (or InvocationsPerInstance) ensures each variant scales based on its load. Option A (swap weights) doesn't fix scaling. Option C (p99 latency alarm) might trigger too late. Option D (separate endpoint) is not necessary.

Answer

Increase the traffic weight to Variant B to 70% and reduce Variant A to 30%.

Answer

Set a CloudWatch alarm on Variant A's p99 latency and trigger a step scaling policy to add instances.

Answer

Create a separate endpoint for burst traffic and route peak traffic to it via DNS.

Question 12

A financial services company uses an Amazon SageMaker endpoint for real-time credit scoring. The endpoint is deployed with an ml.c5.2xlarge instance. Recently, the data science team has received complaints from users about slow response times. The team monitors the endpoint using CloudWatch metrics. They observe that the InvocationsPerSecond metric averages 50, the ModelLatency metric averages 200 milliseconds, and the CPUUtilization metric averages 95%. The team has also noticed that the endpoint occasionally returns HTTP 503 (Service Unavailable) errors during peak hours. The team needs to reduce latency and eliminate 503 errors while minimizing cost increase. Which solution should the team implement?

Accepted Answer

Create a SageMaker endpoint with multiple instances behind a load balancer and configure automatic scaling based on CPUUtilization or InvocationsPerSecond. CPUUtilization at 95% indicates that the instance is overloaded, causing high latency and 503 errors. Scaling out (adding more instances) will distribute the load and reduce latency, and using automatic scaling ensures that the number of instances adjusts to demand, minimizing cost by scaling down when traffic is low. Option A (larger instance) may not be as cost-effective as scaling out, and Option B (enable data capture) would not help latency. Option D (increase timeout) does not address the root cause of overloading.

Answer

Enable SageMaker Data Capture to collect inference data for later analysis to identify slow requests

Answer

Replace the endpoint instance type with a more powerful compute-optimized instance, such as ml.c5.4xlarge

Answer

Increase the endpoint invocation timeout from 60 seconds to 120 seconds in the application configuration

Question 13

A healthcare startup has deployed a machine learning model on Amazon SageMaker that predicts patient readmission risks. The model uses sensitive health data stored in an S3 bucket encrypted with AWS KMS. The SageMaker endpoint is configured with an IAM role that has the following policy attached: {    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Action": "s3:*",            "Resource": "arn:aws:s3:::healthcare-data/*",            "Condition": {                "Bool": {                    "aws:SecureTransport": "true"                }            }        },        {            "Effect": "Allow",            "Action": "kms:Decrypt",            "Resource": "*"        }    ] }. During a security audit, the team discovers that the IAM role's KMS permission is too permissive because it allows decryption of any KMS key in the account. The team needs to modify the policy to follow the principle of least privilege while still allowing the SageMaker endpoint to read the encrypted data. Which modification should the team make?

Accepted Answer

Change the KMS statement to: "Action": "kms:Decrypt", "Resource": "arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab". The current policy allows kms:Decrypt on any KMS key (*). To follow least privilege, the team should restrict the Resource to the specific KMS key used to encrypt the S3 bucket. Option A (change the Action to kms:Decrypt and restrict Resource to the specific key ARN) is correct. Option B (remove the KMS statement entirely) would break the endpoint because it cannot decrypt the data. Option C (add a condition for specific encryption context) is good practice but still allows decryption of any key if condition is met, not least privilege. Option D (use kms:DescribeKey instead of kms:Decrypt) does not allow decryption.

Answer

Change the KMS statement Action to "kms:DescribeKey" instead of "kms:Decrypt"

Answer

Add a condition to the KMS statement: "Condition": { "StringEquals": { "kms:ViaService": "s3.us-east-1.amazonaws.com" } }

Answer

Remove the KMS statement entirely, as S3 bucket policies with SSE-KMS do not require KMS permissions

Question 14

A healthcare company uses Amazon SageMaker to deploy a real-time inference endpoint for a diagnostic model. The endpoint is configured with a single ml.p3.2xlarge instance. The model processes patient data and returns a risk score. Recently, the endpoint has been experiencing intermittent 504 errors along with increased latency. The team uses Amazon CloudWatch to monitor the endpoint's InvocationsPerInstance and ModelLatency metrics. They observe that InvocationsPerInstance is well below the throttling threshold, but ModelLatency shows periodic spikes lasting 5-10 seconds. The endpoint's CPU utilization remains below 60%, but memory utilization occasionally spikes to 90% during those spikes. The team has checked the inference code and found no obvious memory leaks or performance bottlenecks in the custom logic. The model itself is a deep neural network hosted using Apache MXNet. The team suspects that the issue might be related to resource contention or an external dependency. What should the team do FIRST to diagnose and resolve the issue?

Accepted Answer

Enable SageMaker Debugger rules and profiling to monitor memory and CPU utilization at a fine-grained level during inference.. Option B is correct because the symptoms point to a possible memory contention issue, and enabling detailed profiling for memory and CPU can identify the root cause. Option A is wrong because increasing instance size might mask the problem without identifying it. Option C is wrong because request batching can increase memory usage and may worsen the issue. Option D is wrong because Model Monitor is for data drift, not performance diagnostics.

Answer

Implement request batching to increase throughput and reduce the number of inference requests.

Answer

Increase the instance type to a more memory-intensive instance like ml.p3.8xlarge to handle memory spikes.

Answer

Set up SageMaker Model Monitor to track data drift and model quality metrics.

Question 15

An e-commerce company uses a SageMaker endpoint to serve a product recommendation model. The model is retrained every month using batch transforms. The ML team has set up a retraining pipeline using SageMaker Processing jobs and Step Functions. Recently, the Step Functions workflow has been failing at the retraining step with an error: 'AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/RetrainingRole/abc123 is not authorized to perform: s3:GetObject on resource: arn:aws:s3:::training-data/processed/latest.parquet'. The team confirms that the S3 bucket exists and the object is present. The retraining role has the following policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::training-data/*" } ] }. The team also verifies that the bucket policy does not explicitly deny access. What is the MOST likely cause of the AccessDenied error?

Accepted Answer

The training data object uses server-side encryption with AWS KMS (SSE-KMS), and the retraining role lacks kms:Decrypt permission on the KMS key. The error indicates that the retraining role is not authorized to GetObject on the specific object. Even though the policy allows 'arn:aws:s3:::training-data/*', if the object is encrypted with SSE-KMS, the role also needs kms:Decrypt permission on the KMS key. The bucket policy might also require encryption. Option B is the most likely cause. Option A (wrong region) would give a different error. Option C (lack of S3 bucket policy) is not the issue if there is no explicit deny. Option D (path typo) would result in a 404 Not Found error, not AccessDenied.

Answer

The Step Functions execution role does not have permission to invoke the SageMaker Processing job

Answer

The path in the error message is misspelled; the actual object is at a different key

Answer

The S3 bucket has a bucket policy that denies access to the retraining role based on a condition like aws:SourceIp