MLA-C01 • Practice Test 36
Free MLA-C01 practice test — 15 questions with explanations. Set 36. No signup required.
A financial services company is deploying a real-time fraud detection model using Amazon SageMaker. The model is a gradient boosting model (XGBoost) trained on historical transaction data. The inference endpoint uses an ml.m5.2xlarge instance with a single variant. Recently, the company has experienced a 3x increase in transaction volume during peak hours, causing inference latency to exceed the 200ms SLA. The data science team has already optimized the model by reducing the number of trees and feature set, but the latency remains high during spikes. The team considers using SageMaker's built-in scaling policies. They currently have a single endpoint with one production variant. The team wants to maintain low latency without over-provisioning resources. They have ruled out model changes. Which approach should the team take?