MLA-C01 Deployment and Orchestration of ML Workflows • Set 8
MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 8 — 15 questions with explanations. Free, no signup.
Your team manages a SageMaker real-time endpoint for a financial services application that requires low latency for fraud detection. The model is a 1 GB XGBoost model. The endpoint is deployed on two ml.m5.xlarge instances with target tracking auto-scaling based on average CPU utilization at 70%. During peak hours, the endpoint receives a sudden burst of traffic that increases from 500 requests per second to 2000 requests per second within 30 seconds. Many requests start failing with 503 errors. The CPU utilization metric shows that the instances are at 90% before the scaling policy launches new instances. However, by the time the new instances are added (approximately 3 minutes), the burst has subsided. You need to prevent these failures during future bursts while keeping costs reasonable. Which action would be MOST effective?