MLS-C01 • Practice Test 21
Free MLS-C01 practice test — 15 questions with explanations. Set 21. No signup required.
A company is building a real-time fraud detection system using Amazon SageMaker. The model is a gradient boosting classifier trained on 500 GB of transactional data. The inference endpoint is deployed as a SageMaker real-time endpoint using an ml.c5.9xlarge instance. The model is serialized using the native format of the framework (XGBoost). The endpoint receives about 100 requests per second with an average payload size of 10 KB. The company observes that the endpoint's latency is around 200 ms, but they need under 100 ms. The data scientist profiles the endpoint and finds that the model inference time is 50 ms, but the remaining time is spent on data preprocessing and serialization/deserialization. The preprocessing involves converting JSON input to a NumPy array and then to a DMatrix. Which action is most likely to reduce latency to meet the requirement?