MLA-C01 Deployment and Orchestration of ML Workflows • Set 2
MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 2 — 15 questions with explanations. Free, no signup.
A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?