MLA-C01 Deployment and Orchestration of ML Workflows • Set 7
MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 7 — 15 questions with explanations. Free, no signup.
A data science team needs to deploy a trained PyTorch model for real-time inference with sub-100ms latency. The model fits on a single GPU. Which SageMaker inference option is MOST cost-effective while meeting the latency requirement?