MLA-C01 Deployment and Orchestration of ML Workflows • Set 2
MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 2 — 15 questions with explanations. Free, no signup.
A team is deploying a machine learning model using Amazon SageMaker. They need to serve predictions with sub-100ms latency for a real-time application. The model is a large ensemble that requires 4 GB of memory. The team expects traffic of 100 requests per second initially, but it may double during peak hours. Which instance type and deployment configuration should the team choose to minimize cost while meeting the latency requirement?