A company uses Amazon SageMaker to deploy a real-time inference endpoint. They notice increased latency in predictions during peak hours. Which should they investigate first to address the issue?
Auto-scaling policy determines how instances are added/removed; insufficient capacity causes high latency.
Why this answer
Option B is correct because latency typically increases when the endpoint is under-provisioned; auto-scaling policies control scaling behavior. Option A is about training, not inference. Option C is unrelated to inference latency.
Option D may affect latency but is not the first thing to investigate.