A company runs a critical web app on Azure App Service that must handle traffic spikes without downtime. They set up autoscaling rules based on CPU percentage. However, during a spike, the app becomes unresponsive before new instances are added. What should they do?
Pre-warming ensures instances are ready before the spike.
Why this answer
Option C is correct because pre-warming instances with a scheduled scaling rule ensures that additional instances are already running and ready to handle traffic before the CPU spike occurs. This avoids the cold-start delay inherent in reactive autoscaling, where new instances take time to provision and initialize, causing unresponsiveness during rapid spikes.
Exam trap
The trap here is that candidates assume reactive autoscaling (e.g., lowering thresholds or changing metrics) can solve latency issues, but they overlook the fundamental cold-start delay that requires proactive instance pre-warming.
How to eliminate wrong answers
Option A is wrong because switching to memory-based autoscaling does not address the fundamental issue of reactive scaling latency; the app would still become unresponsive while waiting for new instances to start. Option B is wrong because decreasing the scale-in cooldown period affects how quickly instances are removed after a scale-out, not how fast new instances are added during a spike, so it does not prevent the initial unresponsiveness. Option D is wrong because increasing the CPU percentage threshold for scale-out would delay scaling even further, making the app more likely to become unresponsive during a spike.