A startup has deployed a Node.js application on Cloud Run. They are seeing a higher-than-expected bill for Cloud Run usage. The application is accessed by users worldwide, and traffic patterns show occasional spikes. They want to reduce costs while maintaining performance. They currently have no concurrency management and use the default Cloud Run settings. What should they do first?
Increasing concurrency allows a single instance to handle multiple requests, reducing the number of instances needed and lowering costs.
Why this answer
Option C is correct because Cloud Run bills for CPU time during request processing, and the default setting allows unlimited concurrent requests per container instance. By setting a maximum concurrency, you prevent a single instance from being overwhelmed during traffic spikes, which reduces the number of instances needed and avoids over-provisioning. This directly lowers costs while maintaining performance by ensuring each instance handles only its optimal load.
Exam trap
The trap here is that candidates confuse reducing memory limits (Option D) with concurrency management, but memory reduction does not control the number of simultaneous requests hitting an instance, which is the root cause of over-provisioning in serverless billing.
How to eliminate wrong answers
Option A is wrong because implementing Cloud CDN adds caching for static content but does not address the core issue of over-provisioning from unlimited concurrency; it also incurs additional CDN costs. Option B is wrong because moving to Compute Engine with a smaller machine type abandons Cloud Run's serverless scaling and introduces fixed costs, manual scaling, and potential performance degradation during spikes. Option D is wrong because reducing the container memory limit to the minimum required may cause out-of-memory errors or increased cold starts, and it does not control the number of concurrent requests per instance, which is the primary driver of over-provisioning.