ACEChapter 43 of 101Objective 4.2

GCP Quotas and Resource Limits

This chapter covers Google Cloud Platform (GCP) quotas and resource limits—a critical topic for the Associate Cloud Engineer (ACE) exam. Understanding how quotas work, how to monitor them, and how to request increases is essential for designing resilient, scalable applications. Quotas appear in approximately 5-10% of exam questions, often as scenario-based items where you must choose the correct action when a resource creation fails due to quota limits. This chapter will explain the different types of quotas, their default values, how they are enforced, and best practices for managing them.

25 min read

Intermediate

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Quotas as Utility Metered Connections

Imagine you live in an apartment building with a single water pipe coming from the city. The building has a master water meter that measures total usage. Each apartment has its own sub-meter, but the building's contract with the city sets a maximum flow rate (quota) for the entire building. If all apartments turn on their taps at once, the flow drops or the building's main breaker trips. In GCP, each project is like an apartment building, and the city water supply is Google's infrastructure. The water meter is the quota—a hard limit on how many resources (e.g., VM instances, API calls) can be consumed. The building manager (IAM) can allocate sub-limits to different apartments (regions) but cannot exceed the master quota. If you try to turn on too many taps (create too many resources), the water stops (quota exceeded error). You can request a quota increase from the city (Google) by submitting a form, but it takes time to approve. Just as the city monitors overall water pressure to prevent pipe bursts, Google enforces quotas to ensure fair resource distribution and prevent one project from overwhelming the system. Unlike a soft limit like a budget alert, a quota is a hard stop—like a locked valve.

How It Actually Works

What Are GCP Quotas and Why Do They Exist?

Quotas in Google Cloud are hard limits on the amount of a particular resource that can be consumed within a project. They exist to prevent runaway consumption, ensure fair resource distribution among tenants, and protect Google's infrastructure from abuse. Quotas are enforced at the project level and are not tied to billing accounts—even if you have unlimited budget, you cannot exceed your quota without an increase request.

There are two main types of quotas: rate quotas and allocation quotas. Rate quotas limit the rate at which API requests can be made (e.g., 1000 requests per minute per project). Allocation quotas limit the total number of resources that can exist at any time (e.g., max 100 VMs per region per project). The ACE exam focuses heavily on allocation quotas because they directly affect resource creation.

How Quotas Work Internally

When you issue a command to create a resource (e.g., a Compute Engine VM), the request goes to the relevant API. Before the resource is provisioned, the API checks the current usage against the quota. If the new resource would cause the quota to be exceeded, the API returns an error (HTTP 429 Too Many Requests for rate quotas, or a specific error like QUOTA_EXCEEDED for allocation quotas). The request is never executed. This check happens synchronously and is atomic—there are no race conditions that could allow temporary overuse.

Quotas are enforced per project, per region, and per resource type. For example, Compute Engine has separate quotas for CPUs, disks, and static IPs in each region. You can view quotas in the Cloud Console under IAM & Admin > Quotas, or via the gcloud compute regions describe command.

Default Quota Values (Key for the Exam)

The ACE exam expects you to know approximate default quotas for common resources. Note that these can change over time, but as of the current exam:

Compute Engine CPUs: 24 CPUs per region (can be increased up to hundreds).

Compute Engine Persistent Disk: 10 TB per project (sum of all disk types).

Static IP addresses: 20 per region (including in-use and reserved).

VPC networks: 5 per project.

Firewall rules: 100 per network.

Subnets: 100 per network.

Routes: 100 per network.

Load balancer forwarding rules: 20 per project.

Cloud Storage buckets: 100 per project (but you can increase).

Cloud Storage objects per bucket: unlimited (soft limit).

BigQuery datasets: 100 per project.

BigQuery tables per dataset: 1000.

Cloud Functions: 1000 per project.

App Engine services: 5 per project.

IAM roles per project: 300.

Service accounts per project: 100.

API requests per 100 seconds per user per project: varies by API, but typically 20,000 for Compute Engine.

These defaults are important because exam questions often present scenarios where a user hits a quota limit. You must identify which quota is exceeded and how to resolve it.

How to View and Monitor Quotas

You can view quotas using:

Cloud Console: IAM & Admin > Quotas. Shows current usage and limits for all services.

gcloud CLI: gcloud compute regions describe [REGION] shows regional quotas for Compute Engine. gcloud services quota list for API-specific quotas.

Monitoring: You can set up alerts for quota usage using Cloud Monitoring. For example, create an alert policy when usage exceeds 80% of the quota.

Example command to list Compute Engine quotas for us-central1:

gcloud compute regions describe us-central1 --format="table(quotas.metric, quotas.limit, quotas.usage)"

Output:

METRIC              LIMIT  USAGE
CPUS                24      12
DISKS_TOTAL_GB      10240   2048
STATIC_ADDRESSES    20      3
...

Requesting a Quota Increase

When you need more resources than the default quota, you can request an increase via the Cloud Console (IAM & Admin > Quotas > select quota > Edit Quota). You must provide a justification and expected usage. The request is reviewed by Google Cloud support. For most quotas, increases are approved automatically for reasonable amounts (e.g., doubling CPUs). Some quotas, like those for GPU instances, require manual review. The process can take from minutes to days.

Important: Quota increases are applied per project, per region. You cannot increase a global quota—you must increase each regional quota separately.

Interplay with Other Technologies

Quotas interact with:

IAM: Permissions don't affect quota limits. Even if you have owner permissions, you cannot exceed quota.

Resource hierarchy: Quotas are project-scoped. Folders and organizations have no direct quotas, but you can set organization policies that limit resource creation (e.g., restrict VM types).

Billing: Quotas are independent of billing. A project with no billing enabled still has quotas (though some services may not work).

Preemptible VMs: Preemptible VMs count against the same CPU quota as regular VMs (though they have separate quotas in some cases).

Reservations: Reservations consume quota for the reserved resources, even if not in use.

Best Practices

Monitor quota usage proactively. Set up alerts for thresholds like 80% and 95%.

Request quota increases early, before you need them.

Use multiple projects to distribute resource usage if you hit per-project limits.

Consider using regions with higher default quotas (e.g., us-central1 often has higher limits).

For API rate quotas, implement exponential backoff and retry logic in your applications.

Use the gcloud command gcloud services quota list to see rate quotas.

Common Exam Scenarios

A user tries to create a VM in us-west1 but gets an error about CPU quota. The solution is to request a quota increase for CPUs in that region.

A user tries to create a VM but fails because they already have 24 CPUs in the region. They must either delete unused VMs or request an increase.

A user tries to create a static IP but fails because they have 20 already. They must release unused IPs.

A user accidentally hits the API rate quota for Compute Engine. They should implement retry logic with backoff.

Quota vs. Limits

Note that some services have 'limits' that are not adjustable (e.g., maximum size of a single Cloud Storage object is 5 TB). Quotas are adjustable; limits are not. The exam may ask about this distinction.

Summary of Key Commands

# View all quotas for a project
gcloud compute project-info describe --project [PROJECT_ID]

# View regional quotas
gcloud compute regions describe [REGION]

# List quota usage with filter
gcloud compute regions describe us-central1 --format="table(quotas.metric, quotas.limit, quotas.usage)"

# Request quota increase (via console or gcloud alpha)
gcloud alpha compute quotas update --project [PROJECT_ID] --region [REGION] --metric [METRIC] --limit [NEW_LIMIT]

Note: The gcloud alpha command may not be available in all environments; the console is the primary method.

Walk-Through

Identify Resource Creation Failure

When a user attempts to create a resource (e.g., a VM instance) via Cloud Console, gcloud, or API, the request is sent to the relevant service API. If the resource creation fails, the error message typically includes 'QUOTA_EXCEEDED' or a similar phrase. The user must first recognize that the failure is due to a quota limit, not a permissions or configuration issue. Common indicators: HTTP 429 (rate quota) or a specific error like 'Insufficient CPU quota in region'. The user should note the region, resource type, and current usage.

Check Current Quota Usage

The user navigates to IAM & Admin > Quotas in the Cloud Console, or uses `gcloud compute regions describe [REGION]` to view current usage and limits. They filter by the resource type (e.g., CPUs) and region. The output shows the limit (e.g., 24) and current usage (e.g., 24). This confirms the quota is exhausted. For rate quotas, they can use `gcloud services quota list` to see API request counts.

Determine if Quota Increase is Needed

If the user needs more resources, they must decide whether to request a quota increase or free up existing resources. For allocation quotas, deleting unused resources (e.g., stopping and deleting VMs, releasing static IPs) can free quota immediately. If that's not feasible, a quota increase request is necessary. For rate quotas, implementing retry logic with exponential backoff may suffice if the spike is temporary.

Request Quota Increase via Console

In the Cloud Console, the user goes to IAM & Admin > Quotas, selects the quota metric (e.g., 'CPUS' in a specific region), clicks 'Edit Quota', and enters the new desired limit. They must provide a justification (e.g., 'Need more VMs for production workload'). The request is submitted to Google Cloud support. For many quotas, the increase is approved automatically within minutes. For GPUs or sensitive resources, manual review may take days.

Monitor Approval and Verify

After submitting, the user monitors the request status in the Quotas page. Once approved, the new limit is applied. The user can then retry the resource creation. They should verify the new quota by re-running the `gcloud compute regions describe` command. For large increases, it's advisable to set up Cloud Monitoring alerts to track usage and request further increases early.

What This Looks Like on the Job

Scenario 1: E-commerce Platform Scaling for Holiday Traffic A retail company runs its e-commerce site on Compute Engine in us-central1. During Black Friday, they need to scale from 20 VMs to 100 VMs. Their default CPU quota is 24 CPUs in us-central1. They hit the quota error when trying to create the 25th VM. The solution: two weeks before Black Friday, they request a quota increase to 200 CPUs in us-central1 via the console, explaining the expected traffic spike. They also set up Cloud Monitoring alerts at 80% usage. During the event, they scale up without issues. The mistake many engineers make is waiting until the last minute; quota increases can take time for manual review if the request is large or unusual.

Scenario 2: Startup Running Out of Static IPs A startup uses multiple load balancers and VPN gateways, each requiring static IPs. They hit the default quota of 20 static IPs per region. They attempt to create a new VPN gateway and get an error. The engineer checks the quota and finds 20 IPs are in use. They release unused IPs (e.g., old load balancers) to free up 3 IPs, allowing the new gateway. However, they also request a permanent increase to 50 IPs. Common pitfall: forgetting that reserved but unattached IPs count against quota.

Scenario 3: API Rate Limiting on a High-Traffic App A data analytics application makes thousands of BigQuery API calls per second. The default rate quota for BigQuery is 100 requests per second per project. The application gets HTTP 429 errors. The engineer implements exponential backoff with jitter in the client code, which reduces the peak rate. They also request a rate quota increase via the console, but for rate quotas, increases are often approved only after demonstrating need. The engineer also considers distributing load across multiple projects (each with its own quota).

How ACE Actually Tests This

ACE Objective 4.2: Manage resource quotas and limits. This objective is part of the 'Ensuring Success' domain. The exam expects you to:

Identify when a quota limit is the cause of a failure.

Know how to view quotas (Console, gcloud).

Understand the difference between rate quotas and allocation quotas.

Know default values for common resources (CPUs, IPs, disks).

Understand that quotas are per-project and per-region.

Know how to request a quota increase (console, not API).

Understand that quotas are hard limits; budgets are soft limits.

Common Wrong Answers: 1. 'Increase the billing account limit' – Quotas are independent of billing. Increasing the billing limit does not increase quotas. 2. 'Change the IAM permissions' – Permissions do not affect quotas. Even an owner cannot exceed quota. 3. 'Use a different region' – While this may help if the quota is exhausted in one region, the exam may present a scenario where the user needs resources in a specific region, so the correct answer is to request an increase in that region. 4. 'Delete the project and create a new one' – This is inefficient and loses all resources. The correct answer is to request a quota increase.

Numbers to Memorize: - Default CPU quota: 24 per region. - Default static IPs: 20 per region. - Default persistent disk: 10 TB per project. - Default VPC networks: 5 per project. - Default firewall rules: 100 per network.

Edge Cases: - Preemptible VMs count against the same CPU quota as regular VMs (unless a separate preemptible quota exists, which is rare). - GPU quotas are separate and often have a default of 0; you must request an increase. - Quotas for some services (e.g., Cloud Functions) are per-project, not per-region. - If you delete a resource, the quota is freed immediately (no delay).

How to Eliminate Wrong Answers: If a question says 'resource creation fails', look for the word 'quota' in the error. If the answer choices include 'increase billing' or 'change IAM', eliminate them. The correct answer will involve checking quota usage and requesting an increase or freeing resources.

Key Takeaways

Quotas are hard limits per project per region; they cannot be exceeded without an increase request.

Default CPU quota is 24 per region; default static IP quota is 20 per region.

Use Cloud Console (IAM & Admin > Quotas) to view and request quota increases.

Quotas are independent of billing; increasing billing does not increase quotas.

Stopping a resource does not free its quota; you must delete it.

Rate quotas throttle API requests; allocation quotas prevent resource creation.

Set up Cloud Monitoring alerts to track quota usage and request increases early.

Quota increases may require manual approval; plan ahead for large increases.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Allocation Quotas

Limit the total number of resources that can exist at any time (e.g., VMs, IPs).

Checked synchronously at resource creation time.

Exceeding results in a creation error (e.g., QUOTA_EXCEEDED).

Can be increased via quota request.

Example: Max 24 CPUs per region.

Rate Quotas

Limit the rate of API requests (e.g., requests per second).

Checked per request; can cause throttling (HTTP 429).

Exceeding results in throttling, not permanent failure.

Can be increased, but often requires justification.

Example: 100 requests per second for BigQuery.

Watch Out for These

Mistake

Quotas are the same as budget alerts.

Correct

Budgets are soft limits that send alerts when spending exceeds a threshold; they do not stop resource creation. Quotas are hard limits that prevent resource creation when exceeded.

Mistake

Quotas are per-user.

Correct

Quotas are per-project, not per-user. All users in a project share the same quota.

Mistake

You can increase quotas via API without manual approval.

Correct

Most quota increases require a request via the Cloud Console and may need manual approval. There is no API to increase quotas (except some alpha commands).

Mistake

Default quotas are the same for all regions.

Correct

Default quotas can vary by region. For example, some regions may have higher CPU quotas than others.

Mistake

Stopping a VM frees its CPU quota.

Correct

Stopping a VM does not release its CPU quota; only deleting the VM frees the quota. The VM still counts as allocated even when stopped.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I check my current quota usage in GCP?

You can check quota usage via the Cloud Console under IAM & Admin > Quotas. Use the filter to find the specific metric (e.g., CPUs) and region. Alternatively, use the gcloud command: `gcloud compute regions describe [REGION]`. For API rate quotas, use `gcloud services quota list`. The console also shows usage as a percentage of the limit.

What happens when I hit a quota limit?

For allocation quotas, the resource creation fails immediately with an error message like 'QUOTA_EXCEEDED'. For rate quotas, the API returns HTTP 429 Too Many Requests, and your client should retry with exponential backoff. No resources are partially created; the operation is atomic.

Can I increase quotas automatically?

Some quotas can be increased automatically in the Cloud Console if you request a reasonable increase (e.g., doubling CPUs). Others, especially for GPUs or large increases, require manual review by Google Cloud support. The process can take from minutes to days.

Do quotas apply to preemptible VMs?

Yes, preemptible VMs count against the same CPU quota as regular VMs in most cases. There is no separate default quota for preemptible instances, though some regions may have additional capacity.

How do I free up quota without deleting resources?

You cannot free allocation quota without deleting resources. For static IPs, you can release IPs that are not in use. For VMs, you must delete the instance. Stopping a VM does not release its CPU quota.

What is the difference between a quota and a limit?

A quota is an adjustable limit that you can request to increase (e.g., CPU count). A limit is a hard, non-adjustable cap (e.g., maximum Cloud Storage object size of 5 TB). Quotas are meant to be managed; limits are architectural constraints.

Can I have different quotas for different users in the same project?

No, quotas are project-wide. All users and service accounts share the same quota. To give different users more resources, you must use separate projects.

Terms Worth Knowing

Cloud computing Cloud IAM Region

Ready to put this to the test?

You've just covered GCP Quotas and Resource Limits — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Try ACE practice questions Back to all chapters

Done with this chapter?

GCP Resource Hierarchy: Org, Folder, Project

GCP Labels and Resource Tags

See the full ACE study guide