This chapter covers two critical API management features in Amazon API Gateway: caching and usage plans. These topics are frequently tested in the DVA-C02 exam under Domain 1: Development, Objective 1.2 (Design and implement API Gateway features). Expect 3-5 questions covering caching behavior, TTL, cache invalidation, usage plan components (throttle, quota, API keys), and how they interact. Mastering these concepts is essential for building scalable, cost-effective APIs and passing the exam.
Jump to a section
Imagine a public library with a popular reference section. The library has a limited number of photocopiers (API Gateway infrastructure) to serve patrons. To ensure fair access, the library implements a membership system (usage plans) and a caching system. Each member receives a card with a daily copy limit (throttling quota) and a burst limit (how many copies they can make in a row before a cooldown). The library also uses a cache of commonly requested pages: if a patron asks for the same page within 5 minutes, the librarian hands them a pre-printed copy from a filing cabinet (cache), avoiding the photocopier entirely. The cache is cleared when the original book is updated (cache invalidation). Without the cache, the photocopiers would be overwhelmed by repeated requests. Without usage plans, a few heavy users could monopolize the machines. The library can also set a per-request fee (usage plan with API key) and bill members based on their copy count. The librarian (API Gateway) checks the membership card (API key) against the usage plan before allowing access to the photocopier (backend). If the member exceeds their daily limit, the librarian returns a "quota exceeded" error (429 Too Many Requests). The cache reduces latency and load on the photocopier, but stale copies can be a problem if the reference book is updated frequently. The library can manually purge the cache (flush) or set a time-to-live (TTL) for each cached page. This analogy mirrors how API Gateway caching reduces load on backend services and how usage plans enforce rate limits and quotas for different client tiers.
What is API Gateway Caching?
Amazon API Gateway caching stores responses from your backend (e.g., Lambda, HTTP endpoint) so that subsequent identical requests can be served without invoking the backend again. This reduces latency, offloads backend processing, and lowers costs. The cache is an in-memory store located within the API Gateway service, not on your backend.
How Caching Works Internally
When a request arrives at API Gateway, the service checks the cache key. The default cache key is the request's path, query string parameters, and headers. If a cached response matches the key and has not expired, API Gateway returns it immediately, bypassing the integration request/response pipeline. If no cache hit occurs, the request proceeds to the backend, and the response is stored in the cache for subsequent requests.
The cache is per stage (e.g., dev, prod). You enable caching on a per-stage basis, not per API. You can also enable caching for specific methods (e.g., GET) or resources. Cache size ranges from 0.5 GB to 237 GB, with pricing based on size. The default TTL is 300 seconds (5 minutes), but you can set it from 0 (no caching) to 3600 seconds (1 hour). Setting TTL to 0 effectively disables caching for that method.
Cache Invalidation
To force a cache refresh, you can:
Flush the entire cache for a stage via the AWS Console, CLI, or SDK.
Set a shorter TTL.
Use the Cache-Control: max-age=0 header in the request (if enabled).
Invalidate specific entries by calling InvalidateCache (requires IAM permissions).
Important: Cache invalidation via Cache-Control header must be enabled on the method; otherwise, the header is ignored. When you flush the cache, all entries are removed, causing a temporary performance hit until the cache warms up again.
What is a Usage Plan?
A usage plan defines who can access your API and under what conditions. It consists of:
- Throttle: Rate limits (requests per second) and burst limits (maximum requests in a short burst). Default throttle is 10,000 rps per account per region, but can be increased.
- Quota: The maximum number of requests a user can make in a day, week, or month. Quota is reset at the start of each period.
- API keys: Usage plans are associated with API keys. Each key is a string (e.g., abc123) that clients pass in the x-api-key header. API Gateway validates the key and applies the plan's limits.
How Usage Plans Work
When a request includes an API key, API Gateway:
1. Validates the key exists and is active.
2. Checks the associated usage plan.
3. Applies throttle and quota limits.
4. If limits are exceeded, returns 429 Too Many Requests.
Throttling is implemented using a token bucket algorithm. Each API key has a token bucket with a maximum capacity (burst) and refill rate (rate). For example, a rate of 1000 rps and burst of 2000 means the bucket can hold up to 2000 tokens, refilling at 1000 tokens per second. If a request consumes a token and the bucket is empty, the request is throttled.
API Keys and Usage Plans Relationship
API keys are not tied to a specific usage plan until you associate them. One key can be associated with multiple usage plans, but only one plan applies per API stage. When you create a usage plan, you specify which API stages it applies to. Then you add API keys to the plan. Clients must send the key in the x-api-key header. If the key is missing or invalid, API Gateway returns 403 Forbidden (not 429).
Configuration Example (CLI)
# Create a usage plan
aws apigateway create-usage-plan \
--name "BasicPlan" \
--description "Free tier" \
--api-stages "[{\"apiId\": \"abc123\", \"stage\": \"prod\"}]" \
--throttle "{\"rateLimit\": 100, \"burstLimit\": 200}" \
--quota "{\"limit\": 10000, \"period\": \"DAY\"}"
# Create an API key
aws apigateway create-api-key --name "ClientKey" --enabled
# Associate key with usage plan
aws apigateway create-usage-plan-key \
--usage-plan-id <plan-id> \
--key-type API_KEY \
--key-id <key-id>Caching and Usage Plans Interaction
Caching and usage plans operate independently. Caching reduces backend load, but usage plans control client access. A cached response is served without checking the usage plan again—the plan is only checked on the initial request. This means a client could exceed its quota via cached responses? No, because the quota is counted per request, even cached ones? Actually, API Gateway counts every request against the quota, regardless of cache hit. So caching does not bypass usage plan limits. However, throttling is applied before caching: if a request is throttled, it never reaches the cache check.
Performance Considerations
Cache hit ratio improves with larger cache size and appropriate TTL.
Use Cache-Control: max-age to allow clients to control caching.
Avoid caching responses that are user-specific or contain sensitive data unless you use a cache key that includes the user identifier.
For usage plans, monitor usage via CloudWatch metrics (Count, ThrottleCount, QuotaExceeded).
API keys should be rotated periodically for security.
Common Pitfalls
Enabling caching without setting a TTL (default 300s) may serve stale data.
Forgetting to enable Cache-Control header invalidation.
Associating an API key with a usage plan but not requiring the key on the method (the x-api-key header is ignored if the method does not require API key).
Confusing throttle (rate limit per second) with quota (total requests per period).
Exam-Relevant Details
Default cache TTL: 300 seconds.
Maximum TTL: 3600 seconds.
Cache sizes: 0.5 GB, 1 GB, 2 GB, 4 GB, 8 GB, 13.5 GB, 28.4 GB, 58.2 GB, 118 GB, 237 GB.
Throttle default: 10,000 rps per account per region (soft limit).
Burst limit default: 5,000 (but can be set lower).
Quota periods: DAY, WEEK, MONTH.
API key location: x-api-key header.
Error codes: 429 (throttle/quota), 403 (invalid/missing key).
Cache invalidation methods: flush entire cache, set TTL=0, use Cache-Control: max-age=0 (must be enabled).
Usage plans are regional; they apply to a specific API stage.
API keys are not required for public APIs; usage plans are optional.
Step-by-Step: Enabling Caching on a GET Method
In API Gateway console, select your API.
Go to Resources, select the GET method.
Under Method Request, enable API Key Required if using usage plans.
Under Integration Request, enable Use Default Cache or Override Cache.
Set Cache Status to Enabled.
Set Cache TTL (e.g., 300).
Deploy the API to a stage.
In Stage Editor, enable Cache and select cache size.
Optionally, enable Cache Control header invalidation.
Monitoring
CloudWatch metrics: CacheHitCount, CacheMissCount, Latency, IntegrationLatency.
Usage plan metrics: Count, ThrottleCount, QuotaExceeded.
Alarms can be set on throttle count to detect abuse.
Security
API keys can be regenerated; old keys become invalid immediately.
Use IAM authorizers or Lambda authorizers for fine-grained access control.
Cached responses are not encrypted at rest within API Gateway, but data in transit is encrypted via HTTPS.
Avoid caching sensitive data like PII unless using a key that varies per user.
Best Practices
Use caching for read-heavy, stable data.
Set TTL based on data freshness requirements.
Use usage plans to protect backend from abusive clients.
Combine caching with CloudFront for edge caching.
Test throttle and quota limits with load testing tools.
Summary of Key Numbers
| Feature | Default | Maximum | |---------|---------|---------| | Cache TTL | 300 s | 3600 s | | Cache size | 0.5 GB | 237 GB | | Throttle rate (account) | 10,000 rps | Can increase | | Burst (account) | 5,000 | Can increase | | Quota | None | Unlimited |
Conclusion
API Gateway caching and usage plans are essential for building production-ready APIs. Caching reduces latency and backend load, while usage plans enforce access policies and prevent abuse. On the DVA-C02 exam, you must know the default values, how caching interacts with throttling, and how to configure usage plans with API keys. Practice with the AWS CLI and console to solidify these concepts.
Create a REST API
Start by creating a REST API in API Gateway. Define resources (e.g., /items) and methods (e.g., GET). Configure the integration with your backend (Lambda, HTTP, etc.). Ensure the method is set to require an API key if you plan to use usage plans. Deploy the API to a stage (e.g., 'prod'). Note the invoke URL and API ID.
Create a Usage Plan
In the API Gateway console, navigate to Usage Plans and click Create. Specify a name and description. Set throttle limits: rate (requests per second) and burst (maximum requests in a short burst). Set quota limits: number of requests and period (DAY, WEEK, MONTH). Associate the usage plan with your API stage (e.g., prod). Save the plan.
Create and Associate API Keys
Create an API key (e.g., 'MyKey') and enable it. Then associate the key with the usage plan by adding it to the plan's list of keys. The key can be auto-generated or imported. Note the key value; clients must include it in the x-api-key header. You can also generate a usage plan key via CLI: `aws apigateway create-usage-plan-key --usage-plan-id <plan-id> --key-type API_KEY --key-id <key-id>`.
Enable Caching on the Stage
Go to the stage (e.g., 'prod') and enable caching. Choose a cache size (e.g., 0.5 GB). You can enable caching for all methods or specific ones. Set the default TTL (e.g., 300 seconds). Optionally, enable Cache Control header invalidation to allow clients to force cache refresh. Deploy the API again to apply changes.
Test with and without API Key
Send a request without the x-api-key header. Expect 403 Forbidden. Send a request with a valid API key. Expect a 200 response if within limits. Send requests exceeding the throttle rate; expect 429 Too Many Requests. Send requests exceeding the quota; expect 429 after quota is exhausted. Observe cache behavior: repeated requests with same parameters should return cached responses (low latency).
Monitor and Troubleshoot
Use CloudWatch metrics to monitor cache hit/miss, throttle count, and quota exceeded. Set up alarms for throttle count > 0 to detect potential abuse. If cached data is stale, flush the cache or reduce TTL. If clients report 429, check usage plan limits and consider increasing them. Use API Gateway logs for detailed request/response data.
Enterprise Scenario 1: E-commerce Product Catalog
A large e-commerce company exposes a REST API for product details (GET /products/{id}). The product data changes infrequently (once per day). To reduce load on the backend DynamoDB table and improve response times, they enable API Gateway caching with a TTL of 3600 seconds (1 hour). They also create usage plans for different client tiers: a free tier with 1000 requests per day and 10 rps throttle, and a premium tier with 100,000 requests per day and 100 rps throttle. Each client gets an API key. The caching reduces backend calls by 90%, and usage plans prevent abuse. A common issue is that when product prices are updated, the cache serves stale data for up to an hour. To mitigate, they implement a cache invalidation endpoint (POST /products/{id}/invalidate) that flushes the cache for that specific product using the InvalidateCache API. They also set up CloudWatch alarms to monitor throttle count; if it spikes, they investigate potential DDoS attacks.
Enterprise Scenario 2: SaaS API with Multi-Tenancy
A SaaS provider offers an analytics API where each customer has a unique API key. They use usage plans to enforce per-customer quotas (e.g., 10,000 requests per month). They also enable caching for expensive aggregation queries that return the same results for all customers (e.g., total sales). However, they must ensure that cached responses are not served to the wrong customer. They solve this by using a cache key that includes the API key (via a custom header or the x-api-key header). This way, each customer's cached data is isolated. They set a TTL of 300 seconds. Misconfiguration: initially they did not include the API key in the cache key, causing cross-tenant data leaks. They fixed it by enabling cache key parameters (query strings, headers) in the method request. They also set up usage plan alerts to notify when a customer approaches their quota.
Enterprise Scenario 3: Mobile App Backend
A mobile app uses API Gateway to serve user-specific data (e.g., profile, notifications). Caching is dangerous here because data is user-specific. Instead, they disable caching for these endpoints. For public content (e.g., news feed), they enable caching with a TTL of 60 seconds. They use usage plans to limit free users to 100 requests per day, while premium users have unlimited access. They also implement API key rotation every 90 days to prevent key leakage. A common problem is that when a user upgrades from free to premium, their old API key is still associated with the free plan. They must create a new key or update the usage plan association. They automate this via AWS SDK in their user management system.
What the DVA-C02 Tests
This topic falls under Domain 1: Development, Objective 1.2: Design and implement API Gateway features. The exam tests your understanding of caching behavior, default values, and how usage plans enforce limits. Expect scenario-based questions where you must choose the correct configuration to solve a problem.
Common Wrong Answers
Confusing throttle and quota: Many candidates think throttle is the total requests per day, but it's per second. Quota is the total per period. The exam often asks: "A client is getting 429 errors after 1000 requests in a day. What is the issue?" Answer: Quota exceeded, not throttle.
Thinking caching bypasses usage plans: Some believe that cached responses don't count against quota. Actually, every request counts against quota, even cache hits. The exam may present a scenario where a client exceeds quota despite caching; the correct answer is that caching does not reduce quota consumption.
Misunderstanding cache invalidation: Candidates think setting TTL to 0 immediately clears the cache. It only disables caching for new requests; existing cached entries remain until they expire. To clear the cache, you must flush it.
API key location: Some think API keys are passed in a query parameter. The exam specifies the x-api-key header. Questions may ask where to send the API key.
Specific Numbers and Values
Default cache TTL: 300 seconds.
Maximum TTL: 3600 seconds.
Cache sizes: 0.5 GB, 1 GB, 2 GB, 4 GB, 8 GB, 13.5 GB, 28.4 GB, 58.2 GB, 118 GB, 237 GB.
Default throttle: 10,000 rps per account (soft limit).
Burst default: 5,000.
Quota periods: DAY, WEEK, MONTH.
Error codes: 429 (throttle/quota), 403 (missing/invalid key).
Edge Cases and Exceptions
If a method does not require an API key, the key is ignored even if present. The exam may ask: "A client sends a valid API key but gets 200. Why?" Answer: The method does not require API key.
Caching can be enabled per method, but the cache size is per stage. If you enable caching on a stage but disable it on a method, that method is not cached.
Usage plans are regional. If you have multiple regions, you need separate plans.
API keys are not supported in WebSocket APIs.
How to Eliminate Wrong Answers
If the question mentions "rate limit" or "requests per second", it's throttle. If it mentions "total requests per day", it's quota.
If the question involves stale data, the solution is to flush the cache or reduce TTL, not to disable caching entirely.
If the question involves a client getting 403, check if the API key is missing or invalid. If 429, check throttle or quota.
Remember that caching does not affect usage plan limits; it only reduces backend load.
Default cache TTL is 300 seconds; maximum is 3600 seconds.
Cache sizes range from 0.5 GB to 237 GB.
Usage plans consist of throttle (rate + burst) and quota (requests per day/week/month).
API keys must be passed in the x-api-key header.
Throttle returns 429 Too Many Requests; missing/invalid key returns 403 Forbidden.
Caching does not reduce quota consumption; every request counts against quota.
Cache invalidation via Cache-Control header must be enabled on the method.
Usage plans are associated with API stages, not individual methods.
Throttle default is 10,000 rps per account per region (soft limit).
Quota periods are DAY, WEEK, MONTH.
These come up on the exam all the time. Here's how to tell them apart.
API Gateway Caching
Operates at the API Gateway level, after authentication/authorization.
Cache key includes request path, query strings, and headers (configurable).
Cache is stored within API Gateway, not at edge locations.
TTL max is 3600 seconds (1 hour).
Invalidation requires IAM permissions and flushes entire stage cache or specific entries.
CloudFront Caching
Operates at the CDN edge, before reaching API Gateway.
Cache key includes URL, query strings, and headers (configurable).
Cache is stored at edge locations worldwide.
TTL can be set to 0 (no caching) or up to 365 days.
Invalidation can target specific paths and is charged per path.
Mistake
Caching bypasses usage plan throttling and quota limits.
Correct
Every request counts against throttle and quota, regardless of cache hit. Caching only reduces backend invocation; API Gateway still processes the request and applies limits.
Mistake
Setting TTL to 0 immediately clears the cache.
Correct
TTL=0 disables caching for new requests, but existing cached entries remain until they expire naturally. To clear the cache, you must flush it via the console, CLI, or API.
Mistake
API keys are passed in the Authorization header.
Correct
API keys are passed in the x-api-key header, not the Authorization header. The Authorization header is used for IAM or Cognito authorization.
Mistake
Usage plans can be applied without API keys.
Correct
Usage plans require API keys. The API key is used to identify the client and apply the plan's limits. Without a key, the usage plan is not enforced.
Mistake
Caching is enabled by default on all APIs.
Correct
Caching is disabled by default. You must explicitly enable it on a stage and configure cache size and TTL.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Throttle controls the rate of requests per second (rate limit) and the maximum burst of requests (burst limit). Quota controls the total number of requests allowed over a day, week, or month. Throttle prevents short-term spikes, while quota limits long-term usage. Both return 429 when exceeded.
You can call the InvalidateCache API with the cache key (e.g., path and query string). This requires IAM permissions. Alternatively, you can flush the entire stage cache, which removes all entries. You can also set a short TTL or enable Cache-Control header invalidation (client sends max-age=0).
No, API Gateway caching is only supported for REST APIs and HTTP APIs (partial). WebSocket APIs do not support caching. For WebSocket, you must implement caching at the backend or use CloudFront.
API Gateway returns a 429 Too Many Requests error. The response may include a Retry-After header indicating how long to wait before retrying. The throttle is implemented using a token bucket algorithm; once tokens are exhausted, requests are rejected until tokens refill.
Yes, usage plans are associated with API keys. The client must include the key in the x-api-key header. Without a valid key, the request is rejected with 403 Forbidden. The key identifies the client and determines which usage plan applies.
Yes, one API key can be associated with multiple usage plans, but only one plan applies per API stage. When you associate a key with a plan, you specify the API stage. If the same key is added to multiple plans for different stages, each stage uses its own plan.
Use CloudWatch metrics: Count (total requests), ThrottleCount (throttled requests), QuotaExceeded (requests blocked by quota). You can also enable detailed CloudWatch logs for API Gateway to see per-request details. Set alarms on ThrottleCount to detect abuse.
You've just covered API Gateway Caching and Usage Plans — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?