This chapter covers two critical features of Amazon Redshift that directly impact cost and performance: Reserved Nodes and Concurrency Scaling. Both are frequently tested on the SAA-C03 exam under the Cost Optimized domain (Objective 4.4). Understanding when to purchase reserved capacity versus using on-demand nodes, and how Concurrency Scaling handles sudden spikes in query load, can significantly reduce your AWS bill while maintaining predictable performance. Expect 2-3 exam questions that test your ability to choose the right pricing model or scale strategy based on workload patterns.
Jump to a section
Imagine a high-end restaurant with one main kitchen that can prepare 10 meals at a time. During peak hours, 30 orders come in simultaneously. Without extra capacity, orders queue up and wait for the kitchen to become free — this is like Redshift without Concurrency Scaling. With Concurrency Scaling, the restaurant automatically activates a secondary, identical kitchen (a set of temporary compute nodes) that shares the same pantry (the main cluster's storage, which is built on Amazon S3). The new kitchen can pull ingredients from the same pantry and prepare meals independently. When a chef in the main kitchen is busy, the maître d' (the Concurrency Scaling scheduler) routes new orders to the secondary kitchen. Once the rush ends and the main kitchen can handle the load, the secondary kitchen is closed and its staff dismissed — you only pay for the time the extra kitchen was active. Importantly, the secondary kitchen has its own copy of the recipes and prep tables (cached data), but any changes made to the pantry (writes) are immediately visible to both kitchens because they access the same underlying storage. This ensures consistency even when multiple kitchens are cooking simultaneously.
What Are Redshift Reserved Nodes?
Amazon Redshift Reserved Nodes (also called Reserved Instances) are a pricing model that offers a significant discount (up to 75%) compared to on-demand pricing in exchange for a one- or three-year commitment. You reserve a specific node type (e.g., dc2.large, ra3.xlplus) in a specific region. The reservation applies to the node family, not an individual cluster. You can apply the reservation to any cluster in the same AWS account and region that matches the node type and size.
Reserved Nodes come in three payment options: - No Upfront: No upfront payment; you pay a discounted hourly rate for the duration of the term. - Partial Upfront: A portion of the cost is paid upfront, with a lower hourly rate. - All Upfront: The entire cost is paid upfront, providing the maximum discount.
Additionally, you can choose between Standard (one-year or three-year) and Convertible (three-year only) Reserved Nodes. Convertible Reserved Nodes allow you to change the node type, size, or the AWS region (with some restrictions) during the term, but the discount is lower than Standard.
How Redshift Reserved Nodes Work Internally
When you purchase a Reserved Node, AWS creates a reservation record in your account. When you launch or modify a Redshift cluster, AWS automatically applies any eligible reservation to the running nodes. The discount is applied at the billing level — the hourly cost of the node is reduced to the reserved rate. If you have more nodes running than reserved, the extra nodes are billed at on-demand rates.
Reservations are scoped to a specific node type and size. For example, if you reserve a dc2.large node, it only applies to dc2.large nodes. You cannot apply it to dc2.8xlarge or ra3.xlplus. However, you can reserve a dc2.8xlarge and use it for any dc2.8xlarge node in your account.
Concurrency Scaling: What It Is and Why It Exists
Amazon Redshift Concurrency Scaling is a feature that automatically adds temporary compute capacity to handle sudden increases in concurrent read and write queries. Without Concurrency Scaling, when the number of concurrent queries exceeds the cluster's available slots (typically 15 for write queries and 50 for read queries, depending on the node type), queries queue up and wait. Concurrency Scaling eliminates this queue by spinning up additional clusters (called Concurrency Scaling clusters) that share the same underlying data stored in Amazon S3 (for RA3 node types) or on the local SSD (for DC2 node types, but with limitations).
How Concurrency Scaling Works Internally
When a query arrives at the Redshift cluster, the leader node evaluates whether there is available capacity. If the number of concurrent queries exceeds the cluster's concurrency limit, the leader node routes the query to a Concurrency Scaling cluster. This cluster is a temporary, separate Redshift cluster that has access to the same data (via the shared storage layer for RA3 nodes) and the same metadata (table definitions, permissions).
For RA3 nodes, data is stored in Amazon S3 and cached locally on managed storage. The Concurrency Scaling cluster can read from the same S3 data, so no data copying is required. For DC2 nodes, data is stored locally on the cluster's SSDs. Concurrency Scaling for DC2 nodes requires that the data be copied to the Concurrency Scaling cluster, which introduces latency and is only supported for read queries. Write queries are not supported for DC2 Concurrency Scaling.
Concurrency Scaling clusters are provisioned automatically by AWS and are terminated after they are idle for 5 minutes. You are billed per second for the time the cluster is active, at a rate that is typically higher than on-demand (approximately 30% more per hour per node).
Key Components, Values, Defaults, and Timers
Concurrency Scaling default limit: The number of Concurrency Scaling clusters that can be active simultaneously is limited by your account's service quota. By default, you can have up to 10 Concurrency Scaling clusters per region.
Concurrency Scaling idle timeout: 5 minutes. If no queries are routed to the Concurrency Scaling cluster for 5 minutes, it is terminated.
Maximum Concurrency Scaling nodes: Each Concurrency Scaling cluster can have up to the same number of nodes as your base cluster. For example, if your base cluster has 4 nodes, a Concurrency Scaling cluster can have up to 4 nodes.
Supported node types: RA3 (ra3.xlplus, ra3.4xlarge, ra3.16xlarge) and DC2 (dc2.large, dc2.8xlarge). For DC2, only read queries are supported.
Reserved Nodes term lengths: 1 year or 3 years.
Reserved Nodes payment options: No Upfront, Partial Upfront, All Upfront.
Reserved Nodes types: Standard (1 or 3 years) and Convertible (3 years only).
Configuration and Verification
To purchase Reserved Nodes, you use the AWS Management Console, CLI, or API. For example, using the AWS CLI:
aws redshift purchase-reserved-node-offering \
--reserved-node-offering-id <offering-id> \
--node-count 3To view your reservations:
aws redshift describe-reserved-nodesTo enable Concurrency Scaling on a cluster, you modify the parameter group or use the console. Using the AWS CLI:
aws redshift modify-cluster \
--cluster-identifier mycluster \
--automated-snapshot-retention-period 7 \
--allow-version-upgradeConcurrency Scaling is enabled by default on RA3 clusters. You can disable it by setting the enable_concurrency_scaling parameter to false in the parameter group.
To monitor Concurrency Scaling usage, use Amazon CloudWatch metrics:
- ConcurrencyScalingUsageCount: Number of queries that used Concurrency Scaling.
- ConcurrencyScalingActiveClusters: Number of active Concurrency Scaling clusters.
Interaction with Related Technologies
Reserved Nodes and Concurrency Scaling are independent but can be used together. Reserved Nodes reduce the base cost of your cluster, while Concurrency Scaling handles spikes. If you have a steady base workload with occasional spikes, you might reserve nodes for the base capacity and let Concurrency Scaling handle the spikes. However, Concurrency Scaling clusters are billed at on-demand rates (plus a premium), so they are not covered by Reserved Nodes.
Concurrency Scaling works with Redshift Spectrum (querying data directly in S3) and Redshift ML (machine learning inference). Queries that use Spectrum or ML can also be routed to Concurrency Scaling clusters.
Cost Optimization Strategies
Reserve for steady-state workloads: If your cluster runs 24/7, purchasing Reserved Nodes (All Upfront, 3-year) can save up to 75%.
Use Concurrency Scaling for variable workloads: Instead of over-provisioning your base cluster to handle peak load, use Concurrency Scaling to absorb spikes. This reduces the number of reserved nodes needed.
Monitor Concurrency Scaling usage: If you notice frequent use of Concurrency Scaling, consider increasing your base cluster size or purchasing more reserved nodes to reduce cost (since Concurrency Scaling is more expensive per hour than on-demand).
Combine with Redshift auto-scaling: Redshift also supports elastic resize (changing node count) and classic resize (changing node type). Use these for predictable growth, and Concurrency Scaling for unpredictable spikes.
Purchase Reserved Nodes
First, determine the node type and quantity you need based on your steady-state workload. Use the AWS Management Console, CLI, or API to purchase Reserved Nodes. You must specify the offering ID (obtained from `describe-reserved-node-offerings`), node count, and optionally the reservation ID. AWS creates a reservation that applies to any eligible cluster in your account. The discount is applied at billing time. You cannot cancel a reservation; you can only sell unused reservations on the Reserved Instance Marketplace.
Enable Concurrency Scaling
Concurrency Scaling is enabled by default on RA3 clusters. For DC2 clusters, you must explicitly enable it. Modify the cluster's parameter group to set `enable_concurrency_scaling` to `true`. You can also set the maximum number of Concurrency Scaling clusters via the `max_concurrency_scaling_clusters` parameter (default 10). Ensure your cluster has the correct maintenance window to apply parameter changes.
Query Arrival and Capacity Check
When a query arrives at the leader node, it checks the current number of concurrent queries. If the count is below the concurrency limit (typically 15 for writes, 50 for reads), the query is executed on the base cluster. If the limit is reached, the leader node evaluates whether to route the query to a Concurrency Scaling cluster. The decision is based on the query type (read vs write) and the node type (RA3 supports both; DC2 supports only reads).
Provision Concurrency Scaling Cluster
If a Concurrency Scaling cluster is needed, AWS automatically provisions a new cluster with the same node type and number of nodes as the base cluster. The cluster is launched in the same VPC and subnet, and it attaches to the same shared storage (for RA3). The provisioning takes approximately 1-2 minutes. During this time, the query waits in a queue. Once the cluster is ready, the query is routed to it.
Execute Query on Concurrency Scaling Cluster
The query executes on the Concurrency Scaling cluster. The cluster has its own cache, but it reads from the same underlying data (S3 for RA3). Write queries on RA3 are written to the shared storage and are immediately visible to the base cluster. For DC2, only read queries are supported; write queries are always queued on the base cluster.
Terminate Idle Concurrency Scaling Cluster
After the query completes, the Concurrency Scaling cluster remains active for 5 minutes (the idle timeout). If no new queries are routed to it within that window, the cluster is automatically terminated. You are billed for the entire time the cluster was active, from provisioning to termination. There is no charge for the idle time after the timeout? Actually, billing stops at termination. The 5-minute idle period is included in the billing.
Enterprise Scenario 1: E-commerce Analytics Platform
A large e-commerce company runs a Redshift cluster with 10 ra3.4xlarge nodes to support its BI dashboards. The workload is fairly steady during business hours, but during flash sales, the number of concurrent queries triples. Previously, they over-provisioned to 20 nodes to handle the spikes, but this was costly. They implemented Concurrency Scaling: they kept 10 reserved nodes (3-year All Upfront) for the base load, and enabled Concurrency Scaling for the spikes. During a flash sale, Concurrency Scaling automatically spun up additional clusters (up to 10 nodes each) to handle the extra queries. The cost of Concurrency Scaling was about 30% higher than on-demand, but since it was only active for a few hours per month, the overall savings were significant. They also set up CloudWatch alarms to notify them if Concurrency Scaling was active for more than 2 hours, indicating a need to resize the base cluster.
Enterprise Scenario 2: Financial Services with Strict SLAs
A financial services firm uses Redshift for real-time fraud detection. Their workload has predictable peaks at the end of each trading day. They need to ensure that no query waits longer than 1 second. They reserved 8 dc2.8xlarge nodes for the base capacity (3-year Partial Upfront) and enabled Concurrency Scaling. However, they discovered that Concurrency Scaling for DC2 nodes only supports read queries. Their fraud detection pipeline includes write queries (inserting new transactions). They had to redesign their pipeline to separate writes and reads. Writes were queued on the base cluster, and reads were routed to Concurrency Scaling. This added complexity but met the SLA. They also learned that Concurrency Scaling clusters for DC2 require data to be copied from the base cluster's local SSDs, which added latency. They eventually migrated to RA3 nodes to avoid this issue.
Common Pitfalls
Misunderstanding Concurrency Scaling for DC2: Many candidates assume Concurrency Scaling works the same for all node types. In reality, DC2 only supports read queries, and the performance benefit is limited due to data copying.
Over-relying on Concurrency Scaling: If Concurrency Scaling is active for more than 10% of the time, it may be cheaper to resize the base cluster or purchase more reserved nodes.
Ignoring the 5-minute idle timeout: Each Concurrency Scaling cluster incurs cost for at least 5 minutes even if it processes only one query. Spiky workloads with many short queries can accumulate significant cost.
What SAA-C03 Tests on This Topic
The SAA-C03 exam tests your ability to:
Choose between Reserved Nodes and On-Demand pricing based on workload predictability (Objective 4.4).
Identify scenarios where Concurrency Scaling is appropriate versus other scaling methods like elastic resize.
Understand the cost implications of Concurrency Scaling (premium over on-demand).
Recognize that Reserved Nodes apply to node families, not specific clusters.
Know that Concurrency Scaling is enabled by default on RA3 clusters and supports both reads and writes.
Common Wrong Answers and Why Candidates Choose Them
"Concurrency Scaling is free": Candidates think it's a built-in feature with no extra cost. Reality: It is billed per second at a premium rate (approximately 30% higher than on-demand).
"Reserved Nodes apply to any node type": Candidates assume a reservation for dc2.large can be used for dc2.8xlarge. Reality: Reservations are node-type specific (size matters).
"Concurrency Scaling works for DC2 write queries": Candidates assume it works like RA3. Reality: DC2 supports only read queries.
"Concurrency Scaling clusters persist indefinitely": Candidates think they stay until manually deleted. Reality: They auto-terminate after 5 minutes of idle.
Specific Numbers and Terms on the Exam
"75% discount" for 3-year All Upfront Reserved Nodes.
"Concurrency Scaling premium" typically 30% above on-demand.
"5-minute idle timeout" for Concurrency Scaling clusters.
"15 concurrent write queries" and "50 concurrent read queries" as typical limits (varies by node type).
"RA3" and "DC2" node families — know the difference.
Edge Cases and Exceptions
Reserved Nodes in multiple accounts: If you have multiple AWS accounts under an organization, reservations are per-account unless you use consolidated billing. With consolidated billing, all accounts in the organization can share reservations.
Concurrency Scaling and maintenance: Concurrency Scaling clusters do not participate in maintenance windows (e.g., version upgrades). They use the same software version as the base cluster at the time of provisioning.
Concurrency Scaling and Redshift Spectrum: Queries that use Spectrum can be routed to Concurrency Scaling clusters, but the Spectrum portion is billed separately.
How to Eliminate Wrong Answers
If the question mentions "cost savings for a steady workload," look for Reserved Nodes.
If the question mentions "unpredictable spikes" and "automatic scaling," look for Concurrency Scaling.
If the question mentions "write queries" and "DC2," eliminate Concurrency Scaling as an option.
If the question mentions "maximum discount," look for "All Upfront" and "3-year."
Reserved Nodes offer up to 75% discount for 3-year All Upfront commitment.
Reserved Nodes are scoped to a specific node type and size (e.g., dc2.large, not dc2.8xlarge).
Concurrency Scaling automatically adds temporary clusters to handle query spikes.
Concurrency Scaling is enabled by default on RA3 nodes; supports both reads and writes.
Concurrency Scaling on DC2 nodes only supports read queries; writes are not offloaded.
Concurrency Scaling clusters auto-terminate after 5 minutes of idle time.
Concurrency Scaling billing is per-second with a ~30% premium over on-demand.
Reserved Nodes do not cover Concurrency Scaling usage.
These come up on the exam all the time. Here's how to tell them apart.
Reserved Nodes
Requires a 1- or 3-year commitment.
Provides up to 75% discount vs on-demand.
Applies to the base cluster nodes only.
Best for steady-state, predictable workloads.
Cannot be used for short-term spikes.
Concurrency Scaling
No upfront commitment; pay per second of usage.
Costs ~30% more than on-demand per node-hour.
Applies to temporary clusters spun up automatically.
Best for unpredictable, short-lived spikes.
Automatically scales out and in based on demand.
Mistake
Reserved Nodes lock you into a specific cluster ID.
Correct
Reserved Nodes are applied at the account/region level to any cluster matching the node type and size. You can change clusters freely; the discount follows the node family.
Mistake
Concurrency Scaling is a feature that must be explicitly enabled on all node types.
Correct
Concurrency Scaling is enabled by default on RA3 node types. For DC2, you must explicitly enable it, and it only supports read queries.
Mistake
Concurrency Scaling clusters are billed at the same rate as on-demand nodes.
Correct
Concurrency Scaling carries a premium — typically 30% higher than the on-demand rate per node-hour.
Mistake
You can apply Reserved Nodes to Concurrency Scaling clusters.
Correct
Reserved Nodes only apply to the base cluster nodes. Concurrency Scaling clusters are billed separately at the Concurrency Scaling rate.
Mistake
Concurrency Scaling clusters persist until you manually delete them.
Correct
Concurrency Scaling clusters are automatically terminated after 5 minutes of idle time. There is no manual deletion required.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
No. Reserved Nodes only apply to the base cluster. Concurrency Scaling clusters are billed separately at the Concurrency Scaling rate, which is approximately 30% higher than on-demand. If you want to reduce Concurrency Scaling costs, consider increasing your base cluster size or purchasing more reserved nodes to reduce the need for Concurrency Scaling.
5 minutes. After a Concurrency Scaling cluster has no queries routed to it for 5 minutes, it is automatically terminated. You are billed for the entire time the cluster is active, including the idle period. This means even a single query that takes 1 second will incur a minimum of 5 minutes of billing.
Yes. Queries that use Redshift Spectrum (querying data directly in S3) can be routed to Concurrency Scaling clusters. However, the Spectrum portion (scanning data in S3) is billed separately based on the amount of data scanned. Concurrency Scaling only covers the compute cost.
Yes. You can disable Concurrency Scaling by modifying the cluster's parameter group and setting `enable_concurrency_scaling` to `false`. This is useful if you want to avoid any additional costs or if your workload is predictable and you prefer to manage scaling manually via elastic resize.
The reserved discount is applied to the first N nodes that match the reservation, where N is the number of reserved nodes. Any additional nodes are billed at the on-demand rate. For example, if you have 5 reserved dc2.large nodes but run 7 dc2.large nodes, 5 are billed at the reserved rate and 2 at the on-demand rate.
Yes. Reserved Nodes are purchased for a specific AWS region. They apply to any eligible cluster in that region. If you have clusters in multiple regions, you need separate reservations for each region.
Yes, you can sell unused Reserved Nodes on the Reserved Instance Marketplace. This is useful if you no longer need the capacity. The sale price is determined by the market. Note that Convertible Reserved Nodes cannot be sold.
You've just covered Redshift Reserved Nodes and Concurrency Scaling — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.
Done with this chapter?