SAA-C03Chapter 96 of 189Objective 2.5

CloudFront Origin Groups and Failover

This chapter covers CloudFront Origin Groups and Failover, a critical feature for building resilient content delivery architectures. For the SAA-C03 exam, understanding origin failover is essential for designing highly available and fault-tolerant applications that meet AWS Well-Architected Framework reliability principles. Approximately 5-10% of exam questions touch on CloudFront features, and origin groups are a common topic in scenario-based questions involving disaster recovery and multi-region failover. By the end of this chapter, you will be able to configure origin groups, understand failover triggers, and recognize exam traps.

25 min read
Intermediate
Updated May 31, 2026

Aircraft Backup Engines for CloudFront Failover

Imagine CloudFront Origin Groups as a twin-engine aircraft. The primary engine (Origin 1) is the main power source, but if it fails mid-flight, the backup engine (Origin 2) automatically kicks in to keep the plane airborne. The pilot (CloudFront) continuously monitors engine health via instruments (HTTP response codes from origins). When the primary engine sputters (returns 5xx errors), the pilot doesn't panic—they smoothly engage the backup engine, and passengers (users) barely notice a slight vibration (latency increase). The backup engine is always ready but only used when needed. This failover is not instantaneous—it takes a few seconds for the pilot to detect failure and switch (failover threshold). Importantly, the backup engine must be identical in capability (same content), otherwise the plane might fly lopsided (inconsistent responses). Just as a pilot can't switch to a backup engine that hasn't been maintained, CloudFront requires the secondary origin to serve the same content as the primary, typically by having both point to the same S3 bucket or application behind a load balancer. If the backup engine also fails, the plane glides (users see errors)—so you need to ensure both origins are healthy and have identical data.

How It Actually Works

What Are CloudFront Origin Groups and Why Do They Exist?

CloudFront Origin Groups allow you to define a primary origin and one or more secondary origins. When the primary origin fails (returns HTTP 5xx errors, connection failures, or timeout), CloudFront automatically fails over to the secondary origin. This provides high availability for your content without requiring DNS-level failover or custom health checks. The exam tests your ability to identify when to use origin groups versus other failover mechanisms like Route53 health checks.

How It Works Internally

When a viewer requests content, CloudFront first checks its edge cache. If the content is not cached (cache miss), CloudFront forwards the request to the primary origin defined in the origin group. The primary origin is the first origin listed in the group. CloudFront monitors the response from the primary origin. If the primary origin returns an HTTP 5xx status code (e.g., 500, 502, 503, 504) or if there is a network error (connection timeout, protocol error), CloudFront considers that a failure. By default, CloudFront waits for the response from the primary origin. If the response indicates failure, CloudFront immediately retries the request to the secondary origin. This failover happens at the edge location level—each edge independently decides to fail over based on its own interactions with the origins.

Key Components, Values, Defaults, and Timers

Origin Group: A logical grouping of origins. You can have up to 10 origins per distribution, and one origin group per distribution (though you can have multiple origin groups if you use behaviors to map different paths to different origin groups).

Primary Origin: The first origin listed in the group. CloudFront sends all requests to this origin unless it fails.

Secondary Origin(s): Additional origins that serve as fallback. CloudFront will try them in the order they are listed.

Failover Criteria: CloudFront considers a failure when:

The origin returns HTTP 5xx status codes (500, 502, 503, 504).

The origin is unreachable (network connectivity issues).

The origin returns an invalid response (e.g., malformed headers).

Failover Behavior: CloudFront does NOT wait for the primary origin to recover before sending subsequent requests. Each request is evaluated independently. However, if a request fails over to the secondary origin, subsequent requests might still be sent to the primary origin (unless the primary is consistently failing). There is no sticky session or health check state—CloudFront does not maintain a global health status for origins.

Timeouts: The default origin timeout is 30 seconds for the response. If the origin does not respond within that time, CloudFront considers it a timeout and may fail over (timeout is treated as a failure). You can configure the origin response timeout (1-60 seconds) and the keep-alive timeout (1-60 seconds).

Retry Attempts: CloudFront will retry the request to the secondary origin only once per request. If the secondary origin also fails, CloudFront returns an error to the viewer.

Configuration and Verification

You configure origin groups in the CloudFront console under the Origins tab. When creating or editing a distribution, you can add an origin group and specify the origins. All origins in a group must be of the same type (e.g., all S3 buckets or all custom origins). You cannot mix S3 and custom origins in the same group.

Example of creating an origin group using AWS CLI:

aws cloudfront update-distribution --id EDFDVBD6EXAMPLE --distribution-config file://config.json

Where config.json contains:

{
  "Origins": {
    "Quantity": 2,
    "Items": [
      {
        "Id": "primary-origin",
        "DomainName": "primary.example.com",
        "CustomOriginConfig": {
          "HTTPPort": 80,
          "HTTPSPort": 443,
          "OriginProtocolPolicy": "https-only"
        }
      },
      {
        "Id": "secondary-origin",
        "DomainName": "secondary.example.com",
        "CustomOriginConfig": {
          "HTTPPort": 80,
          "HTTPSPort": 443,
          "OriginProtocolPolicy": "https-only"
        }
      }
    ]
  },
  "OriginGroups": {
    "Quantity": 1,
    "Items": [
      {
        "Id": "my-origin-group",
        "FailoverCriteria": {
          "StatusCodes": {
            "Quantity": 4,
            "Items": [500, 502, 503, 504]
          }
        },
        "Members": {
          "Quantity": 2,
          "Items": [
            {
              "OriginId": "primary-origin"
            },
            {
              "OriginId": "secondary-origin"
            }
          ]
        }
      }
    ]
  },
  "DefaultCacheBehavior": {
    "TargetOriginId": "my-origin-group",
    ...
  }
}

To verify failover, you can trigger a failure by making the primary origin return a 5xx error (e.g., by configuring a test endpoint). Then monitor CloudFront logs or use curl to see if the response comes from the secondary origin.

How It Interacts with Related Technologies

Route53: Origin groups provide failover at the CDN level, while Route53 provides DNS-level failover. They can be used together: CloudFront can point to a Route53 alias that uses failover routing, but that adds complexity. Typically, you would use origin groups for simple failover between two origins (e.g., two S3 buckets in different regions).

S3 Cross-Region Replication (CRR): Often used with origin groups. You replicate an S3 bucket to another region and set up an origin group with the primary bucket in one region and the secondary in another. If the primary bucket fails, CloudFront fails over to the replica.

Lambda@Edge: Can be used to customize failover behavior, such as rewriting requests or checking origin health before forwarding.

WAF: Web ACLs are applied at the distribution level, not per origin; they affect all origins in the group equally.

Important Exam Considerations

Origin groups only fail over on 5xx errors and connection failures. They do NOT fail over on 4xx errors (e.g., 404 Not Found). This is a common exam trap. If you need to fail over on 4xx, you must use Lambda@Edge or custom error pages.

The secondary origin must serve the same content as the primary. If they are different, users may see inconsistent data after failover.

Failover is per-request, not per-session. Each request independently may go to primary or secondary based on the current state. This means that if the primary is intermittently failing, requests may bounce between origins.

You can only have one origin group per distribution, but you can have multiple cache behaviors that each point to different origin groups (by using path patterns). However, the exam often simplifies this.

Origin groups are not supported for origins that use Origin Shield (a feature that aggregates requests from edge locations to a regional hub). If you use Origin Shield, you cannot use origin groups.

Step-by-Step Internal Mechanism

1.

Viewer requests content from CloudFront edge.

2.

Edge checks cache. If miss, it selects the origin group associated with the cache behavior.

3.

Edge sends request to primary origin (first in group).

4.

Primary origin responds. If response is 2xx or 3xx, CloudFront caches and serves it. If response is 5xx or connection error, CloudFront marks this as failure.

5.

CloudFront immediately sends the same request to the secondary origin (next in group). It does not wait for the primary to recover.

6.

Secondary origin responds. If successful, CloudFront caches and serves that response. If secondary also fails, CloudFront returns an error to viewer.

7.

CloudFront does not cache the failure response from the primary (unless it's a 4xx which is not a failover trigger).

Default Values and Limits

Maximum origins per distribution: 25 (soft limit, can be increased).

Maximum origin groups per distribution: 10 (soft limit).

Maximum origins per origin group: 10.

Failover status codes: default is 500, 502, 503, 504. You can customize this list to include other 5xx codes.

Origin response timeout: default 30 seconds, range 1-60 seconds.

Origin keep-alive timeout: default 5 seconds, range 1-60 seconds.

CloudFront does not support health checks for origins; it relies on actual request responses.

Common Exam Scenarios

Multi-region S3 failover: You have an S3 bucket in us-east-1 as primary and one in eu-west-2 as secondary. You enable CRR to keep them in sync. If us-east-1 becomes unavailable, CloudFront fails over to eu-west-2.

Application failover: You have an ALB in us-east-1 and another in us-west-2. You create an origin group with both ALBs. If the primary ALB returns 5xx, CloudFront sends requests to the secondary.

Static website hosting: You host a static site on S3 with two buckets in different regions. Origin groups handle regional outages.

Exam Traps

Trap: Origin groups fail over on any error. Reality: Only 5xx and connection errors. 4xx errors do not trigger failover.

Trap: You must configure health checks for origin groups. Reality: CloudFront does not support health checks; it uses actual request responses.

Trap: Failover is global and consistent. Reality: Each edge makes independent failover decisions. A request to a different edge may hit the primary even if another edge failed over.

Trap: Origin groups work with any origin type. Reality: All origins in a group must be the same type (S3 or custom).

Trap: Origin groups and Origin Shield can be used together. Reality: They are mutually exclusive.

Configuration Best Practices

Ensure secondary origin has identical content (use CRR for S3, or deploy same application in secondary region).

Set appropriate timeouts: if your application is slow, increase the origin response timeout to avoid unnecessary failover.

Monitor CloudFront logs to detect failover events. Use CloudWatch metrics like OriginLatency and 5xxErrorRate.

Test failover by intentionally making the primary origin return 5xx (e.g., by updating security groups to block CloudFront IPs).

Summary of Key Exam Points

Origin groups provide automatic failover on 5xx and network errors.

Secondary origin must have same content.

No health checks; failover is reactive.

Customizable status codes.

Cannot mix origin types in same group.

Cannot use with Origin Shield.

Each edge makes independent failover decisions.

Walk-Through

1

Viewer Requests Content

A viewer sends an HTTP request to a CloudFront distribution URL. The request is routed to the nearest edge location based on DNS resolution. At the edge, CloudFront checks its cache for the requested object. If the object is cached and not expired, CloudFront returns it directly without contacting the origin. If the object is not cached, CloudFront must fetch it from the origin. The edge determines which origin group to use based on the cache behavior's target origin ID (which points to an origin group). The edge then selects the primary origin from the group.

2

Edge Sends Request to Primary Origin

CloudFront forwards the HTTP request to the primary origin's domain name. The edge uses the origin's protocol policy (HTTP or HTTPS) and port. It also respects any custom headers configured. The edge waits for a response from the primary origin. The origin response timeout (default 30 seconds) is in effect. If the origin does not respond within this timeout, CloudFront treats it as a failure (connection timeout). During this wait, no other requests are sent to the secondary origin for this specific request.

3

Primary Origin Responds (Success or Failure)

The primary origin sends an HTTP response. CloudFront examines the status code. If the status code is 2xx or 3xx, CloudFront considers this a success. It caches the response according to the cache policy and returns it to the viewer. If the status code is 5xx (e.g., 500, 502, 503, 504) or if there is a network error (connection refused, TLS handshake failure, timeout), CloudFront marks this as a failure. Note: 4xx errors (e.g., 404) are NOT considered failures and will be returned to the viewer without failover.

4

Failover to Secondary Origin

Upon detecting a failure (5xx or network error), CloudFront immediately sends the same request to the secondary origin. It does not wait for the primary to recover. The secondary origin is the next origin in the group's member list. CloudFront uses the same protocol and headers. The secondary origin processes the request and returns a response. If the secondary origin returns a successful response (2xx/3xx), CloudFront caches that response and serves it to the viewer. If the secondary origin also fails (5xx or network error), CloudFront returns an error to the viewer (the specific error depends on the situation; typically a 502 Bad Gateway).

5

Edge Caches and Serves Response

Once CloudFront receives a successful response from either the primary or secondary origin, it caches the object at the edge location according to the cache behavior settings (TTL, cache policy). The response is then sent to the viewer. If the failover occurred, the cached object is associated with the origin group, not a specific origin. Subsequent requests for the same object from this edge will be served from cache, regardless of which origin originally provided it. However, if the object is not cached (e.g., dynamic content), each request may again go through the failover process.

What This Looks Like on the Job

In production, CloudFront Origin Groups are commonly used for disaster recovery across AWS regions. For example, a media streaming company hosts its video assets in an S3 bucket in us-east-1 (primary) and replicates them to a bucket in eu-west-1 (secondary) using S3 Cross-Region Replication. They configure a CloudFront distribution with an origin group containing both buckets. During normal operation, all requests go to us-east-1. If us-east-1 experiences a regional outage, CloudFront automatically fails over to eu-west-1. Users experience minimal interruption, only a slight latency increase due to the secondary region being farther from some viewers. The company must ensure that replication lag is minimal (usually within 15 minutes) and that the secondary bucket has the same permissions and configurations (e.g., CORS, static website hosting). A common misconfiguration is forgetting to enable CRR for existing objects, leading to missing content on failover.

Another scenario is an e-commerce platform with an application running on EC2 behind an Application Load Balancer (ALB) in two regions. They create an origin group with both ALBs. The primary ALB is in us-west-2, secondary in ap-southeast-1. During a deployment that causes the primary ALB to return 5xx errors, CloudFront fails over to the secondary. However, because the failover is per-request, if the primary recovers intermittently, requests may bounce between regions, causing inconsistent user experience (e.g., cart data stored in one region may not be available in the other). To avoid this, they use session affinity (sticky sessions) on the ALB and ensure that the application is stateless or uses a global database like DynamoDB Global Tables.

A third scenario is a global software download site using CloudFront to distribute binary files. They use an origin group with two S3 buckets in different regions, but they also want to failover on 4xx errors (e.g., if a file is accidentally deleted from primary). Since origin groups do not trigger on 4xx, they implement a Lambda@Edge function that checks if the primary returns a 404 and then rewrites the request to the secondary origin. This custom logic adds complexity but covers the gap. Performance-wise, failover adds latency (typically 1-2 seconds) due to the failed attempt to the primary. To minimize impact, they set a low origin response timeout (e.g., 10 seconds) and use small origin groups (2 origins).

How SAA-C03 Actually Tests This

The SAA-C03 exam tests CloudFront Origin Groups under Objective 2.5: 'Design resilient architectures.' Specifically, you need to know when to use origin groups versus other HA mechanisms like Route53 failover, Multi-AZ, or global accelerator. The exam presents scenario-based questions where you must choose the most cost-effective and efficient solution.

Common wrong answers: 1. 'Use Route53 failover routing with health checks.' While this works, it is slower (DNS propagation) and less seamless than CloudFront origin groups. The exam favors origin groups for CDN-based failover. 2. 'Configure CloudFront with multiple origins but no origin group.' Without an origin group, CloudFront will not automatically fail over; it will only use the primary origin. This is a trap: candidates think multiple origins automatically provide HA. 3. 'Set up a custom error page to redirect to another origin.' This works but is not automatic and requires manual configuration. The exam expects you to know that origin groups handle this automatically. 4. 'Use Lambda@Edge to implement failover.' While possible, it's more complex than origin groups. The exam tests when to use built-in features.

Key numbers and terms:

Failover triggers: HTTP 5xx (500, 502, 503, 504) and network errors.

Not triggered: 4xx errors.

Customizable status codes: Yes, you can specify which 5xx codes trigger failover.

Maximum origins per group: 10.

Maximum origin groups per distribution: 10.

Cannot mix S3 and custom origins in same group.

Cannot use with Origin Shield.

No health checks; reactive failover.

Edge cases:

If the secondary origin also fails, CloudFront returns a 502 error.

If the primary origin returns a 503 (Service Unavailable) but the secondary returns 200, the viewer gets the content from secondary.

If the primary origin is slow but returns a 200 after 35 seconds (beyond default timeout), CloudFront might have already failed over to secondary (if timeout is set lower). This can cause duplicate requests.

Failover is per-edge: a viewer in Tokyo might get content from secondary while a viewer in New York still gets from primary if the primary is only failing for Tokyo.

How to eliminate wrong answers:

If the question mentions 'automatic failover for CDN' and '5xx errors', origin group is the answer.

If the question mentions 'health checks' or 'DNS', think Route53.

If the question mentions 'multi-region active-active', consider Global Accelerator.

If the question mentions 'cost-effective' and 'simple', origin groups are preferred over Lambda@Edge or custom solutions.

Key Takeaways

CloudFront Origin Groups provide automatic failover on HTTP 5xx and network errors, but NOT on 4xx errors.

All origins in an origin group must be of the same type (all S3 or all custom).

Failover is per-request and per-edge location; there is no global health state.

Origin groups are mutually exclusive with Origin Shield.

The secondary origin must serve identical content to the primary to avoid inconsistencies.

Customizable failover status codes (default: 500, 502, 503, 504).

Maximum 10 origins per group and 10 origin groups per distribution (soft limits).

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

CloudFront Origin Groups

Works at CDN edge level, failover is fast (sub-second detection).

Reactive: fails over only on 5xx/network errors from actual requests.

No health checks required; uses real traffic.

All origins in group must be same type (S3 or custom).

Best for CloudFront distributions with multiple origins.

Route53 Failover Routing

Works at DNS level, failover depends on TTL (usually 60 seconds or more).

Proactive: uses health checks to monitor origin health.

Requires health check configuration and associated costs.

Can point to any endpoint (S3, ELB, EC2, etc.) regardless of type.

Best for routing traffic to different regions or services not behind CloudFront.

Watch Out for These

Mistake

CloudFront Origin Groups fail over on any HTTP error, including 4xx.

Correct

Origin groups only fail over on HTTP 5xx status codes (500, 502, 503, 504) and network/connection errors. 4xx errors like 404 or 403 do not trigger failover. If you need failover on 4xx, you must use Lambda@Edge or custom error pages.

Mistake

CloudFront performs health checks on origins to decide failover.

Correct

CloudFront does NOT have health checks. Failover is reactive: it only happens when an actual request to the primary origin returns a 5xx or fails. There is no proactive health monitoring.

Mistake

Once CloudFront fails over to a secondary origin, all subsequent requests go to the secondary until the primary recovers.

Correct

Failover is per-request. Each request independently goes through the failover logic. If the primary origin recovers and returns a successful response for the next request, CloudFront will use it again. There is no persistent state or sticky failover.

Mistake

You can mix S3 and custom origins (like ALB) in the same origin group.

Correct

All origins in an origin group must be of the same type: either all S3 origins or all custom origins. You cannot mix them. This is a common exam trap.

Mistake

Origin groups can be used with Origin Shield for additional resiliency.

Correct

Origin Shield and Origin Groups are mutually exclusive. If you enable Origin Shield on a distribution, you cannot use origin groups. You must choose one or the other.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can CloudFront origin groups fail over on 404 errors?

No, origin groups only fail over on HTTP 5xx errors (500, 502, 503, 504) and network/connection errors. 4xx errors like 404 are not considered failures. If you need to fail over on 4xx, you must implement a custom solution using Lambda@Edge to check the response and redirect to a secondary origin.

Do I need to enable S3 Cross-Region Replication for origin groups with S3 buckets?

Yes, if you want the secondary bucket to have the same content as the primary. Origin groups do not replicate data; they only route requests. You must use S3 CRR or another mechanism to keep buckets in sync. Without replication, users may see missing or outdated content after failover.

Can I use CloudFront origin groups with an Application Load Balancer as origin?

Yes, you can use custom origins like ALBs. You need to create an origin group with two custom origins, each pointing to an ALB in different regions or availability zones. Ensure both ALBs serve the same application and are configured similarly.

How does CloudFront decide which origin to use after a failover?

Each request is evaluated independently. CloudFront always tries the primary origin first. If the primary returns a 5xx or fails, it tries the secondary. If the primary later returns a successful response for a different request, CloudFront will use it again. There is no memory of previous failures.

Can I have multiple origin groups in one CloudFront distribution?

Yes, you can have up to 10 origin groups per distribution. Each cache behavior can point to a different origin group using path patterns. For example, '/images/*' could use one origin group and '/api/*' another.

What happens if both origins in a group fail?

CloudFront returns an error to the viewer. Typically, you will see a 502 Bad Gateway error. There is no further retry mechanism. To handle this, you might add more origins or use a different failover strategy.

Does CloudFront origin group failover work with Lambda@Edge?

Yes, Lambda@Edge can be used to customize failover behavior. For example, you can write a Lambda function that checks the origin response and triggers a redirect to a secondary origin if certain conditions are met. However, origin groups themselves are simpler and preferred when the default failover criteria suffice.

Terms Worth Knowing

Ready to put this to the test?

You've just covered CloudFront Origin Groups and Failover — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?