AWS Resilience Hub is a managed service that helps you define, validate, and track the resilience of your applications. For the SAA-C03 exam, this topic appears in Domain 2 (Resilient Architectures) under Objective 2.6: 'Choose appropriate resilient storage and compute.' While not a major topic (roughly 2–4% of questions), understanding Resilience Hub's role in assessing and improving disaster recovery posture is essential. This chapter covers its core functionality, integration with other AWS services, and how to interpret its outputs for exam scenarios.
Jump to a section
Imagine a large corporation with multiple office buildings, each with its own emergency response plan. The company hires a Resilience Coordinator whose job is to test these plans, identify weaknesses, and ensure the company can survive a disaster. The Coordinator doesn't run the buildings—the building managers do—but they set the standards, run drills (Resilience Tests), and produce a report card (Resilience Score) showing how each building would fare. If a building's plan fails, the Coordinator suggests improvements (recommendations). The Coordinator also compares the plans against industry best practices (AWS Well-Architected Framework) and regulatory requirements. The building managers can then apply the fixes and ask the Coordinator to re-evaluate. The Coordinator doesn't actually fix anything—they only assess and advise. In AWS, Resilience Hub plays the same role: it assesses your application's resilience posture by running controlled failure experiments (via AWS Fault Injection Service integration) and evaluating your architecture against best practices, producing a score and actionable recommendations. It does not deploy resources or change configurations—it only observes, tests, and reports.
What is AWS Resilience Hub?
AWS Resilience Hub is a centralized service that enables you to define, validate, and track the resilience of your AWS applications. It was launched in 2021 and is part of the AWS Well-Architected Framework tooling. Its primary purpose is to help you assess whether your application can withstand disruptions, such as infrastructure failures, network outages, or data center failures, and to provide recommendations for improving resilience.
Resilience Hub is not an automated remediation service—it does not make changes to your infrastructure. Instead, it acts as an assessment and guidance tool. You define an application by grouping its AWS resources (e.g., EC2 instances, RDS databases, Load Balancers) and then run resilience assessments. The service evaluates your application against best practices from the AWS Well-Architected Framework and against your own defined Recovery Point Objective (RPO) and Recovery Time Objective (RTO) targets.
How Resilience Hub Works Internally
Application Definition: You create an application in Resilience Hub by selecting a region and then adding resources. You can do this manually by selecting individual resources (e.g., EC2, RDS, ELB) or by importing a CloudFormation stack or a Terraform state file. The service maps dependencies between resources using tags and resource types.
Policy Definition: You define a resilience policy that specifies your RPO and RTO targets. For example, you might set RPO = 1 hour and RTO = 4 hours. The policy can also include a recovery point objective for data (e.g., how recent the data must be after recovery) and a recovery time objective for the application (e.g., how quickly it must be back online).
3. Resilience Assessment: Resilience Hub runs an assessment that includes: - Static analysis: It evaluates the application's architecture against AWS best practices. For example, it checks if EC2 instances are in an Auto Scaling group across multiple Availability Zones (AZs), if RDS has Multi-AZ enabled, if there are single points of failure. - Dynamic testing (optional): If you have enabled AWS Fault Injection Service (FIS) experiments, Resilience Hub can use them to perform controlled failure injections (e.g., terminate EC2 instances, throttle RDS). It then observes how the application behaves and records any failures.
Resilience Score: After the assessment, Resilience Hub produces a resilience score from 0 to 100. This score indicates how well your application meets your policy targets. A score of 100 means your application is fully compliant with your RPO/RTO goals. The score is calculated based on the number of recommendations that are not implemented, weighted by their impact.
Recommendations: The service generates a list of actionable recommendations to improve resilience. Each recommendation includes a description, the resources affected, the estimated impact on the score, and links to documentation. Recommendations might include enabling Multi-AZ for RDS, adding an Auto Scaling group, or setting up cross-region replication.
Tracking Over Time: Resilience Hub retains the history of assessments, allowing you to track improvements over time. You can also set up notifications via Amazon EventBridge to alert you when new assessments complete or when the resilience score drops below a threshold.
Key Components and Defaults
Application: A logical grouping of resources. You can have up to 100 applications per account per region (soft limit).
Resilience Policy: Defines RPO and RTO. Default values: RPO = 1 hour, RTO = 4 hours. You can customize per application.
Resilience Score: 0–100. Calculated based on the recommendations not implemented. The exact formula is not publicly documented, but it's weighted by severity.
Recommendations: Categorized as high, medium, or low impact. High-impact recommendations can increase the score by >20 points if resolved.
Assessment Duration: Typically 1–5 minutes for static analysis. If FIS experiments are used, it can take longer (up to 30 minutes).
Pricing: Resilience Hub is free to use; you only pay for the underlying resources (e.g., FIS experiments, CloudWatch metrics).
Configuration and Verification
To create an application via AWS CLI:
aws resiliencehub create-app --name MyApp --region us-east-1 --resource-mappings file://mappings.jsonWhere mappings.json contains resource identifiers (ARNs).
To run an assessment:
aws resiliencehub start-assessment --app-arn arn:aws:resiliencehub:us-east-1:123456789012:app/MyApp/abc123To view the resilience score:
aws resiliencehub describe-app --app-arn arn:aws:resiliencehub:us-east-1:123456789012:app/MyApp/abc123In the console, you can see the score on the application dashboard alongside a list of recommendations.
Integration with Related Technologies
AWS Fault Injection Service (FIS): Resilience Hub can trigger FIS experiments to inject failures (e.g., stop EC2 instances, throttle RDS) and observe how the application responds. This is optional but provides a more accurate assessment.
AWS CloudFormation: You can import a CloudFormation stack to automatically discover resources.
AWS Config: Resilience Hub uses AWS Config to track resource configurations and compliance.
Amazon EventBridge: You can send assessment completion events to EventBridge for automation (e.g., send a notification via SNS).
AWS Well-Architected Tool: Resilience Hub aligns with the Well-Architected Framework's reliability pillar. Recommendations often overlap.
Important Exam Notes
Resilience Hub is regional—you must create applications in each region where you operate.
It does not automatically remediate issues. It only provides recommendations.
The resilience score is based on your policy (RPO/RTO), not on an absolute standard.
The service supports both single-region and multi-region applications, but it only assesses resources within the same region as the application definition.
For the exam, remember that Resilience Hub is used for post-deployment resilience validation, not for designing architectures (though it can guide changes).
It integrates with FIS for dynamic testing, but FIS is not required for basic assessments.
Common Exam Scenarios
A company wants to validate that its application meets a 1-hour RTO and 15-minute RPO. They should use Resilience Hub to define a policy and run an assessment.
An architect is asked to improve the resilience score from 50 to 80. They should follow the recommendations provided by Resilience Hub.
A question might describe a scenario where an application fails a resilience assessment because it has a single EC2 instance in one AZ. The recommendation would be to use an Auto Scaling group across multiple AZs.
Step-by-Step: Running a Resilience Assessment
Define the application – Group resources manually or via CloudFormation.
Set the resilience policy – Specify RPO and RTO.
Run the assessment – Static analysis starts immediately.
Review the score and recommendations – Identify critical issues.
Implement changes – Manually apply recommendations (e.g., enable Multi-AZ).
Re-run assessment – Confirm score improvement.
Set up notifications – Use EventBridge to alert on score changes.
Limits and Quotas
Maximum number of applications per region: 100 (can be increased via service quota request).
Maximum number of resources per application: 1,000.
Assessment concurrency: 5 assessments per account per region.
Summary of Resilience Hub vs. Other Services
AWS Resilience Hub vs. AWS Well-Architected Tool: Both assess architectures. Well-Architected Tool is broader (all five pillars), while Resilience Hub focuses specifically on resilience and includes dynamic testing via FIS.
AWS Resilience Hub vs. AWS Fault Injection Service: FIS is the mechanism to inject failures; Resilience Hub orchestrates the tests and interprets results.
AWS Resilience Hub vs. AWS Backup: Backup is about data protection; Resilience Hub assesses the overall application resilience, including compute, networking, and data.
Exam Tips
Know that Resilience Hub is free (no additional cost).
Understand that the resilience score is relative to your policy, not absolute.
Remember that Resilience Hub cannot automatically fix issues; it only recommends.
Be aware that it integrates with FIS but does not require it.
For multi-region applications, you must create separate applications in each region.
The service is regional; it does not span regions.
Conclusion
AWS Resilience Hub is a valuable tool for validating and improving the resilience of your applications. On the SAA-C03 exam, expect questions that test your understanding of its purpose, its integration with FIS, and its role in meeting RPO/RTO targets. Remember that it is an assessment and recommendation service, not an automated remediation tool.
Define the Application
In the AWS Management Console, navigate to Resilience Hub and choose 'Create application'. Provide a name and select a region. You can add resources manually by ARN or import from a CloudFormation stack or Terraform configuration. The service will discover resource dependencies based on tags and relationships (e.g., an EC2 instance behind an ALB). This step is critical because the assessment scope is limited to the resources you include. If you omit a critical component (e.g., an RDS instance), the assessment will not cover it. For exam purposes, remember that you can also use AWS Config to track resource changes, but Resilience Hub does not auto-discover resources outside the defined application.
Set the Resilience Policy
After defining the application, you set a resilience policy that specifies your Recovery Point Objective (RPO) and Recovery Time Objective (RTO). For example, RPO = 1 hour (meaning you can tolerate up to 1 hour of data loss) and RTO = 4 hours (meaning the application must be fully operational within 4 hours of a failure). You can also define a 'recovery point objective for data' and 'recovery time objective for application'. The policy is used to evaluate whether your current architecture can meet these targets. If your policy is too aggressive (e.g., RPO of 5 minutes), the score will likely be lower because few architectures can achieve that without advanced replication. The exam may test that the policy is user-defined and not preset.
Run the Assessment
Once the application and policy are defined, you initiate an assessment. Resilience Hub performs a static analysis of your architecture, checking for single points of failure, lack of redundancy, and insufficient backup configurations. Optionally, if you have enabled AWS Fault Injection Service (FIS) experiments, Resilience Hub can trigger them to inject failures (e.g., terminate EC2 instances, corrupt EBS volumes) and observe the application's behavior. The assessment typically completes within a few minutes for static analysis, but can take up to 30 minutes if dynamic testing is used. During the assessment, Resilience Hub collects data from AWS Config, CloudTrail, and other services to evaluate compliance.
Review Score and Recommendations
After the assessment completes, Resilience Hub displays a resilience score (0–100) and a list of recommendations. Each recommendation includes a severity (high, medium, low), the affected resources, and the estimated impact on the score if implemented. For example, a high-severity recommendation might be 'Enable Multi-AZ deployment for RDS instance db-xyz' with an estimated +20 point improvement. You can click on recommendations to view detailed guidance and links to relevant documentation. The score is calculated based on how many recommendations are not implemented, weighted by severity. The exam may ask you to interpret a score or identify which recommendation would have the greatest impact.
Implement Changes and Re-assess
Based on the recommendations, you manually modify your infrastructure (e.g., enable Multi-AZ, add an Auto Scaling group, set up cross-region replication). Resilience Hub does not make changes for you. After implementing changes, you can run another assessment to see the updated score. You can also set up a schedule for periodic assessments (e.g., weekly) to ensure ongoing compliance. Resilience Hub retains the history of assessments, allowing you to track improvements over time. For the exam, remember that the service is purely advisory; automation of remediation is not part of Resilience Hub.
Enterprise Scenario 1: Financial Services Compliance
A large bank runs a critical payment processing application on AWS. Regulatory requirements mandate that the application must have an RTO of 1 hour and an RPO of 15 minutes. The bank uses Resilience Hub to define the application (including EC2 instances, RDS Multi-AZ, ALB, and Route 53 health checks) and sets the policy accordingly. The initial assessment reveals a resilience score of 45. Recommendations include enabling cross-region replication for the RDS database (currently only Multi-AZ in one region) and adding an Auto Scaling group with a minimum of 2 instances across AZs. The bank implements these changes, re-runs the assessment, and achieves a score of 95. The bank also sets up EventBridge notifications to alert the operations team if the score drops below 80 after any infrastructure changes.
Enterprise Scenario 2: E-commerce Platform Disaster Recovery
An e-commerce company runs a multi-tier application with a web tier, application tier, and database tier. They want to validate their disaster recovery plan without causing actual downtime. They use Resilience Hub with Fault Injection Service to simulate failures. They create an application containing the entire stack and run an assessment with FIS experiments that terminate random EC2 instances and throttle the database. Resilience Hub observes that the application fails to recover within the defined RTO because the Auto Scaling group's launch configuration is outdated. The recommendations include updating the launch configuration and adding an Elastic Load Balancer health check. After fixes, they re-run the assessment and achieve a passing score. This proactive testing prevented a real disaster during a peak shopping season.
Common Misconfigurations
Incomplete resource mapping: Engineers often forget to include all dependent resources, leading to an incomplete assessment. For example, including an EC2 instance but not its attached EBS volume or the security group rules. Resilience Hub relies on the resources you explicitly add.
Overly aggressive policy: Setting RPO to 1 minute and RTO to 5 minutes for a simple web app will result in a low score and many recommendations that may not be cost-effective. The policy should reflect actual business requirements.
Ignoring recommendations: Some teams run an assessment once and never act on the recommendations. This defeats the purpose of the service. Resilience Hub is most valuable when used continuously as part of a CI/CD pipeline.
Scale and Performance Considerations
Resilience Hub itself has no performance impact on your applications because it only reads configuration data and optionally runs FIS experiments (which you control). For large applications with hundreds of resources, the assessment may take longer but still completes within minutes. The service can handle up to 1,000 resources per application. For larger environments, you can split the application into multiple applications by component (e.g., frontend, backend, database).
What SAA-C03 Tests on Resilience Hub
Resilience Hub falls under Objective 2.6: 'Choose appropriate resilient storage and compute.' The exam tests your understanding of its purpose, its integration with other services, and its role in meeting RPO/RTO targets. Expect 1–2 questions that present a scenario and ask which service to use. Key objective codes: Domain 2 (Resilient Architectures), Objective 2.6.
Common Wrong Answers and Why Candidates Choose Them
'Use AWS Fault Injection Service directly instead of Resilience Hub' – Candidates think FIS is the same as Resilience Hub. Reality: FIS is the tool to inject failures; Resilience Hub orchestrates the test and interprets results. The correct answer is Resilience Hub because it provides a score and recommendations.
'Resilience Hub automatically fixes issues' – Candidates assume it's an automated remediation service. Reality: It only recommends; you must implement changes manually or via other automation (e.g., AWS Config auto-remediation).
'Resilience Hub works across regions' – Candidates think it's global. Reality: It is regional; you must create separate applications per region.
'The resilience score is absolute' – Candidates think 100 means perfect. Reality: The score is relative to your policy; a score of 100 means your architecture meets your defined RPO/RTO, not that it's bulletproof.
Specific Numbers and Terms That Appear on the Exam
RPO (Recovery Point Objective): Maximum acceptable data loss (e.g., 1 hour).
RTO (Recovery Time Objective): Maximum acceptable downtime (e.g., 4 hours).
Resilience score: 0–100.
Recommendation severity: High, Medium, Low.
Integration with FIS: Optional, for dynamic testing.
Regional service: Cannot span regions.
Edge Cases and Exceptions the Exam Loves
What if you don't define a policy? Resilience Hub uses default values (RPO=1 hour, RTO=4 hours).
Can Resilience Hub assess on-premises resources? No, only AWS resources.
Does Resilience Hub support all AWS services? It supports common services like EC2, RDS, ELB, Auto Scaling, S3, DynamoDB, etc. Check the latest documentation for full list.
What happens if you exceed resource limits? You cannot add more than 1,000 resources per application; you must split.
How to Eliminate Wrong Answers Using the Underlying Mechanism
If the question asks for a service that automates failure testing and provides a resilience score, eliminate FIS (it only injects failures) and choose Resilience Hub.
If the question asks for a service that automatically remediates non-compliant resources, eliminate Resilience Hub (it does not remediate). Choose AWS Config auto-remediation or Systems Manager Automation.
If the question mentions multi-region resilience, remember that Resilience Hub is regional; you would need multiple applications. The correct answer might be a combination of services (e.g., Route 53, Global Accelerator).
Exam Pro Tip
Read the question carefully: if the scenario describes a need to validate an application's ability to meet RPO/RTO and get recommendations for improvement, the answer is Resilience Hub. If it describes a need to inject failures to test behavior, the answer is FIS. If it describes continuous compliance monitoring, the answer might be AWS Config.
Resilience Hub is a regional service that assesses application resilience against user-defined RPO and RTO targets.
It provides a resilience score (0–100) and actionable recommendations, but does not automatically remediate issues.
Integration with AWS Fault Injection Service (FIS) is optional for dynamic failure testing.
Default RPO is 1 hour, default RTO is 4 hours if no policy is defined.
Maximum of 100 applications per region and 1,000 resources per application.
Resilience Hub is free; you only pay for underlying resources like FIS experiments.
Common exam trap: confusing Resilience Hub with FIS or assuming it's global.
These come up on the exam all the time. Here's how to tell them apart.
AWS Resilience Hub
Provides a resilience score and recommendations
Assesses architecture against RPO/RTO policy
Can orchestrate FIS experiments but not required
Free to use (no additional cost)
Regional service
AWS Fault Injection Service (FIS)
Injects failures into resources (e.g., terminate EC2, throttle RDS)
Does not provide a score or recommendations
Used as a tool by Resilience Hub for dynamic testing
Priced per experiment minute
Can be used independently for chaos engineering
AWS Resilience Hub
Focuses specifically on resilience (RPO/RTO)
Includes dynamic testing via FIS
Provides a numerical resilience score
Integrates with EventBridge for notifications
Free to use
AWS Well-Architected Tool
Covers all five pillars (Security, Reliability, etc.)
No dynamic testing; purely static review
Provides a qualitative assessment (no numerical score)
Can generate a PDF report
Free to use
Mistake
Resilience Hub automatically fixes resilience issues.
Correct
Resilience Hub only provides recommendations. It does not make any changes to your infrastructure. You must implement the recommendations manually or via other automation tools.
Mistake
Resilience Hub is a global service that works across multiple regions.
Correct
Resilience Hub is a regional service. You must create a separate application in each region where your resources reside. It cannot assess resources in another region.
Mistake
A resilience score of 100 means the application is completely fault-tolerant.
Correct
The score is relative to your defined policy (RPO/RTO). A score of 100 means your architecture meets your specific targets, not that it is invulnerable to all failures.
Mistake
Resilience Hub requires AWS Fault Injection Service to run assessments.
Correct
FIS is optional. Resilience Hub can perform static analysis without FIS. Dynamic testing via FIS is an additional feature for more accurate assessments.
Mistake
Resilience Hub can assess on-premises resources.
Correct
Resilience Hub only supports AWS resources. It cannot evaluate on-premises infrastructure or resources in other cloud providers.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
AWS Resilience Hub is a service that assesses your application's resilience against your defined RPO/RTO targets and provides a score and recommendations. AWS Fault Injection Service (FIS) is a tool to inject failures (e.g., terminate instances, corrupt volumes) to test how your application behaves. Resilience Hub can use FIS to perform dynamic testing, but FIS alone does not provide a score or recommendations. Think of Resilience Hub as the test coordinator and FIS as the tool that creates failures.
No, Resilience Hub does not automatically fix issues. It only provides recommendations. You must manually implement the changes or use other AWS services like AWS Config auto-remediation or Systems Manager Automation to automate fixes. Resilience Hub is purely an assessment and advisory service.
Resilience Hub is a regional service. You must create a separate application in each AWS region where your resources are deployed. It cannot assess resources in another region from a single application. For multi-region applications, you need to create an application in each region and run separate assessments.
The resilience score is a number from 0 to 100 that indicates how well your application meets your defined RPO and RTO targets. A score of 100 means your architecture fully satisfies your policy. The score is calculated based on the recommendations that are not implemented, weighted by severity (high, medium, low). The exact formula is not publicly documented, but resolving high-severity recommendations has the greatest impact on the score.
Yes, AWS Resilience Hub itself is free to use. You only pay for the underlying AWS resources that you use, such as AWS Fault Injection Service experiments (if you enable dynamic testing), CloudWatch metrics, and EventBridge events. There is no additional cost for the resilience assessments or recommendations.
Resilience Hub supports a wide range of AWS resources, including EC2 instances, Auto Scaling groups, Elastic Load Balancers, RDS databases, DynamoDB tables, S3 buckets, and more. For a complete list, refer to the AWS documentation. You can add resources manually or import from a CloudFormation stack or Terraform state.
Navigate to the AWS Resilience Hub console, choose 'Create application', provide a name and region. Then add resources manually or import from CloudFormation/Terraform. Define a resilience policy with your RPO and RTO targets. Run an assessment to get your score and recommendations. Optionally, enable FIS experiments for dynamic testing. That's it—you can now track your resilience over time.
You've just covered AWS Resilience Hub — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.
Done with this chapter?