This chapter covers CloudWatch Synthetics Canaries, a fully managed service for running programmable, automated tests on your web applications and endpoints. For the SOA-C02 exam, you need to understand how canaries work, their configuration options, integration with CloudWatch metrics and logs, and common use cases. Questions on this topic appear in approximately 5-8% of the exam, typically focusing on canary creation, scheduling, and interpreting results. Mastering this chapter will help you answer scenario-based questions about monitoring application availability and functionality.
Jump to a section
Imagine you run a chain of coffee shops and you want to ensure every location's ordering system works correctly 24/7. You hire a team of robotic testers—each is a small, self-contained robot that you can program to perform a specific routine: walk in, place an order for a latte, wait for a response, and report back whether the order was successful. You can schedule these robots to visit at 5-minute intervals, and you can have dozens of them running simultaneously across different shops. If a robot fails to complete its routine or gets an unexpected response, it immediately sends a detailed failure report with a screenshot and logs. You can also program them to try different order types (e.g., a complicated drink with modifiers) to catch edge cases. These robots are fully managed—you don't have to maintain them, charge them, or replace them. They run inside your shops (within your VPC) or from external locations to mimic real customer behavior. This is exactly how CloudWatch Synthetics Canaries work: they are Node.js or Python scripts that run on a managed Lambda runtime, executing predefined or custom steps against your endpoints, capturing screenshots, and emitting metrics and logs for analysis.
What is CloudWatch Synthetics?
CloudWatch Synthetics is a service that allows you to create canaries—configurable, script-based monitors that run on a schedule to test your endpoints and APIs. A canary is essentially a Node.js or Python script that runs on a managed Lambda runtime, executing steps like HTTP GET/POST requests, verifying responses, and optionally capturing screenshots of web pages. Canaries are used to monitor both public and private endpoints, including those inside a VPC.
Why Canaries Exist
Traditional monitoring checks (like HTTP health checks) only verify that a server responds with a 200 OK. They don't test actual functionality—e.g., does the login page load correctly? Canaries fill this gap by running multi-step workflows that simulate real user interactions. They also provide visual monitoring via screenshots, which can be compared over time to detect UI regressions.
How Canaries Work Internally
Canary Script: You write a script using the Synthetics SDK (available for Node.js and Python). The script uses helper methods like executeHttpStep() or executeWebStep() to perform actions. Each step can include assertions on status codes, response times, or content.
Lambda Execution: The script runs on an AWS-managed Lambda function. The Lambda runtime includes a headless Chromium browser (for web steps) or just an HTTP client (for API steps). The Lambda function is created automatically when you create a canary.
Scheduling: You define a schedule using a rate expression (e.g., rate(5 minutes)) or a cron expression (e.g., cron(0/5 * * * ? *)). The minimum interval is 1 minute.
Execution: At each scheduled time, AWS Lambda invokes the canary script. The script runs in a sandboxed environment with up to 3 minutes timeout per canary run (configurable up to 14 minutes).
Logging and Metrics: The canary automatically emits metrics to CloudWatch (e.g., SuccessPercent, Duration, Failed). It also writes logs to CloudWatch Logs in a log group named /aws/lambda/cwsyn-<canary-name>-<hash>.
Screenshots: If your script uses executeWebStep(), the canary captures a screenshot of the page after each step. Screenshots are stored in an S3 bucket (created automatically, named cwsyn-<canary-name>-<hash>). You can view screenshots in the CloudWatch console.
VPC Support: Canaries can run inside a VPC by specifying subnet IDs and security groups. This allows testing of internal endpoints without exposing them to the internet. The canary Lambda function attaches to the VPC using an Elastic Network Interface (ENI).
Key Components and Defaults
Canary Name: Unique identifier (1-21 characters, letters, numbers, hyphens).
Script Location: You can upload a script (.js or .py) or specify an S3 URI. The script must be a single file (no dependencies beyond what the runtime provides).
Runtime: Node.js 14.x, 16.x, or Python 3.8, 3.9, 3.10. The runtime includes the Synthetics SDK and browser binaries.
Schedule: Default is rate(1 hour). Can be as frequent as rate(1 minute).
Timeout: Default 3 minutes, maximum 14 minutes (840 seconds).
Memory: Default 1 GB, maximum 3 GB (for Lambda).
Environment Variables: You can pass up to 4 KB of environment variables.
Alarms: Canaries can automatically create CloudWatch alarms based on SuccessPercent (default threshold < 90% for 2 consecutive periods).
S3 Bucket: Automatically created with versioning enabled. Screenshots and artifacts are stored with a retention policy (default 30 days, configurable).
IAM Role: The canary uses a service-linked role AWSServiceRoleForCloudWatchSynthetics or a custom role you provide. The role must have permissions to write logs, metrics, and S3 buckets.
Configuration and Verification
To create a canary via AWS CLI:
aws synthetics create-canary \
--name my-canary \
--code { "Handler": "handler.handler", "Script": "fileb://myScript.js" } \
--artifact-s3-location s3://my-bucket/artifacts/ \
--execution-role-arn arn:aws:iam::123456789012:role/CloudWatchSyntheticsRole \
--schedule { "Expression": "rate(5 minutes)" } \
--run-config { "TimeoutInSeconds": 180, "MemoryInMB": 1000 } \
--runtime-version syn-nodejs-puppeteer-3.9To start a canary:
aws synthetics start-canary --name my-canaryTo get canary runs:
aws synthetics get-canary-runs --name my-canaryTo view metrics: CloudWatch console, navigate to Synthetics, select canary, view the Metrics tab.
Interaction with Related Technologies
CloudWatch Metrics: Canaries automatically emit metrics like SuccessPercent, Duration, Failed, FailedRequests. You can create alarms on these.
CloudWatch Logs: Logs are stored in /aws/lambda/cwsyn-<canary-name>-<hash>. You can set log retention.
CloudWatch Alarms: Canaries can auto-create an alarm on SuccessPercent with a threshold of 90% over 2 consecutive periods.
AWS X-Ray: Canaries can be integrated with X-Ray for tracing (optional).
VPC: Canaries can run inside a VPC to test private endpoints. They use ENIs and require VPC configuration (subnets, security groups).
S3: Artifacts (screenshots, HAR files) are stored in S3. You can configure lifecycle policies.
Script Structure Example (Node.js)
const { Synthetics } = require('Synthetics');
const log = require('SyntheticsLogger');
const handler = async () => {
const stepName = 'Test Homepage';
const url = 'https://example.com';
await Synthetics.executeHttpStep(stepName, url, {
method: 'GET',
headers: { 'User-Agent': 'Synthetics' },
expectedStatusCodes: [200],
});
log.info('Homepage test passed');
};
exports.handler = handler;Pricing
Canaries are billed per canary run. Each run costs $0.0012 (as of 2025). Additional charges apply for Lambda, CloudWatch Logs, and S3 storage. There is a free tier of 100 canary runs per month.
Define Canary Script
Write a Node.js or Python script using the Synthetics SDK. The script must export a `handler` function (Node.js) or define a `handler` (Python). Use helper functions like `Synthetics.executeHttpStep()` or `Synthetics.executeWebStep()` to perform actions. Each step should include assertions on status codes, response times, or content. The script can also use `SyntheticsLogger` for custom logging. The script file must be a single file (no external dependencies beyond the runtime). You can upload the script directly or reference an S3 URI.
Configure Canary Settings
In the CloudWatch console or via CLI/API, specify the canary name, script location, runtime version (e.g., syn-nodejs-puppeteer-3.9), schedule (rate or cron), execution role ARN, artifact S3 location, and optional VPC configuration (subnets, security groups). You can also set environment variables, timeout (default 180s, max 840s), memory (default 1000 MB, max 3000 MB), and alarm configuration. The canary will be created with a Lambda function and an S3 bucket for artifacts.
Schedule Execution
The canary runs on the defined schedule. At each scheduled time, AWS Lambda invokes the canary script. The Lambda function runs in a managed environment with the specified runtime. The script executes its steps sequentially. Each step can have its own timeout (default 30s per step, max 180s). The entire canary run must complete within the configured timeout (max 14 minutes). If the script exceeds the timeout, the run is marked as failed.
Capture Artifacts
During execution, the canary captures screenshots (if using web steps) and optionally HAR files. Screenshots are taken after each web step. The canary also logs all steps and results to CloudWatch Logs. After the run completes, artifacts are uploaded to the designated S3 bucket. The bucket name is auto-generated or user-specified. Artifacts are stored with a key prefix that includes the canary name and run ID. You can view screenshots directly in the CloudWatch Synthetics console.
Emit Metrics and Logs
After the run, the canary emits CloudWatch metrics such as `SuccessPercent` (0-100), `Duration` (in milliseconds), `Failed` (1 if any step failed, else 0), and `FailedRequests` (count of failed HTTP requests). These metrics are stored in the `CloudWatchSynthetics` namespace. The canary also writes logs to CloudWatch Logs in the log group `/aws/lambda/cwsyn-<canary-name>-<hash>`. You can create alarms on these metrics to trigger notifications.
Analyze Results
In the CloudWatch console, navigate to Synthetics, select the canary, and view the **Runs** tab. You can see a timeline of runs, their status (Passed/Failed), duration, and links to artifacts. For failed runs, you can view logs and screenshots to diagnose issues. The **Metrics** tab shows graphs of `SuccessPercent` and `Duration`. You can also use CloudWatch Logs Insights to query logs across multiple runs.
Enterprise Scenario 1: E-commerce Checkout Monitoring
A large e-commerce company needs to ensure their checkout process works end-to-end 24/7. They create a canary that runs every 5 minutes and performs a multi-step web workflow: add item to cart, go to checkout, fill in shipping details, select payment method, and submit order. The canary uses executeWebStep() for each page and captures screenshots. The script asserts that each page loads within 2 seconds and that the final order confirmation page contains the text 'Order Complete'. If any step fails, an alarm triggers a PagerDuty notification. The canary runs inside a VPC to test internal staging endpoints before public release. The team also uses the canary's screenshots to visually inspect UI changes after deployments. Misconfiguration: Initially, they set the timeout too low (60 seconds) causing failures during peak traffic. After increasing timeout to 180 seconds, tests passed reliably.
Enterprise Scenario 2: API Availability and Latency Monitoring
A financial services company exposes critical APIs for trading. They create a canary with a simple HTTP script that sends a POST request to https://api.example.com/trade with a test payload and expects a 200 response with a JSON body containing 'status: success'. The canary runs every 1 minute from multiple regions (by creating canaries in different AWS regions) to monitor global availability. They also measure latency using the Duration metric and set an alarm if latency exceeds 500 ms. The canary logs are used to debug authentication issues when API keys rotate. Performance consideration: Running canaries every minute from multiple regions can incur significant costs (approx $0.0012 per run * 60 runs/hour * 24 hours * 3 regions = $5.18/day). They optimize by reducing frequency for non-critical APIs.
Enterprise Scenario 3: Private Endpoint Testing After VPC Changes
A healthcare company runs a web application inside a VPC with no public access. They create a canary with VPC configuration (subnet IDs and security groups) to test internal endpoints. The canary runs a script that hits http://internal-app.example.com/health and expects a 200. After a network ACL change, the canary started failing because the security group didn't allow outbound traffic to the internal app. The team debugged by checking VPC flow logs and canary logs (which showed connection timeout). They updated the security group to allow traffic on port 80. This scenario highlights the importance of proper VPC configuration for canaries.
What the SOA-C02 Exam Tests
Objective 1.1: Implement monitoring and reporting. Canaries fall under CloudWatch Synthetics. You need to know how to create, configure, and interpret canary results.
Common Questions: Scenario-based: 'A company wants to monitor a multi-step web application workflow. Which service should they use?' Answer: CloudWatch Synthetics Canaries. Also, 'What metrics does a canary emit?' and 'How to configure a canary to run inside a VPC?'
Most Common Wrong Answers
Choosing CloudWatch Logs Insights over Canaries: Candidates see 'monitor web app' and think Logs Insights, but Logs Insights is for analyzing logs, not running synthetic tests.
Selecting Route 53 Health Checks: Route 53 health checks only test basic connectivity (TCP/HTTP/HTTPS) and cannot run multi-step scripts or capture screenshots.
Assuming canaries are only for public endpoints: Many forget that canaries support VPC endpoints via subnet and security group configuration.
Confusing canary schedule with Lambda schedule: Canary schedule is set in the canary configuration, not via CloudWatch Events. The Lambda function is managed internally.
Specific Numbers and Terms
Runtime versions: syn-nodejs-puppeteer-3.9, syn-python-selenium-2.0 (Python with Selenium).
Default timeout: 3 minutes (180 seconds). Max 14 minutes (840 seconds).
Default memory: 1 GB (1000 MB). Max 3 GB.
Minimum schedule interval: 1 minute.
Alarm threshold: Default SuccessPercent < 90% for 2 consecutive periods.
Artifact S3 bucket: Auto-generated name cwsyn-<canary-name>-<hash>. Retention default 30 days.
IAM role: AWSServiceRoleForCloudWatchSynthetics is automatically created.
Edge Cases and Exceptions
Canary fails due to script error: The canary run status shows 'Failed' but the canary itself is still active. You need to update the script.
VPC canary cannot reach internet: If the canary needs to access public endpoints, the VPC must have a NAT gateway or the canary must not be in a private subnet.
Screenshots not captured: Only captured if using executeWebStep(). For HTTP steps, no screenshots.
Canary run stuck: If the script hangs, the timeout will eventually kill it. Check logs for infinite loops.
How to Eliminate Wrong Answers
If the question mentions 'multi-step' or 'user interaction', it's likely a canary, not a simple health check.
If the question mentions 'screenshots', it must be a canary with web steps.
If the question mentions 'private VPC endpoint', look for VPC configuration in the canary.
If the question asks for 'metrics', remember SuccessPercent, Duration, Failed are the key ones.
CloudWatch Synthetics Canaries are fully managed, script-based monitors that run on a schedule to test endpoints with multi-step workflows.
Canary scripts are written in Node.js or Python using the Synthetics SDK and run on a managed Lambda runtime.
Canaries can test both public and private (VPC) endpoints by configuring subnets and security groups.
Each canary run can capture screenshots (if using web steps) and stores artifacts in an S3 bucket.
Key metrics emitted: SuccessPercent, Duration, Failed, FailedRequests in the CloudWatchSynthetics namespace.
Default timeout is 3 minutes (max 14 minutes); default memory is 1 GB (max 3 GB).
Minimum schedule interval is 1 minute; can use rate or cron expressions.
Canaries automatically create an IAM role (AWSServiceRoleForCloudWatchSynthetics) and an S3 bucket (cwsyn-<name>-<hash>).
The exam often tests the difference between canaries and Route 53 health checks: canaries for multi-step, health checks for simple endpoint monitoring.
Canary alarms default to SuccessPercent < 90% for 2 consecutive periods.
These come up on the exam all the time. Here's how to tell them apart.
CloudWatch Synthetics Canaries
Runs multi-step scripts (e.g., login, add to cart, checkout).
Captures screenshots and HAR files for visual debugging.
Can test private endpoints inside a VPC.
Emits detailed metrics like SuccessPercent and Duration per step.
Billed per canary run ($0.0012 per run).
Route 53 Health Checks
Only checks basic connectivity (TCP, HTTP, HTTPS) with optional string matching.
No screenshots or multi-step logic.
Only tests public endpoints (unless using private hosted zones with Route 53 resolver).
Emits metrics like HealthCheckPercentageHealthy and ResponseTime.
Billed per health check ($0.50 per month per check).
Mistake
Canaries can only test public endpoints.
Correct
Canaries can test private endpoints inside a VPC by specifying subnet IDs and security groups. The Lambda function attaches to the VPC via an ENI.
Mistake
Canaries are just CloudWatch Events that run a Lambda function.
Correct
Canaries are a managed service that automatically creates and manages the Lambda function, S3 bucket, and IAM role. You don't manage the underlying infrastructure.
Mistake
Canaries can run any arbitrary code with external dependencies.
Correct
Canary scripts must be a single file using only the built-in runtime and Synthetics SDK. You cannot include npm packages or external libraries beyond what the runtime provides.
Mistake
Screenshots are captured for all canary runs.
Correct
Screenshots are only captured if your script uses `executeWebStep()` (which launches a headless browser). HTTP-only scripts do not capture screenshots.
Mistake
Canaries can run indefinitely without timeout.
Correct
Each canary run has a configurable timeout (default 3 minutes, max 14 minutes). If the script exceeds this, the run is terminated and marked as failed.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
A canary is a script that runs on a schedule to test endpoints and emits metrics. A CloudWatch alarm monitors those metrics (or any metric) and triggers actions when a threshold is breached. Canaries can automatically create alarms for you, but they are separate services.
Yes. In your canary script, you can include authentication headers, such as API keys or bearer tokens. You can store sensitive values in environment variables (encrypted with KMS) and reference them in the script.
In the CloudWatch console, go to Synthetics, select your canary, click on a specific run, and then click the 'Screenshots' tab. You can also find screenshots in the S3 bucket under the run's prefix.
The canary run is marked as 'Failed'. The exception message and stack trace are logged in CloudWatch Logs. You can view the logs to debug. The canary itself remains active and will run again on its next schedule.
Yes. Canaries are regional resources. To test from multiple regions, you must create a canary in each region. There is no built-in multi-region canary.
When creating the canary, under 'Network settings', specify the VPC ID, subnet IDs, and security groups. The canary Lambda function will attach to the VPC via an ENI. Ensure the security group allows outbound traffic to the endpoint you are testing.
Each canary run costs $0.0012. At 1-minute intervals, that's 1440 runs per day, costing $1.728 per day per canary. Plus costs for Lambda, CloudWatch Logs, and S3 storage.
You've just covered CloudWatch Synthetics Canaries — now see how well it sticks with free SOA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?