This chapter covers AWS X-Ray debugging, a critical skill for the DVA-C02 exam. You will learn how X-Ray helps trace requests through distributed applications, identify performance bottlenecks, and debug errors. Approximately 10-15% of exam questions touch on monitoring and troubleshooting, with X-Ray appearing in several scenario-based questions. Mastery of X-Ray concepts, configuration, and integration with other AWS services is essential for passing the exam.
Jump to a section
Imagine a fleet of delivery drones flying between warehouses. Each drone has a flight recorder that logs every leg of its journey: takeoff (client request), waypoints (microservices), and landing (response). AWS X-Ray is like a central air traffic control system that collects these flight recorders from all drones. When a package is delayed, you can replay the exact route of a single drone, seeing which warehouse (service) caused the delay and how long it waited at each stop (subsegment duration). X-Ray's sampling rules are like air traffic control deciding to record only 10% of drones on a busy day to avoid data overload, but always recording every drone from critical VIP customers (sampling with reservoir and rate). The trace ID is the unique flight number, and segments are the individual drone's logs. Without X-Ray, you'd have to manually call each warehouse to ask about delays — impossible at scale.
What is AWS X-Ray?
AWS X-Ray is a distributed tracing service that helps developers analyze and debug production applications. It provides an end-to-end view of requests as they travel through your application, showing a map of the underlying components and identifying performance issues and errors. X-Ray is not a logging or monitoring service per se; it is a tracing service that collects data about requests and responses.
Why X-Ray Exists
Modern applications are built using microservices architectures, often spanning multiple AWS services like Lambda, API Gateway, DynamoDB, and EC2. When a user request fails or is slow, it's hard to pinpoint which component caused the issue. Traditional monitoring tools give you metrics per service but not the ability to trace a single request across services. X-Ray solves this by providing a trace ID that follows the request through the entire system.
How X-Ray Works Internally
X-Ray works by collecting data from segments and subsegments. A segment records the work done by a single service (e.g., a Lambda function, an API Gateway request). Subsegments break down the work within a segment into finer granularity (e.g., a database query, an HTTP call to another service). Each segment contains metadata such as start time, end time, error status, and annotations.
The core mechanism: - Trace ID: A unique identifier for a request, generated by the first service that receives the request (e.g., API Gateway or your application). It is passed downstream via HTTP headers (X-Amzn-Trace-Id). - Sampling: To reduce cost and data volume, X-Ray uses sampling rules. By default, it records the first request per second and 5% of additional requests (this is called the reservoir and rate). You can define custom sampling rules. - Annotations and Metadata: Annotations are key-value pairs indexed for search (e.g., for filtering traces). Metadata are key-value pairs not indexed (for extra context). - Service Map: X-Ray automatically generates a service map showing the connections between services and their average latencies.
Key Components, Values, Defaults, and Timers
Segment: The record of work done by a service. A segment has a name, ID, trace ID, start time (epoch seconds), end time (epoch seconds), and optional subsegments.
Subsegment: A nested section within a segment. Subsegments can have their own subsegments, forming a tree.
Sampling Rules: Two types: reservoir (fixed number of traces per second to record) and rate (percentage of additional traces). Default: reservoir=1, rate=0.05 (5%).
Trace Header: X-Amzn-Trace-Id with format: Root=1-5759e988-bd862e3fe1be46a994272793;Parent=53995c3f42cd8ad8;Sampled=1
Daemon: The X-Ray daemon (a process) runs on your EC2 instances or containers, collecting segment data and sending it to the X-Ray API. It listens on UDP port 2000 by default.
X-Ray SDK: Available for Node.js, Java, .NET, Python, Ruby, and Go. It automatically captures incoming and outgoing HTTP requests, AWS SDK calls, and SQL queries.
AWS Services Integration: API Gateway, Lambda, Elastic Beanstalk, ECS, EKS, SNS, SQS, and DynamoDB can send traces to X-Ray with minimal configuration.
Tracing Header Propagation: For a trace to be continuous, the trace ID must be passed from service to service. X-Ray SDKs automatically propagate the header via HTTP. For custom protocols, you must manually pass the trace ID.
Configuration and Verification
To enable X-Ray for a Lambda function:
1. Enable active tracing in the Lambda console or via CLI: aws lambda update-function-configuration --function-name my-function --tracing-config Mode=Active
2. Ensure the Lambda execution role has permissions: xray:PutTraceSegments, xray:PutTelemetryRecords
For API Gateway:
Enable X-Ray tracing in the stage settings: aws apigateway update-stage --rest-api-id ... --stage-name prod --patch-operations op=replace,path=/tracingEnabled,value=true
For EC2:
Install the X-Ray daemon.
Configure the daemon with an IAM role or AWS credentials.
Instrument your application with the X-Ray SDK.
To verify tracing:
Send requests to your application.
Go to X-Ray console -> Traces -> View traces for the last 5 minutes.
Use CLI: aws xray get-trace-summaries --start-time ... --end-time ...
How X-Ray Interacts with Related Technologies
CloudWatch: X-Ray integrates with CloudWatch Logs to link trace IDs to log entries. You can set up the X-Ray daemon to emit log groups. CloudWatch alarms can be triggered based on X-Ray metrics (e.g., fault rate).
AWS Distro for OpenTelemetry (ADOT): An alternative to the X-Ray SDK that provides a vendor-agnostic tracing solution. ADOT can send traces to X-Ray.
VPC: If your services are in a VPC, the X-Ray daemon needs internet access or a VPC endpoint (com.amazonaws.region.xray) to send data.
Elastic Beanstalk: You can enable X-Ray in the environment configuration; Beanstalk automatically installs the daemon and instruments the platform.
Step-by-Step Request Flow
Client Request: A user sends a request to an API Gateway endpoint.
API Gateway: If tracing is enabled, API Gateway generates a trace ID and adds it to the X-Amzn-Trace-Id header. It creates a segment and sends it to X-Ray.
Lambda Invocation: API Gateway invokes a Lambda function. The Lambda runtime automatically reads the trace header and creates a subsegment under API Gateway's segment. The Lambda function's execution environment sends segments via the X-Ray daemon.
Downstream Calls: The Lambda function makes an SDK call to DynamoDB. The X-Ray SDK captures this as a subsegment, recording the operation name, database, and duration.
Segment Completion: When the Lambda function finishes, it sends the completed segment to the daemon, which forwards it to X-Ray.
Trace Visualization: In the X-Ray console, you see a trace with all segments and subsegments, showing the entire path and timings.
Enable Tracing on Entry Point
The first service that receives the request (e.g., API Gateway, ALB, or your application) must have tracing enabled. For API Gateway, set `tracingEnabled=true` on the stage. This causes API Gateway to generate a trace ID and add the `X-Amzn-Trace-Id` header to the request. API Gateway also creates a segment that records the request's start and end times, HTTP method, path, status code, and latency. This segment is sent to X-Ray asynchronously.
Propagate Trace Header Downstream
When the request moves to the next service (e.g., Lambda), the trace ID must be passed via the `X-Amzn-Trace-Id` header. The X-Ray SDK automatically reads this header from incoming requests and adds it to outgoing HTTP requests. If you use custom protocols (e.g., SQS, SNS), you must manually pass the trace ID as a message attribute. Without propagation, the trace is broken, and each service appears as a separate trace.
Instrument Service with X-Ray SDK
Each downstream service (Lambda, EC2, ECS) must be instrumented with the X-Ray SDK. The SDK automatically captures incoming requests, outgoing HTTP calls, AWS SDK calls, and database queries as subsegments. For Lambda, you can enable tracing in the function configuration without modifying code; the Lambda runtime automatically sends segments. For EC2, you must install the X-Ray daemon and instrument your code.
Send Segments to X-Ray Daemon
The X-Ray SDK sends segment data to the X-Ray daemon running on the same instance or container. The daemon buffers and batches segments before sending them to the X-Ray API via UDP. By default, the daemon listens on UDP port 2000. If the daemon is not running, segments are lost. The daemon can be configured to send data to a different region or use a proxy.
View and Analyze Traces
Once segments are sent to X-Ray, you can view traces in the AWS Management Console. The Trace list shows all traces for a given time period. You can filter by annotations, errors, or latency. Clicking a trace shows a timeline view with each segment and subsegment. The service map shows the relationships between services and average latencies. You can also use the X-Ray API to programmatically retrieve traces.
Enterprise Scenario 1: E-commerce Checkout Flow
A large e-commerce site uses a microservices architecture: API Gateway -> Lambda (order processing) -> DynamoDB (inventory) -> SQS (fulfillment) -> Lambda (shipping). During Black Friday, customers report slow checkout. Using X-Ray, the team identifies that the inventory Lambda call to DynamoDB has a high latency (p99 of 5 seconds) due to throttling. They add DynamoDB auto-scaling and implement caching. X-Ray traces show the improvement. Without X-Ray, they would have to add custom logging to every service.
Enterprise Scenario 2: Serverless Video Processing Pipeline
A media company processes videos using a Step Functions workflow: S3 upload triggers Lambda (transcoding), then DynamoDB (metadata), then SNS (notification). Occasionally, videos fail to process. X-Ray traces show that the transcoding Lambda times out after 15 seconds (Lambda timeout). The team increases the timeout and adds error handling. X-Ray's service map shows the exact step where failures occur.
Common Misconfiguration Issues
Missing IAM permissions: The Lambda execution role must have xray:PutTraceSegments and xray:PutTelemetryRecords. Without these, segments are silently dropped.
Daemon not running: On EC2, if the X-Ray daemon is not started, the SDK cannot send segments. The application logs show connection refused errors.
Sampling too aggressive: Setting a very low sampling rate (e.g., 1%) may miss intermittent errors. For critical services, use custom sampling rules to trace 100% of requests.
Trace header not propagated: If you use SQS asynchronously, the trace ID is lost unless you manually add it to the message attributes. This results in broken traces.
What DVA-C02 Tests on X-Ray
The exam tests your ability to configure and interpret X-Ray traces. Key objective codes: Domain 4 (Monitoring and Troubleshooting), Objective 4.1 (Troubleshoot and debug distributed applications). Expect scenario-based questions where you must choose the correct integration or diagnose a problem.
Common Wrong Answers
Using CloudWatch Logs instead of X-Ray: Many candidates think that enabling detailed CloudWatch Logs will solve tracing. However, CloudWatch Logs do not provide end-to-end tracing across services. X-Ray is specifically designed for this.
Enabling X-Ray only on Lambda: A common mistake is enabling X-Ray only on Lambda but not on API Gateway. Traces will be incomplete because API Gateway does not send segments. You must enable tracing on every service in the path.
Assuming X-Ray works without SDK: Some think that simply enabling X-Ray on Lambda automatically traces all downstream calls. In reality, for custom HTTP calls or non-AWS services, you must instrument with the X-Ray SDK to capture subsegments.
Forgetting IAM permissions: A frequent exam trap: the Lambda function has X-Ray enabled but the execution role lacks xray:PutTraceSegments. The result is no traces. The correct answer is to add the required permissions.
Specific Numbers and Terms
Default sampling: reservoir=1, rate=0.05 (5%).
X-Ray daemon UDP port: 2000.
Trace header format: X-Amzn-Trace-Id: Root=1-...;Parent=...;Sampled=...
Lambda tracing modes: Active (traces all requests) or PassThrough (only traces requests that have a trace header).
API Gateway tracing: enabled at stage level via tracingEnabled.
Edge Cases and Exceptions
Asynchronous invocations: For Lambda async invocations, the trace ID is not propagated automatically. You must use EventSourceMapping with SQS or SNS and pass the trace ID in the message attributes.
VPC endpoints: If your Lambda function is in a VPC without internet access, you need a VPC endpoint for X-Ray (com.amazonaws.region.xray) to send segments.
Sampling rules: Custom sampling rules can override the default. The exam may ask how to ensure all errors are traced: set a rule with Reservoir=5 and Rate=1 (100% of additional requests) for error traces.
How to Eliminate Wrong Answers
If a question asks about tracing across services, eliminate any answer that mentions only CloudWatch Logs or CloudWatch Metrics.
If a question involves missing traces, check if the IAM role has X-Ray permissions or if the daemon is running.
For Lambda, remember that PassThrough mode only traces requests that already have a trace header. Active traces all requests.
X-Ray provides distributed tracing across microservices using trace IDs propagated via the X-Amzn-Trace-Id header.
Default sampling is 1 request per second (reservoir) and 5% of additional requests (rate).
X-Ray daemon runs on EC2/ECS and listens on UDP port 2000; not needed for Lambda.
To trace all requests from a Lambda function, set tracing mode to Active (not PassThrough).
API Gateway tracing must be enabled at the stage level via tracingEnabled: true.
IAM permissions required: xray:PutTraceSegments and xray:PutTelemetryRecords.
For VPC-based Lambda, you need a VPC endpoint for X-Ray (com.amazonaws.region.xray).
Annotations are indexed for search; metadata are not indexed.
X-Ray integrates with CloudWatch to link trace IDs to log groups.
ADOT (AWS Distro for OpenTelemetry) is an alternative to X-Ray SDK for vendor-agnostic tracing.
These come up on the exam all the time. Here's how to tell them apart.
AWS X-Ray
Provides end-to-end tracing across services.
Uses trace IDs to correlate requests.
Automatically generates service maps.
Captures segment and subsegment timings.
Integrates with SDKs for automatic instrumentation.
CloudWatch Logs
Stores log events from multiple sources.
No built-in correlation across services.
Requires manual correlation via log queries.
Captures log messages, not request timings.
No automatic instrumentation; you write log statements.
Mistake
X-Ray automatically traces all AWS SDK calls without any configuration.
Correct
X-Ray only traces AWS SDK calls if your application is instrumented with the X-Ray SDK (or uses an integrated service like Lambda). For EC2, you must install the daemon and SDK. Lambda with Active tracing does automatically capture SDK calls, but only if the SDK is used within the function.
Mistake
X-Ray replaces CloudWatch Logs for debugging.
Correct
X-Ray provides tracing, not logging. CloudWatch Logs store log events. They complement each other: you can link trace IDs to log entries for deeper analysis. X-Ray does not store log data.
Mistake
Enabling X-Ray on API Gateway automatically traces all downstream services.
Correct
API Gateway only sends its own segment. Downstream services (Lambda, EC2) must also have X-Ray enabled and propagate the trace header. Without propagation, the trace is broken.
Mistake
X-Ray works out-of-the-box with SQS and SNS without any extra setup.
Correct
SQS and SNS do not automatically propagate trace headers. You must manually pass the trace ID as a message attribute and instrument the consumer to read it. Otherwise, the trace will be lost across the queue.
Mistake
The X-Ray daemon is only needed for EC2, not for Lambda.
Correct
Lambda does not require a separate daemon because the Lambda runtime includes built-in X-Ray integration. However, for Lambda in a VPC, you need a VPC endpoint for X-Ray, not a daemon. The daemon is for EC2, ECS, and on-premises servers.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
You can enable X-Ray tracing by setting the tracing mode to Active in the Lambda function configuration. This can be done via the AWS Console, CLI (`aws lambda update-function-configuration --function-name my-function --tracing-config Mode=Active`), or Infrastructure as Code (e.g., CloudFormation). Ensure the Lambda execution role has the required X-Ray permissions: `xray:PutTraceSegments` and `xray:PutTelemetryRecords`. With Active mode, Lambda automatically sends segments for each invocation and captures downstream SDK calls.
Annotations are key-value pairs that are indexed for use with filter expressions. You can search for traces based on annotation values, e.g., `annotation.mykey = "myvalue"`. Metadata are key-value pairs that are not indexed; they are used to store additional context that you don't need to search on. Annotations are useful for filtering traces in the console, while metadata is for carrying extra data that you can view when you open a trace.
Missing traces can be due to: sampling (by default only 1 req/sec + 5% are traced), IAM permissions missing, X-Ray daemon not running (on EC2), or the trace header not being propagated. Check sampling rules, IAM role, and ensure all services in the path have X-Ray enabled. Also verify that the time range in the console matches the request time.
SQS does not automatically propagate trace headers. You must manually add the trace ID as a message attribute when sending the message. On the consumer side, you need to read the attribute and set it as the parent ID for the new segment. The X-Ray SDK provides utilities for this (e.g., `AWSXRay.getSegment()` and `AWSXRay.setSegment()`). Alternatively, use Amazon EventBridge or S3 event notifications which support tracing.
Yes, you can use X-Ray with on-premises servers by installing the X-Ray daemon on those servers and instrumenting your application with the X-Ray SDK. The daemon must be able to reach the X-Ray API endpoint (via internet or AWS Direct Connect). You also need to configure AWS credentials for the daemon to authenticate.
CloudWatch ServiceLens is a feature that combines traces from X-Ray with metrics and logs from CloudWatch into a single view. It provides a unified interface to visualize service maps and drill down into traces and logs. X-Ray is the underlying tracing engine; ServiceLens is the integrated dashboard. The exam may ask about ServiceLens as a way to correlate X-Ray traces with CloudWatch metrics.
Custom sampling rules can be defined using the X-Ray API or Console. A rule includes: rule name, priority, reservoir size (fixed number of traces per second), rate (percentage of additional traces), and a service name or type filter. For example, to trace 100% of requests to a specific service, set reservoir=5 and rate=1. Rules are evaluated in priority order; the first matching rule is applied.
You've just covered AWS X-Ray Debugging — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?