DVA-C02Chapter 26 of 101Objective 1.5

SQS Dead-Letter Queues and Visibility Timeout

This chapter covers two critical mechanisms for building resilient message processing on Amazon SQS: dead-letter queues (DLQs) and visibility timeout. These features prevent message loss and handle failures gracefully, which is essential for decoupled microservices architectures. On the DVA-C02 exam, questions about SQS DLQ and visibility timeout appear in roughly 8-12% of questions, often combined with Lambda triggers, SNS, and Step Functions. You must know the exact default values, maximum retention periods, and how these interact with consumer retries.

25 min read
Intermediate
Updated May 31, 2026

The Hospital ER Triage and Morgue

Imagine a hospital emergency room (ER) with a triage nurse and multiple treatment rooms. Patients (messages) arrive and are assessed by the triage nurse, who assigns each a severity level and places them in a waiting room (SQS queue). The doctor (consumer) calls in patients one at a time from the waiting room. When a patient is called, the triage nurse sets a timer (visibility timeout) — the doctor has that much time to treat the patient. If the doctor finishes successfully, the patient is discharged (message deleted). But if the timer expires before treatment is complete, the patient is sent back to the waiting room for another attempt. However, some patients are critically ill and cannot be treated successfully even after multiple attempts. To avoid clogging the ER, the triage nurse has a standing order: after 3 failed attempts, that patient is wheeled to the morgue (dead-letter queue). The morgue is a separate room that stores patients who cannot be saved. The ER staff can later examine the morgue records to understand what went wrong and improve procedures. The key mechanic: the timer (visibility timeout) controls how long a patient is out of the waiting room; if it expires, the patient reappears. The dead-letter queue is a safety valve that removes poison pills — messages that keep failing — so they don't block healthy patients from being treated.

How It Actually Works

What Are Dead-Letter Queues and Visibility Timeout?

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables decoupling of application components. Two essential features for handling failures are the visibility timeout and dead-letter queues (DLQ).

Visibility Timeout: When a consumer receives a message from a queue, the message becomes invisible to other consumers for a configurable period. This prevents multiple consumers from processing the same message simultaneously. If the consumer processes and deletes the message within the timeout, the message is gone. If not, after the timeout expires, the message becomes visible again and can be received by another consumer. The default visibility timeout is 30 seconds, with a minimum of 0 seconds and a maximum of 12 hours.

Dead-Letter Queue: A DLQ is a separate SQS queue that other queues (source queues) can target to send messages that have been repeatedly processed but not successfully deleted. The source queue has a redrive policy that specifies the source queue ARN, the DLQ ARN, and a maximum receives threshold (1 to 1000). When a message has been received that many times without being deleted, SQS moves it to the DLQ. This prevents poison-pill messages from blocking the source queue indefinitely.

How Visibility Timeout Works Internally

When a consumer calls ReceiveMessage, SQS marks the message as "in flight" and starts a timer equal to the visibility timeout. While the message is in flight:

It is not returned by subsequent ReceiveMessage calls.

The consumer can call DeleteMessage to remove it permanently, or ChangeMessageVisibility to extend the timeout.

If the timeout expires without deletion, the message becomes visible again immediately.

The visibility timeout is per-message. A consumer can change the visibility timeout of an individual message using ChangeMessageVisibility (with a new timeout value). This is useful when a consumer knows it needs more time.

Important: If a consumer receives a message and then crashes without deleting it, the message will reappear after the timeout. This is a built-in retry mechanism.

Dead-Letter Queue Mechanics

A DLQ is just a regular SQS queue designated as a dead-letter target. The redrive policy is a JSON object attached to the source queue:

{
  "deadLetterTargetArn": "arn:aws:sqs:us-east-1:123456789012:MyDLQ",
  "maxReceiveCount": 5
}

When a message is received from the source queue, SQS increments an internal receive count for that message. When the count exceeds maxReceiveCount, SQS performs a redrive: it copies the message to the DLQ and deletes it from the source queue. The message attributes, including the original message body and metadata, are preserved.

Redrive Policy Limits: - maxReceiveCount: 1 to 1000. Default is not set; you must configure it explicitly. - You can configure a redrive allow policy to restrict which source queues can use a given DLQ.

DLQ Retention: The DLQ has its own MessageRetentionPeriod (default 4 days, max 14 days). After that, messages are automatically deleted. You can process messages from the DLQ manually or via automation to analyze failures.

Configuration and Verification

AWS Management Console: 1. Create two queues: source queue and DLQ. 2. On the source queue, go to the Dead-Letter Queue tab and configure: select the DLQ ARN and set max receives. 3. Optionally, set the redrive allow policy on the DLQ to restrict which source queues can send to it.

AWS CLI: Create source queue with redrive policy:

aws sqs create-queue --queue-name MySourceQueue --attributes '{"RedrivePolicy":"{\"deadLetterTargetArn\":\"arn:aws:sqs:us-east-1:123456789012:MyDLQ\",\"maxReceiveCount\":5}","VisibilityTimeout":"60"}'

Create DLQ:

aws sqs create-queue --queue-name MyDLQ --attributes '{"MessageRetentionPeriod":"1209600"}'

To verify current configuration:

aws sqs get-queue-attributes --queue-url <source-queue-url> --attribute-names All

Look for RedrivePolicy and VisibilityTimeout in the output.

Interactions with Related Technologies

AWS Lambda: When Lambda is configured as a consumer for an SQS queue, the visibility timeout is managed by Lambda's event source mapping. Lambda automatically deletes messages upon successful processing. If the function fails (throws an error or times out), Lambda does not delete the message; it becomes visible again after the visibility timeout. With a DLQ configured, after the max receive count is reached, the message goes to the DLQ. Lambda also has a maximum retry attempts setting (0-2) for the event source mapping, which is separate from the SQS max receive count.

Amazon SNS: SNS can send messages to SQS queues. The DLQ and visibility timeout apply to the SQS queue, not SNS. If the SQS queue fails to deliver to a consumer, the DLQ mechanism works as usual.

AWS Step Functions: Step Functions can integrate with SQS via service tasks. The visibility timeout must be longer than the Step Function execution time to avoid message reappearance during processing.

Best Practices and Defaults

Visibility Timeout: Set it to a value that comfortably exceeds the typical processing time. For Lambda, the visibility timeout should be at least 6 times the function timeout (due to Lambda's retry behavior).

Max Receive Count: Typically set to 3-5. Too low and transient failures cause unnecessary DLQ entries; too high and poison pills linger.

DLQ Retention: Set to maximum (14 days) to allow time for analysis.

Monitoring: Use CloudWatch metrics like ApproximateNumberOfMessagesNotVisible (in-flight) and NumberOfMessagesReceived to tune visibility timeout.

Edge Cases and Exam Traps

Visibility timeout expiration during batch processing: If a consumer receives a batch of messages (up to 10) and the visibility timeout expires before processing all, the unprocessed messages become visible. You must call ChangeMessageVisibility for each message to extend.

DLQ in FIFO queues: DLQ is supported for FIFO queues, but the max receive count applies per message group ID. The ordering of messages within a group is preserved when redriven to the DLQ.

Redrive allow policy: By default, any source queue can use a DLQ if it knows the ARN. To restrict, set a redrive allow policy on the DLQ.

Message attributes and DLQ: All message attributes (including message deduplication ID for FIFO) are preserved when moved to DLQ.

Summary of Key Numbers

| Parameter | Default | Min | Max | |-----------|---------|-----|-----| | Visibility Timeout | 30 seconds | 0 seconds | 12 hours | | Max Receive Count | Not set (must configure) | 1 | 1000 | | DLQ Message Retention | 4 days | 1 minute | 14 days |

Verification Commands

To view redrive policy:

aws sqs get-queue-attributes --queue-url <url> --attribute-names RedrivePolicy

To change visibility timeout:

aws sqs set-queue-attributes --queue-url <url> --attributes VisibilityTimeout=120

Walk-Through

1

Producer sends message to source queue

An application or service calls `SendMessage` on the source SQS queue. The message is stored redundantly across multiple Availability Zones. The queue assigns a unique message ID and MD5 digest of the body. The message remains in the queue until a consumer retrieves it or the retention period expires (default 4 days). At this point, the message's receive count is 0.

2

Consumer retrieves message; visibility timer starts

A consumer (e.g., EC2 instance, Lambda function) calls `ReceiveMessage`. SQS returns the message and starts the visibility timeout timer. The message becomes invisible to other consumers. The consumer must process the message and call `DeleteMessage` before the timer expires. If the consumer fails to do so, the message becomes visible again after the timeout.

3

Consumer processes message and deletes it (success)

The consumer successfully processes the message, perhaps by writing to a database or calling an external API. It then calls `DeleteMessage` with the receipt handle. SQS deletes the message from the queue. The message's lifecycle ends. The receive count is not incremented beyond 1 because deletion occurred before the next receive.

4

Consumer fails or timeout expires (failure)

If the consumer crashes, throws an exception, or the visibility timeout expires before deletion, the message becomes visible again. SQS increments the receive count. Another consumer can now receive the message. This retry continues until either the message is successfully deleted or the max receive count is reached.

5

Max receive count exceeded; message redriven to DLQ

When the receive count reaches the `maxReceiveCount` configured in the redrive policy, SQS automatically copies the message to the dead-letter queue (DLQ) and deletes it from the source queue. The message retains its original body and attributes. The DLQ's retention period begins. The source queue is no longer blocked by this poison-pill message.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Order Processing

A large e-commerce platform uses SQS to decouple order placement from inventory reservation and payment processing. Orders are placed in a standard queue, and a fleet of EC2 instances consume messages. The visibility timeout is set to 5 minutes because processing typically takes 2-3 minutes. However, occasional network blips cause processing to exceed 5 minutes, leading to duplicate processing. The team tunes the visibility timeout to 10 minutes and implements idempotency keys. They also configure a DLQ with maxReceiveCount of 3 to catch orders that fail repeatedly due to invalid product IDs. The DLQ is monitored via CloudWatch alarms. Misconfiguration: initially they set maxReceiveCount to 1, causing transient failures to flood the DLQ. They corrected it to 3 and added a Lambda function that analyzes DLQ messages and sends alerts.

Enterprise Scenario 2: Serverless Image Processing Pipeline

A media company uses S3 event notifications to send object creation events to an SQS queue, which triggers a Lambda function that resizes images. The Lambda timeout is 30 seconds, and the visibility timeout is set to 6 minutes (Lambda's maximum retry attempts is 2, so total possible processing time is 90 seconds; 6 minutes provides safety margin). For images that fail due to corruption, the Lambda returns an error. After 3 failures (maxReceiveCount=3), the message moves to a DLQ. A separate Lambda processes the DLQ daily, logs the S3 key, and sends a notification to the operations team. The team learned the hard way that setting visibility timeout too low (e.g., 60 seconds) caused messages to reappear before Lambda finished, leading to duplicate processing and wasted compute.

Scenario 3: Financial Transaction Reconciliation

A bank uses FIFO SQS to ensure exactly-once processing of transaction records. The visibility timeout is set to 30 seconds, but some transactions require manual review and take longer. The consumer uses ChangeMessageVisibility to extend the timeout dynamically. The DLQ is configured with maxReceiveCount of 5. If a transaction fails 5 times, it goes to the DLQ for manual investigation. The bank also uses the redrive allow policy to restrict which source queues can send to the DLQ, preventing accidental cross-contamination. Performance consideration: FIFO queues have a limited throughput (300 TPS without batching, 3000 with batching). The DLQ does not affect source queue throughput but adds latency for redrive operations.

How DVA-C02 Actually Tests This

DVA-C02 Exam Focus on SQS DLQ and Visibility Timeout

The DVA-C02 exam tests your understanding of how these features work together, especially in serverless architectures with Lambda. Key objectives: Domain 1 (Development) with emphasis on designing decoupled systems. Expect 2-4 questions that involve SQS configuration.

Common Wrong Answers and Why Candidates Choose Them

1.

Setting visibility timeout to 0 to make messages immediately available: Candidates think this improves throughput. Reality: 0 means the message is always visible, leading to duplicate processing and no retry mechanism. The exam expects you to know that visibility timeout must be >0 for any meaningful processing.

2.

Assuming DLQ automatically deletes messages after processing: Candidates confuse DLQ with a backup queue. The DLQ does not process messages; it stores them for manual or automated analysis. Messages in DLQ are not automatically re-driven to the source queue.

3.

Thinking maxReceiveCount applies to the DLQ: The maxReceiveCount is on the source queue, not the DLQ. The DLQ has its own retention period.

4.

Believing visibility timeout can be changed only at queue level: You can change it per message via ChangeMessageVisibility. The exam may ask how to handle a long-running process.

Specific Numbers and Terms That Appear on Exam

Default visibility timeout: 30 seconds

Maximum visibility timeout: 12 hours

Default message retention: 4 days

Max message retention: 14 days

Max receive count range: 1 to 1000

DLQ must be of the same type (standard or FIFO) as source queue

Redrive policy JSON keys: deadLetterTargetArn, maxReceiveCount

To move messages back from DLQ to source: use SQS console's "Redrive" action or AWS CLI start-message-move-task (available since 2021)

Edge Cases and Exceptions

FIFO queues require DLQ to also be FIFO. The message group ID is preserved.

Lambda event source mapping: The maxReceiveCount interacts with Lambda's MaximumRetryAttempts (default 2). If Lambda retries 2 times and still fails, the message goes back to queue with receive count incremented. If maxReceiveCount is 3, after Lambda's 2 retries, one more receive occurs before DLQ.

Batch processing: If you receive a batch of 10 messages and delete only 5, the remaining 5 become visible after the timeout. You must manage this carefully.

How to Eliminate Wrong Answers

If a question asks about preventing duplicate processing, think about visibility timeout and idempotency keys—not DLQ.

If a question asks about handling poison pills, think about DLQ with maxReceiveCount.

If a question mentions messages reappearing, suspect visibility timeout expiration.

Always check whether the scenario uses standard or FIFO queues—FIFO requires DLQ to be FIFO.

Watch for trick: "A message is received but not deleted. What happens?" The message becomes visible after visibility timeout, not immediately.

Key Takeaways

Visibility timeout default is 30 seconds; set it to exceed maximum processing time to avoid premature message reappearance.

Dead-letter queues require explicit configuration: set redrive policy with deadLetterTargetArn and maxReceiveCount (1-1000).

DLQ and source queue must be the same type: both standard or both FIFO.

Messages in DLQ are preserved with original attributes and body, including message group ID for FIFO.

Use ChangeMessageVisibility to extend the timeout for long-running processing, not to reset the receive count.

Lambda event source mapping has its own MaximumRetryAttempts (default 2) that works alongside SQS maxReceiveCount.

Monitor ApproximateNumberOfMessagesNotVisible CloudWatch metric to tune visibility timeout.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Visibility Timeout

Controls how long a message is invisible after being received

Default 30 seconds, max 12 hours

Can be changed per message via ChangeMessageVisibility

Prevents multiple consumers from processing the same message concurrently

Messages reappear after timeout if not deleted

Dead-Letter Queue

A separate queue that stores messages that failed repeatedly

Configured via redrive policy with maxReceiveCount (1-1000)

Does not process messages; stores them for analysis

Helps isolate poison-pill messages from the source queue

Messages can be manually redriven back to source queue

Watch Out for These

Mistake

The dead-letter queue automatically retries processing failed messages.

Correct

The DLQ is a passive storage queue. It does not automatically reprocess messages. You must manually or programmatically move messages back to the source queue or process them directly from the DLQ.

Mistake

Setting visibility timeout to 0 seconds improves performance by making messages immediately available.

Correct

A visibility timeout of 0 means messages are always visible, which can cause multiple consumers to process the same message simultaneously, leading to duplicate processing and potential data corruption. It should be set to a value greater than the expected processing time.

Mistake

The maxReceiveCount is applied to the DLQ, so the DLQ will move messages to another queue after that many receives.

Correct

The maxReceiveCount is a property of the source queue's redrive policy. It determines how many times a message can be received from the source queue before being moved to the DLQ. The DLQ itself has no redrive policy unless you configure one.

Mistake

You can use the same DLQ for both standard and FIFO source queues.

Correct

The DLQ must be of the same type as the source queue. A standard queue can only use a standard DLQ, and a FIFO queue can only use a FIFO DLQ. This ensures message ordering and deduplication properties are preserved.

Mistake

Messages in a DLQ are automatically deleted after the source queue's retention period.

Correct

The DLQ has its own message retention period, independent of the source queue. By default, it is 4 days but can be configured up to 14 days. Messages are deleted from the DLQ when that period expires, regardless of the source queue's settings.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What happens if a message is received but not deleted before the visibility timeout expires?

The message becomes visible again in the queue and can be received by another consumer. The receive count is incremented. If the max receive count is reached, the message is moved to the dead-letter queue. This is the built-in retry mechanism of SQS.

Can I change the visibility timeout after a message has been received?

Yes, you can call ChangeMessageVisibility with the receipt handle and a new timeout value. This is useful if a consumer needs more time to process a message. The new timeout starts from the moment the call is made.

How do I move messages from the dead-letter queue back to the source queue?

In the SQS console, select the DLQ and choose "Redrive" from the Actions menu. You specify the source queue ARN. Alternatively, use the AWS CLI command start-message-move-task. This is available for both standard and FIFO queues.

Does the dead-letter queue count against the source queue's retention period?

No. The DLQ has its own message retention period (default 4 days, max 14 days). Messages are deleted from the DLQ based on that period, not the source queue's retention.

Can I set a dead-letter queue on an existing SQS queue?

Yes. You can update the queue attributes to add a RedrivePolicy at any time. However, the policy only applies to messages received after the policy is set. Existing in-flight messages are not affected.

What is the difference between maxReceiveCount and Lambda's MaximumRetryAttempts?

maxReceiveCount is an SQS queue attribute that determines how many times a message can be received from the queue before being sent to the DLQ. Lambda's MaximumRetryAttempts (0-2) is the number of times Lambda will retry processing a failed message before discarding it or sending it to a DLQ (if configured). They work together: Lambda retries, then the message goes back to SQS, which increments receive count. If maxReceiveCount is reached, it goes to SQS DLQ.

Terms Worth Knowing

Ready to put this to the test?

You've just covered SQS Dead-Letter Queues and Visibility Timeout — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?