An order-processing service consumes messages from an Amazon SQS Standard queue using a custom worker. During traffic spikes, the worker occasionally times out after performing some work but before acknowledging the message, so SQS redelivers it and it may be processed again.
You also observe that a small set of “poison” messages always fail validation.
What change most directly improves resilience by (1) preventing poison messages from retrying indefinitely and (2) avoiding duplicate side effects caused by legitimate retries?
Trap 1: Increase the SQS visibility timeout and, when validation fails,…
Increasing visibility reduces redelivery temporarily, but it does not implement a poison-message quarantine strategy. Deleting invalid messages immediately removes evidence and prevents systematic handling (for example, inspection or correction) of the poison messages.
Trap 2: Move to SNS topics with subscriptions and rely on SNS to provide…
SNS does not provide exactly-once delivery guarantees. Duplicate deliveries can still occur due to retries and downstream failures, so you still need an idempotency strategy to protect side effects.
Trap 3: Change the queue to FIFO and enable content-based deduplication,…
FIFO with content-based deduplication may reduce some duplicates, but it does not guarantee protection against duplicate side effects when the consumer times out or fails after partially processing. Poison-message retry loops still need a DLQ/redrive approach, and idempotency is still required to make processing safe under retries.
- A
Increase the SQS visibility timeout and, when validation fails, call DeleteMessage in the consumer to remove the message immediately.
Why wrong: Increasing visibility reduces redelivery temporarily, but it does not implement a poison-message quarantine strategy. Deleting invalid messages immediately removes evidence and prevents systematic handling (for example, inspection or correction) of the poison messages.
- B
Move to SNS topics with subscriptions and rely on SNS to provide exactly-once delivery to eliminate duplicates automatically.
Why wrong: SNS does not provide exactly-once delivery guarantees. Duplicate deliveries can still occur due to retries and downstream failures, so you still need an idempotency strategy to protect side effects.
- C
Configure a dead-letter queue (DLQ) with a redrive policy that moves messages after maxReceiveCount, and implement idempotent processing in the consumer using an idempotency key.
SQS Standard is at-least-once delivery, so timeouts can cause redelivery and duplicates. A DLQ with a redrive policy prevents poison messages from retrying forever by moving them after repeated failures. Idempotent processing (for example, storing a processed marker in a database with conditional logic keyed by an idempotency key) prevents duplicate side effects when retries occur for valid messages.
- D
Change the queue to FIFO and enable content-based deduplication, leaving the consumer logic unchanged.
Why wrong: FIFO with content-based deduplication may reduce some duplicates, but it does not guarantee protection against duplicate side effects when the consumer times out or fails after partially processing. Poison-message retry loops still need a DLQ/redrive approach, and idempotency is still required to make processing safe under retries.