A company is designing a new event-driven architecture on AWS for processing orders. When a new order is placed, it must be validated, inventory checked, payment processed, and notification sent. Each step is independent and may take variable time. The company wants to decouple the steps and ensure that failures do not block the entire workflow. Which solution should a Solutions Architect recommend?
Step Functions provides orchestration, error handling, and visibility into the workflow.
Why this answer
AWS Step Functions is the correct choice because it provides a fully managed state machine that can orchestrate multiple Lambda functions with built-in retry logic, error handling, and parallel execution. This decouples each step (validation, inventory, payment, notification) while ensuring that failures in one step do not block the entire workflow, as Step Functions can handle errors gracefully with configurable retries and fallback states.
Exam trap
The trap here is that candidates often confuse decoupling with simple fan-out (SNS) or queue-based processing (SQS), overlooking the need for orchestration with error handling and sequential/parallel coordination that Step Functions uniquely provides.
How to eliminate wrong answers
Option A is wrong because using separate SQS queues with Lambda functions polling each queue introduces unnecessary complexity and latency, and does not natively support orchestration of sequential or parallel steps with error handling; it also requires custom code to manage retries and ordering. Option B is wrong because Amazon SNS is a pub/sub messaging service that fans out events to all subscribers simultaneously, but it cannot enforce a sequential order of steps or handle failures in one step without affecting others, as all subscribers receive the event at once and there is no built-in retry or error handling for the workflow. Option D is wrong because a single Lambda function performing all steps sequentially creates a monolithic architecture that violates the decoupling requirement, and any failure in a step would block the entire process with no built-in retry or error isolation.