A company is designing a data lake solution on Amazon S3. Data is ingested from multiple sources and stored in a raw bucket. The data must be processed and transformed before being moved to a curated bucket. The processing logic is complex and includes conditional transformations. Which service should be used to orchestrate the transformation pipeline?
Step Functions can orchestrate complex workflows with conditional branching.
Why this answer
AWS Step Functions is the correct choice because it is designed to orchestrate complex, multi-step workflows with conditional branching, retries, and error handling. It can coordinate AWS Lambda functions, AWS Glue jobs, and other services to process and transform data from a raw S3 bucket to a curated bucket, making it ideal for a transformation pipeline with complex logic.
Exam trap
The trap here is that candidates often confuse AWS Glue ETL jobs as an orchestration tool because it can transform data, but Glue is a processing engine, not a workflow orchestrator; Step Functions is the correct service for coordinating complex, conditional pipelines.
How to eliminate wrong answers
Option A is wrong because AWS Data Pipeline is a legacy service for moving data between sources and destinations, but it lacks native support for complex conditional transformations and is not as flexible as Step Functions for orchestrating custom processing logic. Option C is wrong because AWS Lambda functions are stateless and have a maximum execution timeout of 15 minutes, making them unsuitable for orchestrating long-running or multi-step transformation pipelines; they are better suited for individual processing tasks within the workflow. Option D is wrong because AWS Glue ETL jobs are designed for batch data transformation using Apache Spark, but they are not an orchestration service; they would be a component orchestrated by Step Functions, not the orchestrator itself.