A company is building a data lake on Amazon S3. The data sources include relational databases, streaming data, and log files. The data engineer needs to ensure that the data ingestion pipeline can handle schema evolution, support both batch and streaming, and provide a unified metadata catalog. Which THREE services should the engineer use? (Choose three.)
Provides schema discovery, catalog, and batch ETL.
Why this answer
Options A, C, and D are correct. AWS Glue provides a metadata catalog and ETL for batch. Kinesis Data Firehose handles streaming ingestion to S3.
S3 stores the data. Option B is wrong because Athena is a query service, not ingestion. Option E is wrong because DynamoDB is not used for data lake ingestion.