MLA-C01 • Practice Test 37
Free MLA-C01 practice test — 15 questions with explanations. Set 37. No signup required.
A data engineer is building a data pipeline for a machine learning model that requires both structured and unstructured data. The structured data (customer demographics) is in Amazon RDS, and the unstructured data (customer support chat logs) is in Amazon S3 as JSON files. The engineer needs to combine these datasets into a single training dataset stored in S3 in Parquet format. They must also perform feature engineering such as text vectorization on the chat logs. The pipeline should be serverless and cost-effective. Which approach should they use?