PDE Designing data processing systems • Set 10
PDE Designing data processing systems Practice Test 10 — 15 questions with explanations. Free, no signup.
A data engineer is responsible for a batch ETL pipeline that runs daily using Cloud Composer and Dataproc. The pipeline extracts data from Cloud SQL, transforms it with Spark, and loads to BigQuery. Last night, the pipeline failed because the Spark job ran out of memory. The team needs a solution that prevents future failures without manual intervention. Options: A. Use a larger machine type for Dataproc. B. Enable Dataproc autoscaling and configure memory-based scaling. C. Split the Spark job into multiple stages. D. Use Cloud Functions to retry the job.