A company is migrating on-premises Apache Spark jobs to Google Cloud Dataproc. They want to reduce operational overhead and minimize costs. Which architecture is most appropriate?
Trap 1: Use Cloud Dataproc Serverless for all Spark jobs.
Serverless may not support custom Spark configurations.
Trap 2: Migrate jobs to Cloud Dataflow.
Dataflow is not Spark-compatible.
Trap 3: Run Spark on Compute Engine instances with startup scripts.
Requires manual cluster management.
- A
Use Cloud Dataproc Serverless for all Spark jobs.
Why wrong: Serverless may not support custom Spark configurations.
- B
Migrate jobs to Cloud Dataflow.
Why wrong: Dataflow is not Spark-compatible.
- C
Run Spark on Compute Engine instances with startup scripts.
Why wrong: Requires manual cluster management.
- D
Use Dataproc clusters with auto-scaling and preemptible VMs.
Reduces cost and operational overhead.