A team needs to migrate an existing on-premises Hadoop Hive workload to Google Cloud. They want to minimize code changes and use a managed service for transient clusters. Which service should they choose?
Dataproc is fully compatible with Hadoop/Hive and offers ephemeral clusters with minimal code changes.
Why this answer
Cloud Dataproc is the correct choice because it is a managed Spark and Hadoop service that supports Hive workloads natively, allowing you to run existing Hive scripts with minimal changes. It also supports transient clusters, which can be automatically scaled up and down, aligning with the requirement for transient clusters.
Exam trap
The trap here is that candidates often confuse Cloud Dataflow's ability to process batch data with Hadoop compatibility, but Dataflow does not support Hive or transient Hadoop clusters, making Dataproc the only correct option for minimizing code changes.
How to eliminate wrong answers
Option A is wrong because Cloud Dataflow is a unified stream and batch data processing service based on Apache Beam, not designed for Hive workloads or transient Hadoop clusters. Option B is wrong because Cloud Dataprep is a data preparation and cleaning service (based on Trifacta) that does not run Hive or provide transient clusters. Option D is wrong because BigQuery is a serverless data warehouse that does not support Hive execution engines or transient clusters; migrating Hive to BigQuery would require significant code changes.