Google Professional Data Engineer PDE Questions 151–225 | Page 3/7

151

MCQeasy

A data science team needs to ensure that a deployed Vertex AI model can handle varying traffic patterns with minimal latency and cost. What should they do?

A.Use Vertex AI Prediction with autoscaling

B.Use batch prediction instead of online

C.Pre-warm all instances

D.Deploy to a single large machine type

AnswerA

Autoscaling adjusts replicas based on traffic, balancing latency and cost.

Why this answer

Vertex AI Prediction with autoscaling dynamically adjusts the number of serving instances based on incoming traffic, ensuring minimal latency during spikes and cost efficiency during lulls. This is the recommended approach for handling variable traffic patterns in production, as it leverages Google Cloud's managed infrastructure to scale from zero to thousands of nodes automatically.

Exam trap

Google Cloud often tests the misconception that batch prediction can substitute for online serving in variable traffic scenarios, but the key distinction is that batch prediction lacks real-time latency guarantees and cannot scale dynamically per request.

How to eliminate wrong answers

Option B is wrong because batch prediction is designed for asynchronous, large-scale offline inference on static datasets, not for real-time traffic with varying patterns; it cannot handle low-latency online requests. Option C is wrong because pre-warming all instances defeats the purpose of autoscaling, leading to constant high cost regardless of actual traffic, and is not a dynamic solution. Option D is wrong because deploying to a single large machine type creates a single point of failure and cannot scale horizontally to handle traffic spikes, resulting in either over-provisioning cost or latency under load.

Full explanation →

152

MCQhard

A financial services company must ensure that predictions from a deployed model do not become biased against protected groups. They have a monitoring system in place. Which metric should they track?

A.Prediction latency

B.Prediction distribution across demographic segments

C.Per-query input feature distribution

D.Model accuracy over time

AnswerB

Comparing prediction distributions across groups reveals potential bias in outcomes.

Why this answer

Tracking prediction distribution across demographic segments (option B) directly monitors for bias by comparing the model's output rates for different protected groups. If the distribution diverges significantly, it indicates potential disparate impact, which is the core concern for fairness in deployed models. This aligns with monitoring for algorithmic fairness, not just operational performance.

Exam trap

The trap here is that candidates confuse operational metrics (latency, accuracy) with fairness metrics, assuming high accuracy guarantees fairness, but Cisco tests that bias can exist even with high accuracy if the model performs differently across demographic segments.

How to eliminate wrong answers

Option A is wrong because prediction latency measures the time taken to serve a prediction, which is a performance metric unrelated to bias or fairness against protected groups. Option C is wrong because per-query input feature distribution tracks individual input values, not the aggregated output predictions across demographic segments needed to detect bias. Option D is wrong because model accuracy over time measures overall predictive performance, which can remain high even when the model is biased against a specific group (e.g., accuracy may be high for a majority class while failing for a minority class).

Full explanation →

153

MCQmedium

A retail company uses Cloud Dataflow for a streaming pipeline that aggregates sales events from thousands of stores. The pipeline writes aggregated results to BigQuery every 5 minutes. Recently, the Dataflow job has been restarting multiple times a day with the error: 'Worker ran out of memory' in the logs. The streaming engine is enabled. The pipeline uses keyed state (ParDo with stateful processing) to maintain per-store counters. The average event size is 2KB, and the throughput is 2,000 events/sec. You need to resolve the out-of-memory issues without losing data. What should you do?

A.Disable stateful processing and use side inputs from BigQuery to get per-store aggregates.

B.Modify the pipeline to use sliding windows with a shorter duration to reduce the state size.

C.Increase the number of workers in the pipeline configuration and ensure the maximum worker count is set higher to allow better distribution of state.

D.Reduce the number of workers to limit the overhead of data shuffling.

AnswerC

More workers spread the stateful processing and reduce memory per worker.

Why this answer

Option C is correct because increasing the number of workers distributes the keyed state (per-store counters) across more VMs, reducing the memory pressure on each individual worker. With streaming engine enabled, state is still held in worker memory for low-latency access, so adding workers is the direct way to scale the state footprint. This avoids data loss because the pipeline continues processing with exactly-once semantics and state is preserved via checkpointing.

Exam trap

The trap here is that candidates may confuse window-based state (which can be reduced by shortening windows) with keyed state (which is independent of window duration), leading them to incorrectly choose option B.

How to eliminate wrong answers

Option A is wrong because disabling stateful processing and using side inputs from BigQuery would introduce significant latency and inconsistency (BigQuery is not designed for real-time per-record lookups), and it would break the streaming aggregation logic. Option B is wrong because sliding windows do not reduce state size for keyed state (ParDo with stateful processing uses per-key state, not windowed state); changing window duration has no effect on the memory used by the per-store counters. Option D is wrong because reducing the number of workers would concentrate more state on fewer VMs, worsening the out-of-memory issue and increasing the risk of worker crashes.

Full explanation →

154

MCQmedium

A retail company uses Vertex AI Pipelines to automate monthly retraining of a recommendation model. The pipeline consists of three steps: (1) extract data from BigQuery, (2) train a TensorFlow model on Vertex AI Training, (3) upload the model to Vertex AI Model Registry and deploy to an endpoint if performance metrics improve. Recently, the pipeline has been failing at step 2 with the error: 'The job was cancelled by the system because it exceeded the maximum training time of 3600 seconds.' You have confirmed that the training code is correct and the data size has not changed significantly. What should you do to fix this pipeline failure? A) Reconfigure the pipeline to use a larger machine type for training. B) Set the training timeout to 7200 seconds in the pipeline configuration. C) Reduce the training dataset size by sampling fewer rows. D) Switch from TensorFlow to a simpler model framework.

A.Reduce the training dataset size by sampling fewer rows.

B.Set the training timeout to 7200 seconds in the pipeline configuration.

C.Switch from TensorFlow to a simpler model framework.

D.Reconfigure the pipeline to use a larger machine type for training.

AnswerB

Increasing the timeout accommodates the training duration within the expected limits.

Why this answer

Option B is correct because the default timeout for a training job in Vertex AI Pipelines is 3600 seconds; increasing the timeout allows the job to complete. Option A (larger machine) may help but is not a direct fix for timeout. Option C (reducing data) degrades model quality.

Option D (changing framework) is drastic and unnecessary.

Full explanation →

155

MCQhard

A company uses Vertex AI Feature Store for serving features to both training and prediction. The team notices that predictions made shortly after training use different feature values, causing a training-serving skew. What is the most effective way to prevent this skew?

A.Configure the Feature Store to use point-in-time lookup using the training timestamp

B.Retrain the model more frequently to adapt to the new feature distributions

C.Use batch prediction instead of online prediction to ensure consistent features

D.Ensure that the training and prediction environments use identical compute resources

AnswerA

Point-in-time lookup ensures that the same feature values used during training are used during serving.

Why this answer

Option C is correct because using a feature timestamp to serve the exact feature point-in-time that was used during training ensures consistency. Option A (retraining more often) does not prevent skew. Option B (batch prediction) still uses current features.

Option D (identical compute resources) does not affect feature values.

Full explanation →

156

MCQhard

A manufacturing company wants to detect anomalies in sensor data from thousands of IoT devices in real time. The data is streaming into Pub/Sub. The best solution should use a machine learning model served from AI Platform that scores sensor readings aggregated over 5-minute windows. Which pipeline design meets these requirements?

A.Use Cloud Dataproc with Spark Streaming to aggregate data, and use a Spark ML model embedded in the pipeline

B.Use BigQuery streaming inserts and run scheduled queries that call the ML model

C.Use Cloud Dataflow with sliding windows to aggregate sensor readings every 5 minutes, then call a trained model hosted on AI Platform Prediction for each window

D.Use Cloud Functions triggered by Pub/Sub to process each sensor reading individually

AnswerC

Dataflow handles streaming and windowing natively, and AI Platform Prediction provides low-latency model serving.

Why this answer

Option C is correct because Cloud Dataflow's sliding windows natively handle the 5-minute aggregation requirement for streaming data, and its ability to call external services via a DoFn allows integration with AI Platform Prediction for real-time model scoring. This design aligns with the need for low-latency, scalable processing of Pub/Sub streams without managing infrastructure.

Exam trap

Google Cloud often tests the distinction between stream processing (Dataflow) and batch-oriented services (BigQuery scheduled queries), and the trap here is assuming that BigQuery's streaming inserts combined with scheduled queries can achieve real-time aggregation, when in fact scheduled queries introduce minutes of delay and are not window-aware for sliding time intervals.

How to eliminate wrong answers

Option A is wrong because Cloud Dataproc with Spark Streaming requires managing a cluster and embedding a Spark ML model in the pipeline, which adds operational overhead and does not leverage AI Platform's managed prediction service as specified. Option B is wrong because BigQuery streaming inserts and scheduled queries introduce latency (scheduled queries run at intervals, not in real time) and are not designed for per-window scoring of streaming data. Option D is wrong because Cloud Functions triggered by Pub/Sub process each sensor reading individually, which cannot aggregate data over 5-minute windows as required.

Full explanation →

157

MCQeasy

A data pipeline processes streaming data with Dataflow. The team notices occasional data duplication in BigQuery. What is the best approach to ensure exactly-once processing?

A.Use Pub/Sub with at-least-once delivery and deduplicate in BigQuery using a unique identifier.

B.Configure Dataflow with exactly-once sinks using file staging and deduplication.

C.Use Cloud Functions to deduplicate messages before they enter the pipeline.

D.Enable idempotent writes in BigQuery.

AnswerB

Dataflow's exactly-once sink mechanism ensures each record is written exactly once, preventing duplicates.

Why this answer

Option B is correct because Dataflow's exactly-once sink mechanism, which uses file staging and deduplication, ensures no duplicates. Option A (at-least-once delivery) can cause duplicates unless dedup is applied, but that's not automatic. Option C adds unnecessary complexity.

Option D is incorrect because BigQuery does not natively support idempotent writes.

Full explanation →

158

MCQhard

A Dataflow streaming pipeline that uses global windows and triggers every 5 seconds is experiencing increasing lag and high system latency. The pipeline reads from Pub/Sub, transforms data with a ParDo, and writes to BigQuery. Which action is most likely to reduce lag?

A.Use a session window to group related events.

B.Replace the global window with a sliding window of 1 minute.

C.Change the trigger to processing time instead of event time.

D.Increase the number of workers manually.

AnswerB

A sliding window reduces the number of elements per trigger and improves latency by distributing state across workers.

Why this answer

B is correct because sliding windows of 1 minute allow the pipeline to process data in overlapping fixed-size windows, which can reduce the buildup of data in memory compared to global windows. Global windows with frequent triggers (every 5 seconds) can cause unbounded state growth and high latency as the pipeline must maintain state for all elements until the trigger fires, whereas sliding windows naturally bound the data per window and enable more efficient watermark and trigger management in Dataflow.

Exam trap

Google Cloud often tests the misconception that increasing workers or changing trigger timing alone can fix lag caused by inappropriate windowing strategy, when the real issue is that global windows with frequent triggers create unbounded state that overwhelms the pipeline's memory and shuffle capacity.

How to eliminate wrong answers

Option A is wrong because session windows group events based on inactivity gaps, which does not address the core issue of unbounded state from global windows and can actually increase state size if sessions are long. Option C is wrong because changing the trigger to processing time instead of event time does not reduce lag; it may cause data to be processed based on when it arrives rather than when it occurred, potentially increasing latency due to watermark misalignment and still requiring global window state. Option D is wrong because manually increasing the number of workers can help with throughput but does not fix the fundamental design flaw of using global windows with frequent triggers, which leads to excessive state accumulation and shuffling; autoscaling in Dataflow already handles worker count based on backlog.

Full explanation →

159

MCQhard

A company runs large batch prediction jobs on Vertex AI every day. They want to minimize costs while ensuring the jobs complete within a 4-hour window. The model requires significant memory. What is the most cost-effective approach?

A.Use Cloud TPUs to accelerate predictions

B.Use a smaller machine type (e.g., n1-standard-4) to reduce cost

C.Use preemptible VMs with a machine type that meets memory requirements

D.Use standard VMs and reduce parallelization

AnswerC

Preemptible VMs are much cheaper and restartable, suitable for batch jobs.

Why this answer

Preemptible VMs (now called Spot VMs) are significantly cheaper than standard VMs (up to 60-80% discount) and are ideal for fault-tolerant batch prediction jobs that can handle interruptions. Since the job has a 4-hour window and the model requires significant memory, using preemptible VMs with a machine type that meets the memory requirements minimizes cost while allowing the job to complete if restarted within the time limit.

Exam trap

Google Cloud often tests the misconception that preemptible VMs are unreliable for any production workload, but the trap here is that batch prediction jobs are inherently fault-tolerant and can leverage preemptible VMs to drastically reduce costs without violating the completion window.

How to eliminate wrong answers

Option A is wrong because Cloud TPUs are specialized hardware for training and inference of large models, but they are more expensive and not necessary for batch prediction; they also do not directly address the memory requirement or cost minimization for a 4-hour window. Option B is wrong because using a smaller machine type (e.g., n1-standard-4) would likely cause out-of-memory errors or severe performance degradation since the model requires significant memory, making the job fail or exceed the 4-hour window. Option D is wrong because reducing parallelization would increase job duration, potentially exceeding the 4-hour window, and standard VMs are more expensive than preemptible VMs, so this approach does not minimize costs.

Full explanation →

160

MCQmedium

A financial company needs to process batch trades data daily and ensure that if a transformation step fails, the entire daily run is retried from the beginning. Which design pattern is appropriate?

A.Use idempotent writes with checkpointing

B.Use an orchestrator like Cloud Composer with retry logic

C.Retry the failed step only

D.Use a transactional staging area

AnswerB

Cloud Composer (Airflow) allows defining DAGs with retry policies on the entire pipeline, ensuring full restart on failure.

Why this answer

Option B is correct because the requirement states that if any transformation step fails, the entire daily run must be retried from the beginning. An orchestrator like Cloud Composer (Apache Airflow) provides native DAG-level retry logic that can be configured to restart the entire workflow on failure, ensuring atomicity of the batch run. This pattern is essential for maintaining data consistency when partial processing cannot be tolerated.

Exam trap

Google Cloud often tests the misconception that checkpointing or idempotent writes are sufficient for full-run retries, but the trap is that checkpointing enables partial resumption, not the complete restart from scratch that the question explicitly demands.

How to eliminate wrong answers

Option A is wrong because idempotent writes with checkpointing allow resumption from the last successful checkpoint, which contradicts the requirement to retry the entire run from the beginning; checkpointing is designed for partial retries, not full restarts. Option C is wrong because retrying only the failed step would leave the daily run in an inconsistent state, as earlier steps may have already committed partial results that cannot be rolled back without a full restart. Option D is wrong because a transactional staging area ensures atomic writes but does not provide orchestration or retry logic to restart the entire pipeline from the start upon failure.

Full explanation →

161

MCQmedium

A company uses Cloud Composer (Airflow) to orchestrate a daily batch job that runs a custom Python script on a Compute Engine instance. The process is slow because the instance takes 2 minutes to boot. How can you reduce the total runtime?

A.Switch to Dataproc Serverless to avoid VM boot time

B.Use a larger machine type for faster provisioning

C.Create a custom image with the script and dependencies pre-installed

D.Use a GPU-accelerated instance to speed up the script

AnswerC

Custom image reduces boot time by avoiding package installs.

Why this answer

Option C is correct because creating a custom image with the script and dependencies pre-installed eliminates the need to install packages or configure the environment at boot time. In Cloud Composer, when a Compute Engine instance is provisioned via a BashOperator or SSHOperator, the boot process includes OS initialization and package installation. A custom image bypasses these steps, reducing boot time from minutes to seconds, directly addressing the 2-minute boot delay.

Exam trap

The trap here is that candidates may assume 'faster provisioning' means a larger machine type (Option B) or a serverless service (Option A), but the question specifically targets the boot time caused by environment setup, which is solved by pre-installing dependencies in a custom image.

How to eliminate wrong answers

Option A is wrong because Dataproc Serverless is designed for Apache Spark and Hadoop workloads, not for running arbitrary Python scripts on a single Compute Engine instance; it introduces overhead for job submission and cluster management that is not suitable for this use case. Option B is wrong because a larger machine type does not reduce boot time; boot time is dominated by OS initialization and package installation, not by CPU or memory size. Option D is wrong because GPU-accelerated instances are intended for compute-intensive tasks like machine learning or rendering, not for reducing boot time; the script's slowness is due to boot delay, not computational performance.

Full explanation →

162

MCQhard

A data scientist developed a model using custom training on Vertex AI. They want to automate the entire training-to-deployment process. Which service should they use?

A.Cloud Composer

B.Vertex AI Pipelines

C.Cloud Build

D.Cloud Functions

AnswerB

Vertex AI Pipelines is purpose-built for ML pipeline orchestration.

Why this answer

Vertex AI Pipelines is the correct choice because it provides a fully managed, serverless orchestration service specifically designed to automate ML workflows, including custom training, hyperparameter tuning, evaluation, and deployment. It integrates natively with Vertex AI services and supports Kubeflow Pipelines SDK or TFX for defining reproducible, end-to-end pipelines, making it the ideal solution for automating the entire training-to-deployment process.

Exam trap

The trap here is that candidates often confuse general-purpose orchestration (Cloud Composer) with ML-specific pipeline orchestration (Vertex AI Pipelines), overlooking that Vertex AI Pipelines provides built-in ML artifact tracking and native integration with Vertex AI training and prediction services.

How to eliminate wrong answers

Option A is wrong because Cloud Composer is a workflow orchestration service based on Apache Airflow, which is more general-purpose and requires custom operators or hooks to interact with Vertex AI, adding unnecessary complexity and not providing native ML pipeline capabilities. Option C is wrong because Cloud Build is a CI/CD service focused on building, testing, and deploying software artifacts (e.g., containers), not on orchestrating ML training workflows or managing model deployment steps like evaluation and versioning. Option D is wrong because Cloud Functions is a serverless compute service for event-driven, short-lived functions, which lacks the state management, sequencing, and artifact tracking needed for multi-step ML pipelines.

Full explanation →

163

MCQmedium

A Dataflow pipeline is processing a high-volume streaming data stream. The job is lagging behind by 30 minutes, and the Dataflow monitoring UI shows high system latency with low CPU utilization. Which action should be taken to improve throughput?

A.Enable Streaming Engine

B.Increase the number of workers

C.Enable Dataflow Shuffle

D.Disable hot key detection

AnswerC

Dataflow Shuffle offloads shuffle operations to a managed service, reducing worker overhead and improving throughput when shuffle is the bottleneck.

Why this answer

Option C is correct because high system latency with low CPU utilization indicates a bottleneck in data shuffling, not in processing capacity. Enabling Dataflow Shuffle offloads the shuffle operation to Google-managed resources, reducing disk I/O and network overhead, which directly improves throughput in streaming pipelines.

Exam trap

Google Cloud often tests the misconception that low CPU utilization always means more workers are needed, but the trap here is that shuffle bottlenecks cause high latency without saturating CPU, so the correct fix is to offload shuffle operations rather than scale workers.

How to eliminate wrong answers

Option A is wrong because Streaming Engine is designed to reduce streaming latency by moving state management from workers to backend services, but the issue here is low CPU utilization and high latency due to shuffle bottlenecks, not state management. Option B is wrong because increasing workers would add more processing capacity, but with low CPU utilization, the bottleneck is elsewhere (shuffle), so more workers would not resolve the shuffle contention and could increase cost without benefit. Option D is wrong because disabling hot key detection would remove the ability to identify and optimize for skewed keys, which could worsen the shuffle bottleneck; hot key detection helps in redistributing load, not causing the latency issue.

Full explanation →

164

MCQmedium

A team uses Vertex AI Pipelines to automate retraining of a model every month. The pipeline includes data preprocessing, training, and deployment steps. After a recent update, the pipeline fails intermittently with a timeout error during the deployment step. What is the most likely cause?

A.The service account used by the pipeline lacks permissions to deploy the model

B.The trained model size has increased due to more data, causing the deployment step to time out

C.The pipeline is configured to run steps in parallel, leading to resource contention

D.BigQuery query quotas are being exceeded during data preprocessing

AnswerB

Larger models take longer to upload and deploy, potentially exceeding timeout limits.

Why this answer

Option D is correct because a model with increased size due to training on more data can cause the deployment step to time out. Option A (BigQuery quotas) would affect preprocessing, not deployment. Option B (insufficient service account permissions) would cause persistent errors, not intermittent.

Option C (pipeline step order) would cause consistent failure.

Full explanation →

165

MCQhard

A company uses Cloud Composer (Airflow) to orchestrate a data pipeline. One DAG has many tasks that run in parallel and dependencies that span multiple days. Recently, the DAG started failing with 'DagRun already exists' errors. What is the most likely cause?

A.The DAG has a large number of tasks, overwhelming the Airflow scheduler.

B.The DAG has max_active_runs_per_dag set to a low number, causing overlapping runs to be rejected.

C.The DAG's schedule interval is too short, causing task instances to be created with duplicate run IDs.

D.The DAG has a depends_on_past set to True, causing upstream failures to block new runs.

AnswerB

If max_active_runs_per_dag is too low, a new DAG run cannot start while the previous one is active.

Why this answer

The 'DagRun already exists' error occurs when Airflow attempts to create a new DAG run for a logical date that already has an active or completed run, and the DAG's concurrency settings prevent overlapping runs. Setting max_active_runs_per_dag to a low number (e.g., 1) restricts the number of concurrent runs, so if a previous run hasn't finished or been cleared, a new run for the same or overlapping schedule interval is rejected with this error. This is the most likely cause given the DAG has dependencies spanning multiple days, which can cause runs to overlap if not properly configured.

Exam trap

Google Cloud often tests the distinction between DAG-level concurrency settings (max_active_runs_per_dag) and task-level parallelism (e.g., pool, task concurrency), leading candidates to confuse the 'DagRun already exists' error with scheduler overload or task dependency issues.

How to eliminate wrong answers

Option A is wrong because a large number of tasks may cause scheduler performance issues or resource exhaustion, but it does not directly produce a 'DagRun already exists' error; that error is related to DAG run creation, not task-level parallelism. Option C is wrong because a short schedule interval does not create duplicate run IDs; Airflow uses the logical date (execution_date) as the run ID, and each scheduled interval produces a unique logical date, so duplicate run IDs would only occur if the same logical date is triggered twice (e.g., via manual backfill or API). Option D is wrong because depends_on_past=True causes tasks to wait for previous task instances to succeed, but it does not prevent the creation of new DAG runs; the 'DagRun already exists' error occurs at the DAG run level, not at the task dependency level.

Full explanation →

166

MCQeasy

A team has trained a model using AutoML Tables. They want to deploy it for batch predictions on a schedule. What is the simplest approach?

A.Write a Cloud Function triggered by Cloud Scheduler

B.Export model to Cloud Storage and use Dataflow

C.Deploy to App Engine

D.Use Vertex AI Batch Prediction with a scheduled pipeline

AnswerD

Vertex AI Batch Prediction is the native, simplest way to perform batch predictions on a schedule.

Why this answer

Vertex AI Batch Prediction is the simplest approach because it is a managed service that directly supports batch predictions on AutoML Tables models without requiring additional infrastructure. By wrapping it in a scheduled Vertex AI pipeline, you can automate the entire workflow—triggering predictions on a schedule, handling input/output to Cloud Storage, and managing compute resources—all within the Vertex AI ecosystem, minimizing operational overhead.

Exam trap

Google Cloud often tests the misconception that you must export an AutoML model to use it outside Vertex AI, but the simplest path is to use Vertex AI's native batch prediction service, which avoids the overhead of custom infrastructure like Dataflow or Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Cloud Functions are designed for lightweight, event-driven tasks and lack native support for AutoML Tables model serving; you would need to manually load the model and handle scaling, which adds complexity and is not the simplest approach. Option B is wrong because exporting the model to Cloud Storage and using Dataflow introduces unnecessary steps—Dataflow requires writing a custom pipeline to load the exported model and perform predictions, whereas Vertex AI Batch Prediction handles this natively. Option C is wrong because App Engine is a platform for hosting web applications, not designed for batch prediction workloads; it would require building a custom prediction service and managing scaling, which is more complex than using Vertex AI's built-in batch prediction.

Full explanation →

167

MCQhard

A financial services company runs a batch Dataflow pipeline daily to process transaction data. The pipeline reads from Cloud Storage, performs complex transformations, and writes to BigQuery. Recently, the pipeline has been failing intermittently with the error: 'Workflow failed. Causes: (9c3f7a2b1d4e): The worker missed 2000 data samples in the last 30 seconds. This can be caused by a variety of factors, including slow work items, network issues, or resource contention.' The team has already increased the number of workers and tried using e2-standard-8 machine types, but the issue persists. The pipeline processes approximately 500 GB of data per run and uses approximately 200 workers. The team suspects that the issue might be related to shuffle operations. What should the team do next to resolve the issue?

A.Enable Streaming Engine for the pipeline.

B.Increase the persistent disk size per worker to 100 GB.

C.Reduce the number of workers to 100 to decrease shuffle overhead.

D.Use Cloud Storage as a shuffle sink.

AnswerB

Provides more space for shuffle data, reducing disk contention.

Why this answer

The error indicates that workers are missing data samples due to slow shuffle operations, often caused by insufficient disk I/O. Increasing the persistent disk size per worker to 100 GB provides more local scratch space for Dataflow's shuffle, reducing disk contention and allowing the shuffle to complete within the 30-second window. This directly addresses the root cause without changing the worker count or machine type.

Exam trap

Google Cloud often tests the misconception that increasing workers or machine type always solves performance issues, when in fact shuffle-bound pipelines require adequate local disk I/O, not just more CPU or memory.

How to eliminate wrong answers

Option A is wrong because Streaming Engine is designed for streaming pipelines, not batch pipelines, and enabling it would not resolve shuffle disk I/O issues in a batch Dataflow job. Option C is wrong because reducing the number of workers from 200 to 100 would increase the data volume each worker must shuffle, worsening the disk contention and likely increasing the number of missed samples. Option D is wrong because Cloud Storage as a shuffle sink is not a supported configuration in Dataflow; Dataflow uses persistent disk for shuffle by default, and switching to an external sink would introduce network latency and not fix the local disk bottleneck.

Full explanation →

168

Multi-Selectmedium

Which THREE of the following are best practices when designing a Cloud Dataflow pipeline for batch processing? (Choose three.)

Select 3 answers

A.Use mutable state within ParDo to track running totals.

B.Use side inputs to hold a large lookup table that is read in every element.

C.Always insert a Reshuffle transform after every GroupByKey to redistribute data.

D.Create separate pipelines for independent jobs to allow independent scaling.

E.Tune the batch size in Write transforms to optimize BigQuery streaming inserts.

AnswersB, D, E

Side inputs enable efficient broadcast of static data to all workers.

Why this answer

Option B is correct because side inputs in Cloud Dataflow are designed to efficiently broadcast a read-only dataset (like a lookup table) to all parallel workers. When the side input is a large but static dataset, Dataflow can cache it in memory or on disk across workers, avoiding repeated external lookups and reducing per-element processing overhead. This pattern is especially effective for batch processing where the side input is read once and reused across all elements.

Exam trap

Google Cloud often tests the misconception that mutable state is acceptable in Dataflow's ParDo for batch processing, but the correct understanding is that Dataflow's execution model requires stateless transforms to ensure fault tolerance and exactly-once processing.

Full explanation →

169

MCQhard

A company runs a Dataflow streaming pipeline that reads from Pub/Sub and writes to BigQuery. They experience a sudden spike in data volume causing BigQuery write throughput to be exceeded, resulting in errors. Which strategy should they implement to handle this gracefully?

A.Use a BigQuery sink with 'FAIL_FAST' error handling and set a dead-letter queue for failed writes.

B.Use a BigQuery sink with 'WRITE_APPEND' mode and set 'writeDisposition' to 'WRITE_APPEND'.

C.Use a BigQuery sink with 'WRITE_TRUNCATE' mode.

D.Use a BigQuery sink with 'CREATE_NEVER' write method.

AnswerA

Routes failed writes to a dead-letter queue for retry, avoiding pipeline stalls and data loss.

Why this answer

Using a BigQuery sink with 'FAIL_FAST' error handling and a dead-letter queue (D) allows the pipeline to route failed writes to a separate Pub/Sub topic for later retry, preventing data loss and backpressure. Option A and B change write mode but don't handle errors. Option C prevents table creation but doesn't address throughput.

Full explanation →

170

Multi-Selecthard

Your company is building a data processing system that ingests sensor data from millions of devices, processes it in near real-time to detect anomalies, and stores raw and processed data for long-term analytics. The system must meet a 99.9% uptime SLA and minimize data loss. Which THREE design choices are best? (Choose three.)

Select 3 answers

A.Use Cloud Pub/Sub as the ingestion layer with a dead-letter topic to capture unprocessed messages.

B.Store raw data in Cloud Bigtable and processed data in Cloud Storage.

C.Use Dataflow with at-least-once processing guarantees and perform deduplication downstream.

D.Use Cloud Storage for raw data archival and BigQuery for processed analytics data.

E.Use a global Cloud Load Balancer in front of the Dataflow workers.

AnswersA, C, D

Dead-letter topics prevent data loss by storing messages that cannot be processed after retries.

Why this answer

Cloud Pub/Sub with a dead-letter topic ensures that messages that cannot be processed are captured and not lost, directly supporting the requirement to minimize data loss. The dead-letter topic allows for later reprocessing or analysis of failed messages, which is critical for meeting a 99.9% uptime SLA by preventing message backlogs from blocking the ingestion pipeline.

Exam trap

Google Cloud often tests the misconception that a load balancer is needed to scale Dataflow workers, when in fact Dataflow auto-scales its own workers and uses Pub/Sub's pull subscriptions to distribute messages evenly across workers without a separate load balancer.

Full explanation →

171

MCQmedium

A data engineer deploys a TensorFlow model on Vertex AI using a custom container. After deployment, online prediction requests sometimes fail with a 500 error and the message 'Out of memory'. The model requires significant memory during inference. Which action should the engineer take to resolve this issue?

A.Reduce the batch size of prediction requests sent to the endpoint.

B.Increase the memory limit in the Vertex AI endpoint configuration.

C.Optimize the model by quantizing weights to reduce model size.

D.Use a machine type with higher CPU performance.

AnswerB

Configuring a higher memory machine type or increasing the memory limit in the container spec provides the needed resources.

Why this answer

Option B is correct because Vertex AI endpoints allow you to configure a machine type with a specific memory limit. When a custom container runs out of memory during inference, increasing the memory allocation (e.g., by selecting a machine type with more RAM, such as n1-highmem-8) directly addresses the 'Out of memory' error. This ensures the container has sufficient resources to handle the model's inference workload without crashing.

Exam trap

Google Cloud often tests the misconception that reducing batch size or optimizing the model (quantization) is the first step to fix runtime OOM errors, when in fact the immediate operational fix is to allocate more memory to the deployment.

How to eliminate wrong answers

Option A is wrong because reducing the batch size may reduce per-request memory usage, but the error occurs during inference of a single request or a small batch; the root cause is insufficient memory for the model itself, not request batching. Option C is wrong because quantizing weights reduces model size on disk and may lower memory footprint, but it is a model optimization technique that requires retraining or conversion and does not immediately resolve a runtime OOM error in a deployed container. Option D is wrong because higher CPU performance (e.g., more vCPUs) does not increase available memory; the OOM error is a memory issue, not a CPU bottleneck, and Vertex AI machine types with higher CPU often have the same or lower memory ratios.

Full explanation →

172

MCQeasy

Your company deploys a classification model on Vertex AI for online predictions. The model is an XGBoost model trained on tabular data with 500 features. The endpoint uses a single n1-standard-4 node. After deployment, users report that predictions take 8-10 seconds on average, while the required SLA is under 2 seconds. You have already verified that the model is not large (under 100 MB) and the input data size is small. The endpoint does not scale automatically. Which action should you take to reduce latency to meet the SLA? A) Change the machine type to n1-highcpu-4 to prioritize compute over memory. B) Enable autoscaling by setting min replicas to 2 and max replicas to 5. C) Switch to a custom container that preloads the model into memory. D) Reduce the number of features by half.

A.Change the machine type to n1-highcpu-4 to prioritize compute over memory.

B.Reduce the number of features by half.

C.Switch to a custom container that preloads the model into memory.

D.Enable autoscaling by setting min replicas to 2 and max replicas to 5.

AnswerD

Adding replicas offloads requests, reducing wait time and average latency.

Why this answer

Option B is correct because the current single node is overloaded; autoscaling distributes traffic across multiple nodes, reducing latency for each request. Option A (CPU-optimized machine) may not help if the bottleneck is not CPU. Option C (preloading) is already default for Vertex AI.

Option D (feature reduction) could degrade model accuracy and is not necessary.

Full explanation →

173

MCQmedium

A company is migrating its on-premises Apache Spark jobs to Dataproc. The jobs read from and write to Cloud Storage. After migration, the jobs are slower than expected. The Dataproc cluster uses standard worker machines with local SSDs. What is the most likely cause of the performance degradation?

A.The Spark shuffle service is not enabled on the cluster.

B.The local SSDs are not mounted or are misconfigured.

C.The Cloud Storage connector is not using the gRPC protocol.

D.The jobs use the Cloud Storage connector instead of HDFS, causing network latency.

AnswerD

Reading from Cloud Storage over network is slower than local HDFS reads.

Why this answer

D is correct because the performance degradation is most likely due to network latency when using the Cloud Storage connector instead of HDFS. Cloud Storage is an object store accessed over the network, while HDFS leverages local SSDs for data locality and faster I/O. In Dataproc, jobs that read/write to Cloud Storage incur higher latency compared to using HDFS on local SSDs, especially for shuffle-heavy Spark workloads.

Exam trap

Google Cloud often tests the misconception that local SSDs or connector protocols are the bottleneck, when the real issue is the inherent latency of using a remote object store (Cloud Storage) versus a distributed filesystem (HDFS) with data locality.

How to eliminate wrong answers

Option A is wrong because the Spark shuffle service is enabled by default on Dataproc clusters and is not related to Cloud Storage I/O performance. Option B is wrong because local SSDs are automatically mounted and configured by Dataproc; misconfiguration would cause failures, not just slower performance. Option C is wrong because the Cloud Storage connector uses HTTP/HTTPS by default, and while gRPC can improve performance, it is not the primary cause of degradation compared to the fundamental latency difference between object storage and HDFS.

Full explanation →

174

MCQmedium

A gaming company uses Cloud Pub/Sub to ingest player activity events. A Dataflow streaming pipeline consumes these events, performs stateful processing to compute session metrics, and writes results to Cloud Bigtable for low-latency queries. Recently, the pipeline's processing latency increased, and the Bigtable write throughput dropped. Monitoring shows that the pipeline is experiencing a high rate of 'out-of-order' messages and 'duplicate' events. The Pub/Sub subscription is configured with exactly-once delivery. The Dataflow job uses a GlobalWindow with a trigger that fires every 10 seconds. What is the most likely cause and solution?

A.The Bigtable instance is under-provisioned; add more nodes to increase write throughput.

B.Change the Pub/Sub subscription from exactly-once to at-least-once delivery to avoid redelivery overhead.

C.The pipeline's trigger is too frequent; increase the trigger interval to 30 seconds and set allowed lateness to 1 minute to handle out-of-order events.

D.The streaming engine is disabled; enable Streaming Engine to reduce worker memory pressure.

AnswerC

A longer trigger allows more events to be processed before firing, reducing duplicates and correcting out-of-order handling.

Why this answer

Option C is correct because the high rate of out-of-order and duplicate events indicates that the pipeline's trigger is firing too frequently, causing the stateful processing to attempt to commit partial windows before all events arrive. Increasing the trigger interval to 30 seconds and setting allowed lateness to 1 minute allows the pipeline to buffer more events, reduce the number of speculative triggers, and handle late-arriving data within the lateness bound, which directly reduces processing latency and Bigtable write contention.

Exam trap

Google Cloud often tests the misconception that increasing Bigtable nodes or changing Pub/Sub delivery mode will fix pipeline latency, when the real issue is the trigger configuration causing excessive speculative windowing and state churn.

How to eliminate wrong answers

Option A is wrong because the root cause is not Bigtable provisioning; the symptom of low write throughput is a downstream effect of the pipeline's trigger behavior, not a capacity issue. Option B is wrong because changing from exactly-once to at-least-once delivery would increase duplicates, not reduce them, and the subscription's exactly-once mode is not causing the redelivery overhead—the problem is the trigger frequency. Option D is wrong because disabling Streaming Engine would increase worker memory pressure, not reduce it; the described symptoms are not related to Streaming Engine being disabled, and enabling it would not fix the trigger-induced out-of-order and duplicate events.

Full explanation →

175

MCQeasy

A company has deployed a classification model on Vertex AI. They want to detect data drift in real-time for the model's input features. Which service should they use?

A.Cloud Monitoring

B.Cloud Data Loss Prevention

C.Cloud Logging

D.Vertex AI Model Monitoring

AnswerD

Vertex AI Model Monitoring continuously monitors feature distributions and alerts on drift.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to detect data drift and feature skew for models deployed on Vertex AI. It continuously monitors input features against a baseline distribution and alerts when drift exceeds a configured threshold, enabling real-time detection without requiring custom code.

Exam trap

The trap here is that candidates confuse general monitoring (Cloud Monitoring) with ML-specific drift detection, assuming any monitoring tool can detect data drift, when in fact Vertex AI Model Monitoring is the only service that performs statistical distribution comparison for model inputs.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is a general-purpose observability service for metrics, uptime checks, and dashboards; it lacks built-in statistical drift detection for ML model features. Option B is wrong because Cloud Data Loss Prevention (DLP) is used for inspecting, classifying, and masking sensitive data, not for monitoring feature distributions or drift. Option C is wrong because Cloud Logging captures and stores log entries from services but does not perform statistical analysis or drift detection on model inputs.

Full explanation →

176

Multi-Selecteasy

A data engineering team is operationalizing a machine learning model for real-time inference. They need to monitor the model's performance in production. Which THREE types of monitoring should they implement? (Choose three.)

Select 3 answers

A.Model accuracy decay

B.Model re-training frequency

C.Training pipeline failures

D.Prediction latency

E.Input feature drift

AnswersA, D, E

Measures decline in prediction quality over time.

Why this answer

Model accuracy decay (A) is critical because in production, the model's predictive performance can degrade over time due to changes in the underlying data distribution or business logic. Monitoring accuracy decay allows the team to detect when the model no longer meets its performance baseline, triggering retraining or rollback. This is a standard practice in MLOps for maintaining model reliability.

Exam trap

Google Cloud often tests the distinction between monitoring the model's operational health (latency, drift, accuracy) versus managing the training lifecycle (retraining frequency, pipeline failures), leading candidates to confuse infrastructure monitoring with model performance monitoring.

Full explanation →

177

MCQeasy

A company wants to monitor the performance of a deployed model in production. Which metric indicates that the model's predictions are degrading?

A.Increase in prediction error rate

B.Increase in prediction latency

C.Decrease in throughput

D.Increase in number of requests

AnswerA

Error rate reflects model accuracy.

Why this answer

An increase in prediction error rate directly indicates that the model's outputs are deviating from the expected or ground-truth values, signaling degradation in predictive performance. This metric captures the core concept of model drift, where the statistical properties of the input data or the relationship between features and labels change over time, leading to less accurate predictions. In production ML monitoring, tracking error rate (e.g., classification accuracy, RMSE) is the primary method to detect when a model needs retraining or updating.

Exam trap

Google Cloud often tests the distinction between operational metrics (latency, throughput) and model performance metrics (error rate), trapping candidates who confuse system health with prediction quality.

How to eliminate wrong answers

Option B is wrong because prediction latency measures the time taken for the model to return a prediction, which reflects infrastructure or model complexity issues, not the accuracy or degradation of the predictions themselves. Option C is wrong because throughput (requests per second) is a measure of system capacity and scalability, not a direct indicator of prediction quality or model drift. Option D is wrong because an increase in the number of requests indicates higher demand or usage, which does not imply that the model's predictions are becoming less accurate or degrading.

Full explanation →

178

MCQmedium

A company runs a real-time anomaly detection system on Google Cloud. Streaming data from IoT devices is ingested via Pub/Sub, processed by Dataflow (Apache Beam), and results are written to Bigtable for low-latency serving. Recently, the system has been experiencing increased latency and occasional data loss. The Dataflow pipeline shows high system lag and backlog in Pub/Sub. The Bigtable cluster has 3 nodes and is reporting high CPU utilization (over 90%). The team suspects the issue is with the pipeline configuration. They have already verified that there are no errors in the pipeline code and no network issues. Which action should they take to resolve the issue?

A.Increase the number of Bigtable nodes to handle the write throughput.

B.Change the Dataflow worker machine type to n2-standard-8.

C.Decrease the batch size in the Dataflow pipeline to reduce latency.

D.Increase the number of Dataflow workers to process messages faster.

AnswerA

High CPU utilization suggests Bigtable is overwhelmed; adding nodes increases capacity.

Why this answer

The high CPU utilization on Bigtable (over 90%) indicates that the cluster is saturated and cannot keep up with the write throughput from Dataflow. This causes backpressure in the pipeline, leading to increased system lag and backlog in Pub/Sub, and eventually data loss when Pub/Sub messages expire. Increasing the number of Bigtable nodes directly addresses the bottleneck by distributing the write load and reducing CPU pressure, which allows the pipeline to drain the backlog and reduce latency.

Exam trap

Google Cloud often tests the misconception that scaling Dataflow workers or changing machine types always resolves pipeline latency, but the trap here is that the bottleneck is at the sink (Bigtable), so you must scale the sink first to relieve backpressure.

How to eliminate wrong answers

Option B is wrong because changing the Dataflow worker machine type to n2-standard-8 would increase compute capacity for processing, but the bottleneck is at the Bigtable sink, not the Dataflow workers; the pipeline is already experiencing backpressure from Bigtable, so more worker CPU would not resolve the write throughput limitation. Option C is wrong because decreasing the batch size in Dataflow would increase the number of smaller writes to Bigtable, which actually increases overhead and CPU usage on Bigtable, worsening the latency and backlog issue. Option D is wrong because increasing the number of Dataflow workers would increase the parallelism of writes to Bigtable, further amplifying the write pressure on the already saturated Bigtable cluster, making the high CPU utilization and backlog worse.

Full explanation →

179

MCQeasy

A company deploys a scikit-learn model on Vertex AI for online predictions. The model is packaged in a custom container with all dependencies. Users report high latency (over 5 seconds) for predictions. The model size is 2 GB. What is the most likely cause of the high latency?

A.Using online predictions instead of batch prediction

B.Not enabling GPU acceleration

C.Using a custom container with a large unoptimized model

D.Using a small machine type (e.g., n1-standard-2)

AnswerC

Large models in custom containers cause slow loading and inference; using a prebuilt container or optimizing the model would reduce latency.

Why this answer

Option C is correct because a 2 GB model loaded into a custom container without optimization (e.g., quantization, pruning, or ONNX conversion) will cause significant cold-start latency and per-request loading overhead. Vertex AI online predictions require the model to be loaded into memory for each request or container instance; a large, unoptimized model increases both loading time and inference time, easily exceeding 5 seconds.

Exam trap

Google Cloud often tests the misconception that latency is always due to compute resources (CPU/GPU) or prediction type, when in fact the model's size and lack of optimization are the primary culprits in custom container deployments.

How to eliminate wrong answers

Option A is wrong because online predictions are designed for low-latency, real-time inference, and switching to batch prediction would not reduce latency for individual requests—batch prediction is for high-throughput, asynchronous jobs. Option B is wrong because GPU acceleration primarily speeds up matrix operations during inference, but the main bottleneck here is model size and loading overhead, not compute speed; a 2 GB model on CPU can still be fast if optimized. Option D is wrong because while a small machine type (e.g., n1-standard-2 with 2 vCPUs and 7.5 GB RAM) could contribute to latency, the most likely cause is the unoptimized model size; even a larger machine would still suffer from the same loading and inference delays if the model is not optimized.

Full explanation →

180

MCQeasy

A company uses Cloud Dataproc to run Spark ML jobs. The jobs are memory-intensive and often fail with OutOfMemory errors. Which action would most effectively reduce memory pressure without changing the Spark code?

A.Increase the number of worker nodes and reduce the number of cores per worker.

B.Increase the master node's memory.

C.Increase the number of Spark partitions.

D.Use preemptible VMs for workers.

AnswerA

More workers spread memory load.

Why this answer

Increasing the number of worker nodes while reducing the number of cores per worker reduces memory pressure by distributing the workload across more JVMs, each with a smaller heap. This lowers the per-executor memory requirement and reduces the risk of OutOfMemory errors without modifying Spark code. In Cloud Dataproc, this approach directly addresses memory contention by giving each executor fewer tasks to process concurrently.

Exam trap

Google Cloud often tests the misconception that adding more partitions or increasing master memory solves executor-level memory issues, when the real solution is to reduce per-executor task concurrency by adjusting the worker-to-core ratio.

How to eliminate wrong answers

Option B is wrong because increasing the master node's memory does not help with executor memory pressure; the master node handles cluster coordination and driver tasks, not the memory-intensive worker processing. Option C is wrong because increasing the number of Spark partitions can reduce the data size per task but does not directly reduce per-executor memory pressure and may increase scheduling overhead without addressing the root cause. Option D is wrong because using preemptible VMs for workers reduces cost but does not change memory allocation per worker; preemptible VMs can be reclaimed at any time, potentially causing job instability and not solving OutOfMemory errors.

Full explanation →

181

MCQhard

A company is building a data lake on Cloud Storage with data from multiple sources. They need to apply schema-on-read and support ad-hoc SQL queries. Which architecture is most suitable?

A.Ingest to Cloud Spanner, query directly.

B.Ingest to Cloud SQL, then export to Cloud Storage for queries.

C.Ingest to Cloud Storage, create BigQuery external tables.

D.Ingest to Cloud Storage, load into Dataproc for queries.

AnswerC

Schema-on-read and SQL.

Why this answer

BigQuery external tables allow schema-on-read by defining the schema at query time over data stored in Cloud Storage, enabling ad-hoc SQL queries without loading data into a separate system. This architecture directly supports the requirement for schema-on-read and SQL-based analysis, as BigQuery provides a serverless, scalable SQL engine.

Exam trap

Google Cloud often tests the distinction between schema-on-read (BigQuery external tables) and schema-on-write (traditional databases like Cloud Spanner or Cloud SQL), where candidates mistakenly choose a transactional database for analytical workloads.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database designed for transactional workloads, not for schema-on-read or ad-hoc SQL queries over raw data in a data lake. Option B is wrong because Cloud SQL is a managed relational database for OLTP workloads, and exporting to Cloud Storage for queries adds unnecessary latency and complexity, failing to leverage schema-on-read directly. Option D is wrong because Dataproc is a managed Spark/Hadoop service that requires data loading and cluster management, which is not as efficient or serverless as BigQuery external tables for ad-hoc SQL queries on a data lake.

Full explanation →

182

MCQhard

A financial services company uses Cloud Composer to orchestrate a daily workflow that includes a Dataproc job for risk analysis. The workflow sometimes fails because the Dataproc cluster creation times out. The cluster creation typically takes 3 minutes, but occasionally takes over 10 minutes. What is the most effective way to handle this variability?

A.Create a long-running Dataproc cluster that remains idle and reuse it for each workflow.

B.Implement a retry loop with exponential backoff in the DAG.

C.Use preemptible VMs for the cluster to reduce cost and improve creation speed.

D.Increase the cluster creation timeout in the Airflow configuration.

AnswerA

Reusing an existing cluster eliminates the creation step and associated timeout.

Why this answer

Option A is correct because creating a long-running Dataproc cluster and reusing it eliminates the variable cluster creation time that causes timeouts. Cloud Composer (Airflow) can manage cluster lifecycle separately from the workflow, ensuring the cluster is always available when the Dataproc job runs. This approach decouples cluster provisioning from job execution, making the workflow resilient to creation delays.

Exam trap

The trap here is that candidates often assume retries or timeout adjustments are sufficient for infrastructure variability, but the most effective solution is to eliminate the variable step entirely by reusing a persistent cluster.

How to eliminate wrong answers

Option B is wrong because retry loops with exponential backoff only handle transient failures after a timeout occurs, but they do not address the root cause—the variable cluster creation time—and can lead to long delays or eventual failure if creation consistently exceeds the timeout. Option C is wrong because preemptible VMs are designed to reduce cost, not improve creation speed; they are actually more likely to be reclaimed and can cause cluster creation to fail or take longer due to availability constraints. Option D is wrong because increasing the cluster creation timeout in Airflow configuration merely extends the wait time without solving the underlying variability; it can mask the problem and lead to longer workflow execution times without guaranteeing success.

Full explanation →

183

MCQhard

Refer to the exhibit. A Dataflow pipeline writes to BigQuery table employee_records. The pipeline was working yesterday but fails today. What is the most likely cause?

A.The pipeline dropped the last_name field entirely.

B.The pipeline code was changed to send an integer for the last_name field.

C.The BigQuery table quota was exceeded.

D.The BigQuery table schema was changed from STRING to INTEGER for last_name.

AnswerB

The error clearly states that an integer was provided for a string field.

Why this answer

Option B is correct because if the pipeline code was changed to send an integer for the last_name field, BigQuery will reject the write due to a schema mismatch. BigQuery enforces strict type checking at ingestion time; an integer value cannot be written into a STRING column unless the schema explicitly allows coercion. Since the pipeline was working yesterday, the most likely change is in the data type being sent, not the schema itself.

Exam trap

Google Cloud often tests the misconception that schema changes in BigQuery are the primary cause of pipeline failures, when in fact the most common cause is a code change that alters the data type of a field being written, especially in streaming or batch pipelines where schema enforcement is strict.

How to eliminate wrong answers

Option A is wrong because dropping the last_name field entirely would cause a 'Required field missing' error, but the question states the pipeline fails today, and dropping a field is less likely than a type mismatch if the code was unchanged. Option C is wrong because BigQuery table quota exceeded would affect all writes, not just this pipeline, and would typically produce a 'quota exceeded' error message, not a schema mismatch failure. Option D is wrong because if the BigQuery table schema was changed from STRING to INTEGER for last_name, the pipeline sending a string would also fail, but the question states the pipeline code was changed to send an integer, making the schema change less likely as the cause; moreover, schema changes are typically controlled and would be noticed, whereas a code change is a common oversight.

Full explanation →

184

Multi-Selectmedium

Your team is running a Dataflow streaming pipeline that reads from Pub/Sub, transforms data, and writes to BigQuery. You notice that the pipeline's backlog is growing and the processing latency has increased from seconds to minutes. You need to diagnose and resolve the issue. Which TWO actions should you take? (Choose two.)

Select 2 answers

A.Stop the pipeline, increase the number of workers in the streaming engine configuration, and restart it.

B.Increase the batch size in the WriteToBigQuery transform to reduce I/O operations.

C.Configure a dead-letter queue in Cloud Storage for failed messages to reduce reprocessing load.

D.Increase the maximum number of workers in the pipeline's autoscaling configuration to allow more compute resources.

E.Examine the Dataflow monitoring dashboard for metrics like system lag, data freshness, and worker throughput.

AnswersD, E

Allowing more workers can reduce backlog if the pipeline is CPU-bound.

Why this answer

Option D is correct because increasing the maximum number of workers in the autoscaling configuration allows Dataflow to scale out horizontally, adding more compute resources to handle the increased backlog and reduce processing latency. Dataflow's autoscaling algorithm uses metrics like backlog bytes and CPU utilization to decide when to add workers, but it is capped by the max workers setting. Raising this cap enables the pipeline to allocate more VMs, thus processing more messages per second and reducing the backlog.

Exam trap

Google Cloud often tests the misconception that you must stop a streaming pipeline to change worker count or that increasing batch size always improves throughput, when in fact Dataflow supports live autoscaling and larger batches can worsen latency.

Full explanation →

185

MCQhard

You are a data engineer at a global e-commerce company. Your team manages a real-time recommendation system that ingests user clickstream events from a Pub/Sub topic (topic-clickstream). The pipeline uses Dataflow to read events, join with user profile data from Cloud Bigtable, compute recommendations using a machine learning model hosted on Cloud Run, and write results to a BigQuery table for analytics. The pipeline has been running smoothly for months, but recently the Dataflow job started failing with the error: "Workflow failed. Causes: S01:ReadPubSub/Read+Transform/ParDo(ExtractUserID)+ ... (5a3b2c1d) The job failed because a worker encountered an out-of-memory error." The Dataflow job uses the Streaming Engine feature with a worker type of n2-standard-8 (8 vCPU, 32 GB memory) and autoscaling from 2 to 20 workers. The clickstream event rate has increased from 500 events/second to 5000 events/second over the past week. The user profile data in Bigtable has also grown, with average row size increasing from 1 KB to 10 KB due to additional fields. You need to resolve the out-of-memory errors without completely redesigning the pipeline. What should you do?

A.Increase the maximum number of workers in autoscaling from 20 to 50.

B.Change the worker machine type to n2-highmem-8 (8 vCPU, 64 GB memory) in the Dataflow job configuration.

C.Reduce the batch size in the Dataflow pipeline by setting the `max_batch_size` parameter to a lower value.

D.Increase the number of Bigtable nodes to improve read throughput.

AnswerB

Increasing memory per worker directly addresses OOM without major pipeline changes.

Why this answer

Option B is correct because the out-of-memory error is caused by the increased per-worker memory load from larger Bigtable rows (1 KB to 10 KB) and higher event throughput (500 to 5000 events/sec). Switching to n2-highmem-8 doubles the memory from 32 GB to 64 GB, giving each worker more headroom to cache user profiles and process larger batches without OOM. This directly addresses the root cause without redesigning the pipeline.

Exam trap

Google Cloud often tests the misconception that scaling out (more workers) solves memory issues, when in fact the per-worker memory limit is the bottleneck and must be increased via a higher-memory machine type.

How to eliminate wrong answers

Option A is wrong because increasing the maximum number of workers spreads the load across more machines but does not increase the memory per worker; each worker still has only 32 GB, so the same OOM condition persists on individual workers. Option C is wrong because reducing batch size lowers memory per batch but increases the number of batches and overhead, which can worsen performance and still not prevent OOM if the per-row memory footprint (10 KB) is the dominant factor. Option D is wrong because Bigtable node count affects read throughput and latency, not the memory consumed by the Dataflow worker when caching or processing rows; the OOM is on the Dataflow side, not Bigtable.

Full explanation →

186

Multi-Selecthard

A company is migrating ML workflows to Vertex AI Pipelines. They want to ensure best practices for pipeline reproducibility and debugging. Which THREE actions should they take? (Choose three.)

Select 3 answers

A.Set a random seed for all training components

B.Store all artifacts in Cloud Storage with versioned prefixes

C.Pin all dependencies in training images

D.Use dynamic pipeline parameters for each run

E.Use conditional execution based on previous component outputs

AnswersA, B, C

Random seeds ensure deterministic training results.

Why this answer

Setting a random seed for all training components ensures deterministic behavior, meaning that the same inputs will produce the same outputs across multiple runs. This is critical for debugging and reproducibility in Vertex AI Pipelines, as it eliminates stochastic variability that can mask bugs or make results irreproducible. Without a fixed seed, even identical code and data can yield different model weights or metrics, complicating root cause analysis.

Exam trap

Google Cloud often tests the distinction between features that improve workflow flexibility (like dynamic parameters or conditional execution) and those that enforce reproducibility and debuggability, leading candidates to confuse operational convenience with best practices for deterministic pipelines.

Full explanation →

187

Multi-Selectmedium

A data engineer is designing a batch processing system using Cloud Dataproc. Which TWO practices improve performance and reduce costs? (Choose TWO.)

Select 2 answers

A.Always use persistent disks for all nodes.

B.Set autoscaling policies based on YARN memory.

C.Store intermediate data in HDFS.

D.Use preemptible VMs for worker nodes.

E.Use the largest machine types for master nodes.

AnswersB, D

Optimizes resource utilization.

Why this answer

Option B is correct because autoscaling policies based on YARN memory allow the cluster to dynamically add or remove worker nodes in response to actual resource demand from running jobs. This prevents over-provisioning (reducing costs) and ensures sufficient resources for job completion (improving performance), as Cloud Dataproc directly monitors YARN memory metrics to trigger scaling actions.

Exam trap

The trap here is that candidates often confuse HDFS with Cloud Storage, assuming intermediate data must be stored locally for performance, but Cloud Storage is actually faster and cheaper for transient data in Dataproc due to its native integration and lack of replication overhead.

Full explanation →

188

MCQeasy

Refer to the exhibit. A Cloud Build step fails when pushing a Docker image to Artifact Registry. What is the missing IAM role for the Cloud Build service account?

A.roles/artifactregistry.writer

B.roles/containerregistry.admin

C.roles/storage.objectCreator

D.roles/cloudbuild.builds.editor

AnswerA

This role allows pushing images to Artifact Registry.

Why this answer

The Cloud Build service account needs the `roles/artifactregistry.writer` role to push Docker images to Artifact Registry. This role grants the necessary permissions to upload artifacts, including images, to the registry. Without it, the build step fails with an authorization error.

Exam trap

Google Cloud often tests the distinction between Artifact Registry and Container Registry roles, and the trap here is that candidates confuse `roles/containerregistry.admin` (for Container Registry) with the correct Artifact Registry role, or assume that Cloud Build's own editor role includes artifact push permissions.

How to eliminate wrong answers

Option B is wrong because `roles/containerregistry.admin` is for Container Registry (gcr.io), not Artifact Registry, and the question specifies Artifact Registry. Option C is wrong because `roles/storage.objectCreator` applies to Cloud Storage buckets, not Artifact Registry repositories. Option D is wrong because `roles/cloudbuild.builds.editor` allows managing Cloud Build builds but does not grant permissions to push artifacts to Artifact Registry.

Full explanation →

189

MCQeasy

A data scientist trains a TensorFlow model using Vertex AI Training and wants to deploy it for online prediction. Which Vertex AI resource should the data scientist use to create an endpoint for serving predictions?

A.Vertex AI Batch Prediction Job

B.Vertex AI Endpoint

C.Vertex AI Feature Store

D.Vertex AI Model Registry

AnswerB

An endpoint is required to deploy a model for online predictions.

Why this answer

Option A is correct because Vertex AI Endpoint is the resource for serving online predictions. Option B (Model Registry) stores models but does not serve. Option C (Batch Prediction Job) is for batch predictions.

Option D (Feature Store) is for managing features.

Full explanation →

190

MCQmedium

A data pipeline using Cloud Pub/Sub and Cloud Dataflow is experiencing duplicate messages. The source system publishes messages at least once. What Dataflow technique ensures exactly-once processing?

A.Use idempotent sinks

B.Use GlobalWindows

C.Set watermark threshold

D.Enable streaming engine

AnswerA

Idempotent sinks allow safe duplicate writes, achieving exactly-once.

Why this answer

Option A is correct because idempotent sinks ensure that even if Cloud Pub/Sub delivers the same message multiple times (due to its at-least-once delivery semantics), the Dataflow pipeline can deduplicate or safely reapply the same data without causing duplicates in the output. This is achieved by designing the sink (e.g., BigQuery with insertId, Cloud Storage with unique filenames) to recognize and ignore repeated writes, effectively providing exactly-once processing semantics downstream.

Exam trap

The trap here is that candidates confuse 'exactly-once processing' with 'exactly-once delivery' from the source, but Pub/Sub only guarantees at-least-once delivery, so the responsibility for deduplication falls on the Dataflow pipeline and its sink design, not on windowing or engine settings.

How to eliminate wrong answers

Option B is wrong because GlobalWindows groups all elements into a single window for batch-like processing, but it does not address message duplication; it only changes how data is windowed, not how duplicates are handled. Option C is wrong because setting a watermark threshold controls how long the pipeline waits for late data, which affects completeness and latency but does not prevent duplicate messages from being processed. Option D is wrong because enabling Streaming Engine improves scalability and reduces checkpoint latency in Dataflow, but it does not provide deduplication or exactly-once guarantees; duplicates can still occur from Pub/Sub's at-least-once delivery.

Full explanation →

191

MCQhard

A company runs a Dataproc cluster for nightly batch jobs. The cluster uses preemptible workers for cost savings. Recently, the jobs have been failing intermittently with 'Disk quota exceeded' errors on the persistent disks attached to the preemptible workers. The cluster is configured with a master node and 10 worker nodes, each with a 100 GB persistent disk. The preemptible workers are dynamically added and removed. What is the most likely cause and the best long-term solution?

A.The persistent disks of the preemptible workers are too small. Resize the persistent disks to 200 GB each.

B.The preemptible workers are using local SSDs that are not recreated on reclaim. Use non-preemptible workers with local SSDs instead.

C.The preemptible workers are exceeding the project's persistent disk quota in the region because every time a preempted worker restarts, it tries to attach a new disk. Increase the disk quota.

D.The preemptible workers do not have enough persistent disk space to store intermediate shuffle data. Switch to standard workers to avoid this issue.

AnswerC

A is correct because preemptible workers can cause disk quota exhaustion due to rapid creation/deletion of persistent disks.

Why this answer

Option C is correct because the intermittent 'Disk quota exceeded' errors on preemptible workers are caused by the project's regional persistent disk quota being exhausted. When a preemptible worker is reclaimed, the cluster attempts to attach a new persistent disk to the replacement worker, but the old disk is not immediately deleted, leading to a buildup of unattached disks that consume quota. The best long-term solution is to increase the persistent disk quota in the region to accommodate the temporary disks from preempted workers.

Exam trap

The trap here is that candidates mistakenly attribute the error to insufficient disk size or shuffle data capacity, rather than recognizing it as a regional quota exhaustion issue caused by orphaned disks from preempted workers.

How to eliminate wrong answers

Option A is wrong because resizing disks to 200 GB does not address the quota exhaustion issue; the error is about quota, not disk size, and increasing disk size would actually consume more quota per disk. Option B is wrong because local SSDs are ephemeral and not recommended for preemptible workers, as they are lost on preemption, and the error is about persistent disk quota, not local SSD recreation. Option D is wrong because the error is not about insufficient disk space for shuffle data; it is a quota limit error, and switching to standard workers would increase costs without fixing the underlying quota issue.

Full explanation →

192

MCQhard

A large e-commerce company is migrating its on-premise Hadoop cluster to Google Cloud using Dataproc for batch processing. The cluster processes daily sales data from multiple sources, generates aggregated reports, and performs ad-hoc analysis. The migration is complete, but users report that jobs are running 30% slower than on-premise. The data is stored in Cloud Storage as Parquet files partitioned by date. The Dataproc cluster uses preemptible VMs for worker nodes, and the master node uses a standard VM. The jobs heavily rely on shuffling data between stages. The cluster's autoscaling is enabled with a minimum of 10 and a maximum of 50 workers. During job execution, CPU utilization on workers is low, but disk I/O is high, especially on local SSDs. The network utilization is moderate. The team suspects that the shuffle operation is causing the slowdown. Which action should the team take to improve job performance?

A.Attach additional local SSDs to each worker to increase local disk capacity and I/O throughput.

B.Enable Cloud Storage as a shuffle destination by setting the property `dataproc:dataproc.shuffle.direct` to `true` and ensure the cluster has appropriate IAM permissions.

C.Change all worker VMs from preemptible to standard VMs to avoid preemption and improve reliability.

D.Increase the maximum number of preemptible workers to 100 to provide more parallelism.

AnswerB

Cloud Storage shuffle can offload intermediate shuffle data to Cloud Storage, reducing local disk I/O and potentially improving overall shuffle performance, especially when local disks are saturated.

Why this answer

B is correct because the high disk I/O on local SSDs during shuffling indicates that the shuffle data is being written to local disk, which is a bottleneck. By enabling Cloud Storage as a shuffle destination via `dataproc:dataproc.shuffle.direct`, shuffle data is written directly to Cloud Storage, bypassing local disks and leveraging Google Cloud's high-throughput object storage. This reduces disk I/O contention and improves shuffle performance, especially when preemptible VMs are used, as shuffle data is not lost on VM preemption.

Exam trap

The trap here is that candidates often assume adding more local SSDs or increasing worker count will solve shuffle bottlenecks, but the real issue is the I/O bottleneck of local disks, and Cloud Storage shuffle is the specific Dataproc feature designed to offload shuffle data to a scalable, high-throughput object store.

How to eliminate wrong answers

Option A is wrong because attaching additional local SSDs increases capacity but does not address the root cause of high disk I/O during shuffling; the bottleneck is the local disk I/O itself, not capacity, and Cloud Storage shuffle provides better throughput. Option C is wrong because changing preemptible VMs to standard VMs improves reliability but does not directly address the shuffle I/O bottleneck; the performance issue is disk I/O, not VM preemption. Option D is wrong because increasing the maximum number of preemptible workers to 100 increases parallelism but does not reduce the disk I/O bottleneck during shuffling; more workers can actually increase shuffle traffic and exacerbate the problem.

Full explanation →

193

MCQmedium

A company needs to grant analysts access to a BigQuery table that contains sensitive PII columns. The analysts should be able to run aggregate queries on the entire dataset but must not see individual PII values. Which approach should the team use?

A.Create a user-defined function (UDF) that aggregates the data and grant analysts permission to call the UDF.

B.Use BigQuery row-level security to restrict access to non-PII rows only.

C.Create an authorized view that does not include the PII columns and grant analysts access to the view.

D.Use BigQuery column-level security with data masking to mask the PII columns for the analysts' role.

AnswerD

C is correct because column-level masking dynamically masks data based on user permissions without changing the table structure.

Why this answer

Option D is correct because BigQuery column-level security with data masking allows you to define masking policies on specific PII columns (e.g., using `DEFAULT_MASKING_RULE` or custom policies) that automatically transform the data for analysts' roles while still permitting aggregate queries over the entire dataset. This approach ensures analysts never see individual PII values, yet they can run `COUNT`, `SUM`, `AVG`, etc., on the masked columns, meeting both requirements precisely.

Exam trap

Google Cloud often tests the distinction between row-level security (filtering rows) and column-level security (masking or hiding columns), and candidates mistakenly choose row-level security when the requirement is to hide specific column values across all rows.

How to eliminate wrong answers

Option A is wrong because a UDF that aggregates data would still require analysts to have access to the underlying table to call the UDF, and the UDF cannot prevent analysts from querying the raw table directly if they have table-level permissions. Option B is wrong because row-level security filters entire rows based on a condition (e.g., `user_email = SESSION_USER()`), but here the requirement is to hide specific columns (PII) across all rows, not to exclude entire rows. Option C is wrong because an authorized view that omits PII columns would prevent analysts from seeing those columns, but it also prevents them from running aggregate queries that include PII columns (e.g., `AVG(salary)`), which the requirement explicitly allows as long as individual values are hidden.

Full explanation →

194

MCQmedium

You have a batch prediction job on Vertex AI that processes millions of records. The job is failing with an out-of-memory error. What is the best way to resolve this?

A.Increase the minNodes and maxNodes for the batch prediction job

B.Split the input data into smaller files and run multiple batch prediction jobs

C.Enable autoscaling on the batch prediction job

D.Use a machine type with more memory for the batch prediction job

AnswerD

Increasing memory directly solves OOM.

Why this answer

Option D is correct because a batch prediction job on Vertex AI runs on a single machine (or a cluster of machines) and an out-of-memory (OOM) error indicates that the model or data processing exceeds the available RAM of the chosen machine type. Increasing the machine's memory directly addresses the root cause by providing more heap space for loading the model and processing large batches of predictions, without altering the job's parallelism or data partitioning.

Exam trap

The trap here is that candidates confuse scaling out (increasing nodes or autoscaling) with scaling up (increasing per-node resources), and assume that more nodes or splitting data will fix a memory exhaustion issue that is actually caused by insufficient RAM on each individual machine.

How to eliminate wrong answers

Option A is wrong because minNodes and maxNodes control the number of replicas for distributed prediction, not the memory per machine; increasing nodes spreads the workload but does not increase per-node memory, so OOM errors can still occur on each node. Option B is wrong because splitting input data into smaller files and running multiple jobs addresses data size but not the per-instance memory limit; if the model itself is large or each prediction requires significant memory, even smaller files can cause OOM on the same machine type. Option C is wrong because autoscaling adjusts the number of nodes based on load, not the memory capacity of each node; it can help with throughput but does not resolve a fundamental memory shortage on individual machines.

Full explanation →

195

MCQmedium

A BigQuery query fails with the error shown in the exhibit. What is the most likely cause?

A.The query scans too many partitions or data without efficient pruning

B.The table has too many partitions

C.The user does not have permission to query the table

D.Insufficient slot capacity in the project

AnswerA

SELECT * on a large table can exceed resource limits; partition pruning might help.

Why this answer

The error indicates that the query attempted to scan too many partitions or a large amount of data without effective partition pruning. BigQuery charges based on the amount of data processed, and queries that scan all partitions of a large table can hit limits or incur high costs. The most likely cause is that the query's WHERE clause does not filter on the partitioning column, forcing a full table scan across all partitions.

Exam trap

Google Cloud often tests the distinction between 'too many partitions' (a table design issue) and 'scanning too many partitions' (a query design issue), leading candidates to mistakenly choose the option about partition count rather than the lack of pruning.

How to eliminate wrong answers

Option B is wrong because having too many partitions does not directly cause a query failure; BigQuery supports up to 4,000 partitions per table, and the error is about scanning too many partitions, not the count itself. Option C is wrong because permission errors produce a distinct 'Access Denied' or 'Permission denied' message, not a partition scanning error. Option D is wrong because insufficient slot capacity results in 'Resources exceeded' or 'Query execution timed out' errors, not a partition scanning limit error.

Full explanation →

196

Multi-Selectmedium

A data engineering team is building a CI/CD pipeline for machine learning models using Cloud Build and AI Platform. Which TWO practices are essential for ensuring reproducible and safe model deployments?

Select 2 answers

A.Use Cloud Functions to trigger retraining on new data arrival.

B.Tag each model version with the Git commit hash of the training code.

C.Run integration tests against the model on a staging endpoint before promoting to production.

D.Use the same environment for training and serving, possibly via custom containers.

E.Directly deploy from the development environment using gcloud commands.

AnswersB, C

Links model to exact code version for reproducibility.

Why this answer

Options A and C are correct. A ensures every model version is linked to source and training process; C ensures validation before production. B is not about reproducibility; D might be useful but not essential for reproducibility; E is anti-pattern.

Full explanation →

197

MCQeasy

A company stores raw data files in Cloud Storage in a bucket named 'raw-data'. After processing, the files are moved to a 'processed' bucket. To reduce costs, they want to automatically delete raw data older than 30 days. What should they do?

A.Enable object versioning on the 'raw-data' bucket and configure a lifecycle rule to delete noncurrent versions.

B.Configure a lifecycle rule on the 'raw-data' bucket to delete objects older than 30 days.

C.Set a retention policy on the 'raw-data' bucket to expire objects after 30 days.

D.Use a bucket policy that denies read access to objects older than 30 days.

AnswerB

A is correct because lifecycle management can automatically delete objects based on age.

Why this answer

Option B is correct because Cloud Storage lifecycle management allows you to set a rule that automatically deletes objects after a specified number of days from their creation time. By configuring a lifecycle rule on the 'raw-data' bucket to delete objects older than 30 days, the company can achieve cost reduction without manual intervention. This directly addresses the requirement to remove raw data files that have been processed and are no longer needed.

Exam trap

The trap here is confusing lifecycle deletion rules with retention policies or versioning: candidates often think retention policies delete data after a period, but they actually prevent deletion, while versioning with noncurrent deletion only removes old versions, not the current object.

How to eliminate wrong answers

Option A is wrong because enabling object versioning and deleting noncurrent versions does not delete the current (original) objects; it only removes older versions, so raw data files would remain in the bucket indefinitely. Option C is wrong because a retention policy (e.g., using Object Hold or Retention Policy) prevents deletion or modification of objects for a specified duration, which would keep the data for at least 30 days, not delete it after 30 days. Option D is wrong because a bucket policy that denies read access does not delete the objects; the files would still exist and incur storage costs, failing to meet the cost-reduction goal.

Full explanation →

198

MCQhard

A company uses Vertex AI to serve a model that requires GPU for inference. They want to minimize cost while handling variable traffic. Which strategy should they use?

A.Deploy the model to Cloud Functions with GPU

B.Use a Vertex AI Endpoint with GPU and configure auto-scaling to zero when idle

C.Use Vertex AI Batch Prediction with GPU

D.Use a Vertex AI Endpoint with GPU with a fixed number of replicas

AnswerB

Scales to zero reduces cost.

Why this answer

Option B is correct because Vertex AI Endpoints support GPU-accelerated inference with autoscaling, including the ability to scale down to zero replicas when there is no traffic. This minimizes cost by only incurring GPU charges during active inference, while still handling variable traffic through dynamic scaling.

Exam trap

Google Cloud often tests the misconception that serverless services like Cloud Functions can support GPU acceleration, when in reality GPU compute requires dedicated infrastructure like Vertex AI Endpoints or GKE.

How to eliminate wrong answers

Option A is wrong because Cloud Functions do not support GPU attachments; they are designed for lightweight, event-driven compute and cannot run GPU-accelerated inference. Option C is wrong because Vertex AI Batch Prediction is intended for offline, asynchronous processing of large datasets, not for serving real-time variable traffic with low latency. Option D is wrong because using a fixed number of replicas with GPU does not minimize cost; it keeps GPU instances running continuously regardless of traffic, leading to higher costs during idle periods.

Full explanation →

199

MCQmedium

Your company uses Vertex AI Pipelines to automate model retraining. The pipeline has three steps: data extraction from BigQuery, feature engineering using Dataflow, and model training using a custom container on Vertex AI Training. Recently, the pipeline has been failing intermittently at the Dataflow step with a 'The job encountered a transient error. Please retry.' message. You have enabled pipeline retries with 3 attempts. However, the pipeline still fails after 3 retries. You check the logs and find that the Dataflow job requires more resources than the default worker configuration provides. Which change should you make to reduce the failure rate?

A.Increase the number of Dataflow workers to improve parallelism

B.Increase the number of retries in the pipeline to 5

C.Replace Dataflow with Dataproc to run the feature engineering step

D.Increase the Dataflow worker machine type to have more memory and CPU in the pipeline step configuration

AnswerD

More resources prevent the transient resource exhaustion errors.

Why this answer

Option D is correct because the pipeline fails due to insufficient resources (memory and CPU) in the default Dataflow worker configuration. By increasing the worker machine type (e.g., using a custom machine type with more vCPUs and memory), the Dataflow job can handle the feature engineering workload without hitting resource limits, reducing transient failures. This directly addresses the root cause identified in the logs, unlike retries or parallelism changes.

Exam trap

Google Cloud often tests the misconception that increasing parallelism (more workers) or retries will fix resource exhaustion errors, when the actual fix is to increase per-worker resources by selecting a larger machine type.

How to eliminate wrong answers

Option A is wrong because increasing the number of workers improves parallelism but does not address the root cause of insufficient per-worker resources (memory/CPU); it may even increase resource contention. Option B is wrong because increasing retries from 3 to 5 does not fix the underlying resource constraint; the job will continue to fail on each retry if the worker configuration remains inadequate. Option C is wrong because replacing Dataflow with Dataproc is an unnecessary architectural change that introduces new operational complexity and does not solve the specific resource issue; the problem is with worker sizing, not the service itself.

Full explanation →

200

MCQeasy

A data science team has built a model using scikit-learn. They want to operationalize it on Google Cloud without rewriting the code. Which approach should they take?

A.Export the model as a PMML file and use BigQuery ML

B.Use AI Platform Training to host the model directly

C.Package the model in a custom container and deploy to Vertex AI Endpoints

D.Convert the scikit-learn model to TensorFlow SavedModel format

AnswerC

Custom containers allow any framework without code changes.

Why this answer

Option C is correct because Vertex AI Endpoints support custom containers, allowing you to package your scikit-learn model with its dependencies (e.g., a Flask or FastAPI inference server) and deploy it without rewriting any code. This approach directly meets the requirement to operationalize the existing model on Google Cloud without modification.

Exam trap

Google Cloud often tests the misconception that AI Platform Training can host models directly, but it is strictly for training jobs, not serving endpoints; candidates confuse the training service with the prediction service.

How to eliminate wrong answers

Option A is wrong because PMML (Predictive Model Markup Language) is not natively supported by BigQuery ML; BigQuery ML uses SQL-based model creation and does not import PMML files for inference. Option B is wrong because AI Platform Training is designed for training jobs, not for hosting models as endpoints; hosting is done via AI Platform Prediction (now part of Vertex AI), but even then, scikit-learn models require a custom prediction routine or container, not direct hosting. Option D is wrong because converting a scikit-learn model to TensorFlow SavedModel format would require rewriting the model's inference logic and dependencies, contradicting the requirement to avoid code changes.

Full explanation →

201

MCQhard

You have deployed a TensorFlow model on Vertex AI Endpoints with autoscaling. The model receives high traffic during peak hours, but you notice that inference latency increases significantly during cold starts. Which strategy would best minimize cold-start latency without incurring unnecessary cost?

A.Set minNodes to a value that handles baseline traffic, and use traffic splitting to gradually shift traffic to new replicas

B.Set minNodes to 0 and enable node auto-scaling

C.Increase maxNodes to allow more replicas during peak, and rely on Kubernetes Horizontal Pod Autoscaler

D.Use Cloud Functions with Cloud Run for the model inference to leverage serverless cold-start mitigation

AnswerA

Keeps baseline replicas warm; gradual traffic shift avoids sudden load.

Why this answer

Setting minNodes to a value that handles baseline traffic ensures that a minimum number of replicas are always warm, eliminating cold starts for baseline requests. Traffic splitting gradually shifts new traffic to newly created replicas, allowing them to warm up before receiving full load, which minimizes latency spikes without over-provisioning resources.

Exam trap

Google Cloud often tests the misconception that increasing maxNodes or relying on generic autoscaling (like HPA) solves cold starts, but the key is keeping a baseline of warm replicas via minNodes and using traffic splitting to warm new replicas gradually.

How to eliminate wrong answers

Option B is wrong because setting minNodes to 0 means no replicas are kept warm, so every scale-up event will trigger a cold start, increasing latency during traffic spikes. Option C is wrong because increasing maxNodes alone does not prevent cold starts; without a minimum number of warm replicas, new replicas still need to initialize, and relying on Kubernetes Horizontal Pod Autoscaler (which is not used by Vertex AI Endpoints) is irrelevant as Vertex AI uses its own autoscaling mechanism. Option D is wrong because Cloud Functions and Cloud Run are serverless compute services, not designed for hosting TensorFlow models with GPU/TPU support, and they introduce their own cold-start latency without addressing the specific issue of model inference cold starts on Vertex AI.

Full explanation →

202

MCQmedium

A data scientist deploys a new version of a fraud detection model (model2) alongside the existing model (model1) on the same Vertex AI endpoint with a 70/30 traffic split. After 24 hours, the team notices that model2's predictions are significantly different from model1's, and the fraud detection rate has increased. What is the most likely explanation for the change in predictions?

A.Model2 was trained on data that leaked future information, causing unrealistic results.

B.Model2 is receiving corrupted input data due to a bug in the traffic routing.

C.The traffic split is misconfigured and sending all traffic to model2.

D.Model2 uses a different model artifact (fraud_detection_v2) that produces different predictions.

AnswerD

The environment variable MODEL_NAME points to different model versions, causing output differences.

Why this answer

Option D is correct because the most straightforward explanation for a significant change in predictions and an increased fraud detection rate is that model2 uses a different model artifact (fraud_detection_v2) that was designed to produce different outputs. In Vertex AI, deploying a new model version with a traffic split means both models receive the same input data, but each model artifact independently processes it. If model2's predictions differ substantially, it indicates the model artifact itself has been updated or replaced, not that there is a data or routing issue.

Exam trap

Google Cloud often tests the misconception that a traffic split or routing issue can cause prediction differences, when in fact the split only controls which model receives the request, not the content of the request or the model's internal logic.

How to eliminate wrong answers

Option A is wrong because data leakage would cause unrealistically high performance during training, but it does not explain why predictions differ between two models receiving the same live input data; both models would be affected if the input data itself contained leaked future information. Option B is wrong because corrupted input data due to a traffic routing bug would affect both models equally if they share the same endpoint and routing logic; Vertex AI's traffic split routes requests to the correct model based on the configured percentage, not by altering the input data. Option C is wrong because if the traffic split were misconfigured to send all traffic to model2, model1 would receive zero requests, but the question states a 70/30 split is in place and the team notices model2's predictions differ from model1's; a misconfiguration would not cause model2's predictions to change—it would simply change which model serves requests.

Full explanation →

203

MCQhard

You run `gcloud ai models describe` and get the error above. The model was created successfully from a training job that completed without errors. The model ID is correct. What is the most likely cause?

A.The model was deleted or expired due to time-to-live settings.

B.The gcloud command is not authenticated to the correct project.

C.The model was created but not yet trained; training must complete before describe works.

D.The model was created in a different region (e.g., europe-west4) than the one specified in the command.

AnswerD

Model resources are regional; if created in another region, describe with wrong region fails.

Why this answer

Option D is correct because `gcloud ai models describe` defaults to the `us-central1` region unless overridden with the `--region` flag. If the model was created in a different region (e.g., `europe-west4`), the command will fail with a 'Model not found' error even though the model ID is correct. Vertex AI models are regional resources, so the region must match exactly.

Exam trap

Google Cloud often tests the misconception that Vertex AI models are global resources, but they are actually regional, and candidates forget to specify the `--region` flag or assume the default region matches the model's location.

How to eliminate wrong answers

Option A is wrong because the model was created successfully from a training job that completed without errors, and there is no mention of time-to-live settings being configured; deletion or expiration would typically produce a different error message. Option B is wrong because the error is not about authentication; if the project were wrong, the error would indicate 'Permission denied' or 'Project not found', not 'Model not found'. Option C is wrong because the model was created from a completed training job, meaning training already finished; the `describe` command works on the model resource itself, not on a training state.

Full explanation →

204

Multi-Selecthard

A company uses Cloud Dataproc for large-scale Spark jobs. They notice that some jobs are failing due to insufficient memory on the worker nodes. They want to improve memory management without over-provisioning. Which three configurations should they apply? (Choose 3)

Select 3 answers

A.Set spark.executor.memory to a value that fits within the node memory

B.Enable Spark dynamic allocation

C.Use custom machine types with high memory ratios

D.Use local SSDs for temporary storage

E.Use preemptible worker nodes for volatile tasks

AnswersA, B, C

Prevents out-of-memory errors by ensuring executor memory fits worker capacity.

Why this answer

Option A is correct because setting spark.executor.memory to a value that fits within the node memory ensures that each executor does not exceed the available RAM on a worker node, preventing out-of-memory (OOM) errors. This configuration directly controls the heap size allocated to each executor, and when combined with spark.executor.cores and spark.executor.instances, it allows precise memory budgeting per node. Over-provisioning is avoided by calculating the maximum safe executor memory as (node memory - OS overhead - HDFS cache) / number of executors per node.

Exam trap

Google Cloud often tests the distinction between memory management and storage optimization, so candidates mistakenly choose local SSDs (option D) thinking they help with memory, when in fact they only improve disk I/O for shuffle operations.

Full explanation →

205

MCQmedium

A data science team wants to deploy a model that requires a custom container with specific NVIDIA CUDA version. They build the image and push to Artifact Registry. When deploying to Vertex AI, the model fails to load with an error: 'Failed to start container: invalid ELF header'. What is the most likely cause?

A.The container image was built for a different CPU architecture (e.g., ARM64) than the Vertex AI machine (x86_64)

B.The model file (saved as .pkl) is corrupted

C.The CUDA version in the container is incompatible with the GPU on the machine

D.The container does not have the necessary permissions to access the model file in Cloud Storage

AnswerA

Invalid ELF header indicates the binary is incompatible with the platform architecture.

Why this answer

Option A is correct because the image was built for the wrong architecture (e.g., building on an ARM Mac for a x86 deployment). Option B (CUDA version mismatch) would cause a different error. Option C (container permissions) would cause a permission denied error.

Option D (model file format) would cause loading errors but not container startup failure.

Full explanation →

206

MCQhard

In Cloud Composer, a DAG has two tasks: task_A (runs an Apache Spark job on Dataproc) and task_B (loads data from Cloud Storage to BigQuery). task_B must start after task_A completes. The DAG is scheduled to run hourly. Sometimes task_B starts before task_A finishes because task_A's Dataproc job appears to complete in the Airflow metadata but the data is not yet available. What is the best way to ensure task_B only runs after the data is fully written?

A.Increase the number of retries for task_B

B.Use a sensor after task_A that checks for a specific file in Cloud Storage

C.Use DataprocJobOperator with a job_poll_interval and add a sensor to verify output

D.Change the DAG schedule to run every 30 minutes

AnswerC

DataprocJobOperator can poll the job status, and adding a sensor ensures data is written before proceeding.

Why this answer

Option C is correct because it addresses the root cause: the Dataproc job may report completion in Airflow metadata before the output data is fully written to Cloud Storage. By using DataprocJobOperator with a job_poll_interval, you ensure Airflow waits for the actual job completion on Dataproc, and adding a sensor to verify the output (e.g., checking for a success marker file or expected data in Cloud Storage) guarantees that task_B only starts after the data is fully available. This two-step approach prevents race conditions between job completion and data consistency.

Exam trap

The trap here is that candidates assume a job's completion status in Airflow metadata is sufficient to guarantee data availability, overlooking the eventual consistency of Cloud Storage and the fact that Dataproc job completion and data write finalization are not atomic.

How to eliminate wrong answers

Option A is wrong because increasing retries for task_B does not solve the data availability issue; it only retries a task that may fail repeatedly due to missing data, wasting resources and time. Option B is wrong because using a sensor after task_A that checks for a specific file in Cloud Storage is a partial solution—it does not address the fact that the Dataproc job may not have fully completed, and the file check could succeed before all data is written if the file is created early. Option D is wrong because changing the DAG schedule to run every 30 minutes does not fix the dependency timing; it only increases execution frequency, potentially causing more overlaps and still allowing task_B to start before data is ready.

Full explanation →

207

Matchingmedium

Match each Google Cloud service to its data processing capability.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Unified stream and batch processing (Apache Beam)

Managed Spark and Hadoop clusters

Workflow orchestration (Apache Airflow)

Visual data integration and pipeline builder

Why these pairings

Services for data processing and orchestration.

Full explanation →

208

MCQmedium

A data pipeline reading from Cloud Storage and writing to BigQuery using Dataflow is experiencing high cost. The data is CSV and needs schema inference. What change reduces cost?

A.Use Dataproc instead of Dataflow

B.Use Cloud Functions to transform data

C.Use BigQuery load jobs with schema auto-detection

D.Use BigQuery Data Transfer Service

AnswerC

Load jobs are free for data ingestion (only storage cost) and support auto-detection.

Why this answer

Option C is correct because BigQuery load jobs with schema auto-detection can directly ingest CSV files from Cloud Storage without the need for a Dataflow pipeline, eliminating the compute cost associated with Dataflow. Schema auto-detection infers column names and types from the CSV header and data, matching the requirement for schema inference while being a serverless, no-cost-for-compute operation (you only pay for storage and querying). This reduces cost by removing the Dataflow processing step entirely.

Exam trap

Google Cloud often tests the misconception that any data transformation or schema inference requires a processing framework like Dataflow or Dataproc, when in fact BigQuery's native load jobs with auto-detection can handle many CSV ingestion scenarios at zero compute cost.

How to eliminate wrong answers

Option A is wrong because Dataproc is a managed Spark/Hadoop service that incurs compute costs for cluster VMs, and using it instead of Dataflow would not reduce cost—it would likely increase cost due to cluster overhead and the need to manage schema inference manually. Option B is wrong because Cloud Functions are event-driven compute that would still require processing each CSV file, incurring invocation and execution costs, and they lack native schema inference for BigQuery, requiring custom code that adds complexity and potential cost. Option D is wrong because BigQuery Data Transfer Service is designed for scheduled transfers from sources like Google Ads, Amazon S3, or SaaS applications, not for ad-hoc CSV files in Cloud Storage; it does not support schema auto-detection for arbitrary CSV files and would not replace the need for a pipeline.

Full explanation →

209

Multi-Selecteasy

A company is designing a CI/CD pipeline for their ML models using Cloud Build and Vertex AI. Which TWO practices should they adopt to ensure reliable and reproducible deployments?

Select 2 answers

A.Require manual approval for every model change before deployment

B.Store all model artifacts in a single Cloud Storage bucket without versioning

C.Use immutable container images with version tags for each model deployment

D.Include unit tests for data preprocessing and feature engineering code in the pipeline

E.Deploy every model version directly to production for immediate use

AnswersC, D

Immutable images ensure that the exact same environment is used across all deployments.

Why this answer

Options B and D are correct. B (unit tests for preprocessing) ensures data consistency. D (immutable versioned images) ensures reproducibility.

Option A (manual approval for all changes) slows down CI/CD. Option C (staging endpoint) is good but not required for reproducibility. Option E (single source of truth) is important but not a specific CI/CD practice.

Full explanation →

210

MCQmedium

A team runs a Dataflow streaming pipeline that reads from Pub/Sub, windows events by processing time, and writes to BigQuery. Some late-arriving events are being dropped. The requirement is to include all events that arrive within 10 minutes of the watermark. Which pipeline configuration should be used?

A.Use sliding windows with no allowed lateness

B.Use fixed windows with .withAllowedLateness(Duration.standardMinutes(10))

C.Use fixed windows with withAllowedLateness(Duration.standardSeconds(10))

D.Switch from processing time to event time and use default triggers

AnswerB

Allows late data up to 10 minutes after watermark.

Why this answer

Option B is correct because `withAllowedLateness(Duration.standardMinutes(10))` on a fixed window allows late-arriving events to be included up to 10 minutes after the watermark passes the window's end. This directly meets the requirement to retain events arriving within 10 minutes of the watermark, while still using processing-time windows as specified.

Exam trap

Google Cloud often tests the distinction between processing time and event time, and the exact value of allowed lateness, tricking candidates into choosing a shorter duration or the wrong window type.

How to eliminate wrong answers

Option A is wrong because sliding windows with no allowed lateness will drop all late events, failing the requirement to include events within 10 minutes of the watermark. Option C is wrong because `withAllowedLateness(Duration.standardSeconds(10))` only allows 10 seconds of lateness, not the required 10 minutes. Option D is wrong because switching to event time would change the windowing basis from processing time, which is not requested, and default triggers alone do not provide the explicit 10-minute lateness allowance needed.

Full explanation →

211

MCQhard

An e-commerce company uses Vertex AI to serve a real-time personalization model. The model is updated daily via a retraining pipeline that uploads a new version to the same endpoint. Recently, after a model update, the online prediction responses have been returning anomalous results (e.g., recommending irrelevant products). The previous version performed well. The team suspects that the new model is undercooked or has a bug. They have already checked the training code and the pipeline logs, which show no errors. The pipeline deploys the new model version to the endpoint by updating the traffic split to route 100% of traffic to the new version. Which course of action should the team take to quickly mitigate the issue while diagnosing the root cause? A) Roll back the endpoint to the previous model version by setting traffic split to 0% for the new version. B) Delete the current endpoint and recreate it with the previous model version. C) Tweak the training hyperparameters and retrain immediately. D) Increase the number of replicas on the endpoint to handle load.

A.Tweak the training hyperparameters and retrain immediately.

B.Increase the number of replicas on the endpoint to handle load.

C.Delete the current endpoint and recreate it with the previous model version.

D.Roll back the endpoint to the previous model version by setting traffic split to 0% for the new version.

AnswerD

Rolling back traffic instantly restores previous behavior while allowing debugging of the new version.

Why this answer

Option A is correct because rolling back traffic to the previous known-good version immediately restores correct predictions, while the team investigates the new model. Option B (deleting endpoint) is excessive and causes downtime. Option C (retrain) takes time and may not fix the bug.

Option D (more replicas) does not address the incorrect model output.

Full explanation →

212

MCQhard

A company runs a batch processing job on Dataproc that uses Apache Spark to process 500 GB of data daily. The job completes successfully but takes 4 hours. The team wants to reduce the runtime to under 2 hours without increasing cost. What should they do?

A.Use preemptible VMs for worker nodes and increase the number of workers.

B.Increase the master node's machine type to n2-standard-8.

C.Increase the machine type of worker nodes to n2-highmem-8.

D.Migrate the job to Dataflow with autoscaling enabled.

AnswerA

Preemptible VMs are cheaper, allowing more workers for the same cost, reducing runtime.

Why this answer

Preemptible VMs cost significantly less than standard VMs (about 60-80% discount). By using preemptible VMs for worker nodes, you can increase the number of workers (and thus parallelism) without increasing cost. This directly reduces runtime by distributing the 500 GB workload across more executors, while the cost savings from preemptible VMs offset the additional nodes.

Exam trap

Google Cloud often tests the trade-off between cost and performance by making candidates think that upgrading machine types (more CPU/memory) is the only way to speed up a job, ignoring that preemptible VMs allow scaling out (more nodes) without increasing cost.

How to eliminate wrong answers

Option B is wrong because increasing the master node's machine type (e.g., to n2-standard-8) improves driver capacity but does not accelerate data processing; Spark's bottleneck is typically worker parallelism and memory, not the driver. Option C is wrong because increasing worker node machine type (e.g., to n2-highmem-8) increases cost per node, and without adding more workers, the parallelism remains the same, so runtime may not drop below 2 hours while cost increases. Option D is wrong because migrating to Dataflow does not inherently reduce cost; Dataflow uses different pricing (per second of vCPU/memory) and autoscaling may increase cost if the job requires more resources to meet the 2-hour target, and the question explicitly requires no cost increase.

Full explanation →

213

MCQhard

A company uses Cloud Storage to store IoT sensor data in JSON format. The data is ingested using a Cloud Function triggered by Cloud Storage events. They notice that when many files are uploaded simultaneously, some files are not processed and the Cloud Function logs show 'function execution timeout'. What is the most likely cause and solution?

A.The Cloud Function is not idempotent; implement idempotency.

B.The Cloud Storage event notification is unreliable; switch to Pub/Sub notifications.

C.The Cloud Function has too few instances; increase max instances.

D.The Cloud Function's timeout is too short; increase timeout beyond 540 seconds.

AnswerD

Increasing the timeout allows the function to complete its processing within the allocated time.

Why this answer

The Cloud Function logs explicitly show 'function execution timeout', which indicates the function is exceeding its configured maximum runtime. The default Cloud Functions timeout is 60 seconds, and the maximum is 540 seconds (9 minutes). When many files are uploaded simultaneously, each function invocation may take longer due to increased processing load, causing timeouts.

Increasing the timeout to the maximum of 540 seconds gives the function more time to complete processing, directly addressing the logged error.

Exam trap

Google Cloud often tests the distinction between scaling issues (max instances) and timeout issues, so the trap here is that candidates see 'many files uploaded simultaneously' and incorrectly assume a concurrency/scaling problem, when the logs explicitly point to a timeout.

How to eliminate wrong answers

Option A is wrong because idempotency ensures duplicate events don't cause duplicate processing, but the logs show timeouts, not duplicate processing errors. Option B is wrong because Cloud Storage event notifications are reliable for triggering Cloud Functions; switching to Pub/Sub adds a buffer but does not solve the timeout issue. Option C is wrong because increasing max instances would allow more concurrent invocations, but the problem is that individual invocations are timing out, not that there are too few instances to handle the load.

Full explanation →

214

Multi-Selectmedium

A healthcare company stores patient records as JSON files in Cloud Storage for analysis. They want to design a data lake that enables querying the data with BigQuery while minimizing storage costs and maintaining data security. Which two actions should they take? (Choose two.)

Select 2 answers

A.Partition the data by date and store in separate directories for each partition.

B.Configure object lifecycle management to transition files older than 90 days to Nearline storage.

C.Convert all JSON files to CSV to reduce storage size.

D.Use BigLake to create external tables with row-level security and access delegation.

E.Enable Cloud KMS to encrypt the data with customer-managed encryption keys.

AnswersB, D

Lifecycle policies automatically move data to cheaper storage classes, reducing cost.

Why this answer

Correct answers are A and C. Option A is correct because BigLake allows BigQuery to query Cloud Storage data with fine-grained access control and supports various formats. Option C is correct because object lifecycle management can move old data to colder storage classes (e.g., Nearline, Coldline) to reduce costs.

Option B is incorrect because encryption is already default; Cloud KMS provides additional control but is not a cost-saving measure. Option D is incorrect because CSV is less efficient for nested data; JSON or Parquet is better. Option E is incorrect because partitioning in BigQuery is for managed tables, not for external tables on Cloud Storage (BigLake supports partitioning but not automatic; however, it's not the primary cost or security action).

Full explanation →

215

MCQmedium

Your team runs a weekly batch ETL pipeline using Cloud Dataproc. The pipeline reads raw data from Cloud Storage, transforms it with Apache Spark, and writes results to BigQuery. Recently, the pipeline has been failing with the error 'Out of Memory' during the shuffle phase. The cluster uses standard worker nodes (n1-standard-4). What is the most effective way to resolve this without increasing total cost?

A.Increase the number of Spark partitions by setting spark.sql.shuffle.partitions to a higher value.

B.Increase the number of worker nodes by adding more n1-standard-4 instances.

C.Enable dynamic allocation and use preemptible VMs for some workers.

D.Switch worker nodes to n1-highmem-4 instances to provide more memory.

AnswerA

More partitions mean less data per partition, reducing memory usage per task. This can resolve OOM without added cost.

Why this answer

The 'Out of Memory' error during the shuffle phase indicates that individual executor tasks are processing too much data per partition. Increasing `spark.sql.shuffle.partitions` reduces the amount of data each task handles, lowering memory pressure per executor without adding more nodes or upgrading hardware. This directly addresses the shuffle memory bottleneck while keeping the total cluster cost unchanged.

Exam trap

The trap here is that candidates often assume memory errors must be solved by adding more memory (Option D) or more nodes (Option B), ignoring the cost constraint and the fact that repartitioning can resolve the issue without additional resources.

How to eliminate wrong answers

Option B is wrong because adding more worker nodes increases total cost, which violates the constraint of not increasing cost. Option C is wrong because enabling dynamic allocation and using preemptible VMs does not resolve the per-executor memory shortage; it only changes cluster scaling and cost structure, but the shuffle memory issue persists on the existing nodes. Option D is wrong because switching to n1-highmem-4 instances increases per-node memory but also increases cost per node, raising total cost unless the number of nodes is reduced, which is not specified and may not be feasible without losing parallelism.

Full explanation →

216

MCQhard

You are implementing a data pipeline that reads from Cloud Storage (parquet files), transforms data with Cloud Dataflow, and writes to BigQuery. The pipeline runs on a batch schedule every hour. You notice that the Dataflow job takes 10 minutes, but the overall pipeline latency is 15 minutes due to file availability and scheduling. The business requires latency under 5 minutes. Which change should you make?

A.Switch to streaming pipeline with .watchForNewFiles() and process files as they arrive

B.Batch the hourly data into a single larger hourly run

C.Use a larger machine type for the Dataflow workers

D.Increase the number of workers and use smaller input files

AnswerA

This reduces latency by triggering processing immediately.

Why this answer

The root cause of the latency is file availability and scheduling delay, not the processing time. Switching to a streaming pipeline with `.watchForNewFiles()` (or the equivalent `FileIO.match().continuously()`) allows Dataflow to process files as soon as they arrive in Cloud Storage, eliminating the batch scheduling wait and reducing overall latency to near the processing time.

Exam trap

Google Cloud often tests the distinction between reducing processing time (compute optimization) and reducing scheduling/availability latency (pipeline architecture change), leading candidates to mistakenly choose worker scaling or batching options.

How to eliminate wrong answers

Option B is wrong because batching the hourly data into a single larger run would increase the processing time and does not address the file availability and scheduling delay that cause the 5-minute overhead. Option C is wrong because using a larger machine type for Dataflow workers would only reduce the 10-minute processing time, not the 5-minute scheduling and file availability delay. Option D is wrong because increasing the number of workers and using smaller input files could reduce processing time but does not eliminate the scheduling wait or the delay waiting for files to become available.

Full explanation →

217

Multi-Selecthard

During a Vertex AI training pipeline, the training job fails with an error: 'Out of memory: Killed process'. The model is a large deep learning model using TensorFlow. Which THREE steps should the team take to resolve this issue?

Select 3 answers

A.Change to a distributed training strategy

B.Enable memory growth configuration in TensorFlow

C.Switch the training from GPU to TPU accelerator

D.Reduce the training batch size

E.Use a custom machine type with more memory

AnswersB, D, E

Memory growth allows TensorFlow to allocate memory on demand, avoiding early OOM.

Why this answer

Option B is correct because TensorFlow by default allocates all available GPU memory, which can cause out-of-memory (OOM) errors when other processes or the system itself need memory. Enabling memory growth with `tf.config.experimental.set_memory_growth` allows TensorFlow to allocate memory incrementally, reducing the risk of OOM kills. This is a direct mitigation for the 'Killed process' error caused by memory exhaustion.

Exam trap

Google Cloud often tests the misconception that distributed training automatically solves memory issues, but in reality, it distributes computation, not memory pressure, and can even increase per-node memory usage due to gradient synchronization buffers.

Full explanation →

218

MCQeasy

A team deploys a new version of a Cloud Function. After deployment, error rates increase significantly. What is the most efficient way to diagnose the cause?

A.Deploy a debug version with additional logging.

B.Check Cloud Logging for error stacks and exceptions.

C.Increase the function timeout and retry settings.

D.Immediately rollback to the previous version.

AnswerB

Logs provide immediate insight into the error, allowing targeted debugging.

Why this answer

Cloud Logging captures function execution logs and error stacks; reviewing them is the fastest way to understand the failure.

Full explanation →

219

MCQmedium

A company processes real-time clickstream data from websites. They need to aggregate user sessions that may span multiple hours and handle events that arrive late due to network delays. The pipeline must avoid discarding late data. Which Dataflow feature should they configure?

A.Use fixed windows with a trigger that fires after every element

B.Use session windows with a gap duration and allow late data with a suitable allowed_lateness

C.Use the GlobalWindow with a watermark

D.Use sliding windows with no allowed lateness

AnswerB

Session windows group events within a gap, and allowed_lateness accommodates late arrivals.

Why this answer

Session windows are ideal for aggregating user sessions that span multiple hours, as they group events based on a gap duration of inactivity. By configuring `allowed_lateness`, the pipeline can handle late-arriving events without discarding them, ensuring completeness. This directly addresses the requirement to avoid discarding late data while aggregating sessions.

Exam trap

Google Cloud often tests the distinction between window types and late-data handling; the trap here is that candidates might choose fixed or sliding windows without realizing they lack the session-gap logic needed for variable-length user sessions, or they might overlook the `allowed_lateness` parameter as the key to preserving late data.

How to eliminate wrong answers

Option A is wrong because fixed windows with a trigger after every element would create a new window per event, failing to aggregate sessions that span hours and not handling late data properly. Option C is wrong because GlobalWindow with a watermark is used for global aggregations (e.g., counting all events) but does not naturally group events into sessions based on inactivity gaps; it would require complex triggers and does not inherently support sessionization. Option D is wrong because sliding windows with no allowed lateness would discard any late-arriving events, violating the requirement to avoid discarding late data.

Full explanation →

220

Multi-Selecthard

A company uses Cloud Dataproc for ephemeral clusters to run batch jobs. They want to ensure job reliability and data quality. Which two configuration options should they use? (Choose two.)

Select 2 answers

A.Enable preemptible VMs for cost savings.

B.Use initialization actions for cluster setup.

C.Enable idle timeout to automatically delete clusters.

D.Use custom machine types for better performance.

E.Use graceful decommissioning of workers.

AnswersB, E

Initialization actions guarantee required software and configurations are present, improving job consistency.

Why this answer

Initialization actions enable consistent cluster setup (install libraries, configs), and graceful decommissioning ensures in-progress tasks complete before scaling down, preventing data loss.

Full explanation →

221

MCQeasy

A logistics company uses Cloud Functions to process incoming tracking events from IoT devices. Events are sent via HTTP triggers. During peak hours, some events fail with 500 errors. What is the best strategy to handle this reliably?

A.Implement client-side retry with exponential backoff.

B.Increase the Cloud Functions timeout to 9 minutes and memory to 2GB.

C.Switch to Cloud Tasks and configure retry parameters.

D.Use Cloud Pub/Sub as an intermediary: send events to Pub/Sub and trigger Cloud Functions via Pub/Sub subscription.

AnswerD

Pub/Sub provides buffering and retries.

Why this answer

Option D is correct because Cloud Pub/Sub decouples event ingestion from processing, providing at-least-once delivery and built-in retry with exponential backoff. This ensures that HTTP 500 errors from Cloud Functions are automatically retried without data loss, even during peak loads, and the Pub/Sub subscription can be configured with a dead-letter queue for persistent failures.

Exam trap

Google Cloud often tests the misconception that client-side retry (Option A) or increasing resource limits (Option B) is sufficient for reliability, when the core requirement is decoupling ingestion from processing to handle transient failures and scale independently.

How to eliminate wrong answers

Option A is wrong because client-side retry with exponential backoff shifts the burden to IoT devices, which may be resource-constrained or unreliable, and does not guarantee delivery if the client fails or disconnects. Option B is wrong because increasing timeout and memory does not address the root cause of 500 errors (e.g., transient backend failures or throttling) and can increase costs without improving reliability. Option C is wrong because Cloud Tasks is designed for HTTP target tasks with retries, but it still relies on the Cloud Functions HTTP endpoint, which can fail under load; Cloud Tasks does not provide the same buffering and decoupling as Pub/Sub for event-driven ingestion.

Full explanation →

222

MCQhard

A Dataflow pipeline as described in the exhibit has increasing lag. Which optimization is most likely to reduce the lag?

A.Use FileLoads instead of StreamingInserts for BigQuery output

B.Increase the number of workers

C.Use global windows instead of fixed windows

D.Add additional ParDo transforms

AnswerA

FileLoads (batch loads) are more efficient for high throughput and reduce lag.

Why this answer

The exhibit shows increasing lag in a Dataflow pipeline writing to BigQuery. StreamingInserts (the default) use the BigQuery Storage Write API, which can throttle under high throughput, causing backpressure and lag. Switching to FileLoads writes data to temporary files in Cloud Storage and then loads them into BigQuery via batch load jobs, which decouples the write path from the streaming insert quota and reduces lag by avoiding per-row insert limits.

Exam trap

Google Cloud often tests the misconception that scaling workers or changing windowing fixes all performance issues, but the trap here is that the lag is specifically caused by the BigQuery sink's streaming insert throttling, which requires a sink-level optimization like FileLoads.

How to eliminate wrong answers

Option B is wrong because increasing the number of workers can help with parallel processing but does not address the root cause of lag from BigQuery streaming insert quota exhaustion or throttling; it may even increase the rate of inserts and worsen the problem. Option C is wrong because using global windows instead of fixed windows does not affect the write path to BigQuery; windowing changes how data is grouped for aggregation but does not reduce lag caused by the sink's throughput limitations. Option D is wrong because adding additional ParDo transforms increases the processing steps and can introduce more latency, making the lag worse rather than reducing it.

Full explanation →

223

MCQhard

You have two versions of a classification model (v1 and v2) deployed on a Vertex AI Endpoint. You want to gradually roll out v2 to 10% of traffic, monitor performance, and if metrics are better, increase traffic to 100%. You have set up model monitoring for skew and drift. Which configuration should you use?

A.Use the Vertex AI Endpoint 'traffic_split' parameter to assign 10% of traffic to v2 and 90% to v1.

B.Deploy v2 to a separate endpoint and use a load balancer to route 10% of traffic.

C.Create a new deployment with v2 on the same endpoint and set the 'min_replica_count' to 1 for both versions.

D.Enable Vertex AI Model Monitoring on the endpoint and set up alerting for performance drop.

AnswerA

Traffic splitting is the standard method for canary deployments.

Why this answer

The Vertex AI Endpoint 'traffic_split' parameter allows you to direct a percentage of inference requests to different model versions deployed on the same endpoint. Setting 10% to v2 and 90% to v1 enables a gradual rollout while monitoring skew and drift, and you can adjust the split as needed. This is the native, supported method for canary deployments in Vertex AI, avoiding the complexity and latency of external load balancers.

Exam trap

The trap here is that candidates confuse infrastructure-level load balancing (Option B) with Vertex AI's built-in traffic splitting, or think that replica counts (Option C) control traffic distribution, when in fact traffic_split is the only parameter that directly controls request routing percentages.

How to eliminate wrong answers

Option B is wrong because deploying v2 to a separate endpoint and using an external load balancer adds unnecessary complexity, latency, and cost; Vertex AI Endpoints natively support traffic splitting without additional infrastructure. Option C is wrong because setting 'min_replica_count' to 1 for both versions does not control traffic distribution; it only ensures minimum instance availability, not the percentage of requests routed to each model. Option D is wrong because enabling Model Monitoring and alerting for performance drop is a monitoring step, not a configuration for traffic splitting; it does not direct 10% of traffic to v2.

Full explanation →

224

MCQhard

A media company uses Cloud Dataflow to process video metadata from a Pub/Sub stream. The pipeline enriches metadata using a lookup table stored in Cloud Bigtable. Recently, they noticed increased latency and occasional 'Bigtable operation timeout' errors. The Bigtable instance has 3 nodes and the data is highly distributed. The Dataflow pipeline uses default settings. What is the most likely cause of the timeouts?

A.The Bigtable table uses a single column family with over 100 columns, leading to high read overhead

B.The Dataflow pipeline uses a large batch size for Bigtable reads, overwhelming the instance

C.The Bigtable cluster has too few nodes for the read throughput

D.The Dataflow pipeline does not cache Bigtable results, causing repeated lookups

AnswerA

Wide column families cause inefficient reads in Bigtable.

Why this answer

A single column family with over 100 columns in Bigtable forces the system to read all column qualifiers for each row, even if only a few are needed. This increases read overhead and latency, and can trigger 'operation timeout' errors when the Dataflow pipeline's default settings (which do not limit column qualifiers) request the entire row. The highly distributed data and 3-node cluster exacerbate the issue, but the root cause is the excessive column count within one family.

Exam trap

Google Cloud often tests the misconception that Bigtable timeouts are always due to insufficient nodes or throughput, when in fact the root cause can be inefficient schema design like a single column family with too many columns.

How to eliminate wrong answers

Option B is wrong because Dataflow's default batch size for Bigtable reads is conservative (typically 1–10 rows per RPC), not large; a large batch size would actually reduce overhead, not cause timeouts. Option C is wrong because 3 nodes for a highly distributed dataset is generally sufficient for moderate throughput; the timeouts are due to per-row read overhead, not node count. Option D is wrong because caching Bigtable results would not help with timeouts caused by reading too many columns per row; caching reduces repeated lookups but does not address the fundamental read amplification from a wide column family.

Full explanation →

225

MCQeasy

A company deploys a machine learning model on Vertex AI for online predictions. The model experiences intermittent spikes in traffic, causing latency increases. Which strategy should the company use to ensure consistent low latency during traffic spikes?

A.Enable autoscaling on the Vertex AI endpoint with appropriate min and max nodes.

B.Manually scale the deployed model to a larger machine type during peak hours.

C.Reduce the number of prediction nodes to minimize overhead.

D.Switch to batch prediction to handle all requests asynchronously.

AnswerA

Autoscaling automatically adjusts the number of nodes based on traffic, ensuring low latency during spikes while controlling cost.

Why this answer

Vertex AI endpoints support autoscaling, which dynamically adjusts the number of prediction nodes based on incoming traffic. By setting appropriate min and max nodes, the endpoint can scale up during traffic spikes to maintain low latency and scale down during low traffic to reduce costs. This ensures consistent performance without manual intervention.

Exam trap

Google Cloud often tests the misconception that manual scaling or switching to batch prediction is a valid solution for real-time latency spikes, when in fact autoscaling is the only automated, cost-effective method for handling intermittent traffic on Vertex AI endpoints.

How to eliminate wrong answers

Option B is wrong because manually scaling to a larger machine type during peak hours is reactive, not proactive, and cannot respond instantly to intermittent spikes; it also incurs higher costs during all peak hours rather than scaling only when needed. Option C is wrong because reducing the number of prediction nodes would decrease capacity, worsening latency during traffic spikes rather than improving it. Option D is wrong because batch prediction is designed for asynchronous, offline processing of large datasets and does not provide real-time, low-latency responses required for online predictions.

Full explanation →

Google Professional Data Engineer (PDE) — Questions 151–225