Google Professional Data Engineer PDE Questions 376–450 | Page 6/7

376

MCQhard

A financial services company operates a real-time fraud detection pipeline using Apache Beam running on Google Cloud Dataflow. The pipeline reads transactions from Pub/Sub, enriches them with customer data from Bigtable, runs a machine learning model with side inputs from a Redis cluster, and writes results to BigQuery for downstream reporting. The data must be processed with exactly-once semantics to avoid duplicate fraud alerts or missing transactions. The pipeline currently uses a global window with 5-minute accumulation, but the team is experiencing high latency and occasional duplicates when the model side input is updated (triggered every 15 minutes via a WatchTransform). Additionally, the pipeline has a dead letter queue that outputs failed records to a separate Pub/Sub topic, but these records are never reprocessed. The team needs to ensure high reliability and data quality. Which course of action should the team take to improve solution quality?

A.Use fixed windows with a 10-minute duration and session gap of 2 minutes, disable side input caching, and log all dead letter records to Cloud Storage for manual inspection.

B.Switch to a batch processing approach that runs every minute using Cloud Composer, with data loaded from Pub/Sub into BigQuery and then processed with Dataproc to run the model.

C.Implement sliding windows of 5 minutes with a 2-minute allowed lateness, use side inputs with periodic refreshes using the .withUpdateFrequency transformation, and set up a Cloud Function to automatically replay dead letter records back to the main Pub/Sub topic after fixing the issue.

D.Keep the global window but use a custom trigger with early firings every 30 seconds and a late-firing threshold of 1 minute, and configure the side input to be broadcast every 5 minutes using a Read transform.

AnswerC

Sliding windows with allowed lateness handle late data without blocking, periodic side input refreshes reduce latency, and automatic replay of dead letters ensures data quality.

Why this answer

Option B is correct because switching to a sliding window with allowed lateness ensures that late-arriving transactions are captured without blocking the window, and using side inputs with periodic refreshes (e.g., .withUpdateFrequency) reduces latency from model updates. Adding a system to reprocess dead letter records (e.g., via a Cloud Function that replays to the main topic) ensures data completeness. Option A is incorrect because fixed windows with session gaps do not help with side input latency and may cause data loss.

Option C is incorrect because GlobalWindow with triggers can cause duplicates if not configured carefully; defaults may not achieve exactly-once. Option D is incorrect because it focuses on batching, which is not suitable for real-time detection and introduces latency.

Full explanation →

377

MCQmedium

A retail company needs to generate product recommendations for millions of users every few hours. The model is a small scikit-learn model. Which prediction method should be used to minimize infrastructure cost while meeting the latency requirements?

A.Use Cloud Run to host the model and invoke it for each user request.

B.Export the model as a container and run on Google Kubernetes Engine with cluster autoscaling.

C.Deploy the model to a Vertex AI endpoint with a single replica for online predictions.

D.Use a Vertex AI batch prediction job that reads from BigQuery and writes results back to BigQuery or Cloud Storage.

AnswerD

Batch prediction is designed for such use cases and is cost-efficient for large datasets processed periodically.

Why this answer

Option D is correct because batch prediction is the most cost-effective approach for generating recommendations for millions of users every few hours. Vertex AI batch prediction jobs process large datasets in parallel without maintaining always-on infrastructure, and they can read from BigQuery and write results directly to BigQuery or Cloud Storage, minimizing compute costs while meeting the latency requirement of 'every few hours' (not real-time).

Exam trap

Google Cloud often tests the distinction between online (real-time) and batch (asynchronous) prediction patterns, and the trap here is that candidates assume 'predictions' always require a live endpoint, overlooking that batch jobs are the correct choice when latency requirements are in hours and the workload is massive and periodic.

How to eliminate wrong answers

Option A is wrong because Cloud Run invokes the model per user request, which would require millions of individual invocations every few hours, leading to high request-based costs and potential cold-start latency issues that are unnecessary for a batch workload. Option B is wrong because Google Kubernetes Engine with cluster autoscaling is overkill for a small scikit-learn model and introduces cluster management overhead and always-on node costs, even with autoscaling, making it more expensive than a serverless batch solution. Option C is wrong because a Vertex AI endpoint with a single replica is designed for online (real-time) predictions, which would be idle most of the time between the batch windows, incurring continuous compute costs for a single replica that is not needed for a scheduled batch job.

Full explanation →

378

MCQeasy

A company runs batch jobs on Dataproc. They need to ensure that if a job fails, it automatically retries with exponential backoff. What is the recommended approach?

A.Schedule a cron job to check job status and restart manually.

B.Use a Cloud Function triggered by Stackdriver alerts to restart the job.

C.Use Dataproc Workflow Templates with the maxAttempts parameter set to 3.

D.Create a Cloud Composer DAG that monitors job status and retries on failure.

AnswerC

Workflow Templates natively support retries with configurable backoff, making it the simplest and most robust solution.

Why this answer

Option C is correct because Dataproc Workflow Templates support configuring maxAttempts and retry policy in the template, enabling automatic retries with exponential backoff. Option A (Composer) is overkill for simple retry. Option B (cron job) would need custom logic.

Option D (Cloud Functions) also requires custom implementation.

Full explanation →

379

MCQeasy

A team is setting up a Dataflow pipeline for a time-sensitive ETL job that must complete within a specific time window. Which monitoring metric should they use to determine if the pipeline is on track to finish on time?

A.The number of failed elements and retries.

B.The system lag metric, which measures the time between event occurrence and processing.

C.The number of elements processed in the current window.

D.The job's estimated time to completion shown in the Dataflow monitoring interface.

AnswerD

This metric directly estimates remaining time based on throughput.

Why this answer

Option D is correct because the Dataflow monitoring interface provides an estimated time to completion for the pipeline, which is the most direct metric for determining if the job will finish within the required time window. This estimate is calculated based on current throughput, backlog, and resource utilization, making it the appropriate choice for time-sensitive ETL jobs. Other metrics like system lag or element counts do not directly predict job completion time.

Exam trap

Google Cloud often tests the distinction between metrics that measure current performance (like system lag or element count) versus metrics that predict future completion (like estimated time to completion), leading candidates to pick a metric that sounds relevant but does not answer the specific question about finishing on time.

How to eliminate wrong answers

Option A is wrong because the number of failed elements and retries indicates data quality or processing errors, not the pipeline's progress toward completion within a time window. Option B is wrong because system lag measures the delay between event occurrence and processing, which is useful for streaming latency but does not provide an estimated finish time for a batch or bounded pipeline. Option C is wrong because the number of elements processed in the current window shows throughput but not whether the remaining workload can be completed before the deadline, as it ignores the backlog and processing rate.

Full explanation →

380

MCQhard

The exhibit shows a Spark job submitted to Dataproc that fails with an out-of-memory error. Which change should be made to the submission command to resolve the issue?

A.Use a different Spark example class.

B.Increase the number of worker nodes in the cluster.

C.Add --properties spark.executor.memory=8g to the command.

D.Add --driver-memory 8g to the command.

AnswerC

Increases executor heap space.

Why this answer

The out-of-memory error indicates that the Spark executors do not have enough memory to process the data. Adding `--properties spark.executor.memory=8g` increases the memory allocated to each executor, directly addressing the root cause. This property overrides the default executor memory (typically 1g or 4g depending on the cluster configuration) and is the standard way to tune executor memory in Spark on Dataproc.

Exam trap

Google Cloud often tests the distinction between driver memory and executor memory, and candidates mistakenly choose `--driver-memory` because they confuse the driver's role with the executors' memory needs, or they assume that increasing cluster size (more nodes) automatically increases per-executor memory.

How to eliminate wrong answers

Option A is wrong because changing the Spark example class does not affect memory allocation; the error is a resource exhaustion issue, not a logic or classpath problem. Option B is wrong because increasing the number of worker nodes distributes the workload across more machines but does not increase the memory per executor; the existing executors would still run out of memory if the data partitions are too large. Option D is wrong because `--driver-memory` controls the memory of the Spark driver process, not the executors; the out-of-memory error occurs in the executors (task execution), not in the driver (which handles scheduling and results collection).

Full explanation →

381

MCQhard

A financial services company uses Vertex AI to serve a fraud detection model. The model was trained on historical data that is updated daily. The team wants to automate retraining when data drift is detected. Which approach best operationalizes this requirement with minimal manual intervention?

A.Use Cloud Monitoring alerts on prediction latency to trigger a retraining pipeline.

B.Manually monitor model performance metrics in Vertex AI Experiments and retrain when accuracy drops.

C.Use scheduled Vertex AI Pipelines to retrain the model every night, then deploy automatically.

D.Enable Vertex AI Model Monitoring for feature drift and skew, then create a Cloud Function that triggers a Vertex AI Pipeline to retrain and deploy the model after validation.

AnswerD

This automates detection of data drift, triggers retraining only when needed, and includes validation before deployment.

Why this answer

Option D is correct because it uses Vertex AI Model Monitoring to automatically detect feature drift or skew, then triggers a Cloud Function that invokes a Vertex AI Pipeline to retrain and redeploy the model after validation. This approach minimizes manual intervention by automating both the detection of data drift and the subsequent retraining and deployment lifecycle.

Exam trap

Google Cloud often tests the distinction between scheduled retraining (Option C) and event-driven retraining triggered by actual drift detection (Option D), where candidates mistakenly choose the simpler scheduled approach without recognizing that it ignores the requirement to retrain only when drift is detected.

How to eliminate wrong answers

Option A is wrong because prediction latency is unrelated to data drift; monitoring latency only detects performance issues, not changes in data distribution. Option B is wrong because manually monitoring metrics in Vertex AI Experiments requires human intervention and does not automate retraining, contradicting the requirement for minimal manual intervention. Option C is wrong because scheduled nightly retraining ignores whether data drift has actually occurred, leading to unnecessary retraining and potential deployment of models that are not improved, and it does not use drift detection as the trigger.

Full explanation →

382

MCQmedium

A data engineer is responsible for a batch ETL pipeline that runs daily using Cloud Composer and Dataproc. The pipeline extracts data from Cloud SQL, transforms it with Spark, and loads to BigQuery. Last night, the pipeline failed because the Spark job ran out of memory. The team needs a solution that prevents future failures without manual intervention. Options: A. Use a larger machine type for Dataproc. B. Enable Dataproc autoscaling and configure memory-based scaling. C. Split the Spark job into multiple stages. D. Use Cloud Functions to retry the job.

A.Enable Dataproc autoscaling and configure memory-based scaling

B.Use Cloud Functions to retry the job

C.Use a larger machine type for Dataproc

D.Split the Spark job into multiple stages

AnswerA

Autoscaling adjusts cluster size based on memory usage, preventing OOM.

Why this answer

Option A is correct because Dataproc autoscaling with memory-based scaling dynamically adjusts the cluster size based on the memory utilization of running jobs. This prevents out-of-memory failures by automatically adding worker nodes when memory pressure increases, without requiring manual intervention or pre-provisioning oversized clusters. It directly addresses the root cause—insufficient memory during peak processing—while maintaining cost efficiency.

Exam trap

Google Cloud often tests the misconception that retrying a failed job or manually resizing resources is a sufficient solution, when in fact dynamic, automated scaling is required to handle variable workloads without manual intervention.

How to eliminate wrong answers

Option B is wrong because retrying the failed job with Cloud Functions does not fix the underlying memory issue; the job will simply fail again on retry if the same memory constraints persist. Option C is wrong because using a larger machine type is a static, manual fix that may waste resources during normal operation and still fail if future data volumes exceed the chosen machine's capacity. Option D is wrong because splitting the Spark job into multiple stages does not inherently reduce memory usage per stage; it only reorganizes execution steps and may even increase overhead without addressing memory pressure.

Full explanation →

383

MCQeasy

Your organization has a data lake on Cloud Storage with millions of small files (average 10 KB). You need to build a batch processing pipeline using Cloud Dataproc that runs a Spark job to transform the data and output results to BigQuery. The pipeline currently takes 4 hours to run because Spark spends a large amount of time listing files and managing tasks. You want to reduce the run time without changing the cluster size. Which action should you take?

A.Convert the input files from CSV to Parquet format

B.Use Spark coalesce to reduce the number of output partitions

C.Increase the number of Spark partitions to process more files in parallel

D.Enable the Spark Dynamic Resource Allocation and combine small files using a separate job before the main transformation

AnswerD

Combining files reduces task count and listing overhead.

Why this answer

Option D is correct because the primary bottleneck is the overhead of listing millions of small files and managing many Spark tasks. By combining small files into larger ones using a separate job before the main transformation, you reduce the number of files Spark must list and the number of tasks required, which directly cuts the 4-hour runtime. Enabling Spark Dynamic Resource Allocation ensures resources are used efficiently during this preprocessing step without changing the cluster size.

Exam trap

The trap here is that candidates focus on data format or partitioning tuning (A, B, C) instead of recognizing that the root cause is the sheer number of small files causing excessive file listing and task overhead, which requires a preprocessing step to consolidate files.

How to eliminate wrong answers

Option A is wrong because converting CSV to Parquet improves read performance and compression but does not address the overhead of listing millions of small files or the task management cost; the bottleneck is file count, not format. Option B is wrong because using Spark coalesce reduces the number of output partitions, which only affects the write phase to BigQuery and does nothing to reduce the input file listing or task scheduling overhead. Option C is wrong because increasing the number of Spark partitions would create even more tasks, exacerbating the overhead from managing millions of small files and likely increasing runtime, not reducing it.

Full explanation →

384

MCQhard

A Dataflow streaming pipeline reads from Pub/Sub, applies a ParDo that uses a side input from a BigQuery table (refreshed hourly), and writes to BigQuery. The side input is large and causes increased latency and worker OOM errors. Which design change solves this?

A.Use a stateful ParDo and store the lookup data in an external cache like Cloud Bigtable, performing lookups per element.

B.Increase the side input broadcast frequency to update more often.

C.Split the pipeline into two: one to load the side input, the other to process main input.

D.Use smaller worker machine types to distribute memory across more workers.

AnswerA

External cache reduces per-worker memory footprint and scales well.

Why this answer

Option A is correct because moving the large lookup data to an external cache like Cloud Bigtable offloads memory pressure from workers, eliminating OOM errors. The side input broadcast approach keeps the entire dataset in each worker's memory, which causes OOM when the data is large. Using an external cache allows per-element lookups without storing the entire dataset in memory, reducing latency by avoiding broadcast overhead.

Exam trap

Google Cloud often tests the misconception that increasing resources (like worker size or frequency) solves memory issues, when the real solution is to avoid storing large datasets in memory altogether by using an external lookup service.

How to eliminate wrong answers

Option B is wrong because increasing the broadcast frequency would make the OOM and latency problems worse, as it would reload the large dataset into memory more often without reducing memory footprint. Option C is wrong because splitting the pipeline into two pipelines does not solve the fundamental issue of storing the large side input in memory; the side input would still need to be broadcast or cached, and the two pipelines would require coordination, adding complexity without addressing memory pressure. Option D is wrong because using smaller worker machine types reduces available memory per worker, which would exacerbate OOM errors and increase latency due to more frequent garbage collection and slower processing.

Full explanation →

385

MCQmedium

A team is designing an event-driven data pipeline. They need to process messages from Cloud Pub/Sub, transform them, and write to BigQuery. The messages have variable volume and spikes. What is the best serverless compute option for this workload?

A.Cloud Functions triggered by Pub/Sub

B.Compute Engine with a Pub/Sub client library

C.Cloud Run invoked via Eventarc

D.Cloud Dataflow with a streaming pipeline

AnswerD

Dataflow can handle variable volume, autoscale, and directly read from Pub/Sub and write to BigQuery.

Why this answer

Cloud Dataflow with a streaming pipeline is the best serverless compute option because it is purpose-built for unbounded, variable-volume data streams from Pub/Sub and provides exactly-once processing semantics, auto-scaling, and built-in BigQuery sink integration via the Beam SDK. Unlike simpler compute options, Dataflow handles backpressure, windowing, and state management natively, making it ideal for spikes and high-throughput transformations without manual scaling or idempotency concerns.

Exam trap

Google Cloud often tests the misconception that any serverless compute (like Cloud Functions or Cloud Run) can handle streaming data pipelines, but the trap here is that these services lack native support for unbounded data, stateful processing, and automatic scaling under variable volume, which only Dataflow provides as a fully managed stream processor.

How to eliminate wrong answers

Option A is wrong because Cloud Functions triggered by Pub/Sub is designed for lightweight, short-lived event processing (max 9 minutes timeout) and cannot handle sustained high-throughput streaming transformations or complex stateful operations like windowing and joins, leading to data loss or timeouts under spikes. Option B is wrong because Compute Engine with a Pub/Sub client library is not serverless—it requires manual provisioning, scaling, and management of VMs, and it lacks native integration with BigQuery for streaming writes, adding operational overhead. Option C is wrong because Cloud Run invoked via Eventarc is a request-response compute model with a 60-minute timeout and concurrency limits; it does not natively support unbounded streaming, checkpointing, or exactly-once processing for Pub/Sub messages, making it unsuitable for variable-volume data pipelines.

Full explanation →

386

MCQhard

Based on the exhibit, what is the most likely cause of duplicate rows despite using the same event_id as insertId?

A.BigQuery's streaming buffer deduplication is best-effort and may not catch duplicates within a short time window.

B.The Dataflow pipeline is retrying inserts due to network errors, and the same event_id is not being used in retries.

C.The pipeline is writing more than 100,000 rows per second, exceeding BigQuery's streaming quota.

D.The table is partitioned by timestamp, so BigQuery cannot deduplicate across partitions.

AnswerA

Duplicate inserts within milliseconds can bypass dedup due to coarseness.

Why this answer

BigQuery's streaming buffer uses best-effort deduplication based on the `insertId` field. When multiple rows are inserted with the same `event_id` mapped to `insertId` within a short time window (typically up to a few minutes), the deduplication mechanism may fail to remove all duplicates, especially under high throughput or network retries. This is a documented limitation of BigQuery streaming, not a guarantee of exactly-once semantics.

Exam trap

Google Cloud often tests the misconception that BigQuery's streaming deduplication is a strong guarantee, when in fact it is best-effort and can fail under concurrent writes or short time windows.

How to eliminate wrong answers

Option B is wrong because if the same `event_id` is not used in retries, BigQuery would treat them as distinct rows and not deduplicate, but the question states the same `event_id` is used as `insertId`; the issue is that deduplication is best-effort, not that the ID is missing. Option C is wrong because exceeding the streaming quota (default 100,000 rows per second per table) would cause ingestion errors or throttling, not duplicate rows; duplicates arise from the buffer's deduplication behavior, not quota limits. Option D is wrong because BigQuery can deduplicate across partitions within the streaming buffer; partitioning does not disable deduplication, and duplicates can occur even in a single partition due to the buffer's best-effort nature.

Full explanation →

387

MCQhard

A team deploys a Cloud Run service that processes user-uploaded files. Some requests time out after 60 minutes. They need to handle large files reliably without losing tasks. What is the best solution?

A.Containerize the processing logic and trigger it via Cloud Tasks.

B.Increase the request timeout to 3600 seconds.

C.Use Cloud Functions instead of Cloud Run.

D.Split the file into chunks and process them concurrently.

AnswerA

Cloud Tasks decouples the request, provides retry, and can handle long-running operations without timeout limits.

Why this answer

Cloud Run has a maximum request timeout of 60 minutes. Offloading processing to a Cloud Task decouples the request from the processing, allowing async handling with retries.

Full explanation →

388

MCQmedium

Your team is responsible for operationalizing a series of machine learning models that are trained and deployed using Vertex AI Pipelines. The pipeline consists of several steps including data preprocessing, training with hyperparameter tuning, model evaluation, and deployment to an endpoint. Recently, the pipeline has been failing intermittently at the model evaluation step with an error indicating insufficient memory. The evaluation step uses a custom container with a memory limit of 4 GB. The training step uses 8 GB and completes successfully. You need to resolve the failure without drastically increasing costs. What should you do?

A.Increase the memory limit for the evaluation custom container to 8 GB to match the training step.

B.Optimize the evaluation code to use streaming or incremental processing to reduce peak memory usage.

C.Reduce the batch size used in the evaluation step to lower memory consumption.

D.Use a smaller machine type for the evaluation step to force lower memory usage.

AnswerB

Optimizing the code is a cost-effective long-term solution that addresses the root cause.

Full explanation →

389

Multi-Selecthard

Which THREE practices are recommended when designing a Cloud Data Fusion pipeline to ensure efficient execution and monitoring? (Choose three.)

Select 3 answers

A.Manually partition input files to control parallelism.

B.Limit the memory and disk usage per stage to avoid Dataproc node resource exhaustion.

C.Use a dedicated Dataproc cluster for each production pipeline to avoid resource contention.

D.Schedule pipeline runs using Cloud Scheduler and Pub/Sub triggers to avoid manual starts.

E.Set up custom metrics and alerts for pipeline backpressure and latency.

AnswersB, C, E

Resource limits prevent OOM errors and improve stability.

Why this answer

Option B is correct because Cloud Data Fusion pipelines run on Dataproc clusters, and limiting memory and disk usage per stage prevents resource exhaustion on worker nodes. This ensures that no single stage consumes all available resources, which could cause the pipeline to fail or degrade performance. Proper resource limits help maintain stable execution and avoid out-of-memory errors.

Exam trap

Google Cloud often tests the misconception that manual partitioning (Option A) gives better control, but Cloud Data Fusion's auto-partitioning is more efficient and recommended; candidates may also overlook that scheduling (Option D) is about automation, not execution efficiency or monitoring.

Full explanation →

390

MCQmedium

A data pipeline uses Cloud Composer to orchestrate Dataflow and BigQuery jobs. The pipeline fails intermittently with dependency errors. Which design change can improve reliability?

A.Use retries with exponential backoff

B.Switch to Cloud Functions for orchestration

C.Increase worker count in Dataflow

D.Use a simpler DAG with fewer dependencies

AnswerA

Retries with backoff handle transient failures, improving reliability.

Why this answer

Cloud Composer (Apache Airflow) tasks can fail due to transient issues like API rate limits or resource contention. Implementing retries with exponential backoff allows the DAG to automatically re-attempt failed tasks with increasing delays, reducing the impact of intermittent failures without manual intervention. This is a standard Airflow pattern for improving reliability in orchestrated pipelines.

Exam trap

Google Cloud often tests the distinction between scaling compute resources (Dataflow workers) and improving orchestration reliability (retries), leading candidates to mistakenly choose option C when the problem is transient task failures, not resource bottlenecks.

How to eliminate wrong answers

Option B is wrong because Cloud Functions is a serverless compute service, not a workflow orchestrator; it lacks built-in support for managing task dependencies, retries, and scheduling across multiple services like Dataflow and BigQuery. Option C is wrong because increasing the Dataflow worker count addresses throughput and latency, not dependency errors in the orchestration layer; dependency errors stem from task sequencing or transient failures in Airflow, not from Dataflow parallelism. Option D is wrong because simplifying the DAG reduces complexity but does not handle intermittent failures; the core issue is transient errors, not the number of dependencies, and removing dependencies may break business logic.

Full explanation →

391

MCQhard

A financial institution needs to deploy a TensorFlow model for fraud detection with strict latency requirements (<100ms). The model uses custom ops that are not available in standard TF Serving. What is the most appropriate serving solution?

A.Export the model as a SavedModel and serve on Vertex AI Prediction

B.Use Cloud Run with a custom container that includes the model and pre-loads the library

C.Use NVIDIA Triton Inference Server with a custom backend

D.Package the model with Docker using TF Serving and add custom ops via TensorFlow's custom op registration

AnswerC

NVIDIA Triton supports custom backends and is designed for high-performance inference with low latency.

Why this answer

Option C is correct because NVIDIA Triton Inference Server supports custom backends written in C++ or Python, allowing the integration of custom ops that are not available in standard TensorFlow Serving. This enables the model to meet strict latency requirements (<100ms) by leveraging GPU acceleration and optimized inference pipelines, while avoiding the limitations of TF Serving's fixed op registry.

Exam trap

The trap here is that candidates assume TF Serving's custom op registration (Option D) is straightforward, but Cisco tests the understanding that TF Serving does not support dynamic loading of custom ops without a custom build, making Triton's backend architecture the correct choice for production-grade latency requirements.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction relies on standard TF Serving or custom containers, but exporting as a SavedModel does not automatically include custom ops; Vertex AI would fail to load the model if the custom ops are not registered in its runtime. Option B is wrong because Cloud Run with a custom container can serve the model, but it lacks the specialized inference optimization features (e.g., dynamic batching, model concurrency) needed to guarantee <100ms latency under load, and it does not natively support custom backends for ops. Option D is wrong because TF Serving's custom op registration requires recompiling TF Serving from source with the custom ops linked, which is complex and not supported via standard Docker images; even if done, TF Serving's architecture is less flexible than Triton's custom backend for handling non-standard ops efficiently.

Full explanation →

392

MCQhard

A Dataflow pipeline reads from Cloud Pub/Sub and writes to Cloud Storage. The pipeline needs to guarantee exactly-once processing despite worker failures. Which configuration ensures exactly-once semantics?

A.Use a side input from a deduplication dataset

B.Set the pipeline to use a global window with no early triggers

C.Insert a Reshuffle transform after reading

D.Enable exactly-once delivery on the Pub/Sub subscription and use an idempotent sink

AnswerD

Pub/Sub exactly-once delivery and an idempotent Storage write (e.g., using file naming) ensure no duplicates.

Why this answer

Option D is correct because Pub/Sub subscriptions can be configured with exactly-once delivery (using the `enableExactlyOnceDelivery` flag), which ensures that each message is delivered to the subscriber exactly once. Combining this with an idempotent sink (e.g., Cloud Storage with unique filenames or deduplication logic) guarantees that even if a worker fails and the pipeline retries, the output will not contain duplicates. This is the only option that directly addresses both the source and sink to achieve end-to-end exactly-once semantics.

Exam trap

Google Cloud often tests the misconception that a single transform (like Reshuffle) or windowing strategy can guarantee exactly-once processing, when in reality it requires both source-level exactly-once delivery and an idempotent sink to handle retries from worker failures.

How to eliminate wrong answers

Option A is wrong because using a side input from a deduplication dataset does not prevent duplicate processing at the source; it only attempts to deduplicate after the fact, which is not a guarantee of exactly-once processing and adds complexity and latency. Option B is wrong because a global window with no early triggers controls when results are emitted, but it does not prevent duplicate messages from being processed due to worker failures or retries. Option C is wrong because a Reshuffle transform (which inserts a GroupByKey and an UngroupByKey) can help with fault tolerance by breaking fusion, but it does not provide exactly-once semantics; it only ensures that elements are redistributed, not that duplicates are eliminated.

Full explanation →

393

MCQeasy

A data engineer needs to design a data processing system that ingests large volumes of sensor data from IoT devices. The data should be stored in a schema-less format and allow for real-time analytics. Which Google Cloud service is most appropriate?

A.Cloud Spanner

B.Firestore

C.Cloud Bigtable

D.Cloud SQL

AnswerC

Bigtable is schema-less, highly scalable, and ideal for time-series sensor data.

Why this answer

Cloud Bigtable is the most appropriate choice because it is a fully managed, scalable NoSQL database designed for large-scale analytical and operational workloads. It supports schema-less storage of time-series sensor data and integrates with real-time analytics tools like BigQuery and Dataflow via the HBase API, meeting the requirements for high-throughput ingestion and low-latency queries.

Exam trap

The trap here is that candidates often confuse Cloud Bigtable with Firestore or Cloud SQL because they all offer NoSQL or relational storage, but fail to recognize that Bigtable is purpose-built for high-throughput, schema-less time-series data and real-time analytics, while the others are optimized for transactional or mobile workloads.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database that enforces a fixed schema, making it unsuitable for schema-less IoT data and overkill for real-time analytics at scale. Option B is wrong because Firestore is a document-oriented NoSQL database optimized for mobile and web app real-time synchronization, not for high-throughput ingestion of large volumes of sensor data or analytical workloads. Option D is wrong because Cloud SQL is a managed relational database service (MySQL, PostgreSQL, SQL Server) that requires a predefined schema and cannot handle the petabyte-scale, high-write throughput demands of IoT sensor data without significant performance degradation.

Full explanation →

394

MCQeasy

You are designing a streaming Dataflow pipeline that reads from Cloud Pub/Sub. Some data may arrive late due to network delays. You need to ensure that late-arriving data is still processed, but after a certain point, it should be discarded to avoid unbounded state. What is the best practice?

A.Switch to a batch pipeline

B.Use fixed windows without allowed lateness

C.Discard all late-arriving data

D.Set a watermark and allowed lateness

AnswerD

Allowed lateness enables processing of late data within a configurable period, balancing completeness and latency.

Why this answer

Option D is correct because in streaming Dataflow pipelines, setting a watermark and allowed lateness provides a mechanism to handle late-arriving data from Pub/Sub without unbounded state growth. The watermark defines the point after which data is considered late, and allowed lateness specifies how long to wait for late data before discarding it, balancing completeness and state management.

Exam trap

The trap here is that candidates often confuse 'allowed lateness' with simply discarding late data, failing to recognize that it provides a controlled buffer for late arrivals while still bounding state growth.

How to eliminate wrong answers

Option A is wrong because switching to a batch pipeline would lose the streaming, low-latency processing requirement and cannot handle late-arriving data in real time. Option B is wrong because fixed windows without allowed lateness would immediately discard any data arriving after the window end, even if it is only slightly delayed, leading to data loss. Option C is wrong because discarding all late-arriving data is too aggressive and ignores the need to process data that arrives within a reasonable delay, which is common in distributed systems like Pub/Sub.

Full explanation →

395

MCQeasy

A data engineer is running a Dataproc cluster for a batch ETL job that needs to process 10 TB of data. The job is memory-intensive. The cluster currently uses n1-standard-4 workers. Performance is poor. What is the most cost-effective change to improve performance?

A.Use high-memory machine types (n1-highmem-4)

B.Use preemptible workers to reduce cost

C.Switch to n2-standard-4 machine types

D.Add more n1-standard-4 workers

AnswerA

High-memory machines provide more memory per core, better for memory-bound jobs.

Why this answer

The job is memory-intensive, and n1-standard-4 workers have 15 GB of RAM, which may be insufficient for the workload, causing excessive disk spill or OOM errors. Switching to n1-highmem-4 provides 26 GB of RAM per worker (a 73% increase) without increasing vCPU count, directly addressing the memory bottleneck at a lower cost than adding more workers. This is the most cost-effective change because it improves performance without incurring the overhead of additional vCPUs or licensing costs.

Exam trap

The trap here is that candidates often assume adding more workers (scaling out) is always the best way to improve performance, but for memory-intensive jobs, scaling up (using high-memory instances) is more cost-effective because it addresses the root cause—per-worker memory pressure—without wasting resources on additional vCPUs.

How to eliminate wrong answers

Option B is wrong because preemptible workers reduce cost but do not improve performance for a memory-intensive job; they are suitable for fault-tolerant, stateless workloads, not for memory-bound ETL tasks that may fail if preempted. Option C is wrong because n2-standard-4 machine types offer similar memory (16 GB) to n1-standard-4 (15 GB) and only provide a modest CPU performance improvement via newer architecture, which does not address the memory bottleneck. Option D is wrong because adding more n1-standard-4 workers increases total vCPUs and cost but does not increase per-worker memory, so the memory-intensive job will still suffer from the same per-worker memory constraints, leading to inefficient resource utilization.

Full explanation →

396

MCQhard

A company uses Cloud Dataflow to process financial transactions from Pub/Sub to BigQuery. The pipeline must ensure exactly-once semantics. Recently, they noticed duplicate rows in BigQuery. The source publishes with at-least-once. The Dataflow pipeline uses idempotent writes. What is the most likely cause? Options: A. The pipeline uses GlobalWindows. B. The pipeline has autoscaling enabled. C. The pipeline uses file loads as a sink. D. The pipeline's watermark is misconfigured.

A.The pipeline uses file loads as a sink

B.The pipeline's watermark is misconfigured

C.The pipeline uses GlobalWindows

D.The pipeline has autoscaling enabled

AnswerB

A misconfigured watermark can cause late data to be processed again, producing duplicates.

Why this answer

The most likely cause is a misconfigured watermark. In Dataflow, the watermark tracks event time progress and determines when to trigger window results. If the watermark is misconfigured (e.g., too aggressive or based on incorrect timestamps), late-arriving data may be processed in multiple windows, leading to duplicate rows even with idempotent writes.

Since the source uses at-least-once delivery, late data can be re-published, and a faulty watermark can cause it to be written again.

Exam trap

The trap here is that candidates assume idempotent writes alone guarantee exactly-once, but they overlook that watermark misconfiguration can cause the same event to be processed in multiple windows, leading to duplicates despite idempotent sinks.

How to eliminate wrong answers

Option A is wrong because GlobalWindows do not cause duplicates; they aggregate all data into a single window, and duplicates would still be prevented by idempotent writes. Option C is wrong because autoscaling adjusts worker count but does not inherently cause duplicate writes; Dataflow handles state and checkpointing correctly during scaling. Option D is wrong because file loads as a sink can cause duplicates if the load job is retried, but the question states the pipeline uses idempotent writes, and file loads are not mentioned as the sink; the sink is BigQuery, and Dataflow's streaming inserts to BigQuery are idempotent by default.

Full explanation →

397

MCQmedium

A data engineering team uses Cloud Data Fusion to build ETL pipelines. They have a pipeline that reads from Cloud SQL, transforms data using Wrangler, and writes to BigQuery. The pipeline fails intermittently with a 'connection timeout' error from Cloud SQL. What is the best way to handle this?

A.Use Cloud NAT to provide a static IP for Data Fusion to whitelist.

B.Configure the Cloud SQL connector in Data Fusion to use retry logic and increase the connection timeout.

C.Increase the number of Data Fusion nodes to distribute the load.

D.Migrate Cloud SQL to Cloud Spanner to handle higher concurrency.

AnswerB

Retries and longer timeouts handle transient failures.

Why this answer

Option B is correct because Cloud Data Fusion's Cloud SQL connector can be configured with retry logic and an increased connection timeout to handle transient network issues. This directly addresses the intermittent 'connection timeout' error without requiring architectural changes, as the error is likely due to brief network latency or resource contention, not a persistent connectivity problem.

Exam trap

The trap here is that candidates often assume connectivity issues require network-level fixes (like static IPs or NAT) or scaling, rather than recognizing that transient timeouts are best handled by application-level retry and timeout configuration.

How to eliminate wrong answers

Option A is wrong because using Cloud NAT to provide a static IP for whitelisting addresses IP-based access control, but the error is a connection timeout, not an authorization failure; whitelisting does not resolve transient network delays. Option C is wrong because increasing the number of Data Fusion nodes distributes compute load but does not fix connection timeouts to Cloud SQL, which are caused by network or database-side issues, not pipeline parallelism. Option D is wrong because migrating to Cloud Spanner is an overengineered solution for a transient timeout; it introduces unnecessary complexity and cost, and does not address the root cause of intermittent connectivity.

Full explanation →

398

Multi-Selectmedium

A company wants to ensure high availability for their Cloud SQL instance. Which TWO actions are most appropriate? (Choose two.)

Select 2 answers

A.Create a read replica in a different region.

B.Configure a failover replica in the same region.

C.Enable automatic backups with a retention period of 7 days.

D.Increase the instance's memory and storage size.

E.Set up horizontal scaling with multiple read replicas.

AnswersA, C

A cross-region read replica can be promoted to a standalone instance in a disaster, providing DR.

Why this answer

Options A and B are correct: A read replica in a different region provides disaster recovery, and automatic backups allow point-in-time recovery. Option C (failover replica in same region) provides HA but not DR. Option D (increased memory) does not improve availability.

Option E (horizontal scaling with read replicas) does not provide failover for writes.

Full explanation →

399

Multi-Selectmedium

Which TWO actions can help reduce the latency of a Vertex AI endpoint serving a large neural network model?

Select 2 answers

A.Use a larger machine type with more CPU cores

B.Enable model compression with quantization

C.Increase the number of model versions deployed on the same endpoint

D.Deploy the model on a machine type with GPU accelerators

E.Use a smaller batch size for prediction requests

AnswersD, E

GPUs speed up neural network inference.

Why this answer

Option D is correct because GPU accelerators are specifically designed to handle the parallel computations required by large neural networks, significantly reducing inference latency compared to CPU-only machines. Vertex AI endpoints with GPUs can process multiple predictions concurrently, which is critical for deep learning models where matrix operations dominate the workload.

Exam trap

Google Cloud often tests the misconception that more CPU cores or model compression always reduce latency, but the trap here is that for large neural networks, the primary bottleneck is parallel compute capability, which only GPUs or TPUs can address effectively.

Full explanation →

400

MCQmedium

A retail company is building a recommendation engine that requires processing customer clickstream data in near real-time. The data is ingested via Pub/Sub, and must be joined with a lookup table of product details (updated daily) before being used for model inference. Which design pattern should they use?

A.Enrich the stream by querying BigQuery for each event using a Cloud Function.

B.Use a Dataflow pipeline that reads from Pub/Sub and uses a side input from a regularly refreshed PCollection loaded from Cloud Storage.

C.Store product details in Cloud Memorystore (Redis) and have the streaming application look up each event.

D.Write events to BigQuery and use scheduled queries to join with the product table in batch.

AnswerB

Side inputs enable efficient streaming-batch joins within Dataflow.

Why this answer

Option B is correct because Dataflow can read streaming data from Pub/Sub and use a side input from a regularly refreshed PCollection loaded from Cloud Storage. This pattern allows the product lookup table (updated daily) to be periodically reloaded into the pipeline as a side input, enabling efficient, low-latency enrichment of each event without per-event external calls or batch delays.

Exam trap

Google Cloud often tests the distinction between streaming enrichment patterns that require external lookups (which add latency and cost) versus using side inputs for static or slowly-changing reference data, leading candidates to mistakenly choose a cache-based solution like Redis when the data is already available in Cloud Storage.

How to eliminate wrong answers

Option A is wrong because querying BigQuery for each event via a Cloud Function would introduce high latency and cost due to per-event query overhead, and BigQuery is not designed for real-time point lookups. Option C is wrong because while Cloud Memorystore (Redis) provides low-latency lookups, it requires managing a separate cache and does not natively integrate with the daily-updated Cloud Storage file; the pattern also lacks the automatic refresh mechanism that side inputs provide. Option D is wrong because writing events to BigQuery and using scheduled queries for batch joins introduces significant latency (minutes to hours), which violates the near real-time requirement for the recommendation engine.

Full explanation →

401

MCQmedium

A media company ingests video files from partners via a REST API. Files are stored in Cloud Storage, and metadata is written to Firestore. A Cloud Function is triggered on object finalize to transcode video using Transcoder API. Sometimes, the function fails because the file is still being uploaded when triggered. How should this be fixed?

A.Implement a Cloud Composer workflow to poll for file existence.

B.Require partners to use resumable uploads.

C.Increase the Cloud Functions timeout to allow time for the upload to finish.

D.Use Cloud Pub/Sub notifications for Cloud Storage and trigger the function from the subscription.

AnswerD

Pub/Sub notifications are sent after object finalization.

Why this answer

Option D is correct because Cloud Storage object finalize notifications are sent only after the entire file has been written and committed. By using Pub/Sub notifications for Cloud Storage and triggering the Cloud Function from the subscription, you decouple the trigger from the upload process, ensuring the function only runs when the file is fully available. This eliminates the race condition where the function is triggered before the upload completes.

Exam trap

The trap here is that candidates assume 'object finalize' means the upload is complete, but in practice, the event can fire before the upload is fully committed, leading to the misconception that increasing timeouts or changing upload methods will fix the issue.

How to eliminate wrong answers

Option A is wrong because implementing a Cloud Composer workflow to poll for file existence adds unnecessary complexity, latency, and cost; polling is an inefficient solution compared to event-driven triggers. Option B is wrong because requiring partners to use resumable uploads does not change the fact that the Cloud Function is triggered on object finalize before the upload is fully committed; resumable uploads affect the upload mechanism, not the timing of the finalize event. Option C is wrong because increasing the Cloud Functions timeout does not address the root cause—the function is triggered prematurely; the function will still fail if the file is incomplete, regardless of how long it runs.

Full explanation →

402

MCQhard

A multinational e-commerce company runs a real-time recommendation system. The architecture: user click events are sent via HTTP to a Cloud Run service, which publishes them to a Cloud Pub/Sub topic. A Dataflow streaming pipeline reads from the subscription, joins with user profile data from Firestore, computes recommendations using a TensorFlow model (loaded as a side input), and writes results to a Redis cache (Memorystore) for low-latency serving. The pipeline is deployed in us-central1. Recently, the team noticed that recommendation latency has increased from 50ms to 500ms, and the pipeline's backlog is growing. The Dataflow monitoring shows high CPU utilization on workers, and the SystemLag metric is 2 minutes and increasing. The Redis cluster shows no performance issues. The Firestore queries are within normal latency. The team suspects the TensorFlow model inference is the bottleneck. The model is a large neural network (500MB) loaded in each worker's memory. The pipeline uses 10 n1-standard-4 workers. The pipeline is using Dataflow's streaming engine. The team wants to reduce latency without increasing cost significantly. What should they do?

A.Increase the number of workers by adding a secondary worker group with preemptible VMs.

B.Switch to a batch pipeline that runs every minute to reduce frequency of inference.

C.Increase the machine type of workers to n1-highmem-8 to provide more memory for the model.

D.Remove the model side input and call Cloud Run for inference using a separate service.

AnswerA

More workers parallelize inference, preemptible VMs keep cost low.

Why this answer

Option A is correct because adding preemptible VMs as a secondary worker group allows horizontal scaling at lower cost, distributing the TensorFlow model inference load across more workers. This reduces CPU utilization per worker and decreases the SystemLag without significantly increasing cost, as preemptible VMs are much cheaper than regular instances. The bottleneck is CPU-bound model inference, not memory, so more workers directly address the high CPU utilization and growing backlog.

Exam trap

The trap here is that candidates assume a memory issue (option C) because the model is large, but the real bottleneck is CPU utilization from repeated inference, not memory exhaustion.

How to eliminate wrong answers

Option B is wrong because switching to a batch pipeline would increase latency (from seconds to minutes) and is unsuitable for real-time recommendations; the team needs low-latency streaming, not batch. Option C is wrong because the issue is high CPU utilization, not memory pressure; the model is 500MB and n1-standard-4 has 15GB RAM, which is sufficient, so increasing memory does not address the CPU bottleneck and increases cost unnecessarily. Option D is wrong because removing the model side input and calling Cloud Run for inference adds network latency and cost per request, likely worsening latency and increasing cost, and does not leverage Dataflow's in-memory model loading for efficiency.

Full explanation →

403

Matchingmedium

Match each data storage term to its characteristic.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Atomicity, Consistency, Isolation, Durability

Basically Available, Soft state, Eventual consistency

Consistency, Availability, Partition tolerance trade-off

Horizontal partitioning of data across databases

Why these pairings

Fundamental concepts in data storage systems.

Full explanation →

404

MCQhard

Your company runs a batch data processing pipeline using Cloud Dataproc and Cloud Composer. The pipeline processes hundreds of terabytes of data daily. Recently, the pipeline has been failing intermittently due to Dataproc cluster creation errors: 'Insufficient resources to create cluster in zone us-central1-f.' The project has a global quota of 1000 vCPUs for Compute Engine. The team usually uses n2-standard-8 (8 vCPU) worker nodes. You notice that the error occurs during peak usage times. You need to ensure the pipeline runs reliably without increasing the global quota. Which action should you take?

A.Increase the global Compute Engine quota to 2000 vCPUs

B.Switch to using preemptible VMs only, which have higher availability

C.Use fewer workers with larger machine types, such as n2-standard-64

D.Configure the Dataproc cluster to use multiple zones via the --zone argument with a zonal list

AnswerD

Spreading across zones avoids zonal capacity issues.

Why this answer

Option D is correct because configuring the Dataproc cluster to use multiple zones via the `--zone` argument with a zonal list distributes worker node creation across several zones in the same region. This avoids the 'Insufficient resources' error by not exhausting capacity in a single zone, without requiring a global quota increase. Cloud Dataproc supports specifying a comma-separated list of zones, and the service will attempt to create the cluster in the first available zone.

Exam trap

The trap here is that candidates often assume the only solution to resource exhaustion is to increase quotas or switch to preemptible VMs, overlooking the zonal distribution feature that directly addresses the 'Insufficient resources' error without changing the global quota.

How to eliminate wrong answers

Option A is wrong because the question explicitly states you must not increase the global quota; raising it to 2000 vCPUs would violate that constraint and does not address the zonal resource exhaustion issue. Option B is wrong because preemptible VMs have lower availability (they can be reclaimed at any time) and are not suitable as the only worker type for a reliable production pipeline processing hundreds of terabytes daily; they also do not solve the zone-specific capacity shortage. Option C is wrong because using fewer workers with larger machine types (e.g., n2-standard-64) does not reduce the total vCPU count required for the workload; it may even increase the risk of hitting the global quota per cluster creation request and does not mitigate the zonal resource exhaustion.

Full explanation →

405

MCQhard

You are designing a system to serve predictions from a large language model (LLM) with a latency SLO of 500ms. The model does not fit on a single GPU and requires model parallelism. You are considering using Vertex AI Endpoints with a custom container. What additional setup is required to achieve the latency target?

A.Compile the model using TensorFlow XLA to optimize for single GPU execution.

B.Deploy the model across multiple endpoints and use a load balancer to send requests to different parts of the model.

C.Use Vertex AI Prediction as a service for LLMs, which automatically handles hardware selection.

D.Use a machine type with multiple GPUs and configure the container to use tensor parallelism.

AnswerD

Leveraging multiple GPUs on one node via model parallelism (e.g., tensor parallelism) is the standard approach to fit large models and meet latency.

Why this answer

Model parallelism across multiple GPUs on a single machine can be handled by the container using libraries like TensorFlow Distribution Strategies. Sharding across endpoints would incur network latency. Using TPUs is an alternative but not necessarily required.

The key is to configure multi-GPU in the machine type.

Full explanation →

406

Matchingmedium

Match each Google Cloud monitoring/logging service to its function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Metrics and alerting for cloud resources

Centralized log storage and analysis

Aggregates and analyzes application errors

Records administrative and data access activities

Why these pairings

Services for observability and compliance.

Full explanation →

407

MCQmedium

A data pipeline uses Cloud Pub/Sub to ingest events and Cloud Functions to transform and write to BigQuery. The system is experiencing data loss during Pub/Sub subscription outages. Which design change improves reliability?

A.Use Dataflow with at-least-once delivery and checkpointing

B.Use a pull subscription with a custom app that polls frequently

C.Use long ack deadlines to keep messages in the subscription

D.Increase the timeout in Cloud Functions

AnswerA

Dataflow provides exactly-once semantics with checkpointing to prevent data loss.

Why this answer

Dataflow with at-least-once delivery and checkpointing ensures that messages are not lost during Pub/Sub subscription outages because Dataflow tracks processing progress via checkpoints and can replay unacknowledged messages from the last checkpoint. This decouples the processing from the subscription's transient failures, providing fault-tolerant, exactly-once or at-least-once semantics depending on the sink.

Exam trap

Google Cloud often tests the misconception that increasing timeouts or ack deadlines alone can prevent data loss, when in reality they only delay the inevitable loss without a replay mechanism like checkpointing or a persistent buffer.

How to eliminate wrong answers

Option B is wrong because a pull subscription with a custom app that polls frequently does not inherently provide reliability during subscription outages; the app would still lose messages if the subscription itself is unavailable or if the app fails to acknowledge before the ack deadline. Option C is wrong because long ack deadlines only keep messages in the subscription for a longer time, but they do not prevent data loss if the subscriber crashes or the subscription becomes unavailable; messages can still be dropped if the deadline expires without ack. Option D is wrong because increasing the timeout in Cloud Functions does not address data loss from subscription outages; it only allows the function to run longer before timing out, but does not provide replay or checkpointing mechanisms.

Full explanation →

408

MCQeasy

A company needs to deploy a trained model for real-time predictions with low latency. Which Vertex AI resource should they use?

A.Cloud TPU

B.Vertex AI Batch Prediction

C.Vertex AI Endpoints

D.Cloud Run

AnswerC

Endpoints provide real-time model serving with low latency.

Why this answer

Vertex AI Endpoints are designed for online prediction, providing a managed service that hosts models for real-time inference with low latency. They automatically scale resources and handle traffic routing, making them the correct choice for deploying a trained model that needs to respond to individual prediction requests quickly.

Exam trap

Google Cloud often tests the distinction between batch and online prediction, and the trap here is that candidates confuse Vertex AI Batch Prediction (which is for offline, large-scale inference) with the real-time serving capability of Vertex AI Endpoints, leading them to select option B.

How to eliminate wrong answers

Option A is wrong because Cloud TPUs are specialized hardware accelerators for training and batch inference, not a deployment service for real-time predictions; they require manual management and are not designed for low-latency serving of individual requests. Option B is wrong because Vertex AI Batch Prediction is intended for asynchronous, high-throughput predictions on large datasets, not for real-time, low-latency responses; it processes jobs in batches and returns results to a storage location. Option D is wrong because Cloud Run is a serverless compute platform for containerized applications, but it lacks the native model hosting, versioning, and traffic splitting capabilities that Vertex AI Endpoints provide for machine learning models.

Full explanation →

409

Matchingmedium

Match each data encryption concept to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Customer-supplied encryption key

Customer-managed encryption key via Cloud KMS

CSEK: keys provided by customer; CMEK: keys managed in Cloud KMS

Data encrypted while moving across networks

Why these pairings

Encryption options in Google Cloud.

Full explanation →

410

Multi-Selecteasy

Your company is evaluating managed messaging services for a new event-driven application. The application requires pub/sub semantics, high throughput (millions of messages per second), and integration with Google Cloud services like Cloud Functions and Dataflow. Which TWO services should you consider? (Choose two.)

Select 2 answers

A.Cloud Pub/Sub

B.Cloud Scheduler

C.Cloud Pub/Sub Lite

D.Cloud Functions

E.Cloud Tasks

AnswersA, C

Cloud Pub/Sub is a fully managed, scalable pub/sub messaging service with native Google Cloud integration.

Why this answer

Cloud Pub/Sub (A) is the correct choice because it provides fully managed, highly scalable pub/sub messaging with exactly-once delivery semantics and support for millions of messages per second. It integrates natively with Cloud Functions and Dataflow, making it ideal for event-driven architectures requiring high throughput and decoupled communication.

Exam trap

Google Cloud often tests the distinction between managed messaging services (Pub/Sub vs. Pub/Sub Lite) and other Google Cloud services like Cloud Tasks or Cloud Scheduler, where candidates mistakenly select compute or scheduling services for messaging needs.

Full explanation →

411

MCQhard

You are a data engineer at a financial services company. You manage a batch pipeline that processes daily trade settlement reports. The pipeline runs on Cloud Dataproc using PySpark jobs triggered by Cloud Composer (Airflow). Recent trades have increased by 3x, and the pipeline now frequently fails with 'OutOfMemoryError' in the executor logs. You have already increased the executor memory from 4g to 8g, but the problem persists. The cluster uses standard worker nodes (n1-standard-4) with 15 GB RAM per node. You need to make the pipeline stable and cost-efficient. What should you do?

A.Use n1-highmem-4 instances for the cluster to get 26 GB RAM per node and increase executor memory to 12g.

B.Migrate the PySpark jobs to Cloud Dataflow with the Apache Beam SDK to benefit from auto-scaling.

C.Increase the number of executors and reduce the executor memory to 4g, then add preemptible secondary workers to lower cost.

D.Enable cluster autoscaling and set minimum to 5 workers, maximum to 20 workers.

AnswerC

Adding more executor instances distributes memory and reduces per executor load; preemptible workers lower costs.

Why this answer

Option C is correct because the OutOfMemoryError persists even after increasing executor memory to 8g, indicating that the issue is not simply insufficient memory per executor but rather that the total memory across all executors is insufficient for the 3x data volume. By increasing the number of executors (parallelism) and reducing executor memory back to 4g, you distribute the data processing load across more JVMs, reducing the memory pressure per executor. Adding preemptible secondary workers lowers cost while providing the additional compute capacity needed to handle the increased data volume efficiently.

Exam trap

Google Cloud often tests the misconception that increasing executor memory alone solves OutOfMemoryErrors, when in reality the issue is often insufficient parallelism or misconfigured memory overhead, and the correct solution involves balancing executor count, memory, and cost-efficient instance types like preemptible VMs.

How to eliminate wrong answers

Option A is wrong because simply using n1-highmem-4 instances with 26 GB RAM and increasing executor memory to 12g does not address the root cause—the pipeline needs more parallelism, not just more memory per executor; the OutOfMemoryError can still occur if the data skew or shuffle operations overwhelm a single executor. Option B is wrong because migrating to Cloud Dataflow with Apache Beam SDK is a significant architectural change that does not directly solve the memory issue; Dataflow auto-scaling can help with throughput but does not guarantee stability if the pipeline's memory configuration is fundamentally misaligned with the data volume. Option D is wrong because enabling cluster autoscaling with a minimum of 5 workers and maximum of 20 workers does not address the executor memory configuration; autoscaling adds nodes but if the executor memory is still too high per node (e.g., 8g on a 15 GB node), the system may still run out of memory due to overhead from the OS, YARN, and other daemons, and it does not optimize cost by using preemptible instances.

Full explanation →

412

MCQmedium

A Dataflow pipeline reads events from Pub/Sub and transforms them. Some events contain invalid product IDs that should be filtered out. The list of valid product IDs is stored in a frequently updated BigQuery table. What is the best approach to filter out invalid events?

A.Read the BigQuery table as a side input and refresh it periodically using a global window with a periodic trigger

B.Use a Combine.PerKey to group by product ID and then filter

C.Use a custom pipeline option to read the valid IDs at startup and cache them

D.Use a ParDo with a side input that is a MapSideInput of valid IDs, and refresh it on each element

AnswerA

This approach allows the side input to be updated without restarting the pipeline, and the trigger ensures periodic refresh.

Why this answer

Option A is correct because reading the BigQuery table as a side input with a global window and periodic trigger allows the pipeline to refresh the list of valid product IDs at a configurable interval without reprocessing the entire stream. This pattern is idiomatic for Beam/Dataflow when the reference data changes frequently and must be kept reasonably current while maintaining low latency for streaming events.

Exam trap

Google Cloud often tests the misconception that side inputs are static or that per-element refresh is feasible, leading candidates to choose Option D, but in reality side inputs are materialized once per window/trigger and cannot be efficiently updated per element.

How to eliminate wrong answers

Option B is wrong because Combine.PerKey is designed for aggregating values per key (e.g., summing counts), not for filtering based on an external lookup; it would not incorporate the BigQuery table at all. Option C is wrong because custom pipeline options are evaluated at pipeline construction time and cannot be updated during pipeline execution, so the cached list would become stale as soon as the BigQuery table is updated. Option D is wrong because refreshing the side input on each element would cause excessive BigQuery read operations, leading to high latency and cost; MapSideInput is read-only once materialized and does not support per-element refresh.

Full explanation →

413

MCQhard

A company needs to process sensitive data in BigQuery with column-level security. They want to allow analysts to see aggregated data but not individual records. What approach?

A.Use table-level access controls

B.Use column-level access controls with masking

C.Use authorized views with aggregation functions

D.Use Cloud Data Loss Prevention to de-identify data

AnswerC

Authorized views can present aggregated data while hiding raw details.

Why this answer

Option C is correct because authorized views in BigQuery allow you to define SQL queries that aggregate data (e.g., using SUM, COUNT, AVG) and expose only the aggregated results to analysts, while hiding individual records. This approach enforces column-level security by granting access to the view rather than the underlying table, ensuring analysts cannot query the raw data directly. It meets the requirement of seeing aggregated data without seeing individual records, leveraging BigQuery's native authorization and SQL capabilities.

Exam trap

Google Cloud often tests the distinction between column-level masking (which still allows row-level access) and authorized views (which enforce aggregation at the query level), leading candidates to pick B because they confuse masking with aggregation-based security.

How to eliminate wrong answers

Option A is wrong because table-level access controls grant access to entire tables, which would allow analysts to see individual records, not just aggregated data, violating the requirement. Option B is wrong because column-level access controls with masking can hide specific column values (e.g., by replacing them with NULL or a mask), but they still allow analysts to query individual rows and see non-masked columns, potentially exposing record-level details; they do not inherently restrict access to only aggregated results. Option D is wrong because Cloud Data Loss Prevention (DLP) is used for de-identifying data at rest or in transit (e.g., via inspection and transformation jobs), but it does not provide real-time, query-level aggregation controls within BigQuery; analysts would still have access to the underlying de-identified table, which could contain individual records.

Full explanation →

414

MCQeasy

A team is deploying a model on AI Platform Prediction. They want to monitor for data drift to maintain model quality. Which service should they use?

A.Cloud DLP

B.AI Platform Continuous Evaluation

C.Cloud Monitoring

D.Cloud Audit Logs

AnswerB

This service provides monitoring for model predictions and drift analysis.

Why this answer

Option B is correct because AI Platform Continuous Evaluation is designed to monitor model performance and detect data drift. Option A (Cloud Monitoring) is for infrastructure metrics. Option C (Cloud Audit Logs) is for API activity.

Option D (Cloud DLP) is for data loss prevention.

Full explanation →

415

MCQeasy

A company needs to stream data from a fleet of IoT devices to BigQuery for near-real-time analytics. The data volume is unpredictable and can spike during certain events. Which Google Cloud service should be used as the ingestion point to handle variable throughput with minimal operational overhead?

A.Cloud Datastore

B.Cloud Functions

C.Cloud Storage

D.Cloud Pub/Sub

AnswerD

Cloud Pub/Sub ingests variable-volume data and decouples producers from consumers.

Why this answer

Cloud Pub/Sub is the correct choice because it is a fully managed, scalable messaging service designed to decouple data producers from consumers, handling unpredictable and spiky throughput without requiring manual scaling. It can ingest millions of messages per second and buffer them until BigQuery is ready to consume, ensuring near-real-time analytics with minimal operational overhead.

Exam trap

Google Cloud often tests the misconception that Cloud Functions can serve as a direct ingestion point for streaming data, but candidates overlook that Cloud Functions lacks durable buffering and automatic scaling for high-throughput spikes, making Pub/Sub the correct decoupling layer.

How to eliminate wrong answers

Option A is wrong because Cloud Datastore is a NoSQL document database for storing structured data, not a streaming ingestion service; it cannot handle variable-throughput message ingestion or buffer spikes. Option B is wrong because Cloud Functions is a serverless compute platform for event-driven code execution, not a durable ingestion buffer; it lacks built-in buffering and would require custom scaling logic to handle throughput spikes. Option C is wrong because Cloud Storage is an object storage service for batch data, not designed for near-real-time streaming ingestion; it introduces latency and requires additional components (e.g., Cloud Functions or Pub/Sub notifications) to trigger downstream processing.

Full explanation →

416

MCQeasy

Based on the exhibit, what is the most likely cause of the out-of-memory error?

A.The BigQuery output table schema does not match the transformed data, causing write failures.

B.The Pub/Sub subscription is not acknowledging messages quickly enough, causing a backlog.

C.The worker machine type has insufficient memory for the message size and throughput.

D.The fixed window duration of 1 minute is too short, causing excessive state overhead.

AnswerC

Large messages (50 KB) and high throughput (1000/sec) require more memory; n1-standard-4 may be undersized.

Why this answer

The out-of-memory error in a Dataflow pipeline is most likely caused by the worker machine type having insufficient memory for the message size and throughput. When messages are large or the throughput is high, each worker must hold data in memory for processing, windowing, and shuffling. If the worker's memory is too small, the JVM heap runs out of memory, leading to an OOM error.

Exam trap

Google Cloud often tests the misconception that OOM errors are caused by schema mismatches or Pub/Sub backlogs, but the real cause is almost always insufficient worker memory for the data volume.

How to eliminate wrong answers

Option A is wrong because a schema mismatch between the BigQuery output table and the transformed data would cause write failures or errors in the BigQuery IO connector, not an out-of-memory error on the worker. Option B is wrong because a Pub/Sub subscription not acknowledging messages quickly enough would cause a backlog and increase unacknowledged message count, but it would not directly cause an out-of-memory error on the Dataflow worker; the pipeline would still process messages at its own pace, and the backlog would be in Pub/Sub, not in worker memory. Option D is wrong because a fixed window duration of 1 minute being too short would increase state overhead only if the pipeline uses stateful processing or triggers that accumulate state across windows; for a simple streaming pipeline, shorter windows actually reduce the amount of data held in memory per window, not cause OOM.

Full explanation →

417

MCQmedium

A company stores IoT sensor data in BigQuery. Queries that filter on a timestamp column and a device_id column are slow even though the table is partitioned by day. What should the data engineer do to improve query performance?

A.Increase the partition size to monthly

B.Switch to ingestion-time partitioning instead of column-based

C.Enable automatic query rewriting with BI Engine

D.Cluster the table on device_id

AnswerD

Clustering organizes data within partitions, improving filter performance.

Why this answer

Clustering on device_id organizes the data within each day partition by device_id, allowing BigQuery to prune blocks during queries that filter on that column. This reduces the amount of data scanned and improves query performance without changing the partitioning scheme. Partitioning alone only limits scans by time range; clustering adds intra-partition sorting for non-time-based filters.

Exam trap

Google Cloud often tests the distinction between partitioning (which prunes by time) and clustering (which prunes by non-time columns), and the trap here is assuming that partitioning alone is sufficient for all filter columns, leading candidates to choose an option that changes the partition strategy rather than adding clustering.

How to eliminate wrong answers

Option A is wrong because increasing partition size to monthly would reduce the number of partitions, making each partition larger and actually increasing the data scanned for queries that filter on a specific day, worsening performance. Option B is wrong because ingestion-time partitioning is equivalent to partitioning on a pseudo-column (_PARTITIONTIME) and does not address the need to optimize filtering on device_id; it would not improve performance for queries filtering on device_id. Option C is wrong because BI Engine accelerates sub-second queries on small to medium datasets by caching results, but it does not reduce the amount of data scanned for large tables or optimize filtering on device_id; it is designed for interactive analytics, not for improving slow queries due to full table scans.

Full explanation →

418

MCQmedium

A data scientist needs to provide explanations for each prediction made by a deployed autoML model to comply with regulatory requirements. Which Vertex AI feature should they use?

A.Vertex AI Model Monitoring

B.Vertex AI Vizier

C.Vertex AI Explainable AI

D.Vertex AI Feature Store

AnswerC

Provides per-prediction explanations.

Why this answer

Vertex AI Explainable AI is the correct feature because it provides feature attributions and explanations for each prediction, enabling compliance with regulatory requirements that demand interpretability. It uses techniques like Shapley value approximations or integrated gradients to quantify the contribution of each input feature to the model's output, which is essential for auditing and transparency in deployed autoML models.

Exam trap

Google Cloud often tests the distinction between monitoring (detecting drift) and explaining (interpreting predictions), so candidates mistakenly choose Model Monitoring when the question explicitly asks for per-prediction explanations for regulatory compliance.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Monitoring is designed to detect prediction drift, data drift, and feature skew over time, not to provide per-prediction explanations. Option B is wrong because Vertex AI Vizier is a hyperparameter tuning and optimization service that helps find the best model architecture or parameters, not a tool for explaining individual predictions. Option D is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, but it does not generate explanations for model predictions.

Full explanation →

419

MCQmedium

You need to automate retraining of a model when new training data becomes available every week. The training pipeline runs on Vertex AI Pipelines and is triggered by Cloud Composer. After retraining, you want to evaluate the new model against a golden dataset. If the model's accuracy improves by at least 1%, it should be automatically deployed to the staging endpoint. What is the best way to implement the decision logic?

A.Use Cloud Functions to compare metrics and call the endpoint if conditions are met.

B.Add a conditional step in the Vertex AI Pipeline to evaluate the model and deploy if the accuracy improvement threshold is met.

C.After training, run a batch prediction job on the golden dataset and compare metrics manually.

D.Use Vertex AI Experiments to log metrics and set up an alert to manually deploy.

AnswerB

Pipelines can include a condition step to check metrics and decide deployment.

Why this answer

Option B is correct because Vertex AI Pipelines supports conditional execution natively via the `Condition` component, allowing you to evaluate the new model's accuracy against the golden dataset within the same pipeline and deploy only if the improvement threshold (≥1%) is met. This approach keeps the entire retraining, evaluation, and deployment workflow automated, auditable, and tightly coupled within a single orchestrated pipeline, avoiding external triggers or manual steps.

Exam trap

Google Cloud often tests the misconception that external services like Cloud Functions are needed for decision logic, when in fact Vertex AI Pipelines' native conditional steps are the simpler, more integrated, and recommended approach for automated model evaluation and deployment within a pipeline.

How to eliminate wrong answers

Option A is wrong because Cloud Functions would introduce an external, event-driven component that adds latency, complexity, and potential failure points; Vertex AI Pipelines already provides built-in conditional logic for this exact use case, making an extra function unnecessary. Option C is wrong because running a batch prediction job and manually comparing metrics defeats the automation goal and introduces human error and delay, which is not suitable for a weekly retraining cadence. Option D is wrong because Vertex AI Experiments is designed for tracking and comparing experiments, not for automated decision-making or deployment; relying on alerts for manual deployment contradicts the requirement for automatic retraining and deployment.

Full explanation →

420

Multi-Selectmedium

A company wants to implement model monitoring for a deployed classification model. Which three types of monitoring should they set up? (Select 3)

Select 3 answers

A.Infrastructure cost monitoring

B.Training-serving skew

C.Prediction drift

D.Input feature drift

E.Model version comparison

AnswersB, C, D

Skew detection identifies differences between training and serving data.

Why this answer

Vertex AI Model Monitoring covers input feature drift, prediction drift, and training-serving skew. Cost monitoring and version comparison are not part of model monitoring.

Full explanation →

421

Multi-Selecteasy

A data team uses Cloud Composer to orchestrate Airflow DAGs. They need to ensure that a downstream task runs only if at least two out of three upstream sensor tasks succeed. Which TWO configurations should they combine?

Select 2 answers

A.Set trigger_rule to 'none_failed_or_skipped' and use a condition.

B.Set trigger_rule to 'one_success'.

C.Set trigger_rule to 'all_done'.

D.Set trigger_rule to 'none_failed'.

E.Use a PythonOperator to check the number of successes.

AnswersA, E

Combined with a condition, this ensures at least two succeeded.

Why this answer

Option A is correct because the 'none_failed_or_skipped' trigger rule triggers the downstream task when all upstream tasks have succeeded or been skipped. Combined with a condition (e.g., using a PythonOperator or BranchPythonOperator) that checks whether at least two of the three sensor tasks succeeded, this ensures the downstream task runs only when the required threshold is met. This approach leverages Airflow's built-in trigger rules and conditional logic to implement a quorum-based dependency.

Exam trap

Google Cloud often tests the misconception that a single trigger rule like 'one_success' or 'none_failed' can directly enforce a quorum condition, when in fact you must combine a trigger rule with explicit conditional logic to count successes.

Full explanation →

422

MCQhard

A data science team uses Vertex AI Pipelines to automate retraining. They want to ensure that only models with performance above a threshold are deployed. Which component should they add to the pipeline?

A.Vertex AI Feature Store

B.Vertex AI Model Evaluation

C.Cloud Build trigger

D.Cloud Monitoring alert

AnswerB

Evaluates model and can block deployment if threshold not met.

Why this answer

Vertex AI Model Evaluation provides built-in evaluation metrics and threshold-based validation that can be used as a pipeline condition to gate model deployment. By adding a Model Evaluation component, the pipeline can compare model performance against a predefined threshold and only proceed to deploy if the metrics (e.g., AUC, precision, recall) meet or exceed the required value.

Exam trap

The trap here is that candidates may confuse monitoring (Cloud Monitoring) or feature management (Feature Store) with the evaluation step needed to gate deployment, but only Model Evaluation provides the threshold-based conditional logic within the pipeline itself.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, not for evaluating model performance or enforcing deployment thresholds. Option C is wrong because Cloud Build trigger is used to automate builds and tests of source code, not to evaluate trained model metrics within a Vertex AI Pipeline. Option D is wrong because Cloud Monitoring alert is designed to notify operators about system or application anomalies, not to serve as a pipeline gate that conditionally deploys models based on evaluation results.

Full explanation →

423

MCQeasy

The push endpoint is returning 500 errors. What is the most likely cause?

A.The push endpoint requires authentication but none is set

B.The topic has no messages

C.The push endpoint is not a valid HTTPS URL

D.The ack deadline is too short

AnswerA

If the endpoint expects an Authorization header, requests without it will fail with 500 or 401.

Why this answer

The push endpoint likely requires authentication, but none is configured, causing the 500 errors.

Full explanation →

424

MCQmedium

A financial services company deploys a regression model to predict loan default risk. The model is served using Vertex AI Endpoints with autoscaling. After deployment, latency increases significantly during peak hours, causing timeouts. The model uses scikit-learn and has a large feature set. Which action should the team take to reduce latency while maintaining prediction accuracy?

A.Switch to batch prediction for all requests.

B.Increase the minimum number of replicas in the endpoint to handle peak load.

C.Increase the memory allocation for the serving container.

D.Apply feature selection to reduce the number of input features.

AnswerD

Reducing features decreases model size and inference time.

Why this answer

Option D is correct because the latency spike is caused by the large feature set, which increases the time for preprocessing and inference in the scikit-learn model. Reducing the number of input features via feature selection directly decreases the computational load per request, lowering latency without sacrificing accuracy if the selected features retain predictive power. This addresses the root cause, unlike scaling or resource changes that only mask the symptom.

Exam trap

The trap here is that candidates often confuse scaling solutions (increasing replicas or memory) with performance optimization, but the question specifically asks for reducing latency per request, which requires addressing the computational bottleneck—feature reduction—rather than adding more resources.

How to eliminate wrong answers

Option A is wrong because switching to batch prediction does not reduce per-request latency; it processes requests asynchronously in bulk, which is unsuitable for real-time serving and would still cause timeouts during peak hours. Option B is wrong because increasing the minimum number of replicas only adds more instances to handle concurrent requests, but each individual request still suffers from the same high latency due to the large feature set—autoscaling already adds replicas under load, so this does not fix the per-request processing time. Option C is wrong because increasing memory allocation for the serving container helps with out-of-memory errors but does not reduce the CPU-bound computation time required to process a large feature set; the bottleneck is compute, not memory.

Full explanation →

425

MCQeasy

Your company uses Cloud Dataflow to process streaming data from Pub/Sub. The pipeline occasionally fails with a 'worker terminated unexpectedly' error. What is the most likely cause of this error?

A.Insufficient memory per worker causing OOM errors

B.Incorrect VPC firewall rules blocking internal communication

C.Staging location bucket lacks write permissions

D.Pub/Sub subscription throughput quota exceeded

AnswerA

OOM errors cause workers to terminate unexpectedly.

Why this answer

The 'worker terminated unexpectedly' error in Cloud Dataflow typically indicates that a worker process ran out of memory (OOM) and was killed by the operating system. This occurs when the pipeline's memory requirements exceed the configured worker machine type's memory capacity, often due to large windowing accumulations, skewed data, or inefficient state handling.

Exam trap

Google Cloud often tests the distinction between infrastructure-level errors (like OOM) and configuration or permission errors, so candidates may incorrectly attribute the generic 'worker terminated' message to network or IAM issues rather than resource exhaustion.

How to eliminate wrong answers

Option B is wrong because VPC firewall rules blocking internal communication would cause connectivity errors like 'unable to connect to shuffle service' or 'worker cannot reach Dataflow service', not a generic termination error. Option C is wrong because staging location bucket lacking write permissions would cause a pipeline submission failure with a permission denied error, not a runtime worker termination. Option D is wrong because Pub/Sub subscription throughput quota exceeded would result in Pub/Sub-specific errors such as 'RESOURCE_EXHAUSTED' or backlog buildup, not a worker termination.

Full explanation →

426

MCQeasy

A company trains a custom model using TensorFlow and wants to deploy it to Vertex AI for low-latency predictions. The model is large (2 GB). Which deployment option should they choose?

A.Use Vertex AI Batch Prediction job

B.Deploy as a Cloud Function

C.Deploy to Vertex AI Endpoint with a custom container

D.Deploy to Cloud Run with minimum instances

AnswerC

Custom containers allow large models.

Why this answer

Option C is correct because deploying a large (2 GB) model to Vertex AI Endpoint with a custom container allows you to package the model, its dependencies, and a serving framework (e.g., TensorFlow Serving) into a Docker image. This approach supports low-latency predictions by keeping the model loaded in memory across requests, and it can scale to handle real-time inference traffic, unlike batch or serverless options that have cold-start or size limitations.

Exam trap

Google Cloud often tests the misconception that Cloud Run or Cloud Functions can handle large models for real-time inference, ignoring their size limits, cold-start latency, and lack of native Vertex AI integration for model management and scaling.

How to eliminate wrong answers

Option A is wrong because Vertex AI Batch Prediction is designed for asynchronous, high-throughput processing of large datasets, not for low-latency real-time predictions; it processes jobs in batches and does not maintain a persistent endpoint. Option B is wrong because Cloud Functions have a maximum deployment size of 2 GB (unpackaged) and a 60-second timeout, making them unsuitable for a 2 GB model that requires persistent memory and low-latency inference. Option D is wrong because Cloud Run has a container image size limit of 2 GB (uncompressed) and a request timeout of 60 minutes, but it lacks native integration with Vertex AI's model registry and optimized serving infrastructure, and it may incur cold-start latency even with minimum instances.

Full explanation →

427

MCQhard

A company runs a real-time fraud detection model using Cloud Dataflow for streaming inference. The model is updated every hour with new training data. The team wants to minimize downtime and ensure that both old and new model versions are available during the update. Which deployment strategy should they use?

A.A/B testing: route a small percentage of traffic to the new model and compare performance.

B.Rolling deployment: gradually replace instances of the old model with the new model.

C.Blue/green deployment: deploy the new model to a separate endpoint, then switch all traffic at once.

D.Canary deployment: deploy the new model alongside the old one, gradually increase traffic to the new model while monitoring.

AnswerD

Canary deployment ensures both versions are available and traffic is shifted gradually, minimizing downtime and risk.

Why this answer

Canary deployment is the correct strategy because it allows the new model to be deployed alongside the old one, with traffic gradually shifted to the new version while monitoring for errors or performance degradation. This minimizes downtime and ensures both versions are available during the update, which is critical for a real-time fraud detection system where continuous availability and risk mitigation are paramount.

Exam trap

The trap here is that candidates confuse A/B testing (a statistical evaluation method) with canary deployment (a release strategy), or assume blue/green deployment is always best for zero-downtime updates without considering the requirement for gradual traffic shifting and availability of both versions during the update.

How to eliminate wrong answers

Option A is wrong because A/B testing is a statistical method for comparing model performance, not a deployment strategy for minimizing downtime or ensuring availability during updates. Option B is wrong because rolling deployment gradually replaces instances, which can cause a brief period where only the new model is available, violating the requirement that both old and new versions be available during the update. Option C is wrong because blue/green deployment switches all traffic at once after the new model is deployed, which introduces a cutover risk and does not allow gradual traffic shifting or monitoring during the transition.

Full explanation →

428

Multi-Selecteasy

A company is developing a streaming Dataflow pipeline to process real-time sensor data. To ensure data quality, the team wants to detect malformed records and late data. Which two practices should they implement? (Choose two.)

Select 2 answers

A.Use Beam’s PAssert to validate each element in the pipeline.

B.Enable Dataflow’s built-in schema validation on the PCollection.

C.Configure a dead letter queue for unprocessable records.

D.Use Cloud Monitoring alerting on Dataflow system lag metric.

E.Run a separate batch pipeline to re-process data for validation.

AnswersC, D

A dead letter queue stores malformed records for later analysis, ensuring no data is silently lost.

Why this answer

Option C is correct because a dead letter queue (DLQ) is a standard pattern in streaming pipelines for isolating malformed or unprocessable records without blocking the main data flow. In Dataflow, this is typically implemented by writing bad records to a separate output (e.g., a Pub/Sub topic or Cloud Storage bucket) for later analysis or reprocessing. Option D is correct because the Dataflow system lag metric in Cloud Monitoring measures the time between when data enters the pipeline and when it is processed, making it an effective way to detect late data and trigger alerts for SLA violations.

Exam trap

Google Cloud often tests the misconception that PAssert can be used in production pipelines, but it is strictly a testing utility, and candidates may also confuse schema validation with Dataflow's built-in type checking, which does not exist for arbitrary record validation.

Full explanation →

429

MCQhard

A company processes financial transactions using Cloud Dataflow. They need to ensure that late-arriving data is handled correctly for fraud detection. The pipeline uses event time processing. Which approach should they use to handle late data?

A.Sliding windows with early firing

B.Session windows with gap duration

C.Fixed windows with allowed lateness

D.Global windows with triggers

AnswerC

Allowed lateness includes late events in the correct window.

Why this answer

Option C is correct because fixed windows with allowed lateness are the standard approach in Cloud Dataflow (Apache Beam) for handling late-arriving data in event-time processing. By specifying an allowed lateness duration, the pipeline retains the window state for that period, allowing late events to be correctly assigned to their original window and triggering recomputation of results. This ensures fraud detection pipelines can account for delayed transactions without missing or misordering data.

Exam trap

Google Cloud often tests the misconception that sliding or session windows inherently handle late data, when in fact only explicit allowed lateness (or a similar mechanism) provides the necessary state retention and watermark adjustment for late-arriving events.

How to eliminate wrong answers

Option A is wrong because sliding windows with early firing are designed to produce speculative results before the window closes, not to handle late-arriving data; early firing does not extend the window to accept late events. Option B is wrong because session windows with gap duration are used to group events into sessions based on inactivity gaps, not to manage late data; they do not provide a mechanism to accept events that arrive after the session has closed. Option D is wrong because global windows with triggers are typically used for unbounded aggregations where all data belongs to a single window, but they do not naturally handle late-arriving data within specific time boundaries required for fraud detection; they lack the per-window lateness cutoff that fixed windows offer.

Full explanation →

430

MCQhard

An e-commerce company deploys a recommendation model on Vertex AI Endpoints. The endpoint receives a high volume of requests with a large payload. They notice high latency and occasional timeouts. Which action should they take to improve performance without sacrificing accuracy?

A.Enable request batching on the endpoint

B.Switch to a smaller machine type

C.Reduce the model size by pruning

D.Increase the number of replicas

AnswerA

Batching improves throughput by combining requests, reducing overhead and latency without affecting model accuracy.

Why this answer

Enabling request batching on the Vertex AI endpoint allows multiple inference requests to be grouped into a single prediction call, reducing per-request overhead and improving throughput. This directly addresses high latency and timeouts caused by a high volume of large payloads without altering the model or its accuracy.

Exam trap

Google Cloud often tests the misconception that scaling replicas or reducing model size is the default fix for latency, but the trap here is that batching addresses throughput without sacrificing accuracy, whereas pruning or smaller machines would degrade performance or accuracy.

How to eliminate wrong answers

Option B is wrong because switching to a smaller machine type reduces compute resources, which would increase latency and worsen timeouts under high request volume. Option C is wrong because reducing model size by pruning can degrade prediction accuracy, which the question explicitly states must not be sacrificed. Option D is wrong because increasing the number of replicas adds cost and may not resolve timeouts if the bottleneck is per-request processing overhead rather than concurrency limits.

Full explanation →

431

MCQeasy

A data engineer needs to automatically delete objects from a Cloud Storage bucket after 30 days and archive them to nearline storage after 7 days. Which configuration should they use?

A.Set a lifecycle rule to SetStorageClass to nearline after 30 days only

B.Set a lifecycle rule to delete objects after 7 days only

C.Set a lifecycle rule to SetStorageClass to nearline after 7 days and delete after 30 days

D.Set a lifecycle rule to delete objects after 7 days and SetStorageClass to nearline after 30 days

AnswerC

Correct: archive after 7 days, delete after 30.

Why this answer

Option C is correct because it implements a lifecycle rule that first transitions objects to Nearline storage after 7 days (reducing costs for infrequently accessed data) and then deletes them after 30 days. This matches the requirement to archive after 7 days and delete after 30 days, using the `SetStorageClass` and `Delete` actions in the correct chronological order.

Exam trap

Google Cloud often tests the order of lifecycle actions: candidates mistakenly think deletion should come before archiving, but the correct sequence is to archive first (to reduce cost) and delete later, as objects cannot be archived after deletion.

How to eliminate wrong answers

Option A is wrong because it only sets the storage class to Nearline after 30 days, missing the deletion requirement entirely and incorrectly archiving after 30 days instead of 7. Option B is wrong because it only deletes objects after 7 days, ignoring the archive-to-Nearline step and deleting data too early. Option D is wrong because it reverses the order: it deletes objects after 7 days (before they can be archived) and then attempts to set storage class to Nearline after 30 days, which is impossible since the objects are already deleted.

Full explanation →

432

MCQmedium

A company wants to automate model retraining and deployment whenever new training data becomes available. Which service should be used to orchestrate the end-to-end workflow?

A.Cloud Build

B.Vertex AI Pipelines

C.Cloud Scheduler

D.Cloud Composer

AnswerB

Designed for ML pipeline orchestration with prebuilt components.

Why this answer

Vertex AI Pipelines is the correct choice because it is a managed service specifically designed to orchestrate and automate end-to-end ML workflows, including model retraining and deployment triggered by new data. It allows you to define pipelines as a directed acyclic graph (DAG) of steps using the Kubeflow Pipelines SDK or pre-built components, and it integrates natively with other Vertex AI services for training, evaluation, and deployment.

Exam trap

The trap here is that candidates often confuse Cloud Composer (a general-purpose Airflow service) with Vertex AI Pipelines, but the exam expects you to recognize that Vertex AI Pipelines is the ML-specific, fully managed solution for end-to-end ML workflow orchestration, while Cloud Composer requires more manual setup and lacks native Vertex AI integration.

How to eliminate wrong answers

Option A is wrong because Cloud Build is a CI/CD service focused on building, testing, and deploying software artifacts (e.g., container images), not on orchestrating ML workflows with steps like data validation, model training, and deployment. Option C is wrong because Cloud Scheduler is a cron job service that triggers actions on a time-based schedule, not on the event of new training data becoming available, and it lacks the workflow orchestration capabilities needed for complex ML pipelines. Option D is wrong because Cloud Composer is a managed Apache Airflow service that can orchestrate workflows, but it is a general-purpose workflow orchestrator, not purpose-built for ML pipelines; Vertex AI Pipelines provides tighter integration with Vertex AI components, managed execution, and artifact tracking, making it the more appropriate choice for this specific ML automation scenario.

Full explanation →

433

MCQhard

You are optimizing a BigQuery query that runs on a large table (hundreds of TB). The table is partitioned by date and frequently queried with filters on a specific customer_id column and date range. Queries are slow even after partitioning. Which optimization should you apply?

A.Increase the number of BigQuery slots

B.Columnar clustering on customer_id

C.Create materialized views for each customer

D.Denormalize the table to reduce joins

AnswerB

Clustering sorts data within each partition by customer_id, enabling block pruning for queries filtering on that column.

Why this answer

Clustering on customer_id within the partition improves query performance because BigQuery can prune blocks based on clustered columns. Partitioning alone doesn't help with non-date filters. Materialized views may help pre-aggregated queries but not ad-hoc customer_id filters.

Denormalization is not an optimization. Increasing slots is expensive and doesn't address data structure.

Full explanation →

434

MCQeasy

A company uses Dataflow to process streaming data from Pub/Sub. They notice increased processing latency. What is the most likely cause?

A.Insufficient workers

B.Pub/Sub subscription issue

C.Too many shards

D.Wrong machine type

AnswerA

Insufficient workers create backpressure and increased latency as the pipeline cannot keep up with throughput.

Why this answer

In Dataflow, processing latency increases most commonly due to insufficient workers, as the streaming pipeline cannot keep up with the incoming data rate when the number of Compute Engine instances is too low. This causes backpressure from Pub/Sub, leading to growing unacknowledged messages and higher end-to-end latency. Autoscaling may be delayed or limited by max worker count settings, making manual or configuration-based worker scaling the primary corrective action.

Exam trap

Google Cloud often tests the misconception that Pub/Sub subscription issues (like ack deadline) are the primary cause of latency, but the trap here is that latency in Dataflow is almost always a worker scaling problem, not a Pub/Sub configuration issue.

How to eliminate wrong answers

Option B is wrong because a Pub/Sub subscription issue (e.g., expired pull request or misconfigured ack deadline) would cause message delivery failures or duplicates, not a gradual increase in processing latency across the pipeline. Option C is wrong because too many shards (i.e., excessive parallelism) can cause overhead but typically leads to underutilization or increased cost, not increased latency; latency from too many shards is rare and usually secondary to worker count. Option D is wrong because the wrong machine type (e.g., low CPU or memory) could degrade per-worker performance, but the most likely and direct cause of increased latency in a streaming Dataflow job is insufficient worker count, not machine type, as Dataflow’s autoscaling primarily adjusts worker count rather than machine type.

Full explanation →

435

MCQhard

A company runs a critical real-time data pipeline using Dataflow that ingests events from Cloud Pub/Sub, performs aggregations using sliding windows, and writes results to BigQuery. The pipeline is deployed in us-central1. The pipeline's latency has increased recently, and the Dataflow monitoring shows that the 'system lag' metric is consistently above 5 minutes. The pipeline is using Streaming Engine and has 10 workers with 4 vCPUs each. The pipeline processes approximately 100,000 events per second. The team has verified that the source Pub/Sub topic has sufficient publish throughput and the BigQuery table has no quota issues. The pipeline logs show that some workers are experiencing GC overhead limit exceeded errors. The pipeline code uses stateful processing with a custom keyed state for deduplication. What is the most likely cause of the increased latency?

A.The number of workers is insufficient; increasing to 20 workers will reduce latency.

B.The stateful processing is causing large state sizes that lead to GC overhead; use a more efficient state backend or increase worker memory.

C.The sliding window duration is too long; reducing it to 1 minute will improve performance.

D.The deduplication logic is causing a bottleneck; removing it will reduce latency.

AnswerB

GC overhead indicates memory pressure from large state; increasing memory or using a more efficient state backend like Cloud Bigtable can help.

Why this answer

The GC overhead limit exceeded errors indicate that workers are spending too much time garbage collecting, which is a classic symptom of excessive heap memory usage. Stateful processing with custom keyed state for deduplication can cause large per-key state sizes, especially with sliding windows that maintain overlapping state for each key. This forces the JVM to constantly garbage collect, increasing system lag beyond 5 minutes.

Using a more efficient state backend (e.g., reducing state size or using Dataflow's built-in deduplication) or increasing worker memory directly addresses the root cause.

Exam trap

Google Cloud often tests the misconception that scaling workers (Option A) is the universal fix for latency, when in reality memory-related issues like GC overhead require tuning state management or worker resources, not just parallelism.

How to eliminate wrong answers

Option A is wrong because increasing the number of workers does not fix the GC overhead issue; it may even worsen it by distributing state across more workers without reducing per-worker memory pressure. Option C is wrong because reducing the sliding window duration does not address the state size or GC problem; it could actually increase the number of overlapping windows and state churn. Option D is wrong because removing deduplication would compromise data correctness; the bottleneck is not the logic itself but the memory footprint of the state, which can be mitigated without removing the feature.

Full explanation →

436

MCQmedium

A company is deploying a large-scale streaming application on Google Kubernetes Engine. They need to ensure the application can handle sudden traffic spikes without dropping data. Which architectural pattern is most appropriate?

A.Implement custom retry logic with exponential backoff in the application.

B.Use Cloud SQL as a temporary buffer and process from there.

C.Pre-provision 3x the expected peak capacity to handle spikes.

D.Use a Pub/Sub topic as a buffer and autoscale consumer pods based on Pub/Sub subscription backlog.

AnswerD

Pub/Sub provides a highly scalable buffer; autoscaling consumers based on backlog ensures capacity matches demand.

Why this answer

Option A is correct because using a Pub/Sub buffer decouples producers from consumers, allowing autoscaling of consumers to handle spikes. Option B is wasteful and not dynamically scalable. Option C uses Cloud SQL, which is not designed for high-throughput buffering.

Option D only addresses retries, not overall throughput capacity.

Full explanation →

437

MCQhard

A company's Dataflow pipeline uses the PubSubIO source to read messages and writes to BigQuery via the BigQueryIO sink. The pipeline is running in Streaming mode with exactly-once semantics enabled. Occasionally, duplicate rows appear in BigQuery. What is the most likely reason?

A.The user-provided record ID for deduplication in BigQuery's streaming inserts is not being set for all messages, leading to duplicate rows.

B.The pipeline is using the WriteResult method with WRITE_APPEND in batch mode, which can cause duplicates if retries happen.

C.The pipeline is experiencing the 'dataflow streaming log processing' bug, causing duplicate logs to be written.

D.The PubSubIO source is configured with a dead-letter queue and messages are being redelivered without proper deduplication.

AnswerA

BigQueryIO uses insertId for deduplication; if it's missing or inconsistent, duplicates can occur.

Why this answer

In Dataflow streaming pipelines with exactly-once semantics, BigQuery's streaming inserts use user-provided record IDs for deduplication. If the record ID is not set for all messages, BigQuery cannot identify duplicates, and retries or redeliveries from Pub/Sub can result in duplicate rows. This is the most common cause of duplicates in this scenario.

Exam trap

Google Cloud often tests the misconception that exactly-once semantics in Dataflow automatically deduplicates at the sink, but in reality, BigQuery requires explicit user-provided record IDs for deduplication during streaming inserts.

How to eliminate wrong answers

Option B is wrong because WRITE_APPEND in batch mode is not relevant to a streaming pipeline with exactly-once semantics; the question specifies streaming mode, and batch mode duplicates would not explain streaming-specific behavior. Option C is wrong because there is no known 'dataflow streaming log processing' bug that causes duplicate logs; this is a fabricated term. Option D is wrong because a dead-letter queue handles failed messages after retries are exhausted, not redelivery; Pub/Sub redelivery without deduplication is already addressed by the user-provided record ID mechanism, and the dead-letter queue does not cause duplicates.

Full explanation →

438

MCQmedium

A company has a trained model stored in Vertex AI Model Registry. They want to automate retraining when new training data arrives in Cloud Storage. Which approach is most efficient?

A.Use Cloud Functions triggered by Cloud Storage events to start a Vertex AI Training job

B.Use Dataflow to continuously update the model

C.Use Cloud Scheduler to trigger a Cloud Build retraining step

D.Schedule a weekly Cloud Composer DAG to check for new data and retrain

AnswerA

Cloud Functions provide real-time event-driven triggers to initiate retraining immediately when new data appears.

Why this answer

Cloud Functions can be directly triggered by Cloud Storage events (e.g., object finalize) to invoke the Vertex AI Training service via the AI Platform API. This creates an event-driven, serverless pipeline that retrains the model immediately when new data arrives, without polling or manual intervention, making it the most efficient and cost-effective approach.

Exam trap

Google Cloud often tests the distinction between event-driven (Cloud Functions) and scheduled (Cloud Scheduler, Cloud Composer) approaches, and candidates mistakenly choose a scheduled option thinking it is simpler, missing the requirement for immediate reaction to new data.

How to eliminate wrong answers

Option B is wrong because Dataflow is a stream/batch data processing service for transforming data, not for orchestrating model retraining; it would require custom code to trigger training and lacks native integration with Vertex AI Model Registry. Option C is wrong because Cloud Scheduler triggers jobs on a fixed schedule, not on data arrival events, so it cannot react to new data in real time and may waste resources on unnecessary retraining. Option D is wrong because a weekly Cloud Composer DAG introduces latency (up to a week) and operational overhead for a simple event-driven task, and it is less efficient than a serverless function that fires instantly on data arrival.

Full explanation →

439

Multi-Selecthard

Which THREE actions reduce the cost of a Cloud Composer environment?

Select 3 answers

A.Delete old and unused DAG files to reduce scheduler load

B.Use standard network tier instead of premium

C.Set up a maintenance window to shut down the environment during idle hours

D.Use a smaller environment size (e.g., small instead of medium)

E.Increase the number of schedulers for higher throughput

AnswersA, C, D

Less load means fewer resources needed.

Why this answer

Option A is correct because deleting old and unused DAG files reduces the number of DAGs the scheduler must parse and evaluate. The Cloud Composer scheduler scans the DAG folder every 30 seconds by default; fewer DAG files mean lower CPU and memory consumption, directly reducing the cost of the environment's compute resources.

Exam trap

The trap here is that candidates confuse scaling up (Option E) with cost optimization, not realizing that adding schedulers increases resource consumption and cost, while the correct cost-saving actions involve reducing resource usage or shutting down idle capacity.

Full explanation →

440

MCQhard

A company uses Vertex AI Pipelines to orchestrate ML workflows. They want to automatically retrain the model when new data arrives, but only if the model's performance drops below a threshold. Which approach is best?

A.Use BigQuery scheduled queries to trigger pipeline

B.Trigger a pipeline on a schedule

C.Use Vertex AI Model Monitor to detect skew and trigger retraining

D.Use Cloud Functions to evaluate performance and trigger pipeline

AnswerC

Model Monitor can detect performance degradation and automatically trigger retraining pipelines.

Why this answer

Option C is correct because Vertex AI Model Monitoring can detect skew and drift, and can trigger retraining workflows automatically. Option A is wrong because scheduled triggers do not consider performance metrics. Option B is wrong because Cloud Functions would require custom logic to evaluate performance, which is more complex than using built-in monitoring.

Option D is wrong because BigQuery scheduled queries are not integrated with model performance monitoring.

Full explanation →

441

MCQhard

A healthcare company processes patient data using a Dataflow pipeline that reads from Cloud Storage, transforms data, and writes to BigQuery. They need to ensure that the processing is idempotent to handle failures and retries without duplicating records. The data arrives in daily batches and may be re-delivered if earlier processing failed. What approach should they take to guarantee exactly-once processing in BigQuery?

A.Use BigQuery's streaming inserts with InsertId to deduplicate

B.Ingest data via Pub/Sub and use a Dataflow pipeline with exactly-once processing

C.Use Dataflow's built-in exactly-once semantics and write to BigQuery via load jobs

D.Write data to a staging BigQuery table, then use a MERGE statement to upsert into the final table

AnswerD

MERGE ensures idempotency by matching on unique keys.

Why this answer

Option D is correct because BigQuery load jobs are not idempotent by default; if a load job is retried, it can create duplicate rows. By writing to a staging table first and then using a MERGE statement (or INSERT IF NOT EXISTS) to upsert into the final table, you can deduplicate based on a unique key. This approach guarantees exactly-once semantics even when the same batch is re-delivered, as the MERGE operation will only insert rows that do not already exist in the target table.

Exam trap

The trap here is that candidates often assume Dataflow's exactly-once semantics automatically extend to the sink (BigQuery), but in reality, BigQuery load jobs are not idempotent, so you must implement a deduplication strategy like staging + MERGE to guarantee exactly-once processing.

How to eliminate wrong answers

Option A is wrong because BigQuery streaming inserts with InsertId provide best-effort deduplication within the streaming buffer, but duplicates can still occur if the InsertId is reused after the deduplication window (typically a few minutes) or if the insert fails and is retried with a different InsertId. Option B is wrong because Pub/Sub with Dataflow's exactly-once processing ensures that each message is processed exactly once within the pipeline, but it does not guarantee idempotent writes to BigQuery; if the pipeline fails after writing to BigQuery but before acknowledging the message, a retry could cause duplicate rows. Option C is wrong because Dataflow's built-in exactly-once semantics apply to the pipeline's internal state and shuffle operations, but BigQuery load jobs are not idempotent; if a load job is retried (e.g., due to a worker failure), the same data can be loaded multiple times, resulting in duplicates.

Full explanation →

442

MCQmedium

What is the most likely cause of data duplication after this command?

A.The Pub/Sub source is not exactly-once.

B.The pipeline uses at-least-once semantics.

C.The snapshot was taken before scaling.

D.The BigQuery sink is not idempotent.

AnswerD

If the sink is not idempotent, duplicate data can be written when workers are re-added or when job state is replayed.

Why this answer

Option D is correct because BigQuery sinks in Dataflow are not idempotent by default; if the pipeline retries writes (e.g., due to worker failures or checkpoint issues), duplicate rows can be inserted into the BigQuery table. This is a known limitation: BigQuery does not support deduplication at the sink level unless you implement custom deduplication logic or use a staging table with merge operations. The command likely triggered a retry scenario, and the non-idempotent sink caused the duplication.

Exam trap

Google Cloud often tests the misconception that at-least-once semantics alone cause duplication, but the real trap is that the sink's idempotency (or lack thereof) is the decisive factor when retries occur.

How to eliminate wrong answers

Option A is wrong because Pub/Sub sources in Dataflow can be configured for exactly-once delivery using the 'exactly-once' flag (e.g., with Pub/Sub Lite or by enabling the 'enable_exactly_once' option), and the question does not indicate that the source is the cause. Option B is wrong because at-least-once semantics are a pipeline processing mode, not a direct cause of data duplication; they can lead to duplicates if the sink is not idempotent, but the question asks for the 'most likely cause' and the sink's idempotency is the immediate factor. Option C is wrong because taking a snapshot before scaling does not inherently cause data duplication; snapshots preserve pipeline state for resumption, and scaling only affects parallelism, not data integrity.

Full explanation →

443

MCQhard

A data engineer is designing a batch ETL pipeline that reads CSV files from Cloud Storage, transforms them using Dataproc, and writes the results to BigQuery. The data volume is expected to grow 10x in the next year. Which design approach best balances cost and performance?

A.Create a single large persistent Dataproc cluster to handle the peak load.

B.Use Cloud Data Fusion to visually design the pipeline and run it on Dataproc.

C.Use a Dataproc cluster with preemptible worker nodes and autoscaling enabled.

D.Migrate the pipeline to Dataflow with Apache Beam and use flexRS for cost savings.

AnswerC

Preemptible VMs are cost-effective, and autoscaling handles growth.

Why this answer

Option C is correct because preemptible worker nodes significantly reduce cost (up to 80% discount) while autoscaling dynamically adjusts cluster size to match the growing workload, ensuring performance without over-provisioning. This combination handles the 10x data growth efficiently by scaling out during peak loads and scaling in during lulls, using preemptible instances for fault-tolerant tasks like transformation.

Exam trap

The trap here is that candidates often choose Dataflow (Option D) assuming it is always the best for cost and performance, but the question specifically involves Dataproc and batch ETL from Cloud Storage to BigQuery, where preemptible nodes with autoscaling provide a more direct and cost-effective solution without requiring a pipeline rewrite.

How to eliminate wrong answers

Option A is wrong because a single large persistent cluster incurs high costs even when idle, and cannot efficiently handle a 10x growth without manual resizing, leading to either underutilization or performance bottlenecks. Option B is wrong because Cloud Data Fusion is a visual design tool that adds complexity and cost (via Dataproc provisioning) without inherent autoscaling or preemptible node benefits, and is not optimized for batch ETL cost control. Option D is wrong because Dataflow with flexRS is designed for batch workloads with flexible scheduling, but it requires rewriting the pipeline in Apache Beam, which adds migration overhead and may not leverage existing Dataproc investments; flexRS offers cost savings but with potential execution delays, making it less balanced for immediate performance needs.

Full explanation →

444

Multi-Selecthard

A company wants to implement a robust MLOps lifecycle on Google Cloud. Which THREE components are essential?

Select 3 answers

A.Vertex AI Model Registry for versioning

B.Vertex AI Pipelines for orchestration

C.Pub/Sub for event-driven retraining

D.Cloud Build for CI/CD

E.Cloud SQL for model metadata

AnswersA, B, D

Model Registry centralizes model version management and deployment.

Why this answer

Vertex AI Model Registry is essential for versioning because it provides a centralized repository to track, manage, and deploy different versions of trained ML models. This ensures reproducibility, auditability, and the ability to roll back to previous versions, which is critical for a robust MLOps lifecycle.

Exam trap

The trap here is that candidates may confuse optional supporting services (like Pub/Sub for event triggers or Cloud SQL for metadata) with the essential components required for a robust MLOps lifecycle, which are versioning, orchestration, and CI/CD.

Full explanation →

445

MCQhard

A healthcare company deploys a model for diagnosing medical images on Vertex AI using a custom container with a TensorFlow model. The model uses a mixture of GPUs (NVIDIA T4) and CPUs. After deployment, you notice that prediction latency is highly variable: sometimes under 100ms, sometimes over 10 seconds. Investigation shows that the variability correlates with the number of concurrent requests. The endpoint has a min replicas of 1 and max replicas of 3, with target CPU utilization set to 80%. You also observe that GPU utilization remains low (<20%) even during high load. What is the most likely cause of the latency variability? A) The model is not fully utilizing GPUs due to inefficient data loading from CPU. B) The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization. C) The GPU machine type is too small for the model. D) The container is not configured to use the GPU correctly.

A.The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization.

B.The model is not fully utilizing GPUs due to inefficient data loading from CPU.

C.The container is not configured to use the GPU correctly.

D.The GPU machine type is too small for the model.

AnswerA

Standard autoscaling uses CPU; for GPU workloads, you should use custom metrics like GPU utilization or request count.

Why this answer

Option B is correct because Vertex AI scales based on CPU utilization by default, but GPU-bound workloads may have low CPU utilization, causing autoscaling not trigger. Thus, during high load, the single replica is overwhelmed, causing high latency. Option A (inefficient data loading) could contribute but is not the primary cause.

Option C (GPU too small) would cause consistently high latency. Option D (GPU not configured) would cause continuous errors, not variable latency.

Full explanation →

446

MCQhard

A financial services company needs to process high-frequency trading data with strict ordering guarantees. They use Pub/Sub with ordering keys and Dataflow. The pipeline occasionally produces out-of-order results. What is the most likely cause?

A.Dataflow does not preserve order when using multiple workers

B.Dataflow uses at-least-once processing, which can reorder events

C.Pub/Sub does not guarantee message ordering

D.The window trigger allows late data to be included after the main output

AnswerD

Late data can be emitted in a different pane, causing apparent out-of-order results.

Why this answer

Option D is correct because Dataflow's default window trigger behavior allows late data to arrive after the main pane is emitted. When using Pub/Sub with ordering keys, late-arriving events (e.g., due to network delays or retries) can be assigned to the correct window but emitted in a separate pane, causing the final output to appear out-of-order relative to the event time. This is a known behavior when combining event-time windows with late data handling.

Exam trap

Google Cloud often tests the misconception that Pub/Sub's lack of ordering guarantees is the primary cause of out-of-order results in Dataflow, when in fact the issue is typically the window trigger and late data handling within Dataflow itself.

How to eliminate wrong answers

Option A is wrong because Dataflow can preserve order within a key when using a single worker per key, but the question's scenario involves ordering keys and the issue is not about multiple workers reordering events—Dataflow's shuffle and grouping operations maintain order per key. Option B is wrong because at-least-once processing guarantees delivery but does not inherently reorder events; reordering is caused by late data or window triggers, not by the processing semantics alone. Option C is wrong because Pub/Sub does guarantee message ordering when messages are published to the same ordering key and within the same region, as long as the subscriber acknowledges messages in order; the question states they use ordering keys, so Pub/Sub ordering is not the root cause.

Full explanation →

447

MCQeasy

Your company has deployed a machine learning model on Vertex AI Endpoint to serve real-time predictions for a mobile application. The model was trained using TensorFlow and the prediction requests include raw images that are preprocessed by the client before sending. Recently, the application developers reported that the predictions are becoming less accurate over time. They suspect the issue is related to changes in the client-side preprocessing code. You need to verify this hypothesis and monitor for future regressions. What should you do?

A.Retrain the model using the latest client data to adapt to any changes in preprocessing.

B.Roll back to a previous model version that was known to work well and disable automatic retraining.

C.Ask the developers to provide the exact preprocessing code and manually compare it with the training pipeline's preprocessing.

D.Enable Vertex AI Model Monitoring for feature attribution and set up alerting on skew detection.

AnswerD

Model Monitoring can detect training-serving skew by comparing feature distributions; this would catch preprocessing changes effectively.

Full explanation →

448

MCQeasy

A company is ingesting real-time sensor data from thousands of devices into Cloud Pub/Sub. They need to process this data with low latency (seconds) and exactly-once semantics. Which data processing service should they use?

A.Cloud Run with Pub/Sub push

B.Cloud Functions triggered by Pub/Sub

C.Dataflow streaming with exactly-once processing

D.Dataproc with Spark Streaming

AnswerC

Dataflow provides exactly-once processing for streaming data with low latency, ideal for real-time sensor data.

Why this answer

Dataflow streaming with exactly-once processing is the correct choice because it provides exactly-once semantics for Pub/Sub sources via checkpointing and idempotent sinks, and it meets the low-latency (seconds) requirement through its streaming engine that minimizes per-element overhead. Cloud Dataflow's integration with Pub/Sub ensures that each message is processed exactly once, even in the presence of failures, by using snapshots and consistent state management.

Exam trap

Google Cloud often tests the misconception that serverless services like Cloud Functions or Cloud Run inherently provide exactly-once processing, when in fact they rely on Pub/Sub's at-least-once delivery and require additional logic to achieve exactly-once semantics.

How to eliminate wrong answers

Option A is wrong because Cloud Run with Pub/Sub push does not guarantee exactly-once processing; Pub/Sub push delivery is at-least-once, and Cloud Run's stateless containers cannot enforce exactly-once semantics without external coordination. Option B is wrong because Cloud Functions triggered by Pub/Sub also uses at-least-once delivery from Pub/Sub and lacks built-in mechanisms for exactly-once processing; it is designed for lightweight, event-driven tasks, not for stateful streaming with exactly-once guarantees. Option D is wrong because Dataproc with Spark Streaming provides at-least-once or exactly-once semantics only with additional configuration (e.g., checkpointing and idempotent sinks), but it introduces higher latency (typically seconds to minutes) due to micro-batching and is not optimized for sub-second or low-latency streaming compared to Dataflow's streaming engine.

Full explanation →

449

MCQmedium

A data engineering team uses Cloud Pub/Sub to ingest clickstream events and Cloud Dataflow to process them. They need to maintain strict event ordering per user session, and the processing output must be written to a BigQuery table with exactly-once semantics. Which configuration should the team implement?

A.Enable message ordering in Pub/Sub with a session ID as the ordering key, and in Dataflow use a global window with a custom trigger that fires on watermark and uses a BigQuery sink with 'exactly-once' mode enabled.

B.Use a Pub/Sub pull subscription with a subscriber that acknowledges messages immediately after processing, and a Dataflow pipeline with a sliding window.

C.Assign a unique session ID as the message ordering key in Pub/Sub, use a Dataflow pipeline with session windows and .withAllowedLateness(0), and write to BigQuery using a batch load.

D.Use a Pub/Sub push subscription with an acknowledgment deadline of 600 seconds and enable exactly-once delivery on the subscription.

AnswerA

D is correct because Pub/Sub ordering keys maintain order per session, and Dataflow's exactly-once sink to BigQuery prevents duplicates when combined with deterministic triggers.

Why this answer

Option A is correct because it combines Pub/Sub message ordering (using a session ID as the ordering key) with Dataflow's exactly-once sink to BigQuery. The global window with a watermark-based trigger ensures all events for a session are processed in order before writing, while the BigQuery 'exactly-once' mode prevents duplicate rows even if the pipeline retries. This satisfies both strict per-session ordering and exactly-once semantics.

Exam trap

Google Cloud often tests the misconception that Pub/Sub's exactly-once delivery subscription alone guarantees end-to-end exactly-once processing, ignoring that Dataflow's sink configuration and windowing strategy are required for ordering and deduplication in the output.

How to eliminate wrong answers

Option B is wrong because acknowledging messages immediately after processing (auto-ack) can cause message loss if the pipeline fails before writing to BigQuery, breaking exactly-once semantics; sliding windows do not maintain per-session ordering. Option C is wrong because session windows in Dataflow group events by session gaps, not by a fixed ordering key, and .withAllowedLateness(0) drops late events, risking incomplete sessions; batch loads to BigQuery do not provide exactly-once write semantics (they can produce duplicates on retry). Option D is wrong because enabling exactly-once delivery on a Pub/Sub subscription only ensures at-least-once delivery from Pub/Sub, not exactly-once processing downstream; a 600-second acknowledgment deadline does not guarantee ordering or exactly-once writes to BigQuery.

Full explanation →

450

Multi-Selecteasy

Which TWO roles are required to allow a service account to run a Dataflow job and write results to BigQuery? (Choose two.)

Select 2 answers

A.roles/pubsub.subscriber

B.roles/dataflow.worker

C.roles/bigquery.dataEditor

D.roles/storage.objectAdmin

E.roles/dataflow.admin

AnswersB, C

Required for the worker service account to run the job.

Why this answer

Option B is correct because the roles/dataflow.worker role grants the service account the necessary permissions to execute Dataflow worker tasks, such as reading from sources and writing to sinks. Option C is correct because roles/bigquery.dataEditor allows the service account to insert rows into BigQuery tables, which is required for the Dataflow job to write results.

Exam trap

The trap here is that candidates often select roles/dataflow.admin thinking it is needed to run a job, but the exam tests that the worker role is sufficient for execution, while admin is for management tasks like creating or updating jobs.

Full explanation →

Google Professional Data Engineer (PDE) — Questions 376–450