PDE Practice Test 35 — 15 Questions

Question 1

Your Vertex AI model deployed on an endpoint is experiencing high tail latency during online predictions. The model uses a large embedding layer, and the input size varies. You have enabled automatic scaling with a minimum of 2 replicas and maximum of 10. What is the most likely cause of the latency spikes and the best first step to diagnose?

Accepted Answer

The endpoint's target CPU utilization might be set too low, causing rapid scale-down and cold starts. Check Cloud Logging for scaling events.. High tail latency with variable input sizes and a large embedding layer often points to cold starts from aggressive scaling. When the target CPU utilization is set too low, the endpoint scales down quickly during lulls, and a subsequent burst of requests forces new replicas to spin up, causing latency spikes. Checking Cloud Logging for scaling events is the best first step because it directly reveals whether the endpoint is scaling down and then experiencing cold starts.

Answer

The model's SavedModel is too large due to the embedding layer; reduce embedding dimensions to lower latency.

Answer

The model uses a custom prediction routine that is not optimized; use tf.function to improve performance.

Answer

Enable model monitoring for online prediction and add a buffer to the endpoint's machine type.

Question 2

You run `gcloud ai models describe` and get the error above. The model was created successfully from a training job that completed without errors. The model ID is correct. What is the most likely cause?

Accepted Answer

The model was created in a different region (e.g., europe-west4) than the one specified in the command.. Option D is correct because `gcloud ai models describe` defaults to the `us-central1` region unless overridden with the `--region` flag. If the model was created in a different region (e.g., `europe-west4`), the command will fail with a 'Model not found' error even though the model ID is correct. Vertex AI models are regional resources, so the region must match exactly.

Answer

The model was deleted or expired due to time-to-live settings.

Answer

The gcloud command is not authenticated to the correct project.

Answer

The model was created but not yet trained; training must complete before describe works.

Question 3

A company has deployed a classification model on Vertex AI. They want to detect data drift in real-time for the model's input features. Which service should they use?

Accepted Answer

Vertex AI Model Monitoring. Vertex AI Model Monitoring is the correct service because it is specifically designed to detect data drift and feature skew for models deployed on Vertex AI. It continuously monitors input features against a baseline distribution and alerts when drift exceeds a configured threshold, enabling real-time detection without requiring custom code.

Answer

Cloud Monitoring

Answer

Cloud Data Loss Prevention

Answer

Cloud Logging

Question 4

Which THREE Google Cloud services are typically used together in a production ML pipeline?

Accepted Answer

Cloud Storage. Cloud Storage is correct because it serves as the central artifact repository in a production ML pipeline on Google Cloud. It stores training data, model artifacts, and prediction inputs/outputs, enabling seamless integration with Vertex AI Training for model training and Vertex AI Prediction for serving. Without Cloud Storage, there is no durable, scalable, and cost-effective way to manage the large datasets and model binaries required for production ML workflows.

Answer

Cloud Functions

Answer

BigQuery

Question 5

Your organization uses Vertex AI Feature Store to serve features for a real-time fraud detection model. The model is deployed on a Vertex AI endpoint. After a data pipeline update, the model's online predictions became inconsistent. What is the most likely cause?

Accepted Answer

The feature store's online serving values are not synchronized with the batch feature values used during training.. In Vertex AI Feature Store, batch feature values used during model training and online serving values are stored separately. If a data pipeline update changes the batch feature values but the online serving values are not updated or synchronized, the model will receive different feature values at inference time than it was trained on, leading to inconsistent predictions. This is the most common cause of prediction drift after a pipeline change.

Answer

The model's prediction server is running out of memory.

Answer

The model was retrained with a different training dataset.

Answer

The online serving endpoint's model version was accidentally rolled back.

Question 6

A startup is using Cloud Build to automate the training and deployment of their machine learning models. The workflow is defined in cloudbuild.yaml and includes steps to: 1) Run a training job on AI Platform Training, 2) Build a custom prediction container, 3) Deploy the container to Cloud Run for serving. The deployment step fails intermittently with the error: 'Cloud Run service already exists and is not owned by the calling user.' You need to fix this so that deployments are reliable. What should you do?

Accepted Answer

Ensure the Cloud Build service account has the 'run.services.update' permission on the Cloud Run service.

Answer

Delete the existing Cloud Run service manually before each build.

Answer

Use 'gcloud run deploy --replace' in the build step to force replace the existing service.

Answer

Use Cloud Run for Anthos instead of fully managed Cloud Run to avoid ownership issues.

Question 7

A retail company uses Cloud Dataflow for a streaming pipeline that aggregates sales events from thousands of stores. The pipeline writes aggregated results to BigQuery every 5 minutes. Recently, the Dataflow job has been restarting multiple times a day with the error: 'Worker ran out of memory' in the logs. The streaming engine is enabled. The pipeline uses keyed state (ParDo with stateful processing) to maintain per-store counters. The average event size is 2KB, and the throughput is 2,000 events/sec. You need to resolve the out-of-memory issues without losing data. What should you do?

Accepted Answer

Increase the number of workers in the pipeline configuration and ensure the maximum worker count is set higher to allow better distribution of state.. Option C is correct because increasing the number of workers distributes the keyed state (per-store counters) across more VMs, reducing the memory pressure on each individual worker. With streaming engine enabled, state is still held in worker memory for low-latency access, so adding workers is the direct way to scale the state footprint. This avoids data loss because the pipeline continues processing with exactly-once semantics and state is preserved via checkpointing.

Answer

Disable stateful processing and use side inputs from BigQuery to get per-store aggregates.

Answer

Modify the pipeline to use sliding windows with a shorter duration to reduce the state size.

Answer

Reduce the number of workers to limit the overhead of data shuffling.

Question 8

A healthcare analytics company runs a nightly Dataproc workflow that reads radiology reports from Cloud Storage (CSV files), transforms them using PySpark, and writes results to BigQuery. The workflow is orchestrated by Cloud Composer. Recently, the job has started failing with 'Disk quota exceeded' errors on the worker nodes. The data volume has grown 5x over the past month. Currently, the cluster uses 5 n1-standard-4 workers (each 10GB persistent disk). The PySpark jobs heavily use intermediate shuffles. You need a cost-effective solution that avoids future failures as data grows. What should you do?

Accepted Answer

Increase the persistent disk size on each worker node to 100 GB.. The 'Disk quota exceeded' error occurs because the 10 GB persistent disks on the n1-standard-4 workers are too small to accommodate the intermediate shuffle data, which has grown 5x. Increasing the persistent disk size to 100 GB directly addresses the storage bottleneck without changing the machine type or incurring the cost of local SSDs, making it a cost-effective solution that scales with data growth.

Answer

Upgrade the worker machine type to n1-standard-8 with local SSDs for shuffle storage.

Answer

Add more preemptible workers to the cluster and keep boot disk size at 10GB.

Answer

Use Cloud Dataflow instead of Dataproc, as it handles disk management transparently.

Question 9

Your company runs a batch data processing pipeline using Cloud Dataproc and Cloud Composer. The pipeline processes hundreds of terabytes of data daily. Recently, the pipeline has been failing intermittently due to Dataproc cluster creation errors: 'Insufficient resources to create cluster in zone us-central1-f.' The project has a global quota of 1000 vCPUs for Compute Engine. The team usually uses n2-standard-8 (8 vCPU) worker nodes. You notice that the error occurs during peak usage times. You need to ensure the pipeline runs reliably without increasing the global quota. Which action should you take?

Accepted Answer

Configure the Dataproc cluster to use multiple zones via the --zone argument with a zonal list. Option D is correct because configuring the Dataproc cluster to use multiple zones via the `--zone` argument with a zonal list distributes worker node creation across several zones in the same region. This avoids the 'Insufficient resources' error by not exhausting capacity in a single zone, without requiring a global quota increase. Cloud Dataproc supports specifying a comma-separated list of zones, and the service will attempt to create the cluster in the first available zone.

Answer

Increase the global Compute Engine quota to 2000 vCPUs

Answer

Switch to using preemptible VMs only, which have higher availability

Answer

Use fewer workers with larger machine types, such as n2-standard-64

Question 10

A startup is building a real-time dashboard that shows aggregated metrics from social media feeds. They expect up to 10,000 events per second. The data must be near-real-time (< 30 seconds latency) and stored in BigQuery for historical analysis. They have limited experience managing infrastructure. The CTO suggests using Apache Kafka on Compute Engine for ingestion. However, the data engineer recommends a fully managed solution. Which approach should the team adopt?

Accepted Answer

Use Cloud Pub/Sub for ingestion and Cloud Dataflow for streaming into BigQuery. Option C is correct because Cloud Pub/Sub provides a fully managed, scalable ingestion service that can handle 10,000+ events per second without infrastructure management, and Cloud Dataflow offers exactly-once, auto-scaling streaming into BigQuery with sub-30-second latency. This combination meets the near-real-time requirement while eliminating operational overhead, aligning with the data engineer's recommendation for a fully managed solution.

Answer

Use Cloud Functions to ingest events directly into BigQuery

Answer

Use Apache Kafka on Compute Engine for ingestion, then use Dataflow to write to BigQuery

Answer

Use App Engine to receive events and write to BigQuery

Question 11

A company runs a streaming data pipeline on Google Cloud using Cloud Pub/Sub, Cloud Dataflow, and BigQuery. The pipeline processes real-time sensor data for predictive maintenance. Recently, the Dataflow job's lag has increased from seconds to minutes, and the system shows backpressure. The pipeline uses fixed windows of 1 minute and writes results to BigQuery. The data volume has doubled. The team has already increased the number of workers. What should they do next? Options: A. Use session windows instead of fixed windows. B. Enable Streaming Engine and use Upsert to BigQuery. C. Decrease the window duration. D. Use Cloud Storage as temporary sink.

Accepted Answer

Enable Streaming Engine and use Upsert to BigQuery. The correct answer is A because enabling Streaming Engine offloads the heavy shuffle and state management from the worker VMs to the backend service, reducing the impact of backpressure. Using Upsert to BigQuery allows the pipeline to handle late-arriving data within the fixed windows without requiring a full table rewrite, which is critical when data volume has doubled and lag has increased.

Answer

Decrease the window duration

Answer

Use session windows instead of fixed windows

Answer

Use Cloud Storage as temporary sink

Question 12

You are a data engineer at a financial services company that uses Vertex AI to train and deploy models for credit risk assessment. The company has strict governance requirements: every model version must be approved by the risk committee before going to production. The approval process can take several days. Currently, the team trains a new model weekly and manually deploys it to a staging endpoint for review, then manually promotes to production after approval. This process is error-prone and slow. You want to automate the pipeline: training should trigger automatically when new data arrives, the model should be automatically deployed to a staging endpoint for review, and after manual approval, it should be promoted to production. Additionally, you need to ensure that if a model in staging performs poorly (e.g., low accuracy), it should not be promoted even if approved. What should you do?

Accepted Answer

Create a Vertex AI Pipeline that: (1) Triggers on new data, (2) Trains model, (3) Evaluates and stores metrics in the model registry, (4) Deploys to staging endpoint as a new model version. Then use a manual approval step (e.g., via Cloud Build approval or external system) to trigger a second pipeline that checks the stored metrics and, if acceptable, deploys to production endpoint.. The best approach uses Vertex AI Pipelines to automatically train and deploy to a staging endpoint. After manual approval, a separate pipeline step checks model performance metrics (which were stored during training/evaluation) and if they meet a threshold, promotes to production. This enforces governance and automation.

Answer

Use Vertex AI Experiments to track model versions, then manually deploy from the Experiments UI.

Answer

Use Cloud Scheduler to run training weekly, then use Cloud Functions to deploy to staging, and after manual approval, use another Cloud Function to check performance and deploy to production.

Answer

Train models on Vertex AI Workbench and use a CI/CD tool like Cloud Build to deploy to staging. Use a Cloud Build approval step to promote to production after manual check.

Question 13

A healthcare company deploys a model for diagnosing medical images on Vertex AI using a custom container with a TensorFlow model. The model uses a mixture of GPUs (NVIDIA T4) and CPUs. After deployment, you notice that prediction latency is highly variable: sometimes under 100ms, sometimes over 10 seconds. Investigation shows that the variability correlates with the number of concurrent requests. The endpoint has a min replicas of 1 and max replicas of 3, with target CPU utilization set to 80%. You also observe that GPU utilization remains low (<20%) even during high load. What is the most likely cause of the latency variability? A) The model is not fully utilizing GPUs due to inefficient data loading from CPU. B) The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization. C) The GPU machine type is too small for the model. D) The container is not configured to use the GPU correctly.

Accepted Answer

The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization.. Option B is correct because Vertex AI scales based on CPU utilization by default, but GPU-bound workloads may have low CPU utilization, causing autoscaling not trigger. Thus, during high load, the single replica is overwhelmed, causing high latency. Option A (inefficient data loading) could contribute but is not the primary cause. Option C (GPU too small) would cause consistently high latency. Option D (GPU not configured) would cause continuous errors, not variable latency.

Answer

The model is not fully utilizing GPUs due to inefficient data loading from CPU.

Answer

The container is not configured to use the GPU correctly.

Answer

The GPU machine type is too small for the model.

Question 14

Your team has implemented a CI/CD pipeline using Cloud Composer (Apache Airflow) to retrain a model every day. The pipeline reads new data from BigQuery, trains a model using Vertex AI Training, evaluates it, and if the accuracy improves, deploys it to a Vertex AI Endpoint. For the past week, the pipeline has been running successfully but no new model has been deployed because the evaluation accuracy never exceeds the previous model's accuracy. The training data volume has been consistent. You suspect that the model is not learning from the new data. What should you do?

Accepted Answer

Examine the training data for any data quality issues such as missing values or label leakage.

Answer

Deploy the new model anyway and run an A/B test in production to see if it performs better online.

Answer

Increase the training budget or number of training steps to allow the model to converge better.

Answer

Change the evaluation metric to a different one that may show improvement, such as F1 score instead of accuracy.

Question 15

An e-commerce company uses Vertex AI to serve a real-time personalization model. The model is updated daily via a retraining pipeline that uploads a new version to the same endpoint. Recently, after a model update, the online prediction responses have been returning anomalous results (e.g., recommending irrelevant products). The previous version performed well. The team suspects that the new model is undercooked or has a bug. They have already checked the training code and the pipeline logs, which show no errors. The pipeline deploys the new model version to the endpoint by updating the traffic split to route 100% of traffic to the new version. Which course of action should the team take to quickly mitigate the issue while diagnosing the root cause? A) Roll back the endpoint to the previous model version by setting traffic split to 0% for the new version. B) Delete the current endpoint and recreate it with the previous model version. C) Tweak the training hyperparameters and retrain immediately. D) Increase the number of replicas on the endpoint to handle load.

Accepted Answer

Roll back the endpoint to the previous model version by setting traffic split to 0% for the new version.. Option A is correct because rolling back traffic to the previous known-good version immediately restores correct predictions, while the team investigates the new model. Option B (deleting endpoint) is excessive and causes downtime. Option C (retrain) takes time and may not fix the bug. Option D (more replicas) does not address the incorrect model output.

Answer

Tweak the training hyperparameters and retrain immediately.

Answer

Increase the number of replicas on the endpoint to handle load.

Answer

Delete the current endpoint and recreate it with the previous model version.