Knowledge + Practice

CCNA Operationalizing machine learning models Questions

41 of 191 questions · Page 3/3 · Operationalizing machine learning models · Answers revealed

Practice these questions Domain overview All questions

151

MCQhard

A financial institution needs to deploy a TensorFlow model for fraud detection with strict latency requirements (<100ms). The model uses custom ops that are not available in standard TF Serving. What is the most appropriate serving solution?

A.Export the model as a SavedModel and serve on Vertex AI Prediction

B.Use Cloud Run with a custom container that includes the model and pre-loads the library

C.Use NVIDIA Triton Inference Server with a custom backend

D.Package the model with Docker using TF Serving and add custom ops via TensorFlow's custom op registration

AnswerC

NVIDIA Triton supports custom backends and is designed for high-performance inference with low latency.

Why this answer

Option C is correct because NVIDIA Triton Inference Server supports custom backends written in C++ or Python, allowing the integration of custom ops that are not available in standard TensorFlow Serving. This enables the model to meet strict latency requirements (<100ms) by leveraging GPU acceleration and optimized inference pipelines, while avoiding the limitations of TF Serving's fixed op registry.

Exam trap

The trap here is that candidates assume TF Serving's custom op registration (Option D) is straightforward, but Cisco tests the understanding that TF Serving does not support dynamic loading of custom ops without a custom build, making Triton's backend architecture the correct choice for production-grade latency requirements.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction relies on standard TF Serving or custom containers, but exporting as a SavedModel does not automatically include custom ops; Vertex AI would fail to load the model if the custom ops are not registered in its runtime. Option B is wrong because Cloud Run with a custom container can serve the model, but it lacks the specialized inference optimization features (e.g., dynamic batching, model concurrency) needed to guarantee <100ms latency under load, and it does not natively support custom backends for ops. Option D is wrong because TF Serving's custom op registration requires recompiling TF Serving from source with the custom ops linked, which is complex and not supported via standard Docker images; even if done, TF Serving's architecture is less flexible than Triton's custom backend for handling non-standard ops efficiently.

Practice this question →

152

Multi-Selectmedium

Which TWO actions can help reduce the latency of a Vertex AI endpoint serving a large neural network model?

Select 2 answers

A.Use a larger machine type with more CPU cores

B.Enable model compression with quantization

C.Increase the number of model versions deployed on the same endpoint

D.Deploy the model on a machine type with GPU accelerators

E.Use a smaller batch size for prediction requests

AnswersD, E

GPUs speed up neural network inference.

Why this answer

Option D is correct because GPU accelerators are specifically designed to handle the parallel computations required by large neural networks, significantly reducing inference latency compared to CPU-only machines. Vertex AI endpoints with GPUs can process multiple predictions concurrently, which is critical for deep learning models where matrix operations dominate the workload.

Exam trap

Google Cloud often tests the misconception that more CPU cores or model compression always reduce latency, but the trap here is that for large neural networks, the primary bottleneck is parallel compute capability, which only GPUs or TPUs can address effectively.

Practice this question →

153

MCQhard

You are designing a system to serve predictions from a large language model (LLM) with a latency SLO of 500ms. The model does not fit on a single GPU and requires model parallelism. You are considering using Vertex AI Endpoints with a custom container. What additional setup is required to achieve the latency target?

A.Compile the model using TensorFlow XLA to optimize for single GPU execution.

B.Deploy the model across multiple endpoints and use a load balancer to send requests to different parts of the model.

C.Use Vertex AI Prediction as a service for LLMs, which automatically handles hardware selection.

D.Use a machine type with multiple GPUs and configure the container to use tensor parallelism.

AnswerD

Leveraging multiple GPUs on one node via model parallelism (e.g., tensor parallelism) is the standard approach to fit large models and meet latency.

Why this answer

Model parallelism across multiple GPUs on a single machine can be handled by the container using libraries like TensorFlow Distribution Strategies. Sharding across endpoints would incur network latency. Using TPUs is an alternative but not necessarily required.

The key is to configure multi-GPU in the machine type.

Practice this question →

154

MCQeasy

A company needs to deploy a trained model for real-time predictions with low latency. Which Vertex AI resource should they use?

A.Cloud TPU

B.Vertex AI Batch Prediction

C.Vertex AI Endpoints

D.Cloud Run

AnswerC

Endpoints provide real-time model serving with low latency.

Why this answer

Vertex AI Endpoints are designed for online prediction, providing a managed service that hosts models for real-time inference with low latency. They automatically scale resources and handle traffic routing, making them the correct choice for deploying a trained model that needs to respond to individual prediction requests quickly.

Exam trap

Google Cloud often tests the distinction between batch and online prediction, and the trap here is that candidates confuse Vertex AI Batch Prediction (which is for offline, large-scale inference) with the real-time serving capability of Vertex AI Endpoints, leading them to select option B.

How to eliminate wrong answers

Option A is wrong because Cloud TPUs are specialized hardware accelerators for training and batch inference, not a deployment service for real-time predictions; they require manual management and are not designed for low-latency serving of individual requests. Option B is wrong because Vertex AI Batch Prediction is intended for asynchronous, high-throughput predictions on large datasets, not for real-time, low-latency responses; it processes jobs in batches and returns results to a storage location. Option D is wrong because Cloud Run is a serverless compute platform for containerized applications, but it lacks the native model hosting, versioning, and traffic splitting capabilities that Vertex AI Endpoints provide for machine learning models.

Practice this question →

155

MCQmedium

A data scientist needs to provide explanations for each prediction made by a deployed autoML model to comply with regulatory requirements. Which Vertex AI feature should they use?

A.Vertex AI Model Monitoring

B.Vertex AI Vizier

C.Vertex AI Explainable AI

D.Vertex AI Feature Store

AnswerC

Provides per-prediction explanations.

Why this answer

Vertex AI Explainable AI is the correct feature because it provides feature attributions and explanations for each prediction, enabling compliance with regulatory requirements that demand interpretability. It uses techniques like Shapley value approximations or integrated gradients to quantify the contribution of each input feature to the model's output, which is essential for auditing and transparency in deployed autoML models.

Exam trap

Google Cloud often tests the distinction between monitoring (detecting drift) and explaining (interpreting predictions), so candidates mistakenly choose Model Monitoring when the question explicitly asks for per-prediction explanations for regulatory compliance.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Monitoring is designed to detect prediction drift, data drift, and feature skew over time, not to provide per-prediction explanations. Option B is wrong because Vertex AI Vizier is a hyperparameter tuning and optimization service that helps find the best model architecture or parameters, not a tool for explaining individual predictions. Option D is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, but it does not generate explanations for model predictions.

Practice this question →

156

MCQmedium

You need to automate retraining of a model when new training data becomes available every week. The training pipeline runs on Vertex AI Pipelines and is triggered by Cloud Composer. After retraining, you want to evaluate the new model against a golden dataset. If the model's accuracy improves by at least 1%, it should be automatically deployed to the staging endpoint. What is the best way to implement the decision logic?

A.Use Cloud Functions to compare metrics and call the endpoint if conditions are met.

B.Add a conditional step in the Vertex AI Pipeline to evaluate the model and deploy if the accuracy improvement threshold is met.

C.After training, run a batch prediction job on the golden dataset and compare metrics manually.

D.Use Vertex AI Experiments to log metrics and set up an alert to manually deploy.

AnswerB

Pipelines can include a condition step to check metrics and decide deployment.

Why this answer

Option B is correct because Vertex AI Pipelines supports conditional execution natively via the `Condition` component, allowing you to evaluate the new model's accuracy against the golden dataset within the same pipeline and deploy only if the improvement threshold (≥1%) is met. This approach keeps the entire retraining, evaluation, and deployment workflow automated, auditable, and tightly coupled within a single orchestrated pipeline, avoiding external triggers or manual steps.

Exam trap

Google Cloud often tests the misconception that external services like Cloud Functions are needed for decision logic, when in fact Vertex AI Pipelines' native conditional steps are the simpler, more integrated, and recommended approach for automated model evaluation and deployment within a pipeline.

How to eliminate wrong answers

Option A is wrong because Cloud Functions would introduce an external, event-driven component that adds latency, complexity, and potential failure points; Vertex AI Pipelines already provides built-in conditional logic for this exact use case, making an extra function unnecessary. Option C is wrong because running a batch prediction job and manually comparing metrics defeats the automation goal and introduces human error and delay, which is not suitable for a weekly retraining cadence. Option D is wrong because Vertex AI Experiments is designed for tracking and comparing experiments, not for automated decision-making or deployment; relying on alerts for manual deployment contradicts the requirement for automatic retraining and deployment.

Practice this question →

157

Multi-Selectmedium

A company wants to implement model monitoring for a deployed classification model. Which three types of monitoring should they set up? (Select 3)

Select 3 answers

A.Infrastructure cost monitoring

B.Training-serving skew

C.Prediction drift

D.Input feature drift

E.Model version comparison

AnswersB, C, D

Skew detection identifies differences between training and serving data.

Why this answer

Vertex AI Model Monitoring covers input feature drift, prediction drift, and training-serving skew. Cost monitoring and version comparison are not part of model monitoring.

Practice this question →

158

MCQhard

A data science team uses Vertex AI Pipelines to automate retraining. They want to ensure that only models with performance above a threshold are deployed. Which component should they add to the pipeline?

A.Vertex AI Feature Store

B.Vertex AI Model Evaluation

C.Cloud Build trigger

D.Cloud Monitoring alert

AnswerB

Evaluates model and can block deployment if threshold not met.

Why this answer

Vertex AI Model Evaluation provides built-in evaluation metrics and threshold-based validation that can be used as a pipeline condition to gate model deployment. By adding a Model Evaluation component, the pipeline can compare model performance against a predefined threshold and only proceed to deploy if the metrics (e.g., AUC, precision, recall) meet or exceed the required value.

Exam trap

The trap here is that candidates may confuse monitoring (Cloud Monitoring) or feature management (Feature Store) with the evaluation step needed to gate deployment, but only Model Evaluation provides the threshold-based conditional logic within the pipeline itself.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, not for evaluating model performance or enforcing deployment thresholds. Option C is wrong because Cloud Build trigger is used to automate builds and tests of source code, not to evaluate trained model metrics within a Vertex AI Pipeline. Option D is wrong because Cloud Monitoring alert is designed to notify operators about system or application anomalies, not to serve as a pipeline gate that conditionally deploys models based on evaluation results.

Practice this question →

159

MCQmedium

A financial services company deploys a regression model to predict loan default risk. The model is served using Vertex AI Endpoints with autoscaling. After deployment, latency increases significantly during peak hours, causing timeouts. The model uses scikit-learn and has a large feature set. Which action should the team take to reduce latency while maintaining prediction accuracy?

A.Switch to batch prediction for all requests.

B.Increase the minimum number of replicas in the endpoint to handle peak load.

C.Increase the memory allocation for the serving container.

D.Apply feature selection to reduce the number of input features.

AnswerD

Reducing features decreases model size and inference time.

Why this answer

Option D is correct because the latency spike is caused by the large feature set, which increases the time for preprocessing and inference in the scikit-learn model. Reducing the number of input features via feature selection directly decreases the computational load per request, lowering latency without sacrificing accuracy if the selected features retain predictive power. This addresses the root cause, unlike scaling or resource changes that only mask the symptom.

Exam trap

The trap here is that candidates often confuse scaling solutions (increasing replicas or memory) with performance optimization, but the question specifically asks for reducing latency per request, which requires addressing the computational bottleneck—feature reduction—rather than adding more resources.

How to eliminate wrong answers

Option A is wrong because switching to batch prediction does not reduce per-request latency; it processes requests asynchronously in bulk, which is unsuitable for real-time serving and would still cause timeouts during peak hours. Option B is wrong because increasing the minimum number of replicas only adds more instances to handle concurrent requests, but each individual request still suffers from the same high latency due to the large feature set—autoscaling already adds replicas under load, so this does not fix the per-request processing time. Option C is wrong because increasing memory allocation for the serving container helps with out-of-memory errors but does not reduce the CPU-bound computation time required to process a large feature set; the bottleneck is compute, not memory.

Practice this question →

160

MCQeasy

A company trains a custom model using TensorFlow and wants to deploy it to Vertex AI for low-latency predictions. The model is large (2 GB). Which deployment option should they choose?

A.Use Vertex AI Batch Prediction job

B.Deploy as a Cloud Function

C.Deploy to Vertex AI Endpoint with a custom container

D.Deploy to Cloud Run with minimum instances

AnswerC

Custom containers allow large models.

Why this answer

Option C is correct because deploying a large (2 GB) model to Vertex AI Endpoint with a custom container allows you to package the model, its dependencies, and a serving framework (e.g., TensorFlow Serving) into a Docker image. This approach supports low-latency predictions by keeping the model loaded in memory across requests, and it can scale to handle real-time inference traffic, unlike batch or serverless options that have cold-start or size limitations.

Exam trap

Google Cloud often tests the misconception that Cloud Run or Cloud Functions can handle large models for real-time inference, ignoring their size limits, cold-start latency, and lack of native Vertex AI integration for model management and scaling.

How to eliminate wrong answers

Option A is wrong because Vertex AI Batch Prediction is designed for asynchronous, high-throughput processing of large datasets, not for low-latency real-time predictions; it processes jobs in batches and does not maintain a persistent endpoint. Option B is wrong because Cloud Functions have a maximum deployment size of 2 GB (unpackaged) and a 60-second timeout, making them unsuitable for a 2 GB model that requires persistent memory and low-latency inference. Option D is wrong because Cloud Run has a container image size limit of 2 GB (uncompressed) and a request timeout of 60 minutes, but it lacks native integration with Vertex AI's model registry and optimized serving infrastructure, and it may incur cold-start latency even with minimum instances.

Practice this question →

161

MCQhard

A company runs a real-time fraud detection model using Cloud Dataflow for streaming inference. The model is updated every hour with new training data. The team wants to minimize downtime and ensure that both old and new model versions are available during the update. Which deployment strategy should they use?

A.A/B testing: route a small percentage of traffic to the new model and compare performance.

B.Rolling deployment: gradually replace instances of the old model with the new model.

C.Blue/green deployment: deploy the new model to a separate endpoint, then switch all traffic at once.

D.Canary deployment: deploy the new model alongside the old one, gradually increase traffic to the new model while monitoring.

AnswerD

Canary deployment ensures both versions are available and traffic is shifted gradually, minimizing downtime and risk.

Why this answer

Canary deployment is the correct strategy because it allows the new model to be deployed alongside the old one, with traffic gradually shifted to the new version while monitoring for errors or performance degradation. This minimizes downtime and ensures both versions are available during the update, which is critical for a real-time fraud detection system where continuous availability and risk mitigation are paramount.

Exam trap

The trap here is that candidates confuse A/B testing (a statistical evaluation method) with canary deployment (a release strategy), or assume blue/green deployment is always best for zero-downtime updates without considering the requirement for gradual traffic shifting and availability of both versions during the update.

How to eliminate wrong answers

Option A is wrong because A/B testing is a statistical method for comparing model performance, not a deployment strategy for minimizing downtime or ensuring availability during updates. Option B is wrong because rolling deployment gradually replaces instances, which can cause a brief period where only the new model is available, violating the requirement that both old and new versions be available during the update. Option C is wrong because blue/green deployment switches all traffic at once after the new model is deployed, which introduces a cutover risk and does not allow gradual traffic shifting or monitoring during the transition.

Practice this question →

162

MCQhard

An e-commerce company deploys a recommendation model on Vertex AI Endpoints. The endpoint receives a high volume of requests with a large payload. They notice high latency and occasional timeouts. Which action should they take to improve performance without sacrificing accuracy?

A.Enable request batching on the endpoint

B.Switch to a smaller machine type

C.Reduce the model size by pruning

D.Increase the number of replicas

AnswerA

Batching improves throughput by combining requests, reducing overhead and latency without affecting model accuracy.

Why this answer

Enabling request batching on the Vertex AI endpoint allows multiple inference requests to be grouped into a single prediction call, reducing per-request overhead and improving throughput. This directly addresses high latency and timeouts caused by a high volume of large payloads without altering the model or its accuracy.

Exam trap

Google Cloud often tests the misconception that scaling replicas or reducing model size is the default fix for latency, but the trap here is that batching addresses throughput without sacrificing accuracy, whereas pruning or smaller machines would degrade performance or accuracy.

How to eliminate wrong answers

Option B is wrong because switching to a smaller machine type reduces compute resources, which would increase latency and worsen timeouts under high request volume. Option C is wrong because reducing model size by pruning can degrade prediction accuracy, which the question explicitly states must not be sacrificed. Option D is wrong because increasing the number of replicas adds cost and may not resolve timeouts if the bottleneck is per-request processing overhead rather than concurrency limits.

Practice this question →

163

MCQmedium

A company wants to automate model retraining and deployment whenever new training data becomes available. Which service should be used to orchestrate the end-to-end workflow?

A.Cloud Build

B.Vertex AI Pipelines

C.Cloud Scheduler

D.Cloud Composer

AnswerB

Designed for ML pipeline orchestration with prebuilt components.

Why this answer

Vertex AI Pipelines is the correct choice because it is a managed service specifically designed to orchestrate and automate end-to-end ML workflows, including model retraining and deployment triggered by new data. It allows you to define pipelines as a directed acyclic graph (DAG) of steps using the Kubeflow Pipelines SDK or pre-built components, and it integrates natively with other Vertex AI services for training, evaluation, and deployment.

Exam trap

The trap here is that candidates often confuse Cloud Composer (a general-purpose Airflow service) with Vertex AI Pipelines, but the exam expects you to recognize that Vertex AI Pipelines is the ML-specific, fully managed solution for end-to-end ML workflow orchestration, while Cloud Composer requires more manual setup and lacks native Vertex AI integration.

How to eliminate wrong answers

Option A is wrong because Cloud Build is a CI/CD service focused on building, testing, and deploying software artifacts (e.g., container images), not on orchestrating ML workflows with steps like data validation, model training, and deployment. Option C is wrong because Cloud Scheduler is a cron job service that triggers actions on a time-based schedule, not on the event of new training data becoming available, and it lacks the workflow orchestration capabilities needed for complex ML pipelines. Option D is wrong because Cloud Composer is a managed Apache Airflow service that can orchestrate workflows, but it is a general-purpose workflow orchestrator, not purpose-built for ML pipelines; Vertex AI Pipelines provides tighter integration with Vertex AI components, managed execution, and artifact tracking, making it the more appropriate choice for this specific ML automation scenario.

Practice this question →

164

MCQmedium

A company has a trained model stored in Vertex AI Model Registry. They want to automate retraining when new training data arrives in Cloud Storage. Which approach is most efficient?

A.Use Cloud Functions triggered by Cloud Storage events to start a Vertex AI Training job

B.Use Dataflow to continuously update the model

C.Use Cloud Scheduler to trigger a Cloud Build retraining step

D.Schedule a weekly Cloud Composer DAG to check for new data and retrain

AnswerA

Cloud Functions provide real-time event-driven triggers to initiate retraining immediately when new data appears.

Why this answer

Cloud Functions can be directly triggered by Cloud Storage events (e.g., object finalize) to invoke the Vertex AI Training service via the AI Platform API. This creates an event-driven, serverless pipeline that retrains the model immediately when new data arrives, without polling or manual intervention, making it the most efficient and cost-effective approach.

Exam trap

Google Cloud often tests the distinction between event-driven (Cloud Functions) and scheduled (Cloud Scheduler, Cloud Composer) approaches, and candidates mistakenly choose a scheduled option thinking it is simpler, missing the requirement for immediate reaction to new data.

How to eliminate wrong answers

Option B is wrong because Dataflow is a stream/batch data processing service for transforming data, not for orchestrating model retraining; it would require custom code to trigger training and lacks native integration with Vertex AI Model Registry. Option C is wrong because Cloud Scheduler triggers jobs on a fixed schedule, not on data arrival events, so it cannot react to new data in real time and may waste resources on unnecessary retraining. Option D is wrong because a weekly Cloud Composer DAG introduces latency (up to a week) and operational overhead for a simple event-driven task, and it is less efficient than a serverless function that fires instantly on data arrival.

Practice this question →

165

MCQhard

A company uses Vertex AI Pipelines to orchestrate ML workflows. They want to automatically retrain the model when new data arrives, but only if the model's performance drops below a threshold. Which approach is best?

A.Use BigQuery scheduled queries to trigger pipeline

B.Trigger a pipeline on a schedule

C.Use Vertex AI Model Monitor to detect skew and trigger retraining

D.Use Cloud Functions to evaluate performance and trigger pipeline

AnswerC

Model Monitor can detect performance degradation and automatically trigger retraining pipelines.

Why this answer

Option C is correct because Vertex AI Model Monitoring can detect skew and drift, and can trigger retraining workflows automatically. Option A is wrong because scheduled triggers do not consider performance metrics. Option B is wrong because Cloud Functions would require custom logic to evaluate performance, which is more complex than using built-in monitoring.

Option D is wrong because BigQuery scheduled queries are not integrated with model performance monitoring.

Practice this question →

166

Multi-Selecthard

A company wants to implement a robust MLOps lifecycle on Google Cloud. Which THREE components are essential?

Select 3 answers

A.Vertex AI Model Registry for versioning

B.Vertex AI Pipelines for orchestration

C.Pub/Sub for event-driven retraining

D.Cloud Build for CI/CD

E.Cloud SQL for model metadata

AnswersA, B, D

Model Registry centralizes model version management and deployment.

Why this answer

Vertex AI Model Registry is essential for versioning because it provides a centralized repository to track, manage, and deploy different versions of trained ML models. This ensures reproducibility, auditability, and the ability to roll back to previous versions, which is critical for a robust MLOps lifecycle.

Exam trap

The trap here is that candidates may confuse optional supporting services (like Pub/Sub for event triggers or Cloud SQL for metadata) with the essential components required for a robust MLOps lifecycle, which are versioning, orchestration, and CI/CD.

Practice this question →

167

MCQhard

A healthcare company deploys a model for diagnosing medical images on Vertex AI using a custom container with a TensorFlow model. The model uses a mixture of GPUs (NVIDIA T4) and CPUs. After deployment, you notice that prediction latency is highly variable: sometimes under 100ms, sometimes over 10 seconds. Investigation shows that the variability correlates with the number of concurrent requests. The endpoint has a min replicas of 1 and max replicas of 3, with target CPU utilization set to 80%. You also observe that GPU utilization remains low (<20%) even during high load. What is the most likely cause of the latency variability? A) The model is not fully utilizing GPUs due to inefficient data loading from CPU. B) The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization. C) The GPU machine type is too small for the model. D) The container is not configured to use the GPU correctly.

A.The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization.

B.The model is not fully utilizing GPUs due to inefficient data loading from CPU.

C.The container is not configured to use the GPU correctly.

D.The GPU machine type is too small for the model.

AnswerA

Standard autoscaling uses CPU; for GPU workloads, you should use custom metrics like GPU utilization or request count.

Why this answer

Option B is correct because Vertex AI scales based on CPU utilization by default, but GPU-bound workloads may have low CPU utilization, causing autoscaling not trigger. Thus, during high load, the single replica is overwhelmed, causing high latency. Option A (inefficient data loading) could contribute but is not the primary cause.

Option C (GPU too small) would cause consistently high latency. Option D (GPU not configured) would cause continuous errors, not variable latency.

Practice this question →

168

MCQeasy

Your company has deployed a machine learning model on Vertex AI Endpoint to serve real-time predictions for a mobile application. The model was trained using TensorFlow and the prediction requests include raw images that are preprocessed by the client before sending. Recently, the application developers reported that the predictions are becoming less accurate over time. They suspect the issue is related to changes in the client-side preprocessing code. You need to verify this hypothesis and monitor for future regressions. What should you do?

A.Retrain the model using the latest client data to adapt to any changes in preprocessing.

B.Roll back to a previous model version that was known to work well and disable automatic retraining.

C.Ask the developers to provide the exact preprocessing code and manually compare it with the training pipeline's preprocessing.

D.Enable Vertex AI Model Monitoring for feature attribution and set up alerting on skew detection.

AnswerD

Model Monitoring can detect training-serving skew by comparing feature distributions; this would catch preprocessing changes effectively.

Practice this question →

169

MCQeasy

A company deploys a new machine learning model for real-time predictions using Vertex AI. The model is stored in a Cloud Storage bucket and deployed to an endpoint. To ensure traceability and rollback capability, which practice should be followed?

A.Deploy multiple versions of the model to the same endpoint using traffic splitting and set the primary version to 100% traffic.

B.Use the same model name for all deployments and overwrite the existing model.

C.Store the model in a Cloud Storage bucket with a fixed name and rely on Cloud Build for rollback.

D.Create a new model resource in Vertex AI for each version and deploy the specific version to an endpoint.

AnswerD

This allows version tracking, easy rollback by redeploying a previous version, and maintains a clean deployment history.

Why this answer

Option D is correct because creating a new model resource in Vertex AI for each version ensures that each model iteration is independently tracked, versioned, and can be deployed to an endpoint with full rollback capability. This practice aligns with Vertex AI's model versioning and endpoint deployment model, where each model resource has a unique ID and can be deployed or undeployed without affecting other versions, enabling precise traceability and rollback.

Exam trap

Google Cloud often tests the misconception that traffic splitting alone (Option A) provides sufficient versioning and rollback, but the trap is that traffic splitting still operates within a single model resource, which does not preserve independent version history or allow clean rollback to a prior model resource without manual intervention.

How to eliminate wrong answers

Option A is wrong because deploying multiple versions to the same endpoint with traffic splitting and setting the primary version to 100% traffic does not inherently create separate model resources for each version; it still relies on a single model resource with aliases, which can complicate rollback if the model resource itself is overwritten or corrupted. Option B is wrong because using the same model name for all deployments and overwriting the existing model destroys the previous version's metadata and artifacts, making rollback impossible without manual restoration from backups. Option C is wrong because storing the model in a Cloud Storage bucket with a fixed name and relying on Cloud Build for rollback does not provide native Vertex AI model versioning or endpoint deployment tracking; Cloud Build is a CI/CD tool, not a model registry, and overwriting the bucket contents loses previous versions.

Practice this question →

170

MCQhard

A data scientist uses Vertex AI Workbench notebooks for model development. They want to share the environment with team members while maintaining version control. Which approach should they use?

A.Use Cloud Shell and clone the repo

B.Use a user-managed notebook instance with multiple users

C.Share the notebook via Cloud Storage

D.Store notebooks in Cloud Source Repositories

AnswerB

Allows collaboration with version control.

Why this answer

A user-managed notebook instance with multiple users is the correct approach because Vertex AI Workbench supports collaboration by allowing multiple users to access the same instance via IAM permissions, while the underlying Git integration enables version control. This setup provides a shared, persistent environment where team members can work on the same codebase without duplicating work, and changes can be tracked through Git repositories.

Exam trap

The trap here is that candidates confuse storing notebooks in a version control system (like Cloud Source Repositories) with having a shared, interactive development environment, overlooking that version control alone does not provide the compute and collaboration features of a user-managed notebook instance.

How to eliminate wrong answers

Option A is wrong because Cloud Shell is a temporary, per-user environment with limited resources and no persistent storage, making it unsuitable for sharing a development environment with version control across a team. Option C is wrong because sharing notebooks via Cloud Storage is a static file-sharing method that does not provide version control, collaborative editing, or a live execution environment. Option D is wrong because Cloud Source Repositories is a Git repository hosting service for storing code, not a shared interactive development environment; it lacks the compute and runtime capabilities needed for model development.

Practice this question →

171

Multi-Selectmedium

A team monitors a deployed Vertex AI model and notices an increasing number of prediction errors with status code 413 (Request Entity Too Large). Which TWO actions should they consider to resolve this issue?

Select 2 answers

A.Implement client-side pre-processing to compress or downsample input data

B.Switch the model to batch prediction to handle large payloads offline

C.Increase the number of replicas to handle load

D.Decrease the machine type to reduce resource consumption

E.Increase the maximum request size limit in the endpoint configuration

AnswersA, E

Reducing input size prevents exceeding the limit.

Why this answer

Option A is correct because status code 413 indicates the HTTP request payload exceeds the server's size limit. Implementing client-side pre-processing to compress or downsample input data reduces the payload size before it reaches the Vertex AI endpoint, directly addressing the root cause. This approach is efficient because it shifts the computational burden to the client and avoids hitting the server-imposed request size cap, which is typically 1.5 MB for online predictions in Vertex AI.

Exam trap

Google Cloud often tests the misconception that scaling resources (replicas or machine type) can fix request size errors, but 413 is a protocol-level limit that must be addressed by reducing payload size, not by increasing infrastructure capacity.

Practice this question →

172

MCQhard

A company needs to serve predictions for a model that runs an expensive computation on each request. The model is used by a batch job that processes millions of records each night, and also by a real-time API for a few thousand queries per hour. Which prediction strategy minimizes cost and latency for both use cases?

A.Deploy two identical models, one on a Compute Engine VM for batch, one on Vertex AI for online, and synchronize updates.

B.Use Vertex AI batch prediction for the nightly job and a separate online endpoint with auto-scaling for the real-time API.

C.Use Vertex AI batch prediction for both workloads.

D.Use a single online Vertex AI endpoint with auto-scaling to handle both workloads.

AnswerB

This separates concerns: batch prediction is optimized for throughput, online endpoint for low-latency, and auto-scaling handles varying traffic.

Why this answer

Using batch prediction for the batch job and a separate online endpoint with a smaller machine or auto-scaling for real-time queries optimizes cost and latency. Option D is correct. Option A is wrong because batch prediction alone doesn't serve real-time.

Option B is wrong because online endpoint for millions of records is expensive. Option C is wrong because using the same endpoint for both may cause interference.

Practice this question →

173

MCQmedium

Refer to the exhibit. What is the cause of this error?

A.The machine type flag is only used during model deployment, not endpoint creation

B.The endpoint name already exists

C.The user must specify a model name

D.The region is missing

AnswerA

Correct: machine type is a property of the deployed model, not the endpoint.

Why this answer

The --machine-type flag is not valid for the endpoints create command; it should be specified when deploying a model to the endpoint using 'gcloud ai endpoints deploy-model'. The user must first create an endpoint without machine type, then deploy a model.

Practice this question →

174

MCQmedium

A company deploys a model to Vertex AI Endpoint. They want to run a canary deployment to test a new model version with 10% of traffic. How should they configure this?

A.Deploy to a new endpoint and update the application to call both

B.Use Cloud Load Balancing to route traffic

C.Deploy the new model to the same endpoint and set traffic split

D.Deploy to Cloud Run and use gradual rollout

AnswerC

Traffic splitting allows canary.

Why this answer

Option C is correct because Vertex AI Endpoints natively support traffic splitting between model versions deployed to the same endpoint. By deploying the new model version to the same endpoint and setting a traffic split of 10% to the new version and 90% to the current version, the company can perform a canary deployment without changing the application code or infrastructure.

Exam trap

Google Cloud often tests the misconception that canary deployments require separate endpoints or external load balancers, when in fact Vertex AI Endpoints provide a built-in traffic splitting feature that handles this at the model version level.

How to eliminate wrong answers

Option A is wrong because deploying to a new endpoint and updating the application to call both endpoints adds unnecessary complexity and defeats the purpose of a canary deployment, which should be transparent to the application. Option B is wrong because Cloud Load Balancing operates at the network layer and cannot route traffic based on model version within a single Vertex AI Endpoint; it is designed for distributing traffic across regional endpoints or backends, not for model version canary testing. Option D is wrong because deploying to Cloud Run and using gradual rollout is not the native way to manage model versions in Vertex AI; Vertex AI Endpoints provide built-in traffic splitting for model versions, which is the recommended approach for canary deployments in this context.

Practice this question →

175

MCQeasy

A data engineer wants to automatically detect when the distribution of input features to a production model has shifted significantly. Which Vertex AI feature should they enable?

A.Vertex AI Vizier

B.Vertex AI Model Monitoring

C.Vertex AI Explainable AI

D.Vertex AI Feature Store

AnswerB

Monitors prediction and feature drift/skew.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to continuously detect feature distribution drift and prediction skew in production models. It automatically compares the current input feature distribution against a baseline (e.g., training data) and triggers alerts when significant statistical shifts occur, enabling proactive retraining or investigation.

Exam trap

The trap here is that candidates confuse 'monitoring model performance' (e.g., accuracy, latency) with 'monitoring input feature distribution drift', leading them to incorrectly choose Vertex AI Vizier or Explainable AI, which address different aspects of model lifecycle management.

How to eliminate wrong answers

Option A is wrong because Vertex AI Vizier is a hyperparameter tuning service that optimizes model performance through black-box optimization, not for monitoring distribution shifts in production. Option C is wrong because Vertex AI Explainable AI provides feature attributions and explanations for individual predictions, but it does not monitor aggregate distribution changes over time. Option D is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, but it lacks built-in drift detection or alerting capabilities.

Practice this question →

176

MCQhard

You manage a team that deploys multiple versions of a computer vision model for A/B testing on Vertex AI Endpoints. You need to route a small percentage of traffic to a canary version while the rest goes to the stable version. You also need to gradually increase the canary traffic over time based on performance metrics. Which approach should you take?

A.Create two separate endpoints, one for each version, and use a separate load balancer to route a percentage of requests to the canary endpoint.

B.Deploy both models to the same endpoint and configure traffic splitting percentages using the Vertex AI console or API.

C.Use Cloud Armor with weighted backend services to route a portion of requests to the canary version.

D.Implement feature flags in the application code to randomly select the model version for each prediction request.

AnswerB

Vertex AI endpoints natively support traffic splitting between deployed models, allowing gradual rollout and canary testing.

Why this answer

Vertex AI Endpoints support traffic splitting between model versions. You can assign percentage splits and adjust them programmatically. Weighted routing in Cloud Load Balancing is lower-level.

Using two separate endpoints would not allow splitting within a single endpoint. Feature flags are for application logic, not model serving.

Practice this question →

177

MCQeasy

A team has trained a scikit-learn model and wants to deploy it to AI Platform Prediction for online predictions. What is the required format for the model artifact?

A.A model.joblib file (or model.pkl) along with any custom code.

B.A single .h5 file containing the model weights.

C.A SavedModel directory containing the model for TensorFlow.

D.A model.pt file for PyTorch models.

AnswerA

AI Platform supports joblib/pickle for scikit-learn.

Why this answer

Option B is correct because AI Platform Prediction expects the model to be saved as joblib or pickle files for scikit-learn. Option A is incorrect because h5 is for Keras; C is for TensorFlow; D is for PyTorch.

Practice this question →

178

Matchingmedium

Match each BigQuery feature to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Sorting data within partitions to improve query performance

Dividing tables into segments based on a date/timestamp column

Unit of computational capacity in BigQuery

Pre-computed query results for faster access

Why these pairings

BigQuery features that optimize performance and cost.

Practice this question →

179

MCQeasy

A team deployed a model to Vertex AI Endpoint and notices latency spikes during peak hours. What should they first investigate?

A.Switch to batch prediction

B.Reduce number of features

C.Increase machine type

D.Check if autoscaling is enabled and configured correctly

AnswerD

Autoscaling misconfiguration is a common cause of latency spikes during traffic surges.

Why this answer

Latency spikes during peak hours typically indicate that the serving infrastructure is unable to handle the increased request volume. The first step is to check if autoscaling is enabled and configured correctly on the Vertex AI Endpoint, as this determines whether additional compute nodes are automatically provisioned to match demand. Without proper autoscaling, the endpoint will be overwhelmed, leading to queuing delays and latency spikes.

Exam trap

Google Cloud often tests the misconception that latency spikes are always due to model complexity or feature engineering, when in fact the first diagnostic step should always be to verify the serving infrastructure's scaling configuration.

How to eliminate wrong answers

Option A is wrong because switching to batch prediction is for asynchronous, non-real-time inference and does not address the root cause of latency spikes during online serving. Option B is wrong because reducing the number of features may lower model complexity but does not directly resolve infrastructure scaling issues; latency spikes are typically due to insufficient compute resources, not feature count. Option C is wrong because increasing the machine type (e.g., using a larger VM) may improve per-request performance but does not solve the problem of handling concurrent peak traffic; without autoscaling, a single larger machine can still be overwhelmed.

Practice this question →

180

Multi-Selectmedium

Which TWO are best practices for monitoring a deployed machine learning model in production on Vertex AI?

Select 2 answers

A.Set up a weekly retraining pipeline triggered by calendar schedule

B.Enable Vertex AI Model Monitoring to track feature drift and skew

C.Monitor the training job duration to detect anomalies

D.Monitor the distribution of predictions over time to detect concept drift

E.Monitor the model's file size to ensure it hasn't changed

AnswersB, D

Model Monitoring automatically detects drift.

Why this answer

Option B is correct because Vertex AI Model Monitoring automatically tracks feature drift and skew by comparing the serving data distribution against the training data distribution using statistical tests like the Kolmogorov-Smirnov test. This is a best practice for detecting data quality issues that can degrade model performance in production.

Exam trap

The trap here is that candidates confuse operational maintenance tasks (like scheduled retraining) with monitoring tasks, or they focus on infrastructure metrics (like job duration or file size) instead of data and prediction distribution monitoring, which directly impact model accuracy in production.

Practice this question →

181

MCQmedium

Your team is using Vertex AI Pipelines to orchestrate a model retraining workflow. The pipeline includes a data validation step, a training step, and a model evaluation step. You want to ensure that if the evaluation step fails due to low model performance, the pipeline stops and does not deploy the model. Which approach should you use?

A.Run the evaluation step after deployment and roll back if performance is low

B.Configure the evaluation step to retry up to 3 times on failure

C.Use a Conditional in the pipeline to check evaluation metrics and only run the deployment step if metrics pass thresholds

D.Create a separate pipeline for deployment and trigger it manually after review

AnswerC

Conditionals allow pipeline to branch based on results.

Why this answer

Option C is correct because Vertex AI Pipelines supports conditional execution via the `Condition` component, which allows you to evaluate model performance metrics (e.g., accuracy, RMSE) and gate subsequent steps. By placing the deployment step inside a conditional branch that only executes when evaluation metrics meet predefined thresholds, the pipeline automatically stops and avoids deploying a poor-performing model. This approach aligns with MLOps best practices for automated gating in production pipelines.

Exam trap

The trap here is that candidates confuse retry logic (Option B) with conditional gating, mistakenly thinking that retrying a failed evaluation step will somehow improve model performance, when in fact retries only handle transient errors, not metric-based failures.

How to eliminate wrong answers

Option A is wrong because running the evaluation step after deployment and then rolling back violates the principle of failing fast; it wastes compute resources and risks serving a bad model to users before rollback. Option B is wrong because retrying the evaluation step on failure does not address the root cause — low model performance — and would simply re-run the same evaluation, potentially masking the failure or delaying the pipeline. Option D is wrong because creating a separate pipeline for manual deployment defeats the purpose of automation and introduces human latency and error, contradicting the goal of an automated orchestrated workflow.

Practice this question →

182

MCQhard

A team is training a large model using a custom container with TensorFlow on Vertex AI Training. They need to use multiple GPUs across several machines. Which strategy should they implement to maximize training throughput?

A.Use Cloud TPU Pods for distributed training

B.Use Dataflow for distributed training

C.Use Vertex AI Training with a custom job specifying workerPoolSpecs and MultiWorkerMirroredStrategy

D.Use a single worker with multiple GPUs and TensorFlow MirroredStrategy

AnswerC

MultiWorkerMirroredStrategy distributes across multiple machines.

Why this answer

Vertex AI supports multi-worker distributed training with the 'distribution_strategy' argument in the custom job config. Using a single VM with multiple GPUs is limited by that machine's capabilities. The 'MirroredStrategy' addresses single-machine multi-GPU, not multi-machine.

Practice this question →

183

Multi-Selectmedium

Which TWO steps are required to deploy a custom scikit-learn model to Vertex AI for online predictions?

Select 2 answers

A.Write a custom prediction routine

B.Containerize the model using Docker

C.Save the model using joblib or pickle

D.Create a Vertex AI Endpoint manually

E.Upload the model to Vertex AI Model Registry

AnswersC, E

Vertex AI expects a saved model artifact.

Why this answer

Option C is correct because scikit-learn models must be serialized using joblib or pickle to be saved as a model artifact that can be uploaded to Vertex AI. Vertex AI's pre-built prediction containers for scikit-learn expect the model file to be in this format (typically model.joblib or model.pkl) to serve online predictions.

Exam trap

Google Cloud often tests the misconception that you must always write a custom prediction routine or containerize your model, when in fact Vertex AI provides pre-built containers for popular frameworks like scikit-learn, making steps A and B unnecessary for standard deployments.

Practice this question →

184

MCQeasy

A data scientist wants to automate retraining of a classification model when new labeled data arrives. The model is deployed on AI Platform Prediction. Which Google Cloud service should be used to orchestrate the retraining pipeline?

A.AI Platform Prediction

B.AI Platform Pipelines

C.AI Platform Continuous Evaluation

D.Cloud Dataflow

AnswerB

AI Platform Pipelines provides a way to build and orchestrate ML pipelines.

Why this answer

AI Platform Pipelines (now Vertex AI Pipelines) is the correct service because it provides a fully managed, serverless orchestration engine for building, deploying, and running machine learning pipelines. It integrates with Kubeflow Pipelines and TensorFlow Extended (TFX) to automate the retraining workflow when new labeled data arrives, enabling continuous training and model versioning without manual intervention.

Exam trap

Google Cloud often tests the distinction between services that execute ML tasks (like prediction or evaluation) versus services that orchestrate the workflow; the trap here is that candidates confuse AI Platform Prediction (serving) or Cloud Dataflow (data processing) with pipeline orchestration, missing that AI Platform Pipelines is purpose-built for automating multi-step ML workflows.

How to eliminate wrong answers

Option A is wrong because AI Platform Prediction is a serving endpoint for deploying trained models to make predictions; it does not orchestrate retraining pipelines. Option C is wrong because AI Platform Continuous Evaluation is a service for monitoring model performance and detecting drift, not for orchestrating retraining workflows. Option D is wrong because Cloud Dataflow is a stream and batch data processing service (based on Apache Beam) used for data transformation and ETL, not for orchestrating end-to-end ML pipelines with conditional retraining logic.

Practice this question →

185

MCQhard

A company has a production machine learning model deployed on Vertex AI Endpoint that predicts customer churn. The model is retrained weekly using a Vertex AI Pipeline that pulls new data from BigQuery. Recently, the model's accuracy has been declining. The data science team suspects data drift but is unsure. They have enabled Vertex AI Model Monitoring but have not set up any alerts. The team wants to diagnose and address the issue quickly. The pipeline runs successfully, and no errors are reported. The model endpoint is serving predictions with average latency of 200ms. What should the team do first?

A.Immediately trigger a retraining pipeline with more recent data

B.Increase the number of replicas to reduce latency

C.Examine Cloud Logging for prediction errors

D.Review Vertex AI Model Monitoring drift reports and set up alerts for significant drift

AnswerD

Directly addresses drift detection.

Why this answer

Option D is correct because the team has already enabled Vertex AI Model Monitoring, which automatically tracks feature distributions and prediction statistics over time. The first diagnostic step should be to review the drift reports generated by Model Monitoring to confirm whether data drift is occurring, and then set up alerts so the team is proactively notified of significant drift in the future. This directly addresses the suspected root cause without unnecessary operational changes.

Exam trap

Google Cloud often tests the misconception that any model performance decline must be fixed by immediate retraining or infrastructure scaling, when the correct first step is always to diagnose the root cause using the monitoring tools already in place.

How to eliminate wrong answers

Option A is wrong because blindly retraining with more recent data without first confirming data drift may waste resources and could even degrade model performance if the new data is not representative or contains label errors. Option B is wrong because increasing replicas addresses latency, not accuracy decline; the current 200ms latency is well within acceptable bounds and is unrelated to the accuracy problem. Option C is wrong because Cloud Logging captures prediction errors (e.g., runtime exceptions, invalid inputs), but the pipeline runs successfully with no errors, so examining logs for errors will not reveal gradual accuracy degradation caused by data drift.

Practice this question →

186

MCQhard

A financial services company deploys a fraud detection model on Vertex AI using a custom prediction container that runs a PyTorch model. The model requires GPU acceleration. The deployment succeeds but predictions return an error: 'CUDA error: out of memory'. What should the team do to resolve this issue?

A.Change the container to use a CPU-only image to avoid CUDA errors

B.Increase the GPU machine type to one with more memory (e.g., from NVIDIA T4 to A100)

C.Enable Vertex AI Model Monitoring to automatically scale the endpoint

D.Add CPU replicas to distribute the inferencing load

AnswerB

The CUDA out of memory error indicates the current GPU cannot hold the model; a larger GPU or model optimization is needed.

Why this answer

Option A is correct because the GPU memory is insufficient; using a machine with more GPU memory or optimizing the model is the solution. Option B (enabling model monitoring) does not fix memory. Option C (adding more CPUs) does not address GPU memory.

Option D (using CPU-only) would defeat the purpose of GPU acceleration.

Practice this question →

187

MCQeasy

Your company has a machine learning model that predicts customer churn. The model is deployed on Vertex AI Endpoints with autoscaling. After a marketing campaign, traffic to the endpoint increases by 10x. Some predictions start failing with 'HTTP 503 Service Unavailable' errors. What is the most likely cause?

A.The model container has a memory leak.

B.The model's accuracy has degraded due to data drift.

C.The autoscaling configuration has insufficient maximum nodes to handle the traffic.

D.The model is using an older version that is not supported.

AnswerC

Autoscaling with too few max nodes cannot scale up to meet demand, causing overload and 503 errors.

Why this answer

A 503 Service Unavailable error from Vertex AI Endpoints indicates that the endpoint is overwhelmed and cannot handle the incoming request volume. With a 10x traffic spike and autoscaling configured, the most likely cause is that the autoscaling configuration has insufficient maximum nodes, so the endpoint cannot scale out enough to handle the load, causing requests to be rejected.

Exam trap

Google Cloud often tests the distinction between model-level errors (e.g., data drift, accuracy degradation) and infrastructure-level errors (e.g., 503, 429, timeout), so the trap here is that candidates confuse a model performance issue with a scaling/availability issue.

How to eliminate wrong answers

Option A is wrong because a memory leak in the model container would cause gradual performance degradation or OOM kills, not a sudden 503 error under high traffic; Vertex AI would still attempt to serve requests until the container crashes. Option B is wrong because data drift affects prediction accuracy (e.g., wrong predictions), not the availability or HTTP status of the endpoint; 503 errors are infrastructure-level, not model-level. Option D is wrong because using an unsupported older version would cause deployment or startup failures, not transient 503 errors under load; Vertex AI would reject the deployment or return a different error (e.g., 400 or 404) if the version is incompatible.

Practice this question →

188

Multi-Selectmedium

Which THREE metrics should be monitored to detect model drift in a production ML system?

Select 3 answers

A.Training loss convergence.

B.Prediction distribution (prediction drift).

C.Feature distribution (data drift).

D.CPU utilization of the serving nodes.

E.Model performance metrics (e.g., accuracy, precision, recall) on a ground truth dataset.

AnswersB, C, E

Changes in prediction distribution can indicate concept drift.

Why this answer

Prediction drift (distribution of predictions), feature drift (distribution of input features), and model performance metrics (e.g., accuracy) are key indicators. Infrastructure metrics (CPU usage) and training loss are not directly drift indicators.

Practice this question →

189

MCQeasy

A user gets the above error when trying to get online predictions. The model was created and the endpoint exists. What is the most likely reason?

A.The endpoint does not exist.

B.The endpoint is in a different region than the model.

C.No version of the model is deployed to the endpoint.

D.The model does not exist.

AnswerC

A model must be deployed (a model version) to the endpoint to serve predictions.

Why this answer

Correct: C. The model must be deployed (a version deployed) to the endpoint. Option A wrong because model exists.

Option B wrong because endpoint exists. Option D wrong because region mismatch would give different error.

Practice this question →

190

MCQhard

Refer to the exhibit. A team is trying to run a custom prediction container on Vertex AI Endpoint. They get this error when the container starts. What is the most likely cause?

A.The container image is too large

B.The entry point is missing or incorrect

C.The container is built for a different CPU architecture

D.The model file is missing from the container

AnswerB

The error message directly states to ensure the container has an entry point.

Why this answer

The error occurs when the container starts, which typically happens during the initial health check or readiness probe. Vertex AI Endpoints require a valid entry point (e.g., CMD or ENTRYPOINT in the Dockerfile) to start the prediction server. If the entry point is missing or incorrect, the container fails to launch, resulting in the observed error.

Exam trap

Google Cloud often tests the distinction between container startup failures (entry point issues) and runtime failures (missing model files or architecture mismatches), leading candidates to confuse a missing model file with a startup error.

How to eliminate wrong answers

Option A is wrong because container image size does not prevent startup; Vertex AI supports images up to 10 GB, and a large image would only affect pull time, not the container's ability to start. Option C is wrong because CPU architecture mismatch would cause a runtime crash or 'exec format error' during execution, not a startup failure, and Vertex AI uses x86_64 architecture by default. Option D is wrong because a missing model file would cause a runtime error during prediction (e.g., 404 or model load failure), not a container startup failure, as the container can still start and listen for requests.

Practice this question →

191

MCQmedium

A company uses a custom container image for model serving. The image is large (10 GB). During deployment, they get timeouts. What should they do?

A.Pre-pull the image on all nodes

B.Increase the timeout in the deployment config

C.Switch to a larger machine type

D.Use a smaller base image

AnswerD

Smaller image reduces pull time and deployment time.

Why this answer

Option D is correct because using a smaller base image directly addresses the root cause of the timeout: the 10 GB image takes too long to download from the container registry during pod startup. By reducing the image size (e.g., using a slim or distroless base image), the pull time decreases, avoiding the default kubelet image pull timeout (typically 5 minutes) without requiring infrastructure changes.

Exam trap

Google Cloud often tests the misconception that increasing timeouts or scaling up hardware solves performance bottlenecks, when the correct answer is to optimize the artifact itself (image size) to meet the system's implicit constraints.

How to eliminate wrong answers

Option A is wrong because pre-pulling the image on all nodes is a manual workaround that does not solve the underlying issue of a bloated image; it also adds operational overhead and fails in dynamic clusters where new nodes are added. Option B is wrong because increasing the timeout in the deployment config (e.g., the `imagePullPolicy` or pod-level timeout) only masks the symptom and does not reduce the pull time, potentially leading to other timeouts in the cluster. Option C is wrong because switching to a larger machine type does not affect the network transfer time for pulling the image; it only provides more local resources, which does not address the slow image download.

Practice this question →

← PreviousPage 3 of 3 · 191 questions total

Ready to test yourself?

Try a timed practice session using only Operationalizing machine learning models questions.

Start 20-question session