Knowledge + Practice

CCNA Scaling Prototypes into ML Models Questions

75 of 99 questions · Page 1/2 · Scaling Prototypes into ML Models · Answers revealed

Practice these questions Domain overview All questions

1

MCQmedium

Your Vertex AI custom training job is failing with an out-of-memory error on a single GPU. You need to reduce memory usage without changing the model architecture. Which approach should you try first?

A.Decrease the batch size

B.Implement model parallelism across GPUs

C.Use gradient accumulation

D.Enable mixed precision training (FP16)

AnswerA, D

Decreasing batch size reduces memory but may affect convergence; it is a valid approach but mixed precision is often tried first.

Why this answer

Mixed precision (FP16) training halves memory usage for activations and gradients. Gradient accumulation reduces effective batch size but doesn't reduce memory per step as effectively. Reducing batch size directly reduces memory.

Model parallelism is more complex. The simplest first step is to use mixed precision.

Practice this question →

2

MCQmedium

An ML engineer is using Vertex AI Vizier to tune hyperparameters for a custom training job. The training job takes 2 hours per trial. To speed up the process, the engineer wants to run 10 trials in parallel. What is the correct way to configure parallel trial execution?

A.Use the '--parallel-trials' flag in the gcloud ai hp-tuning-jobs create command

B.Set the 'parallelTrialCount' parameter in the study configuration to 10

C.Set the 'maxParallelTrials' attribute in the HyperparameterSpec

D.Create a CustomJob with 'numTrials' set to 10 and 'parallel' flag

AnswerB

This is the correct parameter to specify the number of parallel trials.

Why this answer

In Vertex AI Vizier, parallel trial execution is configured by setting the 'parallelTrialCount' field in the study configuration. The maxParallelTrials is not a direct field; instead, the StudySpec contains a StudyJobConfig with parallelTrialCount. Setting parallelTrialCount to 10 allows up to 10 trials to run concurrently.

Practice this question →

3

Multi-Selectmedium

A data science team is building a real-time feature engineering pipeline for ML model training and serving. They need to compute features from streaming data, store them for low-latency serving, and ensure consistency between training and serving. Which TWO Google Cloud services should they use?

Select 2 answers

A.Vertex AI Feature Store

B.BigQuery

C.Cloud Functions

D.Cloud Dataflow

E.Cloud SQL

AnswersA, D

Feature Store provides low-latency serving and ensures consistent feature definitions for training and serving.

Why this answer

Vertex AI Feature Store (A) is correct because it provides a centralized repository for storing, serving, and sharing feature data with low-latency online serving and batch serving for training, ensuring consistency between training and serving through point-in-time lookups and feature value time-stamping. Cloud Dataflow (D) is correct because it is a fully managed stream and batch processing service based on Apache Beam, enabling real-time feature engineering from streaming data with exactly-once processing semantics and automatic scaling.

Exam trap

A common trap in Google PMLE exams is assuming BigQuery can serve as a low-latency online feature store for real-time inference, but it is designed for analytical queries with seconds-to-minutes latency, not sub-millisecond serving required for real-time ML inference.

Practice this question →

4

MCQmedium

A team is using TensorFlow Transform (tf.Transform) to create preprocessing functions that will be used both in training and serving. They want to ensure consistency. Which artifact should they save after analyzing the training data?

A.A trained model checkpoint.

B.A analyzed_dataset directory with statistics.

C.A flattened schema file (schema.pbtxt).

D.A transform_fn SavedModel.

AnswerD

The transform_fn is the output of tf.Transform that applies the same transformation to new data.

Why this answer

The correct artifact to save after analyzing training data with tf.Transform is the `transform_fn` SavedModel. This SavedModel encapsulates the exact preprocessing logic (e.g., scaling, normalization, vocabulary mapping) computed from the training dataset, ensuring that the same transformations are applied consistently during both training and serving. Without this artifact, the serving pipeline would need to recompute or duplicate the transformation logic, risking skew between training and inference.

Exam trap

A common pitfall is to confuse intermediate analysis outputs (like statistics or schema) with the executable artifact (the SavedModel) that actually applies the transformation, leading candidates to mistakenly select the schema or statistics directory as the key artifact.

How to eliminate wrong answers

Option A is wrong because a trained model checkpoint stores the model weights and optimizer state after training, not the preprocessing function; it is used for resuming training or inference, not for ensuring consistent data transformations. Option B is wrong because an analyzed_dataset directory with statistics (e.g., mean, variance) is an intermediate output used to compute the transformation, but it is not the executable artifact that applies the transform; the actual transformation logic must be saved as a SavedModel. Option C is wrong because a flattened schema file (schema.pbtxt) describes the data schema (e.g., feature names, types, shapes) but does not contain the transformation operations; it is used for validation and metadata, not for executing the preprocessing pipeline.

Practice this question →

5

MCQmedium

A team is scaling a prototype ML model to production on Vertex AI. The model was developed using scikit-learn and requires custom preprocessing. They want to minimize operational overhead and ensure consistency between training and serving. Which approach should they use?

A.Train on a local machine and upload the model artifacts to Cloud Storage, then create an endpoint with a pre-built container.

B.Use a pre-built Vertex AI container for scikit-learn and provide a custom training Python package with preprocessing code included.

C.Deploy the model as a custom prediction routine on Vertex AI Endpoints with a custom container.

D.Export the model as a .pkl file and use Vertex AI's 'Import Model' with a default container for inference.

AnswerB

Pre-built containers reduce overhead; custom package handles preprocessing, ensuring consistency.

Why this answer

Option B is correct because using a pre-built Vertex AI container for scikit-learn with a custom training Python package ensures that the same preprocessing code runs during both training and serving, minimizing operational overhead. This approach leverages Vertex AI's managed infrastructure to handle scaling, monitoring, and consistency without requiring custom container maintenance.

Exam trap

The trap here is that candidates often assume a pre-built container cannot handle custom preprocessing, leading them to choose a custom container (Option C) or a simpler import (Option D), but Vertex AI allows embedding preprocessing in the training package or model artifact to maintain consistency with minimal overhead.

How to eliminate wrong answers

Option A is wrong because training on a local machine and uploading model artifacts to Cloud Storage, then creating an endpoint with a pre-built container, does not guarantee consistency between training and serving preprocessing logic, as the preprocessing code is not bundled with the model. Option C is wrong because deploying the model as a custom prediction routine with a custom container introduces unnecessary operational overhead for a scikit-learn model that can be served with a pre-built container, and it requires building and maintaining a custom Docker image. Option D is wrong because exporting the model as a .pkl file and using Vertex AI's 'Import Model' with a default container for inference does not include custom preprocessing code, leading to potential inconsistencies between training and serving.

Practice this question →

6

Multi-Selectmedium

You need to reduce the cost of training a large model on Vertex AI while maintaining fault tolerance. Which THREE actions should you take? (Choose 3)

Select 3 answers

A.Use spot VMs

B.Use MultiWorkerMirroredStrategy

C.Use TPUs instead of GPUs

D.Enable checkpointing and save to Cloud Storage

E.Use a single worker with multiple GPUs instead of multiple workers

AnswersA, D, E

Discounted instances.

Why this answer

Spot VMs are cheaper but can be preempted; using checkpointing and saving to a durable storage (like GCS) allows recovery. Also, using a single node with multiple GPUs may be cheaper than multiple nodes.

Practice this question →

7

MCQmedium

You are performing post-training quantisation of a trained TensorFlow model to INT8 for deployment on edge devices. Which technique should you use to minimise accuracy loss?

A.Float16 quantisation

B.Quantisation-aware training

C.Post-training integer quantisation with calibration

D.Post-training dynamic range quantisation

AnswerB

Trains the model to be robust to quantisation, minimising accuracy loss.

Why this answer

Quantisation-aware training (QAT) simulates quantisation during training, allowing the model to adapt, resulting in higher accuracy than post-training quantisation. Post-training quantisation is simpler but may lose accuracy.

Practice this question →

8

Multi-Selecthard

A data scientist is training a very large neural network using Vertex AI with multiple GPUs across multiple nodes. The model does not fit on a single GPU, so they need to use both data parallelism and model parallelism (pipeline parallelism). Which THREE components or configurations are required to set up distributed training with Vertex AI?

Select 3 answers

A.Using Vertex AI Vizier to optimize the model parallelism strategy

B.Enabling Vertex AI AutoML to automatically distribute the model

C.Implementing pipeline parallelism manually in the training script using torch.distributed.pipeline.sync.Pipe

D.A custom container with the distributed framework (e.g., PyTorch DDP) installed

E.Setting the --worker-machine-count flag when submitting the job

AnswersC, D, E

Manual implementation of pipeline parallelism is required as Vertex AI does not provide built-in model parallelism.

Why this answer

Option C is correct because pipeline parallelism requires explicit implementation in the training script, such as using `torch.distributed.pipeline.sync.Pipe` in PyTorch, to split the model layers across multiple GPUs. This is necessary when the model does not fit on a single GPU, and Vertex AI does not automatically handle model parallelism—it must be coded by the user.

Exam trap

This question tests the misconception that Vertex AI automatically handles model parallelism (e.g., via AutoML or Vizier), when in reality the user must manually implement it in the training script using frameworks like PyTorch or TensorFlow.

Practice this question →

9

MCQhard

An ML engineer is training a very large PyTorch model on Vertex AI using a TPU v3 pod. The training is slower than expected, and the TPU utilization is low. What is the most likely cause?

A.The data pipeline is a bottleneck; the TPU is waiting for data.

B.The learning rate schedule is too aggressive.

C.The model is using a single TensorFlow operation not supported by TPU.

D.The batch size is too large for the TPU memory.

AnswerA

TPUs are fast; insufficient data throughput leads to idle time.

Why this answer

The most likely cause of low TPU utilization is a data pipeline bottleneck, where the TPU spends a significant amount of time idle waiting for the next batch of data to be loaded and preprocessed. TPU v3 pods are designed for high-throughput matrix operations and can process data far faster than a typical CPU-based data loader can supply it, especially if the data pipeline uses inefficient I/O, lacks prefetching, or has insufficient workers. This mismatch starves the TPU, leading to low utilization and slower training.

Exam trap

Google often tests the misconception that low utilization is caused by model architecture or hyperparameter issues, when in reality the most common bottleneck in distributed TPU training is the data pipeline, not the compute or memory limits.

How to eliminate wrong answers

Option B is wrong because an aggressive learning rate schedule may cause training instability or divergence, but it does not directly cause low TPU utilization; utilization is a measure of hardware activity, not training convergence. Option C is wrong because the question explicitly states the model is a PyTorch model, and while PyTorch has limited TPU support compared to TensorFlow, the issue is not a single unsupported operation—such an operation would typically raise an error or fall back to CPU, not cause low utilization across the entire pod. Option D is wrong because a batch size that is too large for TPU memory would cause an out-of-memory (OOM) error, not low utilization; the TPU would fail to allocate the batch, not run slowly.

Practice this question →

10

Multi-Selectmedium

A company wants to train a custom machine learning model on Vertex AI using a pre-built container for scikit-learn. They want to use spot VMs to reduce costs. However, the training job fails intermittently due to preemption. Which TWO actions should they take to ensure the training job completes successfully?

Select 2 answers

A.Use a larger machine type to reduce training time

B.Increase the number of parallel trials in hyperparameter tuning

C.Set the worker_pool_specs to use spot VMs by setting spot=True

D.Set the max_retry_count in the worker pool spec to a value greater than 0

E.Implement checkpointing in the training code to save model state periodically to Cloud Storage

AnswersD, E

Vertex AI will retry the job if preempted up to max_retry_count times.

Why this answer

To handle spot VM preemptions, the training job must be restartable. Using checkpoints allows the job to resume from the last saved state. Vertex AI automatically retries on preemption if the job is restartable (managed by the service).

Setting max_retry_count in the worker pool spec allows Vertex AI to automatically restart the job after preemption. Also, reducing machine type or increasing parallel trials are not direct solutions.

Practice this question →

11

Multi-Selectmedium

A company is deploying a TensorFlow model on Vertex AI Prediction. The model is memory-intensive and requires GPU acceleration. The team wants to minimize latency and cost. Which TWO configurations should they select? (Select 2)

Select 2 answers

A.Use NVIDIA T4 GPUs

B.Enable autoscaling with a minimum of 1 instance

C.Use NVIDIA A100 GPUs for faster inference

D.Set manual machine count to 10 for consistent performance

E.Use batch prediction to reduce cost

AnswersA, B

T4 is optimized for inference and cost-effective.

Why this answer

For GPU-accelerated prediction, the T4 GPU is cost-effective and provides good performance for inference. Autoscaling with a minimum of 1 instance ensures availability while allowing the service to scale down when not in use. A100 is more expensive; batch prediction is for asynchronous large-scale jobs; manual scaling may lead to over-provisioning.

Practice this question →

12

MCQeasy

A company wants to bring their own Docker container to Vertex AI for training a model with a custom framework. They need to ensure the container is compatible with the Vertex AI training service. What is the minimum requirement for the container?

A.The container must be based on a Google-provided base image

B.The container must expose an HTTP server on port 8080 that responds to health checks and starts the training job when called

C.The container must expose an HTTP server on port 8080 and handle /health and /predict endpoints

D.The container must include a Python script that reads command-line arguments and outputs a trained model to /tmp/model

AnswerB

Vertex AI sends a health check to the container's port 8080; the container must respond appropriately and then execute training.

Why this answer

Vertex AI requires the container to run a simple HTTP server on port 8080 (default) to respond to health checks and accept training requests. The container must be stored in Artifact Registry or Container Registry.

Practice this question →

13

MCQmedium

A company wants to use Vertex AI Vizier to tune hyperparameters for a PyTorch model. They have a limited budget of 50 training jobs. The objective metric is validation accuracy, and they want to find the best configuration efficiently. Which algorithm should they choose?

A.Bayesian optimization using Vertex AI Vizier.

B.Random search with 50 random configurations.

C.Use a custom algorithm implemented in the training code.

D.Grid search with 50 evenly spaced points.

AnswerA

Bayesian optimization is designed for efficient search with limited trials.

Why this answer

Bayesian optimization is the most efficient algorithm for hyperparameter tuning when the number of trials is limited. It builds a probabilistic model of the objective function and selects promising configurations.

Practice this question →

14

Multi-Selectmedium

An ML engineer is using Vertex AI for distributed training of a PyTorch model across multiple nodes. The training job must use TPUs for high throughput. The engineer sets up the job configuration. Which THREE components are required for the training to work correctly? (Select 3)

Select 3 answers

A.A startup script to configure the TPU pod (e.g., `xla_lib.sh`)

B.A MultiWorkerMirroredStrategy configuration

C.A Docker image that includes PyTorch and the TPU library (torch-xla)

D.A TF_CONFIG environment variable set for each worker

E.A CustomJob with a TPU accelerator type (e.g., v3-32)

AnswersA, C, E

Startup scripts are often needed to initialize TPU devices.

Why this answer

A is correct because TPU pods require a startup script (e.g., `xla_lib.sh`) to initialize the XLA runtime, configure the TPU mesh, and set environment variables like `XRT_TPU_CONFIG`. Without this script, the TPU devices will not be discoverable by the PyTorch/XLA process, causing the training to fail with device-not-found errors.

Exam trap

Google Cloud often tests the distinction between TensorFlow and PyTorch distributed training configurations, and the trap here is assuming that `TF_CONFIG` or `MultiWorkerMirroredStrategy` are universal for all frameworks, when in fact PyTorch uses its own environment variables and the `torch-xla` library for TPU training.

Practice this question →

15

MCQmedium

A machine learning team is training a large transformer model on Vertex AI. They need to reduce training time by utilizing multiple GPUs across nodes, but the model is too large to fit into a single GPU memory. Which distributed training strategy should they use?

A.Model parallelism using tf.distribute.experimental.PipelineMirroredStrategy

B.Data parallelism using tf.distribute.MirroredStrategy

C.Multi-worker mirrored strategy with a single worker per node

D.Hyperparameter tuning with Vertex AI Vizier

AnswerA

PipelineMirroredStrategy implements pipeline parallelism, which splits the model across GPUs, reducing per-device memory footprint. This is appropriate for models too large for a single GPU.

Why this answer

Option A is correct because PipelineMirroredStrategy combines model parallelism (splitting the transformer layers across multiple GPUs) with pipeline parallelism to handle models that exceed single GPU memory. This strategy partitions the model into stages, each placed on a different GPU, and uses micro-batching to keep all GPUs busy, which is essential for large transformer models that cannot fit into a single GPU's memory.

Exam trap

The trap here is that candidates often confuse data parallelism (which requires the model to fit in a single GPU) with model parallelism, and assume that multi-worker strategies inherently solve memory constraints, but they only distribute data, not the model itself.

How to eliminate wrong answers

Option B is wrong because data parallelism (MirroredStrategy) replicates the entire model on each GPU, which fails if the model is too large to fit into a single GPU memory. Option C is wrong because multi-worker mirrored strategy with a single worker per node still relies on data parallelism and does not address the model size constraint; it only scales across nodes for data parallelism. Option D is wrong because hyperparameter tuning with Vertex AI Vizier optimizes hyperparameters, not the distributed training strategy for fitting a large model across GPUs.

Practice this question →

16

MCQeasy

An ML engineer wants to use Vertex AI Model Garden to deploy a pre-trained foundation model for text summarisation. What is the quickest way to achieve this?

A.Use Vertex AI AutoML for text summarisation

B.Use Vertex AI Pipelines to build a custom training pipeline

C.Export the model from Model Garden and deploy using a custom container

D.Use Vertex AI JumpStart to deploy the model with one click

AnswerD

JumpStart offers one-click deployment of foundation models.

Why this answer

Vertex AI JumpStart provides one-click deployment of foundation models like Llama, Gemini, etc. Model Garden is for discovery, but JumpStart directly deploys.

Practice this question →

17

MCQmedium

You are fine-tuning a pre-trained BERT model from Hugging Face for a sentiment analysis task using Vertex AI training. The dataset has 100k examples. To avoid catastrophic forgetting, which layer freezing strategy should you apply?

A.Unfreeze all layers and fine-tune the entire model

B.Freeze the first 6 layers, fine-tune the last 6 layers

C.Freeze all layers except the classification head

D.Use transfer learning only on the embeddings layer

AnswerA

With 100k examples, full fine-tuning is feasible and yields better performance.

Why this answer

For fine-tuning with a sufficiently large dataset, it is common to unfreeze all layers to adapt the model to the new task. Freezing many layers is typical for very small datasets. With 100k examples, full fine-tuning is appropriate.

Practice this question →

18

Multi-Selecthard

Your team is deploying a large model on edge devices and needs to reduce its size by 80% while maintaining reasonable accuracy. Which THREE techniques should they consider? (Choose 3.)

Select 3 answers

A.Quantisation to INT8

B.Transfer learning from a larger model

C.Knowledge distillation

D.Increasing model capacity with more layers

E.Pruning of redundant connections

AnswersA, C, E

Reduces model size by reducing precision of weights.

Why this answer

Quantisation to INT8 reduces the precision of model weights and activations from 32-bit floating point to 8-bit integers, cutting memory usage by approximately 75% (4x compression). This directly addresses the 80% size reduction target while often preserving accuracy within 1-2% through careful calibration and scaling, making it a primary technique for edge deployment.

Exam trap

Google Cloud often tests the misconception that transfer learning reduces model size, when in fact it only transfers learned features and does not compress the model; candidates may confuse it with knowledge distillation.

Practice this question →

19

MCQmedium

A data scientist wants to train a PyTorch model on Vertex AI using a pre-built container for GPU training. She needs to use 4 NVIDIA A100 GPUs on a single machine. Which machine configuration should she select?

A.n1-highmem-16 with 4 NVIDIA V100 GPUs

B.n1-standard-16 with 4 NVIDIA T4 GPUs

C.a2-highgpu-4g (4 A100 GPUs)

D.a2-megagpu-16g (16 A100 GPUs)

AnswerC

This machine type is specifically for A100 GPUs, providing 4 GPUs as required.

Why this answer

Vertex AI offers pre-built containers for PyTorch that support GPU training. To use 4 A100 GPUs, the machine type should be 'a2-highgpu-4g', which provides 4 A100 GPUs. The 'n1-standard-16' only supports up to 4 GPUs but typically uses P100 or T4, and 'n1-highmem-16' can support up to 4 GPUs but the GPU type is not A100 by default.

The 'a2-megagpu-16g' provides 16 GPUs.

Practice this question →

20

Multi-Selecthard

An engineer is designing a distributed training job on Vertex AI for a TensorFlow model that uses the MultiWorkerMirroredStrategy. They need to ensure proper communication between workers. Which two environment variables must be set correctly for each worker? (Choose TWO.)

Select 1 answer

A.CLUSTER_SPEC

B.TF_CPP_MIN_LOG_LEVEL

C.TF_CONFIG_JSON

D.TF_DISTRIBUTED_STRATEGY

E.TF_CONFIG

AnswersE

Required to specify cluster spec and task identity.

Why this answer

In TensorFlow distributed training with MultiWorkerMirroredStrategy, two environment variables are essential. First, `TF_CONFIG` (option E) provides the cluster topology and task information, enabling gRPC communication between workers. Second, `TF_DISTRIBUTED_STRATEGY` (option D) can be set to `'multi_worker_mirrored'` to automatically configure the strategy without modifying code.

Both must be correctly set for each worker to ensure proper communication and strategy initialization.

Exam trap

The exam often tests the distinction between the actual environment variable name (`TF_CONFIG`) and plausible-sounding alternatives like `TF_CONFIG_JSON` or `CLUSTER_SPEC`, leading candidates to pick a non-existent variable.

Practice this question →

21

MCQmedium

An ML team is using Vertex AI to train a deep learning model on a large dataset. To reduce costs, they want to use preemptible VMs for training jobs. However, training must complete within a bounded time. Which strategy should they use?

A.Use Cloud TPU instead of GPU; TPUs are not preemptible.

B.Use Vertex AI Training without spot VMs, because preemptible VMs are not supported for training.

C.Use Vertex AI Training with spot VMs and ensure the training code saves checkpoints periodically to Cloud Storage.

D.Use a single powerful non-preemptible VM to avoid interruptions.

AnswerC

Checkpointing allows resuming from the last checkpoint after a preemption, enabling completion.

Why this answer

Option C is correct because Vertex AI Training supports spot VMs (preemptible instances) for cost savings, and periodic checkpointing to Cloud Storage ensures that training can resume from the last saved state if a VM is preempted, allowing the job to complete within a bounded time despite interruptions.

Exam trap

A common misconception is that preemptible VMs are not supported in Vertex AI Training, but they are fully supported as spot VMs. The key to bounded-time completion is checkpointing to Cloud Storage for resumability.

How to eliminate wrong answers

Option A is wrong because Cloud TPUs are not inherently non-preemptible; they can also be preempted, and using TPUs does not address the cost-reduction goal with preemptible VMs. Option B is wrong because Vertex AI Training does support spot VMs (preemptible VMs) for training jobs, so the claim that they are not supported is incorrect. Option D is wrong because using a single powerful non-preemptible VM increases costs significantly and does not leverage the cost savings of preemptible instances, while still being susceptible to other failures without checkpointing.

Practice this question →

22

Multi-Selecteasy

A company wants to use Vertex AI JumpStart to deploy a pre-trained image classification model and later fine-tune it on their own data. Which TWO statements are true about Vertex AI JumpStart?

Select 2 answers

A.JumpStart requires users to build custom Docker containers for all models

B.JumpStart only supports text-based models

C.JumpStart allows you to fine-tune foundation models like Gemma

D.JumpStart only supports tabular data models

E.JumpStart provides one-click deployment of pre-trained models and ML solutions

AnswersC, E

JumpStart supports fine-tuning of foundation models such as Gemma.

Why this answer

Option C is correct because Vertex AI JumpStart supports fine-tuning of foundation models like Gemma, allowing users to adapt pre-trained models to their specific datasets. This capability is built into JumpStart's managed environment, which handles the underlying infrastructure for training and deployment.

Exam trap

In the Google PMLE exam, candidates often mistakenly think that JumpStart only supports a narrow set of model types (e.g., text-only or tabular-only), when in fact it supports a broad range including image, text, and tabular models, and provides one-click deployment and fine-tuning capabilities.

Practice this question →

23

MCQmedium

A team is training a large image classification model using transfer learning from a pre-trained ResNet50. The model will be deployed on mobile devices. They want to fine-tune only the last few layers while keeping the earlier layers frozen. Which approach should they use?

A.Load ResNet50, set trainable=False for all layers, and replace the final dense layer only

B.Load ResNet50, freeze all layers, add new classification layers, and train only the new layers

C.Load ResNet50 from Keras Applications, set trainable=True for all layers, add new layers, and train the entire model

D.Use AutoML Vision to transfer learn without coding

AnswerB

This is the standard fine-tuning approach for resource-constrained deployment.

Why this answer

Transfer learning typically involves loading a pre-trained model (e.g., ResNet50 from Keras Applications) without the top classification layer, freezing all layers, adding new trainable layers on top, and then training. The base model's layers are frozen (trainable=False). After initial training, one can optionally unfreeze some top layers.

Practice this question →

24

Multi-Selecthard

A machine learning team is building a feature engineering pipeline using Dataflow. They need to compute features from streaming data and store them in Vertex AI Feature Store for online serving. The features must be updated within 5 seconds of the event. Which TWO services should they combine? (Select 2)

Select 2 answers

A.Cloud Dataflow for stream processing and feature computation

B.Cloud Pub/Sub for event ingestion

C.Cloud Storage for feature store

D.Cloud Functions for feature transformation

E.BigQuery for feature storage

AnswersA, B

Dataflow can compute features in near real-time and write to Feature Store.

Why this answer

Cloud Dataflow is correct because it provides unified stream and batch processing with exactly-once semantics, enabling low-latency feature computation from streaming data. It integrates natively with Vertex AI Feature Store for online serving, ensuring features are updated within the required 5-second SLA.

Exam trap

The exam often tests the distinction between general-purpose storage services (Cloud Storage, BigQuery) and the dedicated online feature store (Vertex AI Feature Store) required for real-time ML serving, leading candidates to pick a storage option instead of the correct streaming ingestion (Pub/Sub) and processing (Dataflow) pair.

Practice this question →

25

MCQhard

You have an edge device with limited compute resources. You need to deploy a deep learning model for real-time inference. Which model compression technique should you apply to reduce the model size and latency with minimal accuracy loss?

A.Pruning only

B.Post-training quantization to INT8

C.Knowledge distillation only

D.Use full precision FP32 to maintain accuracy

AnswerB

Reduces model size by 4x and speeds up inference on edge hardware.

Why this answer

Post-training quantization (e.g., INT8) is the easiest and most effective method for reducing model size and latency on edge devices. Quantization-aware training can yield better accuracy but is more complex. Pruning and distillation also help, but quantization often gives the best trade-off.

Practice this question →

26

MCQmedium

A data scientist needs to train a large PyTorch model on a custom dataset using Vertex AI. The training script expects data from Cloud Storage and uses GPU acceleration. Which option correctly configures a custom training job with a pre-built container for PyTorch and attaches a single NVIDIA V100 GPU?

A.Use a custom container built from PyTorch base image and specify accelerator_count=1 in the machine spec

B.Use the pre-built container 'us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-12:latest' and in worker_pool_specs set machine_type='n1-standard-4', accelerator_type='NVIDIA_TESLA_V100', accelerator_count=1

C.Use the AI Platform Training service with gcloud ai-platform jobs submit training and --scale-tier BASIC_GPU

D.Create a training pipeline with AutoML and select GPU runtime

AnswerB

This correctly uses a pre-built container, sets the proper machine type and GPU accelerator.

Why this answer

In Vertex AI, worker_pool_specs define machine types and accelerators. Using a pre-built container for PyTorch 1.12 with image_uri 'us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-12:latest' and specifying worker_count=1, machine_type='n1-standard-4' (or similar), and accelerator_type='NVIDIA_TESLA_V100' with count=1 sets up the job correctly.

Practice this question →

27

MCQmedium

You are designing a distributed training job on Vertex AI for a PyTorch model using DataDistributedParallel (DDP). You have 4 nodes, each with 4 GPUs. What is the total number of workers that should be configured in the TF_CONFIG equivalent for PyTorch?

A.4

B.8

C.1

D.16

AnswerD

Total GPUs across all nodes: 4 nodes * 4 GPUs = 16 workers.

Why this answer

In PyTorch DDP on Vertex AI, each GPU typically runs one process. With 4 nodes each with 4 GPUs, the world size is 16 (total GPUs). Vertex AI distributed training uses TF_CONFIG to set up the cluster; for PyTorch, you need to set the number of workers to the total number of processes, which is 16.

Practice this question →

28

MCQhard

You need to preprocess a large dataset (terabytes) for training a TensorFlow model. The preprocessing includes scaling and bucketizing features, and the same transformations must be applied during serving. Which tool should you use?

A.Dataflow with Apache Beam and tf.Transform

B.Dataproc with Spark ML

C.Vertex AI Feature Store

D.BigQuery ML

AnswerA

tf.Transform runs on Dataflow to scale, and produces a saved model for consistent serving.

Why this answer

TensorFlow Transform (tf.Transform) allows you to define preprocessing pipelines that are applied consistently during training and serving. It computes statistics over the full dataset and creates a TensorFlow graph for serving.

Practice this question →

29

MCQhard

You have a very large language model that does not fit on a single GPU. You need to train it efficiently across multiple GPUs on a single machine. Which approach should you use?

A.Data parallelism with MirroredStrategy

B.Data parallelism with MultiWorkerMirroredStrategy

C.Use TPU training as TPUs have more memory

D.Model parallelism using pipeline parallelism

AnswerD

Splits model layers across GPUs, enabling training of models larger than memory.

Why this answer

When a model is too large for a single GPU, model parallelism (specifically pipeline parallelism) splits the model across GPUs. Data parallelism would require each GPU to hold the full model, which is not possible.

Practice this question →

30

MCQeasy

You have a TensorFlow training script that runs on a single machine. To speed up training on Vertex AI with 8 GPUs on a single machine, which strategy should you use?

A.tf.distribute.ParameterServerStrategy

B.tf.distribute.MirroredStrategy

C.tf.distribute.TPUStrategy

D.tf.distribute.MultiWorkerMirroredStrategy

AnswerB

MirroredStrategy is designed for single-machine multi-GPU synchronous training.

Why this answer

MirroredStrategy performs synchronous data parallelism across multiple GPUs on a single machine. MultiWorkerMirroredStrategy is for multiple machines, not needed here. ParameterServerStrategy is for distributed asynchronous training.

TPUStrategy is for TPUs.

Practice this question →

31

MCQmedium

You are performing hyperparameter tuning on Vertex AI with Vizier. You want to maximize the accuracy of your model, and you have a budget of 50 trials. Which algorithm should you choose to best explore the search space?

A.No algorithm; use default Vertex AI tuning

B.Bayesian optimization

C.Grid search

D.Random search

AnswerB

Bayesian optimization builds a probabilistic model and selects promising hyperparameters, efficient for 50 trials.

Why this answer

Bayesian optimization is the default and recommended algorithm for most use cases as it balances exploration and exploitation efficiently. For a budget of 50 trials, Bayesian optimization is suitable.

Practice this question →

32

Multi-Selecthard

A team is training a custom TensorFlow model on Vertex AI using a pre-built container. They need to use a TPU pod slice (v3-32). What THREE actions are required to set up the training job correctly?

Select 3 answers

A.Configure TF_CONFIG for distributed training

B.Set the training worker pool to use only one worker

C.Specify the accelerator type as TPU_V3 and topology as '2x2x4'

D.Use a custom container with TensorFlow 2.12 and TPU support

E.Set the machine type to a high-memory VM with NVIDIA A100 GPUs

AnswersA, C, D

When using TPU pods with multi-worker, TF_CONFIG must be set to coordinate workers.

Why this answer

Option A is correct because Vertex AI training with TPU pod slices requires explicit `TF_CONFIG` environment variable configuration to enable distributed training across the TPU hosts. The `TF_CONFIG` must define the cluster (e.g., `worker` and `chief` jobs) and task index so that TensorFlow can coordinate the all-reduce communication via gRPC or the TPU's internal interconnect. Without this, each worker would operate independently, failing to leverage the pod slice's collective computation.

Exam trap

A common misconception is that TPU pods can be treated as a single accelerator like a GPU, leading candidates to select a single-worker pool (Option B) or a GPU machine type (Option E), when in fact TPU pod slices require explicit multi-worker topology and TF_CONFIG setup.

Practice this question →

33

MCQmedium

You want to deploy a trained scikit-learn model to Vertex AI for online predictions. The model file is 2 GB. Which option should you use?

A.Use AI Platform (now Vertex AI) with a large model VM

B.Use a custom container with the scikit-learn model

C.Use Cloud Functions to host the model

D.Upload the model to Vertex AI Model Registry and deploy using pre-built scikit-learn container

AnswerB

Custom container allows you to handle large models and dependencies.

Why this answer

Vertex AI supports custom containers for deploying models. For scikit-learn, you can use a pre-built container for scikit-learn, but the model size (2 GB) may exceed the default limits; using a custom container gives you more control. However, the pre-built scikit-learn container supports up to 2 GB model size, but it's safer to use a custom container for large models.

The recommended approach for scikit-learn >500 MB is to use a custom container.

Practice this question →

34

MCQhard

You are fine-tuning a large language model (LLM) from Hugging Face Transformers using Vertex AI Training. The model has 7 billion parameters and does not fit into the memory of a single GPU. You need to train across multiple GPUs, splitting the model layers across devices. Which distributed training approach should you use?

A.Model parallelism using pipeline parallelism

B.Data parallelism with MultiWorkerMirroredStrategy

C.Mixed precision training (FP16)

D.Data parallelism with tf.distribute.MirroredStrategy

AnswerA

Pipeline parallelism splits layers across devices, allowing large models to fit by distributing the model parameters.

Why this answer

Model parallelism (pipeline parallelism) splits model layers across devices, necessary for large models that don't fit on one GPU. Data parallelism replicates the model and splits data, not suitable if model doesn't fit. Mixed precision reduces memory but still requires model parallelism for 7B.

Fully sharded data parallelism (FSDP) is a form of data parallelism with sharding, but pipeline parallelism is more common for layer-wise splitting.

Practice this question →

35

MCQhard

A machine learning team is deploying a PyTorch model on Vertex AI Prediction for real-time inference. The model was trained with preprocessing that includes tokenization and normalization. They want to embed the preprocessing logic in the model to reduce prediction latency and avoid additional service calls. Which approach should they take?

A.Deploy the preprocessing logic as a Cloud Function and invoke it before calling the prediction endpoint

B.Wrap the preprocessing logic in a Flask application and deploy it as a separate microservice in front of the prediction endpoint

C.Use TorchScript to trace the preprocessing steps and export the entire pipeline as a single scripted model

D.Use TensorFlow Transform to convert preprocessing into a SavedModel and call it from the PyTorch model

AnswerC

TorchScript compiles PyTorch code into a graph that can be run efficiently in C++ runtime, ideal for production serving.

Why this answer

TorchScript allows exporting the entire model (including preprocessing) into a serialized format that can be run without Python dependencies. This reduces latency as all operations are within the exported graph. Wrapping in a Flask app or using Cloud Functions would introduce overhead.

Training with tf.Transform is not applicable for PyTorch.

Practice this question →

36

MCQhard

You are fine-tuning a pre-trained model using transfer learning. The new dataset is small and very similar to the original training data. To avoid overfitting, which layer freezing strategy should you adopt?

A.Unfreeze the last few layers and freeze the rest

B.Randomly reinitialise all layers and train from scratch

C.Freeze all layers and train only the classifier head

D.Unfreeze all layers and train the entire model

AnswerC

Minimises overfitting by limiting trainable parameters; features are already good.

Why this answer

When fine-tuning a pre-trained model on a small dataset that is very similar to the original training data, the safest strategy to avoid overfitting is to freeze all layers and train only the classifier head. This preserves the rich, general-purpose feature representations learned from the original large dataset, while allowing the final classification layer to adapt to the new task. Training the entire model or unfreezing many layers on a small dataset would risk overfitting because the model would have too many parameters to update relative to the limited new samples.

Exam trap

A common pitfall in this exam is the misconception that unfreezing more layers always yields better fine-tuning performance. However, with a small, similar dataset, freezing all layers except the classifier head is the correct regularization strategy to prevent overfitting, not a sign of underfitting.

How to eliminate wrong answers

Option A is wrong because unfreezing the last few layers still updates a significant number of parameters, which can lead to overfitting when the new dataset is very small and similar to the original data; the risk is that the model will memorize the small dataset rather than generalize. Option B is wrong because randomly reinitializing all layers and training from scratch discards all the pre-trained knowledge, which defeats the purpose of transfer learning and, with a small dataset, will almost certainly result in severe overfitting or failure to converge. Option D is wrong because unfreezing all layers and training the entire model on a small dataset is the most aggressive overfitting scenario, as it allows every parameter to be updated, making the model highly likely to memorize the training examples instead of learning generalizable features.

Practice this question →

37

MCQhard

An ML engineer is using Vertex AI distributed training for a TensorFlow model that uses the MirroredStrategy. They notice that the training throughput drops significantly when moving from a single GPU to multiple GPUs on the same machine. What is the most likely cause?

A.The GPUs are not properly configured in TF_CONFIG.

B.The batch size is too small, causing each GPU to complete its forward pass quickly, but the sync wait dominates.

C.The learning rate is too high, causing instability.

D.The model uses TensorFlow 1.x instead of 2.x.

AnswerB

With small batch sizes, GPUs are underutilized, and sync overhead becomes significant.

Why this answer

MirroredStrategy synchronously updates gradients across GPUs. The overhead of gradient synchronization (all-reduce) can become a bottleneck if the model is small or the network between GPUs is slow. This is a common issue.

Practice this question →

38

Multi-Selectmedium

An ML team is optimizing an inference model for deployment on edge devices. They need to reduce the model size and improve latency while maintaining accuracy as much as possible. Which two techniques should they use? (Choose TWO.)

Select 2 answers

A.Use a larger pre-trained model as a starting point.

B.Post-training quantization to INT8.

C.Use half-precision (FP16) instead of INT8.

D.Apply weight pruning to remove small weights.

E.Increase the number of layers in the model.

AnswersB, D

Reduces size and latency with minimal accuracy loss.

Why this answer

Post-training quantization to INT8 reduces model size by converting 32-bit floating-point weights and activations to 8-bit integers, which also speeds up inference on edge devices with integer-optimized hardware. This technique typically maintains accuracy within 1-2% of the original model while significantly lowering memory footprint and latency.

Exam trap

Candidates often think that FP16 is always better than INT8 for edge devices, but INT8 offers greater size reduction and is more widely supported on edge hardware, including Google's Edge TPU.

Practice this question →

39

MCQeasy

A developer wants to quickly deploy a pre-trained foundation model for text generation without writing any code. Which Vertex AI feature should they use?

A.Vertex AI Model Garden

B.Vertex AI Endpoints

C.Vertex AI AutoML

D.Vertex AI JumpStart

AnswerD

JumpStart offers one-click deployment of foundation models.

Why this answer

Vertex AI JumpStart provides one-click deployment of foundation models and ML solutions. It allows deploying pre-trained models without coding.

Practice this question →

40

MCQhard

Your team is deploying a large language model (LLM) on Vertex AI for online prediction. The model exceeds the maximum request size for Vertex AI Prediction. Which approach should you take to serve this model?

A.Use Vertex AI Endpoint with a larger machine type and gRPC

B.Use Vertex AI Batch Prediction

C.Split the model into smaller parts and deploy multiple endpoints

D.Deploy the model on a Compute Engine VM with a custom container and a load balancer

AnswerD

Bypasses Vertex AI Prediction limits; you can handle large payloads.

Why this answer

Vertex AI Prediction has a request size limit (1.5 MB). Using a custom container with a ModelServer (e.g., TensorFlow Serving) behind an HTTP load balancer bypasses this limit and allows large payloads.

Practice this question →

41

MCQeasy

A machine learning engineer wants to use Vertex AI Vizier to tune three hyperparameters: learning rate (log scale), number of layers (integer), and optimizer (categorical). They have 50 parallel trials available. Which parameter specification types should they define?

A.learning_rate: CATEGORICAL, layers: INTEGER, optimizer: CATEGORICAL

B.learning_rate: DOUBLE (unit_log_scale), layers: INTEGER (unit_linear_scale), optimizer: CATEGORICAL

C.learning_rate: DOUBLE (unit_log_scale), layers: DOUBLE (unit_linear_scale), optimizer: DISCRETE

D.learning_rate: DOUBLE (unit_linear_scale), layers: INTEGER (unit_linear_scale), optimizer: CATEGORICAL

AnswerB

Correct types and scales for the parameters.

Why this answer

In Vertex AI Vizier, continuous parameters use DOUBLE type (can be scaled log), integer parameters use INTEGER, and categorical parameters use CATEGORICAL. The scale type for learning rate should be UNIT_LOG_SCALE for log scale.

Practice this question →

42

Multi-Selecthard

You are fine-tuning a Gemma model using Vertex AI JumpStart. You want to combine the fine-tuned model with a custom output layer for a unique task. Which TWO components are required to deploy the combined model? (Choose 2)

Select 2 answers

A.Vertex AI Feature Store

B.Cloud Run

C.Vertex AI Model Registry

D.Custom container with the model and custom head

E.Pre-built container for Gemma

AnswersC, D

Required to store and deploy the model.

Why this answer

To customize the output layer, you need a custom container that loads both the fine-tuned base model and your custom head. The model must be registered in Vertex AI Model Registry for deployment.

Practice this question →

43

MCQeasy

A data scientist wants to use a pre-trained ResNet model from Keras Applications and fine-tune it on a small custom dataset. Which approach should they take to avoid overfitting?

A.Freeze the first few layers and train the rest.

B.Add more convolutional layers to the model.

C.Use a larger learning rate to speed up training.

D.Train the entire model from scratch on the custom dataset.

AnswerA

Freezing earlier layers preserves general features; training only later layers adapts to the new task.

Why this answer

Freezing the earlier layers (which capture general features) and only training the later layers is a common transfer learning approach for small datasets, reducing overfitting.

Practice this question →

44

MCQmedium

An engineer is using TensorFlow Transform (tf.Transform) to preprocess training data. They want to ensure that the same preprocessing logic is applied during inference without code duplication. Which approach should they take?

A.Use tf.Transform at prediction time by running a separate Beam pipeline

B.Use Dataflow to preprocess data for both training and serving

C.Use tf.Transform to generate a transform_fn and save it as a SavedModel; then use tf.saved_model.load to apply it in the serving pipeline

D.Write separate preprocessing code for training and serving in Python

AnswerC

The transform_fn SavedModel ensures consistency.

Why this answer

TensorFlow Transform outputs a SavedModel that contains the preprocessing graph. This can be exported as a transform_fn and embedded in the serving model, ensuring consistency between training and serving.

Practice this question →

45

MCQhard

A machine learning engineer is deploying a TensorFlow model on an edge device with limited memory and compute. The model needs to perform inference with low latency. The engineer has a trained float32 model. Which model compression technique should be applied first to reduce the model size and improve inference speed without significant accuracy loss?

A.Post-training quantization to INT8

B.Knowledge distillation

C.Quantization-aware training

D.Weight pruning

AnswerA

This is the recommended first step for edge deployment.

Why this answer

Post-training quantization to INT8 is the correct first step because it directly reduces the model size by approximately 4x (from 32-bit floats to 8-bit integers) and speeds up inference on edge devices by leveraging integer-optimized hardware (e.g., ARM NEON or Qualcomm Hexagon). This technique requires no retraining and typically yields minimal accuracy loss for most TensorFlow models, making it the fastest path to deploy on resource-constrained devices.

Exam trap

Google Cloud often tests the misconception that quantization-aware training is always required for INT8 deployment, but the trap here is that post-training quantization is the simplest and most effective first step for reducing model size and latency on edge devices, with quantization-aware training reserved only for cases where accuracy drops below acceptable thresholds.

How to eliminate wrong answers

Option B (Knowledge distillation) is wrong because it requires training a smaller student model from scratch using the teacher model's outputs, which is computationally expensive and not a quick compression technique for an already trained model. Option C (Quantization-aware training) is wrong because it simulates quantization effects during training to preserve accuracy, but it requires retraining the model, making it a second step after post-training quantization if accuracy loss is unacceptable. Option D (Weight pruning) is wrong because it removes individual weights (often via magnitude-based pruning), which can reduce model size but typically requires retraining to recover accuracy and does not directly improve inference speed on standard edge hardware without sparse matrix support.

Practice this question →

46

MCQmedium

A company has a TensorFlow model for image classification that must run on edge devices with limited memory. They need to reduce the model size without significant accuracy loss. Which technique should they use?

A.Post-training quantization using TensorFlow Lite.

B.Knowledge distillation to train a smaller student model.

C.Pruning the model weights to zero out unimportant connections.

D.Use a larger VM for training.

AnswerA

TFLite quantization reduces size and latency, suitable for edge devices.

Why this answer

Post-training quantization (e.g., INT8) reduces model size and speeds up inference with minimal accuracy loss. It is the simplest method for deployment on edge devices.

Practice this question →

47

MCQeasy

You want to use Vertex AI JumpStart to quickly deploy a pre-built foundation model for text summarization. Which action is required?

A.Select the model from Model Garden and deploy it to a Vertex AI endpoint

B.Train the model from scratch using Vertex AI Training

C.Export the model to a Cloud Storage bucket and use batch prediction

D.Build a custom Docker container with the model and deploy to Vertex AI

AnswerA

JumpStart allows selecting and deploying foundation models directly.

Why this answer

JumpStart provides one-click deployment of foundation models from Model Garden. You select a model and deploy it to an endpoint. No custom training or container building is needed.

Practice this question →

48

MCQhard

You need to perform a large-scale feature computation on streaming data from Pub/Sub, transforming raw events into features, and writing results to Vertex AI Feature Store for online serving. Which Google Cloud architecture is most appropriate?

A.Use Dataproc with Spark Streaming to read from Pub/Sub and write to Feature Store

B.Use Cloud Functions triggered by Pub/Sub to compute features and update Feature Store

C.Use Dataflow streaming pipeline with Apache Beam to read from Pub/Sub, compute features, and write to Feature Store

D.Use Cloud Run to consume Pub/Sub messages and update Feature Store via a service

AnswerC

Dataflow streaming is ideal for scalable, low-latency stream processing with exactly-once semantics.

Why this answer

Dataflow with streaming (Apache Beam) can read from Pub/Sub, transform data, and write to Feature Store via the online serving API. Cloud Functions is not suitable for complex transforms. Dataproc Streaming is possible but Dataflow is more natural.

Cloud Run is for request-response.

Practice this question →

49

MCQhard

You are deploying a deep learning model on edge devices with limited computational resources. The model must run inference in <10 ms and the model size must be under 50 MB. Currently, your trained model is 200 MB and runs in 50 ms. Which combination of model compression techniques should you apply?

A.Only apply weight pruning

B.Apply quantization-aware training and knowledge distillation

C.Use knowledge distillation to train a smaller student model

D.Apply post-training quantization (INT8) and pruning

AnswerD

Quantization reduces size and latency; pruning reduces complexity; both can be applied post-training.

Why this answer

Post-training quantization to INT8 reduces model size by 4x and often speeds up inference. Pruning removes redundant weights, further reducing size. Knowledge distillation would require retraining a smaller student model.

Quantization-aware training is more accurate but needs retraining. For a simple fix, quantization and pruning are effective.

Practice this question →

50

MCQmedium

A team is training a large TensorFlow model that requires more memory than a single GPU provides. They have access to multiple GPUs on a single machine. Which distributed training strategy should they use to split the model layers across GPUs?

A.tf.distribute.experimental.MultiWorkerMirroredStrategy

B.tf.distribute.experimental.ParameterServerStrategy

C.Manual device placement using tf.device to assign layers to specific GPUs

D.tf.distribute.MirroredStrategy

AnswerC

Model parallelism in TensorFlow is typically done by manually placing operations on different GPUs using tf.device.

Why this answer

Model parallelism splits the model itself across devices, as opposed to data parallelism which replicates the model. For TensorFlow, this is typically achieved using device placement or tf.distribute.experimental.ParameterServerStrategy with manual partitioning, but for splitting layers, tf.distribute.experimental.DeviceAssignment or manual tf.device is used.

Practice this question →

51

MCQmedium

You are using Vertex AI hyperparameter tuning with a custom container. The training job reports the objective metric but Vizier is not converging. Which configuration change could improve convergence?

A.Use a larger machine type for each trial

B.Increase the number of parallel trials

C.Reduce the number of parallel trials

D.Switch from Bayesian to grid search

AnswerB

More parallel trials allow broader exploration, helping convergence.

Why this answer

Increasing the number of parallel trials can help explore more of the hyperparameter space, but the specified number of max trials must be high enough. Reducing parallel trials may slow exploration. Bayesian optimisation benefits from more trials.

Practice this question →

52

MCQmedium

A company is using Vertex AI Vizier for hyperparameter tuning of a model with 5 integer hyperparameters, each with a range of 10-100. They have a budget of 50 trials and want to maximize the chance of finding the best configuration. Which Vizier algorithm should they use?

A.Grid search

B.Simulated annealing

C.Bayesian optimization (GP bandit)

D.Random search

AnswerC

Bayesian optimization uses a probabilistic model to select promising configurations, ideal for small budgets.

Why this answer

Bayesian optimization (GP bandit) is best for small trial budgets as it uses past results to guide search. Grid search would be too many combinations. Random search is better than grid but still less efficient than Bayesian.

Vizier does not support simulated annealing.

Practice this question →

53

MCQmedium

A data scientist has a TensorFlow 2.x model trained on a single GPU. They want to scale training to multiple GPUs on a single Vertex AI machine without code changes. Which strategy should they use?

A.MultiWorkerMirroredStrategy

B.TPUStrategy

C.CentralStorageStrategy

D.MirroredStrategy

AnswerD

Designed for single-machine multi-GPU training; no code changes needed beyond wrapping in strategy scope.

Why this answer

MirroredStrategy distributes training across GPUs on a single machine with minimal code changes — just wrap model creation and compile inside the strategy scope. MultiWorkerMirroredStrategy is for multi-machine, not single-machine multi-GPU.

Practice this question →

54

MCQeasy

Which Vertex AI service allows you to discover, fine-tune, and deploy foundation models with a few clicks, including models like Llama and Gemma?

A.Vertex AI Prediction

B.Vertex AI Vizier

C.Vertex AI Model Garden

D.Vertex AI JumpStart

AnswerD

JumpStart offers one-click deployment and fine-tuning of foundation models.

Why this answer

Vertex AI JumpStart (now part of Model Garden) provides one-click deployment and fine-tuning of foundation models. Model Garden is the broader model discovery hub, but JumpStart is the feature for quick start.

Practice this question →

55

MCQeasy

A data scientist wants to quickly experiment with a pre-trained Vision Transformer model from Hugging Face and fine-tune it on a custom dataset using Vertex AI. They want to use a managed environment with minimal setup. Which Vertex AI service should they use?

A.Vertex AI Prediction

B.Vertex AI Workbench

C.Vertex AI JumpStart

D.AI Platform Training

AnswerC

JumpStart offers pre-built models and ML solutions that can be deployed and fine-tuned with minimal effort.

Why this answer

Vertex AI JumpStart is the correct choice because it provides a managed environment with pre-built, optimized containers for popular models like Vision Transformers, enabling one-click deployment and fine-tuning with minimal setup. It abstracts away infrastructure management, allowing the data scientist to quickly experiment without configuring custom training scripts or environments.

Exam trap

Google often tests the distinction between managed services (JumpStart) and semi-managed environments (Workbench), where candidates mistakenly choose Workbench for its notebook interface, overlooking that JumpStart offers a more streamlined, pre-configured path for quick experimentation.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction is designed for serving trained models for inference, not for training or fine-tuning; it lacks the capability to run training jobs. Option B is wrong because Vertex AI Workbench is a Jupyter-based notebook environment that requires manual setup of dependencies and infrastructure, which contradicts the 'minimal setup' requirement. Option D is wrong because AI Platform Training is a legacy service that has been superseded by Vertex AI; it requires more manual configuration and does not offer the same level of pre-built, managed integration for Hugging Face models as JumpStart.

Practice this question →

56

MCQmedium

You are fine-tuning a pre-trained BERT model from Hugging Face on a custom text classification dataset using Vertex AI Training. You want to speed up training by using mixed precision. What should you do?

A.Modify the model to use half-precision layers

B.Use a custom container with TensorFlow instead of PyTorch

C.Enable mixed precision via Vertex AI hyperparameter tuning

D.Set fp16=True in the TrainingArguments

AnswerD

This enables mixed precision in Hugging Face Trainer.

Why this answer

Hugging Face Trainer supports mixed precision via the fp16 argument. Set it to True in the TrainingArguments. No need to modify the model architecture or use a custom container.

Practice this question →

57

MCQmedium

An organization wants to use Vertex AI JumpStart to fine-tune a foundation model for a custom classification task. They have a labeled dataset stored in BigQuery. Which steps should they take?

A.Export the data from BigQuery to CSV files in Cloud Storage, then upload to JumpStart.

B.Write a custom training script using PyTorch and submit to Vertex AI Training.

C.Use the Vertex AI console to select a model from JumpStart, specify the BigQuery table as the source, and launch fine-tuning.

D.Use Vertex AI Model Garden to deploy the model directly without fine-tuning.

AnswerC

JumpStart integrates with BigQuery for data input.

Why this answer

JumpStart fine-tuning typically involves selecting a model, connecting to your dataset (often via BigQuery), configuring training, and launching. The process is largely GUI-based.

Practice this question →

58

MCQmedium

An ML engineer is using Vertex AI Vizier to tune hyperparameters for a PyTorch model. They want to maximise the chance of finding the global optimum within a fixed trial budget of 50 trials. Which algorithm should they select?

A.Random search

B.Bayesian optimisation

C.Grid search

D.Evolutionary algorithm

AnswerB

Uses probabilistic model to focus on promising regions; best for limited budget.

Why this answer

Bayesian optimisation (option B) is the correct choice because it builds a probabilistic surrogate model of the objective function and uses an acquisition function to balance exploration and exploitation, making it highly sample-efficient. With only 50 trials, Bayesian optimisation maximises the probability of finding the global optimum by focusing trials on the most promising hyperparameter regions, unlike random or grid search which waste trials on unpromising areas.

Exam trap

The trap here is that candidates often choose random search (option A) because they recall it is better than grid search for high-dimensional spaces, but they overlook that Bayesian optimisation is strictly more sample-efficient and is the default recommendation in Vertex AI Vizier for maximising global optimum discovery under a fixed trial budget.

How to eliminate wrong answers

Option A is wrong because random search, while better than grid search in high-dimensional spaces, does not use past trial results to guide future trials, so it wastes trials on suboptimal regions and has a lower probability of finding the global optimum within a fixed budget of 50 trials. Option C is wrong because grid search exhaustively evaluates a fixed set of points, which scales exponentially with the number of hyperparameters and is extremely inefficient for more than a few parameters, often missing the global optimum entirely within a limited budget. Option D is wrong because evolutionary algorithms (e.g., genetic algorithms) are population-based and require many generations to converge, typically needing hundreds or thousands of trials to be effective, making them impractical for a tight budget of 50 trials.

Practice this question →

59

Multi-Selecteasy

A company is deploying a computer vision model on edge devices using TensorFlow Lite. They want to reduce model size without significant accuracy loss. Which TWO model compression techniques are most suitable?

Select 2 answers

A.Quantization-aware training (QAT)

B.Weight pruning

C.Knowledge distillation

D.Post-training float16 quantization

E.Increasing model depth

AnswersB, D

Pruning removes near-zero weights, reducing model size. It can be done post-training with minimal accuracy loss.

Why this answer

Weight pruning (B) is suitable because it removes redundant connections (weights) from the neural network, reducing the model size and computational cost while often preserving accuracy if done gradually. Post-training float16 quantization (D) converts model weights from float32 to float16, halving the storage size with minimal accuracy loss, and is directly supported by TensorFlow Lite for edge deployment.

Exam trap

In Google's PMLE exam, they often test the distinction between techniques that directly reduce model size (pruning, quantization) versus those that improve accuracy or create new models (QAT, knowledge distillation), leading candidates to select QAT as a compression method when it is actually a training-time optimization.

Practice this question →

60

MCQhard

Your team is training a very large transformer model that does not fit on a single GPU. They are using Vertex AI custom training with PyTorch. Which distributed training approach should they use?

A.Data parallelism using PyTorch DistributedDataParallel (DDP)

B.Horovod with allreduce

C.Model parallelism using pipeline parallelism

D.Vertex AI distributed training with TF_CONFIG

AnswerC

Splits layers across GPUs, allowing large models.

Why this answer

When a transformer model is too large to fit on a single GPU, model parallelism (specifically pipeline parallelism) is required because it splits the model's layers across multiple devices, with each device holding a subset of the model's parameters. Data parallelism (DDP) replicates the entire model on each GPU, which fails if the model exceeds a single GPU's memory. Pipeline parallelism allows training very large models by partitioning the model into stages and passing activations and gradients sequentially between devices.

Exam trap

Google Cloud often tests the distinction between data parallelism (which replicates the model) and model parallelism (which splits the model), and the trap here is that candidates assume any distributed training framework (like DDP or Horovod) can handle oversized models, ignoring the fundamental memory constraint that data parallelism cannot overcome.

How to eliminate wrong answers

Option A is wrong because PyTorch DistributedDataParallel (DDP) implements data parallelism, which requires the entire model to fit on each GPU; if the model is too large for one GPU, DDP cannot be used. Option B is wrong because Horovod with allreduce is also a data-parallel approach that replicates the model on every worker, suffering the same memory limitation as DDP. Option D is wrong because Vertex AI distributed training with TF_CONFIG is a configuration mechanism for TensorFlow-based distributed training (using MirroredStrategy or MultiWorkerMirroredStrategy), not a PyTorch-native approach, and it still relies on data parallelism unless combined with model parallelism; the question specifies PyTorch, making this option technically incompatible.

Practice this question →

61

MCQhard

A data engineering team needs to compute rolling window features (7-day average, 30-day sum) from a high-volume stream of e-commerce events stored in BigQuery. They must output the features to Vertex AI Feature Store for online serving. Which approach is MOST cost-effective and scalable?

A.Use Cloud Composer (Airflow) with a daily DAG to run SQL queries on BigQuery and export results

B.Schedule a query in BigQuery using scheduled queries and export results to Feature Store

C.Use Cloud Functions triggered by Pub/Sub to compute features on the fly

D.Use Dataflow with Apache Beam, reading from BigQuery, computing windowed aggregations, and writing to Vertex AI Feature Store

AnswerD

Dataflow handles large-scale, windowed feature computation efficiently and can write to Feature Store.

Why this answer

Dataflow (Apache Beam) is ideal for processing large-scale batch and streaming data. It can read from BigQuery, perform windowing computations, and write to Feature Store's online store. Cloud Functions have timeouts, Cloud Composer is not optimal for streaming, and BigQuery scheduled queries are not designed for streaming-feature computation.

Practice this question →

62

MCQeasy

An ML team wants to use Vertex AI Hyperparameter Tuning to tune a custom training job. They have a budget of 50 trials and want to use an algorithm that balances exploration and exploitation. Which algorithm should they choose?

A.Random search

B.Grid search

C.Bayesian optimization (Vizier default)

D.Manual search

AnswerC

Bayesian optimization is designed to balance exploration and exploitation.

Why this answer

Bayesian optimization (the default algorithm in Vertex AI Vizier) is the correct choice because it explicitly balances exploration and exploitation by building a probabilistic model of the objective function and using an acquisition function to select the next hyperparameter configuration. With a budget of 50 trials, this algorithm efficiently converges to optimal regions while still exploring uncertain areas, making it ideal for tuning custom training jobs where each trial is computationally expensive.

Exam trap

A common pitfall is assuming that random search is the best default for balancing exploration and exploitation. However, random search lacks any exploitation mechanism, making Bayesian optimization the correct choice for efficient tuning within a constrained budget, as emphasized in Google PMLE.

How to eliminate wrong answers

Option A is wrong because random search does not balance exploration and exploitation; it samples hyperparameters uniformly at random without using past trial results to guide future selections, which wastes budget on suboptimal regions. Option B is wrong because grid search exhaustively evaluates a fixed set of hyperparameter combinations, which is computationally inefficient for a budget of 50 trials and does not incorporate any exploitation mechanism. Option D is wrong because manual search relies on human intuition and ad-hoc adjustments, which is not an automated algorithm and cannot systematically balance exploration and exploitation within a defined trial budget.

Practice this question →

63

MCQeasy

You want to use a pre-trained model from TensorFlow Hub for image classification, but you need to adapt it to classify your own custom categories with a small dataset. Which Vertex AI approach is most appropriate?

A.Write a custom training script that loads the pre-trained model and fine-tunes it on your dataset

B.Deploy the pre-trained model as-is via Vertex AI JumpStart

C.Build a custom container with the pre-trained model and deploy to Vertex AI Endpoints

D.Use Vertex AI AutoML for image classification

AnswerA

Fine-tuning a pre-trained model is the standard transfer learning approach, efficient with small data.

Why this answer

Transfer learning fine-tunes a pre-trained model on a new dataset with small data. JumpStart deploys foundation models but not fine-tune for custom categories easily. Custom container is overkill.

AutoML requires no code but may not be suitable if you want to control the pre-trained model.

Practice this question →

64

MCQmedium

You want to reduce training costs by using preemptible VMs on Vertex AI for a fault-tolerant distributed training job that uses checkpointing. Which machine type should you choose in the worker pool configuration?

A.Use spot VMs by setting 'spot' to true in the machine spec

B.Use custom machine types with preemptible flag

C.Use standard VMs and rely on Vertex AI auto-restart

D.Use TPU VMs because they are cheaper

AnswerA

Spot VMs are discounted preemptible instances, suitable for fault-tolerant jobs.

Why this answer

Vertex AI supports spot VMs (formerly preemptible) for training. By setting 'spot' to true, you use preemptible instances at a discount. The training must handle preemption via checkpointing.

Practice this question →

65

Multi-Selectmedium

You are setting up a hyperparameter tuning job on Vertex AI for a large neural network. The objective is to minimize validation loss. You want to explore the hyperparameter space efficiently with a limited budget of 100 trials. Which THREE settings should you configure in the study?

Select 3 answers

A.Enable early stopping

B.Algorithm: BAYESIAN_OPTIMIZATION

C.Parallel trial execution count: 10

D.Algorithm: GRID_SEARCH

E.Disable early stopping

AnswersA, B, C

Early stopping (e.g., via median stopping) terminates underperforming trials early.

Why this answer

Option A is correct because enabling early stopping in Vertex AI hyperparameter tuning terminates poorly performing trials early, saving the trial budget for more promising hyperparameter configurations. This is critical when the objective is to minimize validation loss with a limited budget of 100 trials, as it prevents wasting resources on suboptimal runs and allows the search to focus on the most promising regions of the hyperparameter space.

Exam trap

A common misconception is that grid search is suitable for large hyperparameter spaces with limited budgets, when in fact it is computationally prohibitive and should be replaced by Bayesian optimization or random search for efficiency.

Practice this question →

66

MCQmedium

You are deploying a pre-trained BERT model for inference on edge devices. The model must be under 500 MB and inference latency under 50 ms. Which approach should you take?

A.Use a larger model like BERT-Large and deploy on GPU

B.Apply post-training INT8 quantization using TensorFlow Lite

C.Prune 50% of the model weights and fine-tune

D.Use knowledge distillation to train a smaller student model from scratch

AnswerB

Post-training INT8 quantization reduces model size by ~4x and speeds up inference, often within the target latency. It is the simplest and most effective first step.

Why this answer

Option B is correct because post-training INT8 quantization reduces model size by approximately 75% (from ~440 MB to ~110 MB for BERT-Base) and accelerates inference on edge devices via integer arithmetic, easily meeting the 500 MB and 50 ms constraints. TensorFlow Lite provides hardware-optimized kernels for ARM CPUs and NPUs, making it ideal for edge deployment without requiring retraining.

Exam trap

A common pitfall is assuming that only pruning or distillation can reduce model size, but post-training INT8 quantization directly shrinks the model and speeds inference without retraining, which is ideal for deploying a pre-trained model on edge devices.

How to eliminate wrong answers

Option A is wrong because BERT-Large is ~1.3 GB, far exceeding the 500 MB limit, and GPU deployment is not feasible on most edge devices due to power and thermal constraints. Option C is wrong because pruning 50% of weights without fine-tuning would cause catastrophic accuracy loss, and fine-tuning requires the original training data and compute, which may not be available; even with fine-tuning, pruned models often need specialized hardware for speedup. Option D is wrong because knowledge distillation requires training a smaller student model from scratch, which demands significant compute, time, and access to the teacher model's logits, making it impractical for a quick deployment scenario where a pre-trained BERT model is already available.

Practice this question →

67

Multi-Selectmedium

You are using tf.Transform to preprocess data at scale. Which TWO services are required to run tf.Transform on Google Cloud? (Choose 2)

Select 2 answers

A.Cloud Functions

B.Dataflow

C.Cloud Storage

D.Vertex AI Training

E.BigQuery

AnswersB, C

Runs the Beam pipeline.

Why this answer

tf.Transform requires Apache Beam for execution, which on GCP is typically run on Dataflow. The processed data and transform artifacts are stored in Cloud Storage.

Practice this question →

68

MCQeasy

A data scientist wants to use tf.Transform for preprocessing a large dataset stored in BigQuery before training a TensorFlow model. The preprocessing should be consistent during training and serving. What is the correct way to use tf.Transform in this scenario?

A.Write preprocessing logic in Python and reuse the same code in training and serving

B.Use BigQuery's built-in ML.TRANSFORM function for consistency

C.Use tf.Transform to define a preprocessing_fn and apply it to the dataset, then export the transform graph for serving

D.Use a Lambda layer in Keras for preprocessing

AnswerC

This is the standard workflow: define function, compute on full data, export graph.

Why this answer

Option C is correct because tf.Transform is specifically designed to handle full-pass preprocessing (e.g., computing min/max, vocabularies) that requires seeing the entire dataset. By defining a `preprocessing_fn` and applying it via `tft.beam.analyze_and_transform`, the transform graph is exported as a SavedModel, which can be loaded during serving to ensure identical preprocessing logic. This guarantees consistency between training and inference, which is critical for production ML pipelines.

Exam trap

A common mistake is to assume that any preprocessing code can be reused as-is between training and serving, or that BigQuery's ML.TRANSFORM provides the same level of consistency as tf.Transform. However, tf.Transform exports a portable, consistent transform graph specifically for TensorFlow serving, ensuring identical preprocessing logic.

How to eliminate wrong answers

Option A is wrong because writing raw Python preprocessing code cannot guarantee consistency across training and serving environments, especially when the code relies on global statistics (e.g., mean, standard deviation) that must be computed from the full dataset and reused identically at serving time. Option B is wrong because BigQuery's ML.TRANSFORM is a BigQuery ML feature that only works within BigQuery's ML pipeline and cannot export a portable transform graph for use with TensorFlow serving outside BigQuery. Option D is wrong because a Lambda layer in Keras performs per-batch or per-sample operations and cannot compute dataset-level statistics (e.g., min, max, vocabulary) that require a full pass over the data, leading to inconsistent preprocessing between training and serving.

Practice this question →

69

MCQeasy

You want to use Vertex AI Vizier for hyperparameter tuning. You have 2 categorical parameters and 3 continuous parameters. Which algorithm is best suited for this mixed parameter space?

A.Evolutionary algorithm

B.Random search

C.Bayesian optimization

D.Grid search

AnswerC

Handles mixed types and is sample-efficient.

Why this answer

Bayesian optimization can handle mixed parameter types. Grid search cannot handle continuous parameters efficiently; random search can, but Bayesian is more sample-efficient.

Practice this question →

70

MCQmedium

An ML team is building a feature pipeline with Dataflow that reads from BigQuery, computes features, and writes to Vertex AI Feature Store. They need to ensure that features are available for both training and serving with low latency. Which Feature Store option should they use?

A.Create a featurestore with only offline serving

B.Store features directly in BigQuery

C.Use Cloud SQL as a feature store

D.Create a featurestore with online serving enabled

AnswerD

Online serving provides low-latency access via Bigtable.

Why this answer

Vertex AI Feature Store supports online serving (low latency) and offline serving (batch). For low latency, they must enable online serving. The online store uses Bigtable for low-latency lookups.

Practice this question →

71

Multi-Selectmedium

You are designing a distributed training job for a very large neural network that does not fit on a single machine. You need to split the model across multiple devices. Which TWO techniques can you use?

Select 2 answers

A.ParameterServerStrategy

B.Pipeline parallelism

C.Operator-level model parallelism

D.Data parallelism with MirroredStrategy

E.MultiWorkerMirroredStrategy

AnswersB, C

Splits model into stages across devices.

Why this answer

Pipeline parallelism splits layers across devices; model parallelism (operator-level) splits individual operations. Data parallelism replicates the model. ParameterServer is a form of distributed training, not model splitting.

Practice this question →

72

MCQmedium

An engineer is training a model on Vertex AI using a custom container. The training job fails with an error indicating that the container exited with a non-zero status. The engineer wants to debug the issue. What is the best way to access the logs?

A.SSH into the training container using Vertex AI's SSH feature

B.View logs in Cloud Storage under the job's output directory

C.Use Cloud Debugger to inspect the container

D.Check the logs in Cloud Logging (Logs Explorer)

AnswerD

Logs are automatically streamed to Cloud Logging; this is the standard debugging approach.

Why this answer

Option D is correct because Vertex AI automatically streams all container stdout and stderr to Cloud Logging (Logs Explorer). When a custom container exits with a non-zero status, the detailed error messages, stack traces, and application logs are captured there, making it the primary and most comprehensive debugging tool. Cloud Logging provides structured, searchable logs without requiring direct access to the container.

Exam trap

A common trap is the misconception that you can SSH into a training container or that logs are stored in Cloud Storage, when in fact Cloud Logging is the centralized, default logging solution for all Vertex AI training jobs.

How to eliminate wrong answers

Option A is wrong because Vertex AI does not provide an SSH feature for training containers; training jobs run in ephemeral, isolated environments with no interactive shell access. Option B is wrong because Cloud Storage under the job's output directory stores artifacts like model checkpoints and metrics, not real-time container logs or error messages. Option C is wrong because Cloud Debugger is designed for debugging running applications in production by capturing snapshots and variable states, not for inspecting container exit errors or retrieving logs from a failed training job.

Practice this question →

73

MCQmedium

A company is deploying a deep learning model on edge devices with limited storage and computational resources. They need to reduce the model size by 80% while maintaining acceptable accuracy. Which two techniques should they combine?

A.Pruning and distillation

B.Quantization and distillation

C.Quantization and pruning

Why this answer

Quantization reduces model size by converting weights from FP32 to INT8, achieving up to 75% size reduction. Pruning removes redundant weights (e.g., 50% pruning) to further shrink the model. Together, these two techniques can reduce model size by over 80% (e.g., 50% pruning + INT8 quantization yields ~87.5% reduction) while preserving acceptable accuracy on edge devices.

Knowledge distillation also reduces size but typically requires a pre-trained teacher and may not achieve the extreme compression needed when combined with quantization alone.

Practice this question →

74

Multi-Selecteasy

A company wants to use Vertex AI for hyperparameter tuning. Which three components are required to configure a hyperparameter tuning job? (Choose THREE.)

Select 3 answers

A.Algorithm (e.g., Bayesian, grid, random)

B.Machine type for each trial

C.List of hyperparameters with types and ranges

D.Objective metric name and goal (minimize or maximize)

E.Training container image

AnswersA, C, D

Required to specify how to search.

Why this answer

Option A is correct because Vertex AI hyperparameter tuning requires specifying the search algorithm (Bayesian, grid, or random) to determine how the hyperparameter space is explored. Bayesian optimization is the default and most efficient for continuous spaces, while grid search is exhaustive and random search is simple. Without an algorithm, Vertex AI cannot decide how to sample trials.

Exam trap

Candidates often mistakenly include machine type or container image as mandatory tuning parameters, but these are optional training job settings.

Practice this question →

75

MCQeasy

You need to run a custom training job on Vertex AI using a pre-built container for scikit-learn. Which container image should you specify?

A.us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-9

B.us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0.23-0

C.us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.1-3

D.us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6

AnswerB

Correct pre-built container for scikit-learn.

Why this answer

Vertex AI provides pre-built containers for scikit-learn with the prefix 'us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0.23-0'. The others are for different frameworks.

Practice this question →

Page 1 of 2 · 99 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Scaling Prototypes into ML Models questions.

Start 20-question session