CCNA Pmle Scaling Models Questions

24 of 99 questions · Page 2/2 · Pmle Scaling Models topic · Answers revealed

76
MCQeasy

An organization wants to deploy a pre-trained BERT model for sentiment analysis on Vertex AI. They want to fine-tune it on their domain-specific data. Which feature in Vertex AI allows them to find and fine-tune a suitable foundation model with minimal effort?

A.Vertex AI Model Garden
B.Vertex AI AutoML
C.Vertex AI Custom Training
D.Vertex AI JumpStart
AnswerD

JumpStart offers pre-built models and fine-tuning options with minimal code.

Why this answer

Vertex AI JumpStart provides one-click deployment and fine-tuning of foundation models, including BERT and other NLP models. Model Garden is for model discovery, but fine-tuning is typically done via JumpStart. AutoML is for training custom models, not fine-tuning existing ones.

Custom training requires more manual effort.

77
MCQmedium

You are running a Vertex AI custom training job with pre-built TensorFlow container. You want to use TPU v3 pods for faster training. Which configuration is required?

A.Specify machine type as tpu-v3-8 in the worker pool spec and use tf.distribute.TPUStrategy
B.Set the worker pool machine type to n1-standard-8 and add --accelerator type=tpu
C.Use a custom container with TPU libraries and set --tpu-topology=v3-8
D.Enable TPU by setting the environment variable TPU_NAME
AnswerA

TPU machine type must be specified, and code must use TPUStrategy.

Why this answer

TPU training requires specifying the TPU type in the worker pool spec and using tf.distribute.TPUStrategy in code. The pre-built container supports TPUs. Runtime version must include TPU support.

78
MCQmedium

A team wants to perform hyperparameter tuning on a Vertex AI custom training job with 100 trials. They require an algorithm that efficiently explores the search space by learning from previous trials. Which algorithm should they select in the study configuration?

A.RANDOM_SEARCH
B.HYPERBAND
C.ALGORITHM_UNSPECIFIED (defaults to Bayesian optimization)
D.GRID_SEARCH
AnswerC

Bayesian optimization is the default and most efficient algorithm for hyperparameter tuning.

Why this answer

Vertex AI Vizier provides Bayesian optimization as a default algorithm that builds a probabilistic model of the objective function and selects hyperparameters based on expected improvement. This is more efficient than grid or random search for most scenarios. The ALGORITHM_UNSPECIFIED defaults to Bayesian optimization.

79
Multi-Selecthard

You are fine-tuning a large language model using Vertex AI Training with spot VMs to reduce cost. Your training job keeps getting preempted, causing delays. Which THREE strategies can help mitigate the impact of preemption?

Select 3 answers
A.Enable checkpointing and restore from the latest checkpoint after preemption
B.Increase the number of worker replicas to train faster
C.Reduce batch size to lower memory usage
D.Switch to dedicated (non-preemptible) VMs
E.Use a smaller, more efficient model architecture
AnswersA, B, E

Allows resuming from last saved state.

Why this answer

Checkpointing saves progress; using a smaller model reduces training time; increasing number of workers allows faster training and recovery. Using dedicated VMs increases cost. Reducing batch size doesn't help with preemption frequency.

80
MCQhard

A team is fine-tuning a large language model (LLaMA 2) using Vertex AI with a custom container on a multi-node GPU cluster. They need to implement model parallelism to fit the model across multiple GPUs because it does not fit into a single GPU memory. Which distributed training strategy should they use?

A.Use Vertex AI Hyperparameter Tuning to find optimal model partitioning
B.Use tf.distribute.MirroredStrategy across all GPUs
C.Implement pipeline parallelism by manually splitting the model layers across GPUs and using a framework like PyTorch's RPC or Megatron-LM
D.Use Vertex AI distributed training with TF_CONFIG to set up multi-worker mirrored strategy and rely on XLA to partition the model
AnswerC

Pipeline parallelism is the appropriate model parallelism technique for large models; it must be manually configured.

Why this answer

Model parallelism, specifically pipeline parallelism, splits the model layers across devices. For large models that don't fit on one GPU, this is necessary. Data parallelism (even with ZeRO) still requires the full model on each device.

Vertex AI does not natively support model parallelism; users must configure it manually using frameworks like Megatron-LM or DeepSpeed.

81
MCQeasy

You are fine-tuning a BERT model from Hugging Face Transformers on Vertex AI. You want to minimise cost for a short experiment. Which compute configuration should you use?

A.A custom training job with a single NVIDIA T4 GPU using spot VMs
B.A custom training job with a TPU v3-8 pod
C.A custom training job with 8 NVIDIA V100 GPUs using regular VMs
D.A standard n1-highmem-8 machine with no accelerator
AnswerA

Spot VMs lower cost; T4 is sufficient for fine-tuning BERT.

Why this answer

Spot VMs provide up to 60-90% discount compared to regular VMs. Since the experiment is short, preemption risk is low. Custom TPU pods are expensive and overkill; T4 GPUs are cheaper but spot VMs are the most cost-effective.

82
MCQeasy

A data engineer wants to compute feature aggregates over a large dataset stored in BigQuery and write the results to Vertex AI Feature Store. The pipeline must handle both batch and streaming data. Which Google Cloud service should they use?

A.BigQuery scheduled queries
B.Cloud Functions triggered by Pub/Sub
C.Cloud Dataproc with Spark
D.Cloud Dataflow with Apache Beam
AnswerD

Dataflow handles both batch and streaming, integrates with BigQuery and Feature Store.

Why this answer

Dataflow (Apache Beam) is the recommended service for both batch and streaming data processing at scale. It can read from BigQuery and write to Feature Store.

83
Multi-Selectmedium

You are using Vertex AI to train a model with a custom container. You need to pass command-line arguments for hyperparameters. Which TWO methods can you use? (Choose 2.)

Select 3 answers
A.Store arguments in a Cloud Storage file and download at runtime.
B.Set environment variables using the 'env' field and reference them in the container.
C.Set hyperparameter values in the 'hyperparameters' field of the worker pool spec.
D.Use the 'command' field to override the entrypoint and include arguments.
E.Specify args in the 'args' field of the container spec.
AnswersB, D, E

Environment variables can also be used to pass arguments.

Why this answer

Option B is correct because Vertex AI custom container training allows you to pass environment variables via the 'env' field in the worker pool spec, which the container can then reference at runtime. Option D is correct because you can override the container's default entrypoint using the 'command' field and include command-line arguments directly in that field. Option E is also correct as the 'args' field in the container spec can be used to pass arguments that are appended to the entrypoint or command.

Exam trap

Google often tests the distinction between the 'hyperparameters' field (used for tuning jobs) and the 'args'/'command' fields (used for passing static arguments), causing candidates to mistakenly select option C as a valid method for passing command-line arguments.

84
Multi-Selectmedium

A company is fine-tuning a large language model (Gemma 7B) using Vertex AI JumpStart. They want to reduce the model's memory footprint for deployment on edge devices. Which THREE model compression techniques should they consider?

Select 3 answers
A.Post-training quantization to INT8
B.Using a larger batch size during inference
C.Quantization-aware training
D.Increasing the number of layers
E.Pruning of weights with small magnitude
AnswersA, C, E

Reduces model size and speeds up inference.

Why this answer

Post-training quantization (e.g., to INT8) reduces precision and size. Quantization-aware training simulates quantization during training for better accuracy. Pruning removes redundant weights.

Knowledge distillation trains a smaller model. For edge deployment, quantization (both post-training and quantization-aware) and pruning are common. Distillation is also valid but often considered separate.

85
MCQeasy

A team is building a feature pipeline for an ML model. They need to compute aggregate features over a sliding time window from streaming data. Which Google Cloud service is most appropriate for this task?

A.Cloud Dataflow with fixed windows
B.Cloud Pub/Sub for windowing logic
C.BigQuery scheduled queries
D.Cloud Functions with Pub/Sub triggers
AnswerA

Dataflow allows windowed aggregations (sliding, fixed, session) on streaming data.

Why this answer

Cloud Dataflow with fixed windows is the most appropriate choice because it natively supports windowing and aggregation over streaming data using the Apache Beam programming model. Fixed windows allow you to define sliding time intervals (e.g., every 5 minutes) to compute aggregate features like sums or averages, which is exactly what the feature pipeline requires.

Exam trap

The trap here is that candidates confuse Cloud Pub/Sub's ability to handle streaming data with the ability to perform windowed aggregations, but Pub/Sub is only a transport layer and cannot compute features itself.

How to eliminate wrong answers

Option B is wrong because Cloud Pub/Sub is a messaging service that handles event ingestion and delivery, not windowing logic or stateful aggregation; it has no built-in capability to compute sliding window aggregates. Option C is wrong because BigQuery scheduled queries operate on batch data in tables, not on streaming data in real time, and they lack the low-latency, per-event windowing needed for a streaming feature pipeline. Option D is wrong because Cloud Functions with Pub/Sub triggers are stateless and ephemeral, with a maximum timeout of 9 minutes (or 60 minutes with 2nd gen), making them unsuitable for maintaining sliding window state or performing continuous aggregation over streaming data.

86
MCQhard

A data scientist is fine-tuning a large language model from Hugging Face using Vertex AI Training with a GPU. The model has 7 billion parameters and does not fit on a single GPU. They need to split the model across multiple GPUs and train with data parallelism. Which strategy should they use?

A.Use Vertex AI's AutoML to automatically distribute the model.
B.Use pipeline parallelism via a custom container with DeepSpeed and data parallelism across workers using PyTorch DDP, configured with Vertex AI distributed training.
C.Use Vertex AI's hyperparameter tuning with multiple trials.
D.Configure a multi-worker mirrored strategy with TensorFlow, setting TF_CONFIG to use all GPUs on each worker.
AnswerB

This combines model and data parallelism, suitable for large models.

Why this answer

Option B is correct because it combines pipeline parallelism (via DeepSpeed) to split the 7B-parameter model across multiple GPUs, with data parallelism (via PyTorch DDP) to replicate the model across workers for training on larger batches. Vertex AI distributed training coordinates the multi-worker setup, making this the only viable strategy for a model that exceeds single-GPU memory while requiring data parallelism.

Exam trap

The trap here is that candidates confuse 'data parallelism' (which replicates the model) with 'model parallelism' (which splits the model), and assume a single strategy like DDP or mirrored strategy suffices, ignoring that the model must first be partitioned across GPUs using pipeline or tensor parallelism before data parallelism can be applied.

How to eliminate wrong answers

Option A is wrong because Vertex AI AutoML is a no-code automated ML service that does not support custom model architectures or manual distribution of large language models; it cannot handle a 7B-parameter model that requires custom parallelism strategies. Option C is wrong because hyperparameter tuning optimizes training hyperparameters (e.g., learning rate) across multiple trials, but does not address the fundamental need to split a model across GPUs or enable data parallelism. Option D is wrong because a multi-worker mirrored strategy with TensorFlow requires the model to fit on a single GPU per worker (it mirrors the entire model), and the 7B-parameter model exceeds that limit; additionally, TF_CONFIG setup does not provide pipeline parallelism to split the model across devices.

87
Multi-Selecthard

You are fine-tuning a large language model (LLM) from Vertex AI Model Garden using a custom dataset. You need to minimize training cost while maintaining reasonable throughput. Which THREE strategies should you combine?

Select 3 answers
A.Use spot VM instances for training
B.Use parameter-efficient fine-tuning (PEFT) such as LoRA
C.Use full fine-tuning of all model parameters
D.Use TPU v4 pods for training
E.Use mixed precision training (FP16)
AnswersA, B, E

Spot VMs are significantly cheaper than regular VMs and are suitable for fault-tolerant fine-tuning jobs.

Why this answer

Option A is correct because spot VM instances are significantly cheaper than on-demand instances, reducing training cost. They can be preempted, but for fine-tuning tasks that can checkpoint and resume, this trade-off is acceptable for cost savings.

Exam trap

The Google PMLE exam often tests the misconception that higher-performance hardware (like TPU pods) is always the best choice for cost optimization, when in reality, cost-minimization strategies prioritize cheaper compute and efficient training methods over raw throughput.

88
Multi-Selecteasy

You are building a machine learning pipeline on Google Cloud. You need to perform feature engineering on large datasets stored in BigQuery and store the resulting features in Vertex AI Feature Store for both online and offline use. Which TWO Google Cloud services should you use?

Select 2 answers
A.Cloud Functions
B.Dataflow
C.BigQuery ML
D.Dataproc
E.Cloud Build
AnswersB, D

Dataflow can process large-scale data and integrate with Feature Store.

Why this answer

Dataflow can read from BigQuery, compute features via Apache Beam, and write to Feature Store. Alternatively, Dataproc can also do this but Dataflow is more serverless. Cloud Functions is not suitable for large-scale.

Cloud Build is for CI/CD. BigQuery ML is for in-database ML.

89
MCQhard

An ML team is fine-tuning a large language model using a custom container on Vertex AI. They want to reduce costs by using preemptible (spot) VMs for training. The training job is long-running and uses checkpointing. Which statement is correct regarding spot VM usage?

A.Spot VMs are not available for custom training jobs on Vertex AI
B.Training will automatically resume from the latest checkpoint without any configuration
C.You must enable checkpointing in the training code and use spot VMs by setting the 'spot' field in the machine spec
D.Spot VMs cannot be used with GPU accelerators
AnswerC

This is correct: the code must checkpoint, and the machine spec must indicate spot=true.

Why this answer

Option C is correct because Vertex AI custom training jobs support spot VMs, but you must explicitly enable checkpointing in your training code and set the 'spot' field in the machine spec to true. This ensures that when a preemptible VM is terminated, the training can resume from the latest checkpoint, preventing loss of progress and reducing costs.

Exam trap

The trap here is that candidates assume Vertex AI automatically handles checkpointing and resumption for spot VMs, but in reality, you must explicitly implement both the checkpointing logic and the spot VM configuration.

How to eliminate wrong answers

Option A is wrong because Vertex AI does support spot VMs for custom training jobs, as long as you configure them correctly. Option B is wrong because training does not automatically resume from the latest checkpoint; you must implement checkpointing logic in your training code and configure the job to use spot VMs. Option D is wrong because spot VMs can be used with GPU accelerators on Vertex AI, though you must be aware that preemption may occur more frequently with GPUs.

90
MCQmedium

A machine learning engineer is training a TensorFlow model on Vertex AI using distributed training with the MultiWorkerMirroredStrategy. The training job uses 4 workers with 4 GPUs each. The engineer notices that the training is not scaling linearly. What is the most likely cause?

A.The model architecture is too simple to benefit from distribution
B.The workers are not using the same version of TensorFlow
C.Communication overhead due to gradient synchronization
D.The GPUs are not configured correctly
AnswerC

MultiWorkerMirroredStrategy synchronizes gradients across workers; network latency can limit scaling.

Why this answer

With MultiWorkerMirroredStrategy, each worker computes gradients independently on its local batch, then all-reduces gradients across workers via collective communication (e.g., NCCL or gRPC). As the number of workers increases, the communication overhead for gradient synchronization grows, often dominating the per-step time and preventing linear scaling. This is the most common bottleneck in distributed TensorFlow training, especially with many workers or small batch sizes per worker.

Exam trap

The trap here is that candidates often assume more workers always means linear speedup, ignoring the fixed overhead of gradient synchronization that becomes the dominant factor in distributed training.

How to eliminate wrong answers

Option A is wrong because even a simple model can suffer from communication overhead if the compute-to-communication ratio is low; the issue is not model simplicity but the cost of synchronizing gradients across workers. Option B is wrong because TensorFlow enforces version consistency across workers in a distributed job; mismatched versions would cause a job failure, not sublinear scaling. Option D is wrong because GPU misconfiguration (e.g., incorrect driver or CUDA version) would typically cause errors or zero utilization, not gradual scaling degradation; the observed symptom of sublinear scaling points to communication, not hardware misconfiguration.

91
MCQhard

A research team is training a very large Transformer model that does not fit into the memory of a single GPU. They have access to multiple GPUs on a single machine and want to split the model layers across GPUs. Which distributed training strategy should they use?

A.MultiWorkerMirroredStrategy
B.Parameter server strategy
C.MirroredStrategy (data parallelism)
D.Pipeline parallelism (model parallelism)
AnswerD

Pipeline parallelism splits the model across devices, allowing large models to be trained.

Why this answer

When a model is too large for one GPU, model parallelism (pipeline parallelism) is required. This splits different layers (or layer groups) across devices. Data parallelism (mirrored strategy) replicates the model, which would still require the full model on each GPU.

Pipeline parallelism is a form of model parallelism where layers are distributed across devices and micro-batches flow through the pipeline.

92
Multi-Selectmedium

You are designing a distributed training job for a PyTorch model on Vertex AI using multiple machines with GPUs. Which TWO configurations are required to enable data parallelism with PyTorch DDP? (Choose 2.)

Select 2 answers
A.Use the command 'torch.distributed.launch' to start each worker.
B.Set environment variables MASTER_ADDR, MASTER_PORT, WORLD_SIZE, and RANK in each container.
C.Enable Vertex Explainable AI during training.
D.Set environment variable TF_CONFIG for each replica.
E.Specify a custom service account with access to Cloud TPU.
AnswersA, B

torch.distributed.launch (or torchrun) handles spawning processes with correct environment variables.

Why this answer

PyTorch DDP requires the master address and port (MASTER_ADDR, MASTER_PORT) for the communication group, and WORLD_SIZE and RANK. Vertex AI sets TF_CONFIG for TensorFlow, not PyTorch. NCCL is the backend.

93
MCQmedium

You need to run a distributed training job on Vertex AI using TensorFlow with MirroredStrategy on a single machine with 4 GPUs. Which training configuration should you use?

A.Use MirroredStrategy with a single workerPoolSpec containing a machine_type with 4 GPUs
B.Use MultiWorkerMirroredStrategy with multiple workerPools
C.Use MirroredStrategy with two workerPoolSpecs, each with 2 GPUs
D.Use ParameterServerStrategy with a chief and a parameter server
AnswerA

MirroredStrategy handles intra-machine GPU parallelism. Single worker pool with multiple GPUs is correct.

Why this answer

For single-machine multi-worker training with multiple GPUs, TensorFlow's MirroredStrategy is appropriate. The workerPoolSpec should have a single worker pool with a machine type that has multiple GPUs.

94
MCQmedium

A data science team is building a feature engineering pipeline that processes large-scale data from BigQuery daily. They need to compute aggregate features and store the results in Vertex AI Feature Store for both online serving and offline training. Which Google Cloud service is best suited for this batch computation?

A.Cloud Composer
B.Dataproc
C.Cloud Functions
D.Dataflow
AnswerD

Dataflow (Apache Beam) is the correct choice for scalable batch processing and integrates with Feature Store.

Why this answer

Dataflow is ideal for batch processing large datasets from BigQuery with Apache Beam. It can write directly to Feature Store's API. Cloud Functions is event-driven and not for heavy batch.

Dataproc is for Spark/Hadoop, not as efficient for Beam. Cloud Composer is an orchestrator, not executor.

95
Multi-Selectmedium

You are using tf.Transform to preprocess data for a TensorFlow model. You want to ensure that the same transformations applied during training are also applied during serving. Which THREE components are necessary to achieve this?

Select 3 answers
A.Use the tf.Transform analyze_and_transform function on the training data
B.Use TensorFlow Serving with the exported SavedModel
C.Store raw data in BigQuery for serving
D.Save the transform function and load it in the serving input function
E.Duplicate the preprocessing code in the serving application
AnswersA, B, D

This function computes statistics and applies transformations, producing a transform graph.

Why this answer

Option A is correct because `tf.Transform.analyze_and_transform` computes the full-pass statistics (e.g., mean, variance, vocabulary) needed for consistent preprocessing and applies the transformation to the training data. This function produces a `tf.Transform` graph that captures the exact operations, ensuring the same transformation logic is available for both training and serving.

Exam trap

Cisco often tests the misconception that duplicating preprocessing code (Option E) is acceptable, but the trap is that this leads to silent data drift and inconsistent model behavior between training and serving.

96
MCQmedium

Your PyTorch training script uses DistributedDataParallel (DDP) across 4 vertices each with 4 GPUs (16 GPUs total). You submit a Vertex AI custom training job. How should you configure the worker pool spec?

A.Create one worker pool with 4 replicas, each with machine type having 4 GPUs
B.Create a chief worker pool with 1 replica (4 GPUs) and a parameter server pool with 4 replicas (no GPUs)
C.Create 4 separate jobs, each with 1 replica and 4 GPUs
D.Create one worker pool with 16 replicas, each with 1 GPU
AnswerA

This matches the requirement: 4 workers, each with 4 GPUs.

Why this answer

For DDP across multiple machines, use MultiWorkerMirroredStrategy equivalent in PyTorch: set replicas to 4, each with machine type having 4 GPUs. The TF_CONFIG env var is not needed; Vertex AI sets necessary environment variables for distributed training.

97
MCQmedium

A machine learning engineer is preparing to train a Transformer-based model using TensorFlow on a single TPU v3-8 pod slice. The training script uses tf.distribute.TPUStrategy. Which environment variable must be set in Vertex AI to enable TPU training with the appropriate topology?

A.TPU_NAME
B.XRT_TPU_CONFIG
C.TPU_CONFIG
D.TF_CONFIG
AnswerC

Vertex AI automatically populates TPU_CONFIG with the TPU worker endpoint; the training script can parse it.

Why this answer

Vertex AI automatically sets the TPU_CONFIG environment variable to communicate the TPU worker IP address and port to the training container. TF_CONFIG is used for distributed training with CPUs/GPUs, but TPU_CONFIG is the correct one for TPU training.

98
MCQeasy

A data scientist wants to train a TensorFlow model on Vertex AI using a pre-built container. Which of the following pre-built containers is NOT available for custom training in Vertex AI?

A.TensorFlow
B.PyTorch
C.Apache Spark
D.scikit-learn
AnswerC

Spark is not a pre-built container; use custom container or Dataproc.

Why this answer

Vertex AI provides pre-built containers for TensorFlow, PyTorch, scikit-learn, and XGBoost. There is no pre-built container for Apache Spark in the Vertex AI training service; Spark jobs are typically run on Dataproc.

99
MCQmedium

You are using TensorFlow Transform (tf.Transform) to preprocess data for a model that will be deployed on Vertex AI. What is the primary benefit of using tf.Transform over Dataflow alone?

A.Support for GPUs during preprocessing
B.Built-in feature store integration
C.Faster data processing
D.Training/serving skew prevention through a consistent transformation graph
AnswerD

tf.Transform saves a TensorFlow graph that can be used both in training and serving, avoiding skew.

Why this answer

tf.Transform computes statistics (e.g., min, max) on the full dataset, then generates a TensorFlow graph that applies the same transformation consistently at training and serving time. Dataflow alone does not ensure this consistency.

← PreviousPage 2 of 2 · 99 questions total

Ready to test yourself?

Try a timed practice session using only Pmle Scaling Models questions.

CCNA Pmle Scaling Models Questions — Page 2 of 2 | Courseiva