CCNA Fundamentals of Generative AI Questions

75 of 124 questions · Page 1/2 · Fundamentals of Generative AI · Answers revealed

1
MCQhard

A financial services company wants to use generative AI to generate personalized investment advice. They must ensure responses comply with regulatory requirements (e.g., no guarantees of returns). Which Vertex AI safety feature should they primarily use?

A.Vertex AI Grounding with their compliance database.
B.Prompt engineering with instructions to avoid guarantees.
C.Safety filters with a custom blocklist that includes phrases like 'guaranteed return'.
D.Reinforcement learning from human feedback (RLHF) on the model.
AnswerC

Safety filters can block defined categories or custom phrases.

Why this answer

Option C is correct because safety filters with a custom blocklist allow the company to define specific prohibited phrases (e.g., 'guaranteed return') that the model must avoid generating. This provides a deterministic, rule-based enforcement layer that directly addresses regulatory compliance by blocking disallowed content at inference time, without relying on the model's probabilistic behavior.

Exam trap

The trap here is that candidates often confuse grounding (factual retrieval) with compliance enforcement, or assume prompt engineering is sufficient for regulatory guardrails, when in fact only a deterministic blocklist can reliably prevent specific prohibited phrases from appearing in generated outputs.

How to eliminate wrong answers

Option A is wrong because Vertex AI Grounding connects the model to external data sources for factuality, but it does not enforce compliance rules—it retrieves information but does not block specific prohibited phrases. Option B is wrong because prompt engineering is a soft, non-deterministic approach; the model may still generate guarantees despite instructions, especially with adversarial or edge-case inputs. Option D is wrong because RLHF aligns the model based on human preferences over time, but it is not a real-time safety filter and cannot guarantee that specific regulatory phrases are never generated in production.

2
MCQhard

A team is fine-tuning a large language model on custom data using Vertex AI. They find that the training loss decreases but validation loss increases. What is the best course of action?

A.Increase the number of training epochs.
B.Reduce the model size or add dropout regularization.
C.Increase the learning rate.
D.Switch to a smaller batch size.
AnswerB

Regularization techniques combat overfitting.

Why this answer

The increasing validation loss while training loss decreases is a classic sign of overfitting, where the model memorizes the training data but fails to generalize. Reducing model size or adding dropout regularization directly combats overfitting by limiting the model's capacity or introducing noise during training, which forces the model to learn more robust features. This is the best course of action because it addresses the root cause without further exacerbating the problem.

Exam trap

Google Cloud often tests the distinction between underfitting and overfitting, and the trap here is that candidates may confuse increasing validation loss with underfitting and incorrectly choose to increase epochs or learning rate, rather than recognizing the hallmark divergence of overfitting.

How to eliminate wrong answers

Option A is wrong because increasing the number of training epochs would further overfit the model to the training data, worsening the validation loss. Option C is wrong because increasing the learning rate can cause the model to overshoot minima and destabilize training, potentially increasing both training and validation loss, and does not address overfitting. Option D is wrong because switching to a smaller batch size introduces more noise in gradient estimates, which can sometimes help generalization but is not a direct or reliable remedy for overfitting; it may also slow convergence and is not the primary solution for the described loss divergence.

3
MCQhard

A generative AI model is trained on a dataset containing biased text. The team wants to debias the model without significantly sacrificing performance on the original task. Which approach is most appropriate?

A.Curate a smaller, balanced dataset that is representative of fair outcomes and fine-tune the model using a combination of the original data and this dataset with a regularization penalty on bias metrics.
B.Train an adversarial classifier to predict protected attributes from the model's hidden representations and minimize that prediction accuracy.
C.Filter the original training dataset to remove all sentences containing biased terms or stereotypes.
D.After training, apply a separate classifier on the model's output logits to adjust the final predictions for fairness.
AnswerA

This approach directly reduces bias while retaining task performance through regularization.

Why this answer

Option A is correct because it directly addresses bias in the training data by combining the original dataset with a curated, balanced dataset and applying a regularization penalty on bias metrics. This approach allows the model to retain performance on the original task while explicitly penalizing biased representations during fine-tuning, which is a standard technique in fairness-aware machine learning. The regularization term acts as a constraint that guides the optimization away from biased decision boundaries without requiring full retraining or architectural changes.

Exam trap

Google Cloud often tests the misconception that simply removing biased data or applying post-hoc adjustments is sufficient for debiasing, when in fact these methods fail to address latent biases in model representations and can degrade performance or introduce new biases.

How to eliminate wrong answers

Option B is wrong because training an adversarial classifier to minimize prediction accuracy of protected attributes is a debiasing technique, but it operates on hidden representations and can significantly degrade model performance by removing useful information correlated with protected attributes, often leading to a trade-off that sacrifices task accuracy. Option C is wrong because simply filtering out sentences with biased terms or stereotypes is ineffective; bias can be implicit in non-obvious patterns, and removing data can introduce distribution shift and reduce model robustness without guaranteeing fairness. Option D is wrong because applying a separate classifier on output logits to adjust predictions is a post-processing method that does not address bias in the model's internal representations; it can improve fairness metrics but often at the cost of calibration and may not generalize well across different subgroups.

4
Multi-Selecteasy

Which TWO of the following are key differences between generative AI and discriminative AI? (Choose two.)

Select 2 answers
A.Generative models can create new data samples, while discriminative models only assign labels to existing data.
B.Generative models require less training data than discriminative models.
C.Generative models cannot be used for supervised learning tasks like classification.
D.Generative models model the joint probability distribution of inputs and labels, whereas discriminative models model the conditional probability of labels given inputs.
E.Discriminative models always outperform generative models on tasks like image classification.
AnswersA, D

Generation is a hallmark of generative AI.

Why this answer

Option A is correct because generative AI models learn the underlying distribution of the data, enabling them to generate new, realistic samples (e.g., images, text) from the learned distribution. In contrast, discriminative models learn decision boundaries to classify or label existing data without the ability to create new data instances. This fundamental difference in capability—creation versus discrimination—is a core distinction between the two paradigms.

Exam trap

Google Cloud often tests the misconception that generative models are only for unsupervised tasks and cannot perform classification, leading candidates to incorrectly select Option C, while also testing the false assumption that discriminative models are universally superior, as in Option E.

5
MCQeasy

Which Google Cloud product provides access to pre-trained foundation models like Gemini?

A.Dataflow
B.Vertex AI Generative AI Studio
C.Cloud Translation
D.Vertex AI Model Registry
AnswerB

Generative AI Studio (Model Garden) provides access to a variety of foundation models including Gemini.

Why this answer

Vertex AI Generative AI Studio is the correct answer because it is the Google Cloud service specifically designed to provide access to pre-trained foundation models like Gemini, allowing users to test, customize, and deploy them via a managed interface. Unlike other services, Generative AI Studio directly integrates with Gemini's API and offers prompt engineering, tuning, and model evaluation capabilities.

Exam trap

The trap here is that candidates confuse Vertex AI Model Registry (a model management tool) with Generative AI Studio (the actual interface for accessing and experimenting with foundation models), leading them to pick D instead of B.

How to eliminate wrong answers

Option A is wrong because Dataflow is a fully managed stream and batch data processing service based on Apache Beam, not a platform for accessing or interacting with pre-trained foundation models. Option C is wrong because Cloud Translation is a specialized service for language translation using pre-trained models, but it does not provide access to general-purpose foundation models like Gemini or support for multimodal tasks. Option D is wrong because Vertex AI Model Registry is a metadata management service for storing and versioning models, not a tool for directly accessing or experimenting with pre-trained foundation models like Gemini.

6
MCQmedium

You are the lead AI engineer at a financial services firm. You have fine-tuned a large language model on historical trade reports to generate daily market summaries. The model is deployed on Google Cloud's Vertex AI using a custom container. A few weeks after deployment, the operations team notices that inference latency has increased by 300%, causing timeouts. You investigate and find that the model's memory consumption has grown unexpectedly, and the GPUs are idling due to high data transfer wait times. The model architecture and code have not changed. Which action is most likely to resolve the latency issue?

A.Upgrade to a more powerful GPU instance (e.g., A100 to H100) to handle the increased memory footprint.
B.Enable preemptible VM instances to reduce cost and redeploy the model on a faster network.
C.Periodically clear the key-value cache between inference requests and implement cache truncation for long sequences.
D.Recompile the model using XLA with optimizations for dynamic shapes.
AnswerC

Clearing and managing the KV cache reduces memory bloat and speeds up inference.

Why this answer

The latency spike is caused by the key-value (KV) cache growing unboundedly across inference requests, leading to excessive memory consumption and data transfer wait times. Periodically clearing the KV cache between requests and truncating it for long sequences directly addresses the root cause by freeing GPU memory and reducing I/O bottlenecks, without requiring hardware upgrades or recompilation.

Exam trap

Google Cloud often tests the misconception that hardware upgrades or compilation optimizations can fix memory management issues, when the real problem is a software-level cache leak that must be handled explicitly in the serving infrastructure.

How to eliminate wrong answers

Option A is wrong because upgrading to a more powerful GPU (e.g., A100 to H100) does not fix the underlying issue of an ever-growing KV cache; it merely masks the symptom with more memory, and the high data transfer wait times would persist due to cache bloat. Option B is wrong because enabling preemptible VMs reduces cost but does not resolve memory growth or data transfer latency; preemptible instances can be terminated at any time, worsening reliability, and a faster network does not address the cache-induced memory pressure. Option D is wrong because recompiling with XLA for dynamic shapes optimizes computation graphs but does not prevent the KV cache from accumulating across requests; the latency issue stems from memory management, not from suboptimal compilation.

7
MCQmedium

A team is tuning a large language model for a question-answering task. They notice the model gives high confidence scores to answers that are factually incorrect. Which evaluation metric should they primarily use to detect this overconfidence problem?

A.Perplexity
B.Expected Calibration Error (ECE)
C.BLEU score
D.ROUGE-L
AnswerB

ECE directly quantifies how well confidence scores reflect actual correctness.

Why this answer

Expected Calibration Error (ECE) directly measures the alignment between a model's predicted confidence and its actual accuracy. In this scenario, high confidence on incorrect answers indicates miscalibration, and ECE quantifies this mismatch by binning predictions by confidence and computing the average absolute difference between accuracy and confidence per bin.

Exam trap

Google Cloud often tests the distinction between intrinsic evaluation metrics (like perplexity) and calibration metrics, leading candidates to mistakenly choose perplexity when the core issue is confidence miscalibration rather than general model uncertainty.

How to eliminate wrong answers

Option A is wrong because Perplexity measures how well a probability distribution predicts a sample, reflecting model uncertainty over token sequences, but it does not assess calibration of confidence scores against factual correctness. Option C is wrong because BLEU score evaluates n-gram overlap between generated and reference texts for translation quality, not confidence calibration or factual accuracy. Option D is wrong because ROUGE-L measures longest common subsequence recall for summarization tasks, and is unrelated to detecting overconfidence in model predictions.

8
MCQhard

Which of the following is a best practice when using Vertex AI for prompt engineering?

A.Always set temperature to 0
B.Use consistent formatting and delimiters
C.Avoid using examples in the prompt
D.Use very long prompts to include all possible instructions
AnswerB

Consistent structure helps the model parse instructions and reduces errors.

Why this answer

Consistent formatting and delimiters (e.g., using triple backticks, XML tags, or clear section headers) help the model parse instructions and context reliably, reducing ambiguity and improving output quality. This is a core best practice in prompt engineering on Vertex AI because it leverages the model's attention mechanisms to focus on distinct prompt segments, leading to more predictable and accurate responses.

Exam trap

Google Cloud often tests the misconception that 'more is better' in prompts or that deterministic settings like temperature=0 are universally optimal, leading candidates to overlook the importance of structured, concise formatting.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 always is not a best practice; temperature controls randomness, and while 0 yields deterministic outputs, many tasks benefit from slight variability (e.g., creative generation or diverse suggestions), and Vertex AI supports a range of 0.0 to 1.0. Option C is wrong because including examples (few-shot prompting) is a powerful technique to guide the model's behavior and improve performance, especially for complex or nuanced tasks; avoiding them would reduce effectiveness. Option D is wrong because very long prompts can exceed context windows, dilute key instructions, and increase latency or cost; Vertex AI models have token limits (e.g., 8,192 tokens for Gemini), and concise, well-structured prompts are more efficient.

9
MCQmedium

Refer to the exhibit. A developer creates a model resource with this YAML config but gets an error that the model is not deployable. What is missing?

A.model_type
B.artifact_uri
C.container_spec
D.description
AnswerC

container_spec is required to tell Vertex AI which container to use.

Why this answer

The error 'model is not deployable' occurs because the YAML config lacks a `container_spec` field. In Vertex AI, a model must specify how to serve predictions—either via a pre-built container (using `container_spec`) or a custom container. Without this, the model has no runtime environment and cannot be deployed to an endpoint.

Exam trap

Google Cloud often tests the misconception that `artifact_uri` is the key requirement for deployment, but the real mandatory field is the container specification that defines the runtime environment.

How to eliminate wrong answers

Option A is wrong because `model_type` is not a required field for deployment; Vertex AI infers the model type from the artifact or container. Option B is wrong because `artifact_uri` is optional—it points to the model artifacts but is not mandatory if the container already includes them. Option D is wrong because `description` is purely metadata and has no impact on deployability.

10
MCQeasy

A developer is using Vertex AI PaLM API to generate code snippets. The responses sometimes contain security vulnerabilities. What is the best practice to mitigate this?

A.Implement input validation and output filtering with safety attributes
B.Disable safety filters to allow more output
C.Increase the max output tokens
D.Set safety settings to block all categories
AnswerA

Validating inputs and filtering outputs reduces security risks.

Why this answer

Option A is correct because input validation and output filtering with safety attributes directly address security vulnerabilities by sanitizing user inputs and filtering model outputs for harmful content. The Vertex AI PaLM API provides safety attribute scores (e.g., toxicity, harassment) that allow developers to programmatically block or flag responses that exceed defined thresholds, reducing the risk of generating insecure code snippets.

Exam trap

The trap here is that candidates may think increasing token limits or disabling filters improves output quality, when in fact the core issue is controlling content safety through validation and filtering, not adjusting generation parameters.

How to eliminate wrong answers

Option B is wrong because disabling safety filters removes all guardrails, allowing the model to generate potentially harmful or insecure code without any mitigation, which increases security risks. Option C is wrong because increasing max output tokens does not affect the content's security; it only allows longer responses, which could include more vulnerabilities. Option D is wrong because setting safety settings to block all categories is overly restrictive and may prevent legitimate code generation, but more importantly, it does not address the root cause of vulnerabilities—input validation and output filtering are needed to catch context-specific issues like insecure code patterns.

11
MCQhard

A gen AI application produces hallucinations (factually incorrect outputs). Which mitigation strategy is LEAST effective?

A.Using prompt templates with constraints
B.Using grounding with a knowledge base
C.Implementing retrieval-augmented generation
D.Increasing model temperature
AnswerD

Higher temperature leads to more diverse but less predictable outputs, exacerbating hallucinations.

Why this answer

Increasing model temperature makes the model more random and creative, which directly increases the likelihood of hallucinations. It does not constrain or ground the output in factual data, making it the least effective mitigation strategy among the options.

Exam trap

Google Cloud often tests the misconception that increasing model temperature improves accuracy by making the model 'more confident,' when in reality it increases randomness and hallucination risk.

How to eliminate wrong answers

Option A is wrong because prompt templates with constraints (e.g., specifying 'only answer from the provided context') reduce the model's freedom to generate unverified content, thereby lowering hallucination risk. Option B is wrong because grounding with a knowledge base ties the model's outputs to verified facts, preventing fabrication by restricting the response to a trusted data source. Option C is wrong because retrieval-augmented generation (RAG) explicitly fetches relevant documents from a knowledge base before generation, ensuring the output is based on retrieved evidence rather than parametric memory alone.

12
Multi-Selectmedium

A team is evaluating generative AI models on Vertex AI. They need to compare models based on specific criteria. Which TWO criteria are most important for selecting a model for a text summarization task?

Select 2 answers
A.ROUGE scores
B.Training dataset size
C.Cost per token
D.Model size in parameters
E.Latency
AnswersA, E

ROUGE evaluates summary quality against references.

Why this answer

ROUGE scores are the standard evaluation metric for text summarization tasks, measuring the overlap of n-grams, word sequences, and word pairs between generated summaries and reference summaries. This directly quantifies summary quality, making it the most important criterion for model selection.

Exam trap

Google Cloud often tests the misconception that model size or cost are primary selection criteria, when in fact task-specific metrics like ROUGE are the correct focus for evaluating generative model output quality.

13
MCQeasy

Refer to the exhibit. A machine learning engineer is configuring a model using this YAML. What is the purpose of the 'tuningPipeline' field?

A.It specifies a pipeline to fine-tune the base model
B.It configures the model for online prediction
C.It defines the hyperparameters for training from scratch
D.It sets the model for batch prediction
AnswerA

The tuningPipeline references a pipeline that performs supervised fine-tuning of the base model.

Why this answer

The 'tuningPipeline' field in this YAML configuration specifies a dedicated pipeline for fine-tuning the base model, which is a common practice in MLOps frameworks like Vertex AI Pipelines or Kubeflow. It allows the engineer to define a separate workflow for parameter-efficient fine-tuning (e.g., LoRA) or full fine-tuning, distinct from training from scratch or serving. This field is essential for orchestrating the fine-tuning process, including data preprocessing, training, and evaluation steps, without affecting the base model's original weights.

Exam trap

Google Cloud often tests the distinction between 'tuningPipeline' (fine-tuning an existing model) and 'trainingPipeline' (training from scratch), and the trap here is that candidates confuse fine-tuning with full training or assume the field is for inference tasks like prediction.

How to eliminate wrong answers

Option B is wrong because 'tuningPipeline' is not used for online prediction; online prediction is typically configured via a separate 'predict' or 'serving' pipeline or endpoint specification. Option C is wrong because 'tuningPipeline' is specifically for fine-tuning an existing base model, not for training from scratch, which would require a different pipeline definition (e.g., 'trainingPipeline') with full hyperparameter search. Option D is wrong because batch prediction is handled by a distinct 'batchPrediction' or 'batch' pipeline configuration, not by the tuning pipeline, which focuses on model adaptation rather than inference.

14
MCQhard

During fine-tuning a model on Vertex AI, the job fails with error 'ResourceExhausted: Out of memory'. What is the most likely cause?

A.Too few training steps
B.Batch size too large
C.Dataset too small
D.Wrong machine type with insufficient memory
AnswerD

Insufficient GPU or TPU memory leads to OOM; selecting a larger machine type often resolves it.

Why this answer

Using a machine type with insufficient memory for the model size and batch size is the most common cause of OOM errors during training.

15
MCQhard

Refer to the exhibit. A user with this IAM role tries to deploy a model to a Vertex AI Endpoint but fails. What is the most likely reason?

A.The user is not authorized to use Vertex AI at all
B.The model artifact is not in the same region as the endpoint
C.The user needs the roles/aiplatform.deployer role
D.The user needs the roles/aiplatform.admin role
AnswerC

Deploying a model requires the aiplatform.deployer role or equivalent permissions.

Why this answer

The user has an IAM role but lacks the specific permission `aiplatform.deployments.create` required to deploy a model to a Vertex AI Endpoint. The `roles/aiplatform.deployer` role includes this permission, while the user's existing role does not, causing the deployment to fail. Even if the user can use other Vertex AI services, deploying a model to an endpoint is a distinct action that requires this specific role.

Exam trap

Google Cloud often tests the distinction between broad roles like `roles/aiplatform.admin` and specific roles like `roles/aiplatform.deployer`, trapping candidates who assume that any Vertex AI role can perform all actions, when in fact deployment requires a dedicated permission set.

How to eliminate wrong answers

Option A is wrong because the user is able to interact with Vertex AI (they have an IAM role), but the failure is specific to the deploy action, not a blanket denial of all Vertex AI access. Option B is wrong because model artifacts can be deployed to endpoints in any region as long as the endpoint exists; Vertex AI supports cross-region deployment by copying the model artifact to the endpoint's region automatically. Option D is wrong because the `roles/aiplatform.admin` role is overly permissive and includes full administrative access, which is not required for deploying a model; the principle of least privilege dictates that the `roles/aiplatform.deployer` role is sufficient and more appropriate.

16
MCQeasy

A graphic design company wants to generate high-quality synthetic images for product mockups. Which Google Cloud generative AI service is most suitable?

A.AutoML Vision
B.Imagen on Vertex AI
C.Codey APIs for code generation
D.Natural Language API
AnswerB

Imagen is specifically built for image generation and is accessible via Vertex AI.

Why this answer

Imagen on Vertex AI is the correct choice because it is Google Cloud's state-of-the-art text-to-image diffusion model specifically designed to generate high-quality, photorealistic synthetic images from natural language prompts. This directly meets the requirement for creating product mockups, as Imagen can produce custom visuals with fine-grained control over style and composition, and it integrates seamlessly with Vertex AI for deployment and management.

Exam trap

The trap here is that candidates may confuse AutoML Vision's ability to classify or detect objects in images with generative image creation, leading them to select Option A despite it lacking any generative capability.

How to eliminate wrong answers

Option A is wrong because AutoML Vision is a traditional machine learning service for training custom image classification, object detection, or segmentation models on labeled datasets; it does not generate synthetic images from text prompts. Option C is wrong because Codey APIs are specialized for generating code snippets, documentation, and code completions, not for creating visual content like images. Option D is wrong because Natural Language API is designed for analyzing and extracting insights from text (e.g., sentiment, entity recognition), not for generating synthetic images.

17
MCQmedium

A team uses Vertex AI to host a large language model. They want to reduce latency for real-time applications. What is the best strategy?

A.Increase number of replicas
B.Switch to a smaller model
C.Use model quantization
D.Use batch prediction instead of online
AnswerC

Quantization reduces model size and speeds up inference.

Why this answer

Option C is correct because model quantization reduces the precision of the model's weights (e.g., from FP32 to INT8), which decreases memory footprint and computational requirements, directly lowering inference latency for real-time applications on Vertex AI. This is a standard optimization technique for deploying large language models with minimal accuracy loss while meeting latency SLAs.

Exam trap

Google Cloud often tests the misconception that scaling resources (replicas) directly reduces latency, when in fact latency optimization requires algorithmic changes like quantization or pruning, not just horizontal scaling.

How to eliminate wrong answers

Option A is wrong because increasing the number of replicas improves throughput and availability but does not reduce per-request latency; it may even add overhead from load balancing. Option B is wrong because switching to a smaller model reduces latency but sacrifices model capability and output quality, which is not a 'best' strategy when the team specifically needs a large language model. Option D is wrong because batch prediction is designed for asynchronous, high-throughput scenarios and introduces higher latency per request, making it unsuitable for real-time applications.

18
MCQmedium

Refer to the exhibit. A developer executed the command to list endpoints. They notice that two models are deployed to the same endpoint. What is the most likely reason for this configuration?

A.It is a canary deployment with traffic splitting
B.The endpoint is misconfigured and will cause conflicts
C.The models are from different frameworks
D.It is a batch prediction endpoint
AnswerA

Multiple models on the same endpoint support gradual rollout by splitting traffic.

Why this answer

A is correct because deploying two models to the same endpoint with traffic splitting is a standard canary deployment strategy. In this configuration, a small percentage of inference requests are routed to the new model while the majority go to the stable model, allowing validation of the new model's performance before full rollout. This is commonly supported by model serving platforms like Amazon SageMaker, where you can specify a production variant with a traffic weight (e.g., 90%) and a canary variant with a lower weight (e.g., 10%).

Exam trap

Google Cloud often tests the misconception that deploying two models to the same endpoint is always an error, when in fact it is a deliberate pattern for canary testing or A/B testing with traffic splitting.

How to eliminate wrong answers

Option B is wrong because deploying two models to the same endpoint with traffic splitting is a deliberate, supported configuration, not a misconfiguration; conflicts are avoided by routing traffic based on defined weights. Option C is wrong because models from different frameworks can be deployed to the same endpoint without issue, as the serving layer handles framework-specific inference containers independently. Option D is wrong because batch prediction endpoints typically use a single model or a single job configuration, not multiple models deployed simultaneously with traffic splitting.

19
MCQeasy

A company wants to build a chatbot using Vertex AI that can answer customer questions based on their internal knowledge base. Which Google Cloud service should they use to store and retrieve the knowledge base efficiently?

A.Cloud Storage
B.Vertex AI Vector Search
C.BigQuery
D.Vertex AI Matching Engine
AnswerB

Vertex AI Vector Search provides scalable vector similarity search for knowledge retrieval.

Why this answer

Vertex AI Vector Search is the correct choice because it is purpose-built for semantic similarity search over embeddings, enabling the chatbot to retrieve relevant chunks from the knowledge base based on meaning rather than exact keyword matches. It integrates natively with Vertex AI and supports high-dimensional vector indexing, making it efficient for large-scale retrieval-augmented generation (RAG) workflows.

Exam trap

The trap here is that Google Cloud often tests the rebranding of Vertex AI Matching Engine to Vertex AI Vector Search, leading candidates to select the outdated service name (Matching Engine) instead of the current correct name (Vector Search).

How to eliminate wrong answers

Option A is wrong because Cloud Storage is an object storage service for unstructured data, not a vector database; it lacks built-in similarity search capabilities and would require additional services to perform semantic retrieval. Option C is wrong because BigQuery is a serverless data warehouse designed for SQL-based analytics on structured data, not for storing and querying dense vector embeddings with approximate nearest neighbor (ANN) search. Option D is wrong because Vertex AI Matching Engine is the previous name for what is now Vertex AI Vector Search; the service was rebranded, so the current correct name is Vector Search, making Matching Engine a deprecated or legacy term in this context.

20
MCQhard

An organization uses a fine-tuned model for medical diagnosis and must comply with HIPAA. Which measure is essential when deploying the model on Vertex AI?

A.Store all patient data in Cloud Storage with object versioning.
B.Enable encryption at rest for all resources.
C.Use a publicly accessible endpoint for faster response times.
D.Use a private Google Cloud Access and disable external internet access for the endpoint.
AnswerD

This ensures the endpoint is not publicly accessible, a key requirement for HIPAA.

Why this answer

Option D is correct because HIPAA requires that patient data be protected from unauthorized access during transmission and deployment. Using a private Google Cloud Access endpoint with external internet access disabled ensures that the model endpoint is only reachable within the organization's VPC network, preventing data exposure over the public internet and meeting HIPAA's security rule for safeguarding electronic protected health information (ePHI).

Exam trap

Google Cloud often tests the misconception that encryption at rest or basic data storage features are sufficient for HIPAA compliance, when in fact network-level access controls (like private endpoints) are the critical measure for protecting ePHI during model inference.

How to eliminate wrong answers

Option A is wrong because storing patient data in Cloud Storage with object versioning provides data retention and recovery capabilities but does not address the core requirement of securing the model endpoint or controlling network access, which is essential for HIPAA compliance. Option B is wrong because enabling encryption at rest for all resources is a baseline security practice and is already enabled by default in Google Cloud; it does not specifically address the need to restrict network access to the deployed model endpoint, which is a key HIPAA requirement. Option C is wrong because using a publicly accessible endpoint for faster response times directly violates HIPAA's requirement to protect ePHI from unauthorized access, as a public endpoint exposes the model to the internet and increases the risk of data breaches.

21
MCQmedium

Refer to the exhibit. A team has deployed a model to an endpoint with the configuration shown. They notice that during peak traffic, the endpoint frequently returns 429 (Too Many Requests) errors. Which action should they take to resolve this issue?

A.Change MACHINE_TYPE to n1-highmem-4
B.Increase MIN_REPLICA_COUNT to 5
C.Decrease MAX_REPLICA_COUNT to 1
D.Disable autoscaling by setting MIN_REPLICA_COUNT equals MAX_REPLICA_COUNT
AnswerB

More minimum replicas provide capacity for sudden traffic spikes.

Why this answer

Increasing MIN_REPLICA_COUNT ensures a minimum number of replicas are always available to handle traffic bursts, reducing 429 errors. Other options would not help or would worsen the problem.

22
MCQmedium

A data scientist notices that a text generation model deployed on Vertex AI returns repetitive outputs after a few turns in a chat application. What is the most likely cause and the best parameter adjustment?

A.The max_output_tokens is too low; increase it to allow more diverse output.
B.The top_p value is too high; reduce top_p to limit token sampling.
C.The model is overfitted; switch to a smaller model.
D.The temperature is too low; increase temperature to add randomness.
AnswerB

Reducing top_p narrows the token pool, reducing repetition.

Why this answer

Repetitive outputs in a chat application after a few turns are typically caused by the model getting stuck in a loop due to high cumulative probability from top-p sampling. Reducing top_p limits the set of tokens considered at each step, forcing the model to explore less likely tokens and breaking the repetition cycle. This directly addresses the issue without sacrificing coherence, unlike temperature adjustments which affect randomness globally.

Exam trap

Google Cloud often tests the misconception that temperature and top-p both control randomness in the same way, but the trap here is that candidates confuse 'increasing randomness' (temperature) with 'limiting the sampling pool' (top-p), leading them to choose D instead of B.

How to eliminate wrong answers

Option A is wrong because max_output_tokens controls the length of the output, not the diversity of token choices; increasing it would allow longer repetitive sequences, not fix the repetition. Option C is wrong because overfitting is a training-phase issue unrelated to inference-time repetition; switching to a smaller model would reduce capacity but not specifically address the sampling behavior causing loops. Option D is wrong because increasing temperature adds randomness to all token probabilities, which can actually worsen repetition by making the model more likely to pick high-probability tokens repeatedly; the problem is too much diversity in the sampling set, not too little.

23
MCQhard

Refer to the exhibit. A data scientist is fine-tuning a model. The training loss and accuracy are improving each epoch. However, after training, the model performs poorly on a held-out validation set. What is the most likely issue?

A.Underfitting
B.Inappropriate learning rate
C.Data leakage
D.Overfitting
AnswerD

Overfitting leads to good training performance but poor validation.

Why this answer

The model's training loss and accuracy improve each epoch, but performance on the validation set is poor. This classic symptom indicates overfitting, where the model memorizes the training data (including noise) rather than learning generalizable patterns. In fine-tuning, this often occurs when the model is trained for too many epochs or the dataset is too small relative to model capacity.

Exam trap

Google Cloud often tests the distinction between overfitting and underfitting by presenting improving training metrics alongside poor validation performance, which candidates may misinterpret as a learning rate issue or data leakage if they do not recognize the hallmark divergence pattern.

How to eliminate wrong answers

Option A is wrong because underfitting would show poor performance on both training and validation sets, not improving training metrics. Option B is wrong because an inappropriate learning rate typically causes training instability (e.g., loss divergence or stagnation), not a clear divergence between training and validation performance. Option C is wrong because data leakage would cause both training and validation metrics to be artificially high (since validation data leaks into training), not a gap where training is good and validation is poor.

24
Multi-Selecthard

A company is migrating an on-premises NLP pipeline to Vertex AI. Which three capabilities of Vertex AI align with common MLOps best practices for generative AI? (Choose THREE)

Select 3 answers
A.Automatic model retraining based on performance degradation
B.Local on-premises execution
C.Continuous training with Vertex AI Pipelines
D.Manual data labeling only
E.Model registry for versioning
AnswersA, C, E

Triggering retraining when performance drops is a key MLOps practice.

Why this answer

Option A is correct because Vertex AI's model monitoring can automatically trigger retraining when performance metrics (e.g., prediction drift or data drift) degrade below a threshold. This aligns with MLOps best practices for maintaining generative AI model quality over time without manual intervention.

Exam trap

Google Cloud often tests the misconception that MLOps for generative AI requires on-premises execution or manual-only labeling, but the correct answer emphasizes cloud-native automation and versioning as core best practices.

25
MCQmedium

A team uses PaLM 2 API to generate product descriptions, but the output sometimes contains factual inaccuracies. What is the best approach to improve accuracy?

A.Increase the temperature parameter
B.Reduce the top_k value
C.Use grounding with Google Search
D.Set the max_output_tokens higher
AnswerC

Grounding supplies factual references, helping the model generate accurate information.

Why this answer

Grounding with Google Search is the correct approach because it allows the PaLM 2 API to retrieve real-time, verifiable information from the web, directly reducing factual inaccuracies in generated product descriptions. Unlike parameter adjustments, grounding provides an external knowledge source that the model can cite, ensuring outputs are based on current and accurate data rather than relying solely on its training data.

Exam trap

Google Cloud often tests the misconception that tuning generation parameters (temperature, top_k, max tokens) can fix factual accuracy issues, when in reality those parameters control randomness and length, not the model's reliance on its training data versus external sources.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter makes the model more random and creative, which would likely increase, not decrease, factual inaccuracies. Option B is wrong because reducing the top_k value limits the pool of tokens the model can sample from, which may reduce diversity but does not address the root cause of hallucination or factual errors. Option D is wrong because setting max_output_tokens higher only allows longer responses, which can actually increase the chance of generating more inaccuracies without improving factual correctness.

26
MCQeasy

A developer wants to quickly experiment with different foundation models available in Google Cloud. Which tool should they use?

A.BigQuery ML
B.Cloud Console Compute Engine
C.Gen AI Studio in Vertex AI
D.Vertex AI Model Registry
AnswerC

Gen AI Studio allows prompt testing and model comparison.

Why this answer

Gen AI Studio provides a user interface to test prompts and compare models. Model Registry is for managing models, Compute Engine is for VMs, BigQuery ML is for SQL ML.

27
MCQeasy

A company wants to use generative AI to summarize customer support tickets. Which Google Cloud tool is best suited for this task?

A.Vertex AI Text Generation (Gemini)
B.Dialogflow CX
C.Document AI
D.AutoML Tables
AnswerA

Vertex AI text generation supports summarization tasks via Gemini models.

Why this answer

Vertex AI Text Generation (Gemini) is the correct choice because it is a generative AI service specifically designed for natural language understanding and generation tasks, such as summarizing customer support tickets. Gemini models can process long-form text and produce concise, coherent summaries by leveraging transformer-based architectures fine-tuned for instruction-following and text completion. This makes it ideal for extracting key information from support conversations and generating actionable summaries.

Exam trap

The trap here is that candidates may confuse Dialogflow CX (a conversational AI builder) with a generative AI tool, overlooking that Dialogflow is for structured dialogue flows rather than open-ended text generation.

How to eliminate wrong answers

Option B (Dialogflow CX) is wrong because it is a conversational AI platform for building chatbots and virtual agents, not a generative text summarization tool; it focuses on intent recognition and dialogue management rather than free-form text generation. Option C (Document AI) is wrong because it is designed for document processing and extraction of structured data (e.g., OCR, form parsing) from scanned documents, not for generative summarization of unstructured text. Option D (AutoML Tables) is wrong because it is a tabular data modeling service for regression and classification tasks on structured datasets, not a natural language generation tool.

28
MCQmedium

A data scientist is fine-tuning a large language model using Vertex AI. The training job fails with an out-of-memory error. Which action should they take to resolve this issue?

A.Change the accelerator to TPU
B.Use a larger model
C.Increase the batch size
D.Reduce the batch size
AnswerD

Smaller batch size reduces memory footprint.

Why this answer

Reducing batch size lowers memory consumption per step, directly addressing OOM. Option A is wrong because increasing batch size worsens memory usage. Option B is wrong because switching to a larger model increases memory.

Option D is wrong because TPUs also have memory limits; reducing batch size works on TPU as well.

29
Multi-Selectmedium

Which TWO options are best practices for deploying generative AI models on Vertex AI? (Choose two.)

Select 2 answers
A.Disable logging to reduce cost
B.Enable automatic scaling to handle variable traffic
C.Use Vertex AI Model Monitoring to detect drift
D.Manually scale instances based on expected load
E.Serve the model directly without optimization
AnswersB, C

Automatic scaling adjusts resources based on demand.

Why this answer

Option B is correct because Vertex AI's automatic scaling dynamically adjusts the number of serving instances based on incoming traffic, ensuring low latency during spikes and cost efficiency during lulls. This is a best practice for production workloads where traffic patterns are unpredictable, as it eliminates the need for manual capacity planning.

Exam trap

Google Cloud often tests the misconception that manual scaling is more reliable or cost-effective than automatic scaling, but in cloud-native environments, automatic scaling is the standard best practice for variable workloads.

30
MCQmedium

A retail company uses the Vertex AI Gemini API to generate product descriptions. Recently, the model started producing factually incorrect statements about product specifications, such as wrong dimensions and materials. Which strategy should be implemented to improve factual accuracy?

A.Enable model versioning to automatically roll back to a previous version
B.Fine-tune the model on a dataset of product images and descriptions
C.Increase the temperature parameter to 0.9
D.Use grounding with Vertex AI Search to retrieve verified product data
AnswerD

Grounding the model on authoritative sources improves factual accuracy by providing context from the company's knowledge base.

Why this answer

Option B is correct because grounding with Vertex AI Search retrieves authoritative information from the company's knowledge base, reducing hallucinations. Option A (increasing temperature) would increase randomness, worsening accuracy. Option C (fine-tuning on product images) does not address factual text inaccuracies.

Option D (enabling model versioning) helps with version control but not with correctness of responses.

31
MCQmedium

A company wants to generate images from text descriptions. Which model in Vertex AI Model Garden should they use?

A.Chirp
B.Codey
C.PaLM 2
D.Imagen
AnswerD

Imagen generates images from text.

Why this answer

Imagen is the correct choice because it is a text-to-image model in Vertex AI Model Garden specifically designed to generate high-fidelity images from natural language descriptions. Unlike the other options, Imagen uses a diffusion-based architecture to create photorealistic visuals, making it the only option that directly addresses the requirement of generating images from text.

Exam trap

The trap here is that candidates may confuse PaLM 2's multimodal capabilities (which include image understanding but not generation) with Imagen's generative ability, leading them to incorrectly select PaLM 2 for text-to-image tasks.

How to eliminate wrong answers

Option A is wrong because Chirp is a speech-to-text and text-to-speech model in Vertex AI, focused on audio processing, not image generation. Option B is wrong because Codey is a code generation model specialized in writing and completing code, not generating images from text. Option C is wrong because PaLM 2 is a large language model for text generation, reasoning, and chat, but it does not have the capability to produce visual outputs like images.

32
MCQmedium

A company is using Vertex AI to generate marketing copy. They notice that the output sometimes contains factual inaccuracies. Which parameter adjustment is most likely to improve factual accuracy?

A.Decrease the temperature parameter.
B.Increase the max_output_tokens parameter.
C.Increase the top_p parameter.
D.Add a post-processing step to verify facts using a database.
AnswerA

Lower temperature reduces randomness, making output more factual.

Why this answer

Decreasing the temperature parameter reduces the randomness of the model's output, making it more deterministic and less likely to generate creative but factually incorrect content. Lower temperature (e.g., 0.1) forces the model to choose higher-probability tokens, which aligns with more factual and consistent responses, especially in tasks like marketing copy where accuracy is critical.

Exam trap

Google Cloud often tests the misconception that increasing output length or diversity (via top_p or max_tokens) improves quality, when in fact these parameters increase randomness and the likelihood of hallucination, whereas lowering temperature is the direct lever for factual accuracy.

How to eliminate wrong answers

Option B is wrong because increasing max_output_tokens only extends the length of the generated text, which can increase the chance of hallucinations as the model continues generating beyond its reliable context window; it does not improve factual accuracy. Option C is wrong because increasing top_p (nucleus sampling) allows the model to consider a larger set of probable tokens, increasing diversity and randomness, which can worsen factual inaccuracies rather than reduce them. Option D is wrong because adding a post-processing step to verify facts using a database is a valid engineering solution but is not a parameter adjustment of the generative model itself; the question specifically asks for a parameter adjustment, and this option represents a workflow change, not a model parameter.

33
MCQeasy

A marketing team wants to generate product descriptions using generative AI. They need to ensure factual accuracy and avoid hallucinations. Which approach should they use?

A.Use a code generation model to generate structured descriptions.
B.Fine-tune the model on all product descriptions using supervised learning.
C.Implement a retrieval augmented generation (RAG) system that retrieves product facts from a database.
D.Use a large language model with detailed prompt instructions to be accurate.
AnswerC

RAG grounds the model's output in retrieved facts, improving accuracy.

Why this answer

Retrieval Augmented Generation (RAG) is the correct approach because it grounds the model's output in verifiable, external data sources. By retrieving product facts from a database in real-time, the system ensures that the generated descriptions are based on accurate information, directly mitigating the risk of hallucination. This method combines the generative power of an LLM with a retrieval step that provides factual context, making it ideal for applications where precision is critical.

Exam trap

Google Cloud often tests the misconception that detailed prompting alone (Option D) is sufficient to guarantee factual accuracy, when in reality, without external knowledge retrieval, the model can still generate plausible but incorrect information.

How to eliminate wrong answers

Option A is wrong because code generation models are designed to produce structured code or data formats, not to ensure factual accuracy in natural language product descriptions; they lack a retrieval mechanism to verify facts. Option B is wrong because fine-tuning on all product descriptions using supervised learning can embed training data biases and does not prevent hallucination on unseen or updated product facts; it also requires extensive labeled data and retraining for each change. Option D is wrong because while detailed prompt instructions can guide the model, they do not provide a mechanism to access or verify external facts; the model may still hallucinate based on its parametric knowledge, which can be outdated or incorrect.

34
MCQhard

A developer is using Vertex AI Generative AI Studio to fine-tune a PaLM 2 model for code generation. After training, they notice the model generates plausible but incorrect code. What is the most likely cause?

A.Overfitting to training data
B.Insufficient training steps
C.Hallucination due to lack of grounding
D.Prompt format mismatch
AnswerA

Overfitting leads to memorization of training data, including mistakes, reducing generalization.

Why this answer

When a PaLM 2 model generates plausible but incorrect code after fine-tuning, the most likely cause is overfitting to the training data. Overfitting occurs when the model memorizes specific code patterns, syntax, or even bugs from the fine-tuning dataset rather than learning generalizable programming logic. This results in outputs that look syntactically correct and contextually relevant but fail to execute properly or solve the intended problem, because the model has not learned the underlying algorithmic principles.

Exam trap

The trap here is that candidates confuse 'plausible but incorrect code' with hallucination (Option C), but hallucination in code generation typically produces non-existent functions or libraries, whereas overfitting produces code that is syntactically valid and uses real functions but contains logical errors learned from the training data.

How to eliminate wrong answers

Option B is wrong because insufficient training steps typically lead to underfitting, where the model fails to capture even basic patterns from the training data, resulting in incoherent or irrelevant code—not plausible but incorrect code. Option C is wrong because hallucination due to lack of grounding is a phenomenon more associated with factual inaccuracies in text generation (e.g., inventing API names or libraries), not with generating syntactically valid but logically flawed code; fine-tuning on code data directly addresses grounding. Option D is wrong because prompt format mismatch would cause the model to misinterpret the input structure or produce outputs in the wrong format, not generate code that appears correct but is functionally wrong.

35
MCQhard

You are a data scientist at a financial institution. You are using Vertex AI to fine-tune a large language model (LLM) for generating financial reports. You have prepared a dataset of 10,000 examples. During fine-tuning, you notice that the training loss is decreasing steadily, but the validation loss is increasing after 5 epochs. The model's generated reports on the validation set contain many factual errors and nonsensical statements. You suspect overfitting. You have limited compute budget and need to improve generalization. What should you do?

A.Increase the learning rate
B.Increase the number of training epochs to 20
C.Add more training examples from a public dataset
D.Implement early stopping with a patience of 2 epochs
AnswerD

Early stopping prevents overfitting.

Why this answer

Early stopping with a patience of 2 epochs is the correct approach because it directly addresses overfitting by halting training when the validation loss fails to improve for a specified number of epochs. This preserves the model's generalization ability without requiring additional compute or data, which aligns with the limited budget constraint. In Vertex AI, early stopping is a built-in hyperparameter tuning strategy that monitors validation metrics and stops the job to prevent further degradation.

Exam trap

The trap here is that candidates often confuse overfitting with underfitting and choose to add more data or increase epochs, failing to recognize that the validation loss increasing while training loss decreases is the classic sign of overfitting, which requires a regularization technique like early stopping.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate would make the optimizer take larger steps, which can cause the loss to diverge or oscillate, worsening the overfitting and factual errors. Option B is wrong because increasing the number of training epochs to 20 would continue training on the same data, likely exacerbating overfitting as the validation loss is already increasing after 5 epochs. Option C is wrong because adding more training examples from a public dataset may introduce domain mismatch or noise, and it does not address the immediate overfitting issue; it also requires additional compute and data curation resources, contradicting the limited budget constraint.

36
Multi-Selectmedium

What are THREE benefits of using embedding models in a Retrieval Augmented Generation (RAG) system?

Select 3 answers
A.They compress text into dense vectors for efficient retrieval.
B.They allow the model to generate new training data automatically.
C.They enable semantic similarity search beyond keyword matching.
D.They reduce the need for fine-tuning the generator model.
E.They provide deterministic outputs for the same query.
AnswersA, C, D

Vectors allow fast similarity search in vector databases.

Why this answer

Option A is correct because embedding models convert text into dense vector representations that capture semantic meaning, enabling efficient similarity search in vector databases. This compression reduces the dimensionality of the data, allowing the RAG system to quickly retrieve the most relevant documents from a large corpus based on vector distance metrics like cosine similarity.

Exam trap

Google Cloud often tests the misconception that embedding models are used for generating training data or ensuring deterministic outputs, when in fact their primary role is semantic compression and similarity-based retrieval, while output determinism is controlled by the generator model's parameters, not the embedding model.

37
Multi-Selecteasy

Which TWO statements are true about generative AI models?

Select 2 answers
A.They are typically pre-trained on large datasets.
B.They are deterministic by design.
C.They always produce the same output for the same input.
D.They can generate new content not seen in training.
E.They require no data for training.
AnswersA, D

Pre-training on large corpora is standard.

Why this answer

Option A is correct because generative AI models, such as GPT-4 or DALL-E, are typically pre-trained on vast, diverse datasets (e.g., terabytes of text or images) using unsupervised or self-supervised learning. This pre-training phase allows the model to learn statistical patterns, grammar, and world knowledge, which is then fine-tuned for specific tasks. Without this large-scale pre-training, the model would lack the foundational understanding needed to generate coherent and contextually relevant outputs.

Exam trap

Google Cloud often tests the misconception that generative AI models are deterministic and always produce the same output for the same input, when in fact they are probabilistic by design, especially at non-zero temperature settings.

38
MCQhard

A company is deploying a Gemini 1.0 Ultra model for a code generation assistant. They have set up Vertex AI Model Evaluation with a custom evaluation dataset to measure pass@1 accuracy. The initial evaluation shows 65% pass@1. They want to improve to 80% without collecting more training data. They have already attempted basic prompt engineering (e.g., 'write correct code') with limited improvement. Which approach is most likely to achieve the desired improvement?

A.Reduce the temperature to 0 and set top_p to 1.
B.Increase the number of output tokens and enable beam search with width 4.
C.Use chain-of-thought prompting with few-shot examples of correct code generation.
D.Apply reinforcement learning from human feedback (RLHF) using a reward model trained on the existing evaluation dataset.
AnswerC

Chain-of-thought elicits reasoning steps, improving accuracy beyond basic prompting.

Why this answer

Chain-of-thought prompting with few-shot examples is the most effective approach because it guides the model through step-by-step reasoning, which is critical for complex code generation tasks. This technique leverages the model's in-context learning ability to improve accuracy without additional training data, directly addressing the need to boost pass@1 from 65% to 80%.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (like temperature or beam search) can substitute for structured prompting techniques, when in reality, chain-of-thought prompting directly addresses the reasoning gap that limits pass@1 accuracy in code generation.

How to eliminate wrong answers

Option A is wrong because reducing temperature to 0 and setting top_p to 1 makes the model deterministic, which may reduce diversity but does not inherently improve correctness for complex code generation; it can even cause repetitive or suboptimal outputs. Option B is wrong because increasing output tokens and enabling beam search with width 4 can improve exploration but does not guarantee higher pass@1 accuracy; beam search is more suited for tasks like translation and may not align with the goal of generating a single correct code snippet. Option D is wrong because applying RLHF requires a reward model trained on human preferences, not just the existing evaluation dataset, and this approach demands significant additional data and computational resources, contradicting the constraint of not collecting more training data.

39
MCQeasy

A startup is developing a customer support chatbot using Vertex AI PaLM 2 API. They notice that the model sometimes generates plausible-sounding but factually incorrect information about company policies. The chatbot currently uses no external data. To reduce these hallucinations without retraining the model, the team needs a solution that can be implemented quickly and maintains low latency. They have access to the company's internal policy database stored in Cloud SQL. Which approach should they take?

A.Fine-tune the PaLM 2 model on a dataset of company policy documents.
B.Implement grounding by connecting the model to the company's policy database using Vertex AI Grounding.
C.Reduce the temperature parameter to 0 and increase top_k to 50.
D.Use prompt engineering to instruct the model to only answer from its internal knowledge.
AnswerB

Grounding directly ties responses to verified data, reducing hallucinations effectively.

Why this answer

Option B is correct because Vertex AI Grounding connects the PaLM 2 model to the company's policy database in Cloud SQL, allowing the model to retrieve and cite factual information in real time. This approach reduces hallucinations without retraining, meets the low-latency requirement, and leverages existing internal data. Grounding works by augmenting the prompt with retrieved context from the grounding source, ensuring responses are factually grounded.

Exam trap

Google Cloud often tests the misconception that adjusting sampling parameters (temperature, top_k) can fix factual inaccuracies, when in reality those parameters only control creativity and randomness, not knowledge grounding.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires a labeled dataset and retraining, which is time-consuming and does not meet the 'quickly implemented' requirement; it also does not guarantee real-time factual accuracy for dynamic policies. Option C is wrong because reducing temperature to 0 and increasing top_k to 50 only affects output randomness and diversity, not factual grounding—hallucinations stem from lack of external knowledge, not sampling parameters. Option D is wrong because prompt engineering alone cannot prevent the model from generating plausible-sounding falsehoods; the model's internal knowledge is static and may be outdated or incorrect, and it cannot access the Cloud SQL database without a retrieval mechanism.

40
Multi-Selecteasy

Which THREE components are core to a typical Retrieval Augmented Generation (RAG) system?

Select 3 answers
A.Classifier
B.Vector database
C.Embedding model
D.Rewriter
E.Large language model
AnswersB, C, E

Stores embeddings and enables similarity search.

Why this answer

A vector database (B) is core to a RAG system because it stores and indexes embeddings of external knowledge chunks, enabling efficient similarity search to retrieve the most relevant context for a user query. This retrieved context is then provided to the LLM to ground its response in factual data, reducing hallucinations and improving accuracy.

Exam trap

Google Cloud often tests the distinction between core mandatory components (embedding model, vector DB, LLM) and optional auxiliary components (classifier, rewriter, reranker) to see if candidates understand the minimal viable RAG architecture versus extended pipelines.

41
Multi-Selecthard

A company deploys a Gemini model on Vertex AI for a customer-facing chatbot. They observe the chatbot occasionally produces toxic language. Which TWO measures should they implement immediately to reduce toxic outputs?

Select 2 answers
A.Increase the model's temperature to make outputs more conservative.
B.Use a separate language model to rephrase the outputs before sending to users.
C.Fine-tune the model on a curated dataset of polite conversations.
D.Enable the 'block offensive content' flag in the model's safety configuration.
E.Configure the safety thresholds in the Vertex AI endpoint deployment to block hate speech and toxic content.
AnswersD, E

This flag directly enables content filtering.

Why this answer

Option D is correct because enabling the 'block offensive content' flag directly activates Gemini's built-in safety filters, which are designed to detect and suppress toxic language at the model's output layer. This is an immediate, configuration-level measure that requires no additional training or external services, making it the fastest way to reduce harmful responses in a production chatbot.

Exam trap

Google Cloud often tests the misconception that increasing temperature or fine-tuning are quick fixes for safety issues, when in fact they are either counterproductive or require significant time and resources, whereas safety configuration flags are the immediate, recommended first step.

42
MCQeasy

An organization wants to ensure their generative AI application does not produce toxic or harmful content. Which Vertex AI feature should they implement?

A.Safety filters and content moderation
B.Explainable AI
C.AutoML
D.Model Monitoring
AnswerA

These features are designed to detect and mitigate harmful content in model outputs.

Why this answer

Safety filters and content moderation in Vertex AI allow organizations to define and enforce policies that block or flag toxic, harmful, or inappropriate content generated by the model. This feature uses pre-built and customizable classifiers to evaluate prompts and responses against safety attributes (e.g., hate speech, harassment, sexually explicit content) before returning them to the user, directly addressing the requirement to prevent harmful outputs.

Exam trap

Google Cloud often tests the distinction between features that *analyze* model behavior (like Explainable AI or Model Monitoring) versus features that *actively enforce* safety policies (like Safety Filters), leading candidates to confuse monitoring or interpretability tools with content moderation controls.

How to eliminate wrong answers

Option B (Explainable AI) is wrong because it focuses on interpreting model predictions (e.g., feature attributions) rather than blocking toxic content; it provides transparency but no active content filtering. Option C (AutoML) is wrong because it automates model training and deployment for custom ML tasks, not content moderation or safety enforcement. Option D (Model Monitoring) is wrong because it tracks model performance and drift over time (e.g., prediction skew, data drift), not real-time content safety checks on individual outputs.

43
Multi-Selecteasy

Which TWO are benefits of using pre-trained foundation models instead of training from scratch?

Select 2 answers
A.Complete control over model architecture
B.Lower training cost
C.Eliminates the need for prompt engineering
D.Guaranteed absence of bias
E.Faster time to deployment
AnswersB, E

Pre-trained models require less compute and data, reducing cost.

Why this answer

Pre-trained foundation models have already been trained on vast datasets, so you only need to fine-tune them for your specific task. This dramatically reduces the compute resources, time, and data required compared to training from scratch, directly lowering training cost.

Exam trap

Google Cloud often tests the misconception that pre-trained models eliminate the need for any further engineering (like prompt engineering) or that they are completely bias-free, when in fact they still require careful tuning and can perpetuate biases from their training data.

44
MCQmedium

During a RAG pipeline implementation, the retrieval system frequently returns irrelevant documents, causing the generator to produce incorrect answers. Which change is most likely to improve the relevance of retrieved documents?

A.Add a re-ranking step using a cross-encoder model to refine the top results.
B.Increase the number of documents retrieved from the vector store.
C.Use a different embedding model with higher vector dimension.
D.Decrease the chunk size of documents to reduce noise.
AnswerA

Re-ranking directly improves document relevance by deep semantic matching.

Why this answer

Re-ranking with a cross-encoder model evaluates query-document pairs more deeply, improving precision at the cost of latency. Increasing the number of documents may introduce more noise. Changing embedding dimension or chunk size may help but are less targeted.

45
MCQmedium

A machine learning engineer is building a text-to-image model using Vertex AI. They want to reduce inference latency. Which strategy is most effective?

A.Use a larger image resolution
B.Use a smaller model variant
C.Enable batch processing
D.Increase the number of inference steps
AnswerB

Smaller models are faster.

Why this answer

Option B is correct because using a smaller model variant directly reduces the number of parameters and computational operations required per inference pass, which lowers latency. In text-to-image models like Imagen or Stable Diffusion, the model size is the primary driver of forward-pass time, so a smaller variant (e.g., fewer layers or reduced latent dimensions) yields faster generation.

Exam trap

The trap here is that candidates confuse throughput optimization (batch processing) with latency reduction, or assume that more steps or higher resolution improve quality without considering the latency trade-off.

How to eliminate wrong answers

Option A is wrong because larger image resolution increases the pixel space the model must process, which increases computational load and latency, not reduces it. Option C is wrong because batch processing improves throughput (images per second) but does not reduce per-request latency; it may even increase the time to first token for an individual request. Option D is wrong because increasing the number of inference steps (e.g., diffusion denoising steps) directly increases the sequential computation time, making latency worse.

46
MCQhard

A company is deploying a chatbot that uses a foundation model. They want to minimize latency for user queries. Which action is most effective?

A.Use a larger model with more parameters
B.Disable safety filters
C.Increase the number of tokens
D.Use a smaller distilled model
AnswerD

Distilled models are optimized for speed.

Why this answer

Distilled models are smaller, faster versions of larger foundation models, trained to mimic their behavior while requiring fewer computational resources. This directly reduces inference latency because fewer parameters mean faster forward passes through the network, which is critical for real-time chatbot responses.

Exam trap

Google Cloud often tests the misconception that 'bigger is better' for performance, but in latency-constrained scenarios, model size is inversely related to speed, and candidates may overlook distillation as a standard optimization technique.

How to eliminate wrong answers

Option A is wrong because larger models with more parameters increase computational complexity and memory bandwidth requirements, which actually increases latency rather than reducing it. Option B is wrong because disabling safety filters does not affect model inference speed; safety filters are post-processing steps that add negligible latency compared to the model itself. Option C is wrong because increasing the number of tokens (the output length) forces the model to perform more autoregressive generation steps, which linearly increases latency per additional token.

47
MCQhard

A large enterprise runs a production application that uses the Gemini API on Vertex AI for real-time content moderation. They are experiencing occasional 429 (Too Many Requests) errors during peak hours. Their current quota is 1000 requests per minute (RPM) and they are hitting around 950 RPM on average, with spikes up to 1050. They have already implemented exponential backoff and retry logic. They need to reduce the error rate without reducing the quality of moderation. Which additional measure should they take?

A.Deploy the model on a dedicated Vertex AI endpoint with autoscaling.
B.Switch to a lower-tier model like Gemini 1.0 Pro to reduce quota consumption.
C.Implement a local caching layer for common moderation queries.
D.Request a quota increase from Google Cloud support.
AnswerC

Caching eliminates duplicate requests, reducing the request rate and errors.

Why this answer

Option C is correct because implementing a local caching layer for common moderation queries reduces the number of identical requests sent to the Gemini API, directly lowering the effective RPM without compromising moderation quality. Since the enterprise is already using exponential backoff and retry logic, caching addresses the root cause of hitting quota limits by eliminating redundant API calls, which is a standard pattern for rate-limit mitigation in production AI workloads.

Exam trap

Google Cloud often tests the misconception that scaling infrastructure (Option A) or switching models (Option B) solves API quota issues, when the real constraint is the API rate limit itself, which requires reducing the number of calls through caching or other client-side optimizations.

How to eliminate wrong answers

Option A is wrong because deploying a dedicated endpoint with autoscaling does not increase the quota limit; it only scales compute resources, and the 429 errors are due to API quota exhaustion, not endpoint capacity. Option B is wrong because switching to a lower-tier model like Gemini 1.0 Pro would reduce quality of moderation, which violates the requirement to not reduce quality, and it does not address the fundamental issue of hitting the RPM quota. Option D is wrong because requesting a quota increase is a valid long-term solution but does not address the immediate need to reduce error rate without reducing quality; it also assumes Google Cloud will approve the increase, which is not guaranteed, and it does not optimize existing usage.

48
Multi-Selecteasy

A company is deploying a large language model (LLM) for customer support using Vertex AI. Which TWO best practices should they follow to ensure high-quality and cost-effective responses?

Select 2 answers
A.Deploy the model on Spot VMs to reduce infrastructure costs
B.Store prompts in plain text files for easy version control
C.Implement prompt optimization techniques to tailor responses
D.Use Vertex AI Model Monitoring to track input drift and response quality
E.Use a single large model for all query types to maintain consistency
AnswersC, D

Prompt optimization helps generate more accurate and relevant responses for specific use cases.

Why this answer

The correct answers are B (implement prompt optimization) and D (use Vertex AI Model Monitoring for drift). Prompt optimization improves response quality, and model monitoring detects performance degradation. Option A is wrong because using a single large model for all queries may be inefficient; smaller specialized models or routing can be better.

Option C is risky for production workloads. Option E exposes sensitive prompts.

49
MCQmedium

An enterprise deploys a large language model (LLM) for internal document summarization. Users complain that summaries sometimes include statements not present in the original document. Which mitigation strategy should the team prioritize to address this hallucination issue?

A.Train a discriminator model to detect hallucinations and perform adversarial training.
B.Implement retrieval-augmented generation (RAG) to ground the model in the original documents and require citations.
C.Apply reinforcement learning from human feedback (RLHF) using a reward model that penalizes hallucinations.
D.Reduce the model's temperature parameter to 0 to make outputs deterministic.
AnswerB

RAG ties outputs to source documents, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the most direct and effective mitigation for hallucination in document summarization because it forces the LLM to base its output on retrieved chunks of the original document. By requiring citations, the model must reference specific passages, making it verifiable and reducing the likelihood of fabricating content. This grounds the generation in the source material, addressing the root cause of hallucination—lack of factual grounding—rather than relying on post-hoc correction or output tuning.

Exam trap

Google Cloud often tests the misconception that reducing temperature or applying RLHF alone can solve hallucination, when in fact these methods do not provide the explicit grounding that RAG offers for document-specific tasks.

How to eliminate wrong answers

Option A is wrong because training a discriminator model and performing adversarial training is a complex, resource-intensive approach that does not directly prevent hallucinations at inference time; it only improves robustness against adversarial inputs, not factual grounding. Option C is wrong because RLHF with a reward model that penalizes hallucinations can reduce their frequency over time, but it requires extensive human feedback and fine-tuning, and does not guarantee grounding in specific source documents for each summary. Option D is wrong because reducing the temperature parameter to 0 makes outputs deterministic but does not eliminate hallucinations—it only reduces randomness; the model can still confidently generate false statements that were never in the source document.

50
MCQhard

A data scientist sees the above error when trying to deploy a model to an endpoint. What is the most likely cause?

A.The IAM permissions are insufficient
B.The model import into Vertex AI Model Registry is still in progress
C.The endpoint does not exist
D.The model is already deployed to another endpoint
AnswerB

Model is still DEPLOYING, not ready.

Why this answer

The error indicates that the model is not yet fully imported into the Vertex AI Model Registry. Deploying a model to an endpoint requires the model resource to be in an 'ACTIVE' state; if the import is still in progress, the deployment request will fail. This is a common timing issue when a model is uploaded but not yet registered.

Exam trap

Google Cloud often tests the misconception that any deployment failure is due to permissions or missing resources, when in fact the model's lifecycle state (e.g., still importing) is the root cause.

How to eliminate wrong answers

Option A is wrong because insufficient IAM permissions would typically result in a 403 Forbidden error, not a model-not-found or import-in-progress error. Option C is wrong because if the endpoint did not exist, the error would be a 404 Not Found for the endpoint resource, not a model import issue. Option D is wrong because a model can be deployed to multiple endpoints simultaneously; the error would instead mention a conflict or quota limit, not an import status.

51
MCQmedium

You are the AI lead at an e-commerce company that uses a generative model to write product descriptions from images and key attributes. The model is a multimodal transformer that encodes both image and text (attributes) and decodes a description. Recently, your team deployed a new version of the image encoder that uses a more powerful backbone (ViT-L instead of ViT-B). After deployment, the generated descriptions became longer but often include irrelevant visual details (e.g., background objects) and occasionally misrepresent the product's main features. The model was fine-tuned on the same dataset as before. The descriptions from the old model were concise and focused. What is the most likely cause of the degradation and the best fix?

A.The decoder is now too small relative to the encoder; reduce the encoder's hidden size or increase the decoder's capacity.
B.The new encoder produces less discriminative features; replace it with an older version.
C.Lower the decoder's temperature to reduce diversity and hallucination.
D.The powerful encoder introduces overfitting to the training images; continue fine-tuning with additional loss terms that penalize description of irrelevant details (e.g., using attention regularization).
AnswerD

Attention regularization forces the model to focus on product-relevant regions.

Why this answer

Option D is correct because the more powerful ViT-L encoder captures richer, high-resolution features, including background details, which the decoder then amplifies into longer, less focused descriptions. This is a form of overfitting to irrelevant visual patterns in the training images, not to the labels. Adding attention regularization (e.g., penalizing attention weights on non-salient regions) forces the model to focus on product-relevant features, restoring conciseness and accuracy without reverting to the weaker encoder.

Exam trap

Google Cloud often tests the misconception that a more powerful encoder always improves performance, when in fact it can introduce overfitting to irrelevant features, and the fix is not to downgrade the encoder but to add regularization that guides attention to salient regions.

How to eliminate wrong answers

Option A is wrong because the decoder's capacity is not inherently too small; the issue is that the encoder now provides a richer but noisier representation, and simply resizing encoder/decoder dimensions does not address the root cause of attending to irrelevant details. Option B is wrong because replacing the encoder with an older version would discard the potential benefits of ViT-L (e.g., better attribute recognition) and is a regression, not a fix; the problem is not that features are less discriminative but that they are too detailed and unfocused. Option C is wrong because lowering temperature reduces randomness in token sampling but does not prevent the decoder from faithfully reproducing irrelevant visual details that the encoder already extracted; it would only make the output more deterministic, not more concise or accurate.

52
MCQhard

A company has a large dataset of proprietary documents and wants to build a Q&A system using a foundation model without exposing the documents to the model. Which approach is most appropriate?

A.Use RAG with Vertex AI Vector Search and embeddings
B.Use a zero-shot model with context in prompt
C.Fine-tune the model on the documents
D.Use prompt engineering to instruct the model
AnswerA

RAG retrieves documents at query time without training on them.

Why this answer

Option A is correct because Retrieval-Augmented Generation (RAG) with Vertex AI Vector Search allows the system to retrieve relevant document chunks via embeddings without exposing the full documents to the foundation model. The model only sees the retrieved context in the prompt, ensuring proprietary data remains isolated and not used for training or memorization.

Exam trap

Google Cloud often tests the misconception that fine-tuning or prompt engineering can solve data privacy concerns, when in reality RAG is the only approach that keeps proprietary documents isolated from the model's training and inference pipeline.

How to eliminate wrong answers

Option B is wrong because a zero-shot model with context in the prompt still requires the entire document content to be included in the prompt, which exposes the proprietary data to the model and may exceed token limits. Option C is wrong because fine-tuning the model on the documents would directly expose the proprietary data to the model during training, risking memorization and data leakage. Option D is wrong because prompt engineering alone cannot retrieve specific information from a large dataset; it only instructs the model on how to respond, not where to find the data.

53
MCQhard

A retail company is building a generative AI chatbot to assist customers with product recommendations and order tracking. The chatbot uses Vertex AI with Gemini 1.5 Pro, and the development team has implemented a Retrieval-Augmented Generation (RAG) pipeline using Vertex AI Search for grounding. The pipeline uses a vector store containing product descriptions and order history. During testing, the team observes that the chatbot sometimes provides incorrect order statuses—for example, claiming an order is 'shipped' when it is actually 'pending'. The team suspects the issue is related to how context is retrieved and used. The RAG pipeline currently retrieves the top 5 chunks based on cosine similarity from the vector store, and passes them as context to the model. The team is considering several changes to improve factual accuracy. Which single action would most effectively reduce hallucinations in this scenario?

A.Switch from Vertex AI Search to a different vector database like Pinecone.
B.Reduce the model temperature to 0.0 to make outputs more deterministic.
C.Increase the similarity score threshold for retrieval to 0.85 to filter out less relevant chunks.
D.Increase the top-K retrieval value to 10 to provide more context to the model.
AnswerC

Option A is correct because a higher threshold reduces irrelevant context, directly improving factual grounding.

Why this answer

Option C is correct because increasing the similarity score threshold to 0.85 ensures that only highly relevant chunks are passed to the Gemini 1.5 Pro model, directly reducing the risk of the model generating responses based on irrelevant or low-confidence context. In a RAG pipeline using Vertex AI Search, low-similarity chunks can contain order statuses from different customers or products, leading to hallucinations like incorrect order statuses. Filtering out these less relevant chunks improves the factual grounding of the model's output.

Exam trap

Google Cloud often tests the misconception that simply adding more context (higher top-K) or making the model more deterministic (lower temperature) will fix hallucinations, when the real issue is the relevance and quality of the retrieved context in a RAG pipeline.

How to eliminate wrong answers

Option A is wrong because switching to a different vector database like Pinecone does not address the core issue of retrieval relevance; the problem lies in the similarity threshold and chunk selection, not the database technology. Option B is wrong because reducing temperature to 0.0 makes the model more deterministic but does not fix the underlying issue of irrelevant or incorrect context being retrieved; the model will still confidently generate incorrect answers based on poor context. Option D is wrong because increasing top-K to 10 would retrieve more chunks, potentially including even more low-relevance or noisy context, which could worsen hallucinations rather than improve factual accuracy.

54
MCQeasy

Refer to the exhibit. What is the most likely cause of this error?

A.The user does not have the required IAM role
B.The model is too large
C.The network is down
D.The project ID is incorrect
AnswerA

The error is a permission denial, meaning the user's IAM roles do not include 'aiplatform.models.upload'.

Why this answer

The error shown in the exhibit is an HTTP 403 Forbidden response, which indicates that the server understood the request but refuses to authorize it. In Google Cloud, this is most commonly caused by the user's identity lacking the necessary IAM role or permission to call the specific API or access the resource. Even if the project ID is correct and the network is functional, a missing IAM role (e.g., `aiplatform.user` or `roles/aiplatform.user`) will result in this exact error.

Exam trap

Google Cloud often tests the distinction between authentication (who you are) and authorization (what you can do), and the trap here is that candidates confuse a 403 Forbidden with a 404 Not Found or a network error, leading them to pick 'The project ID is incorrect' or 'The network is down' instead of recognizing the IAM permission failure.

How to eliminate wrong answers

Option B is wrong because model size does not cause an HTTP 403 error; a model that is too large would typically result in a 413 Payload Too Large or a resource-exhausted error, not an authorization failure. Option C is wrong because a network outage would produce a connectivity error (e.g., timeout, DNS resolution failure, or HTTP 502/503), not a 403 Forbidden response which requires a successful TCP connection and HTTP request to reach the server. Option D is wrong because an incorrect project ID would cause a 404 Not Found or a 400 Bad Request (e.g., 'Project not found'), not a 403 Forbidden; the 403 specifically indicates the request was received and the project exists, but the caller lacks authorization.

55
MCQmedium

A developer is using Vertex AI Gemini API for a chatbot. The chatbot sometimes outputs harmful content. What is the best first step to mitigate this?

A.Fine-tune the model on curated safe data
B.Add a human-in-the-loop review
C.Use safety filters and safety settings in the API request
D.Switch to a smaller model
AnswerC

Safety settings directly filter harmful content at inference time.

Why this answer

Option C is correct because the Vertex AI Gemini API provides built-in safety filters and configurable safety settings (e.g., `safety_settings` parameter with categories like `HARM_CATEGORY_HARASSMENT` and thresholds like `BLOCK_ONLY_HIGH`) that allow developers to block harmful outputs at inference time without retraining. This is the fastest and most direct first step to mitigate harmful content, as it requires no additional infrastructure or model modification.

Exam trap

Google Cloud often tests the misconception that the first step to mitigate harmful content is to fine-tune the model, when in reality the immediate, low-cost, and recommended first step is to leverage the API's built-in safety filters and settings.

How to eliminate wrong answers

Option A is wrong because fine-tuning on curated safe data is a resource-intensive, secondary step that does not address immediate harmful outputs during inference and may not cover all edge cases of harmful content. Option B is wrong because adding a human-in-the-loop review introduces latency and cost, and is a reactive measure rather than a proactive first step to block harmful content at the API level. Option D is wrong because switching to a smaller model does not inherently reduce harmful outputs; smaller models can still generate harmful content and may have reduced capabilities for safe response generation.

56
MCQhard

An organization wants to use a generative model to automatically generate legal contracts. The model must produce clauses that are not only grammatically correct but also legally enforceable and consistent with current jurisdiction laws. Which combination of techniques best ensures legal compliance?

A.Fine-tune a small model exclusively on legal contracts from a single jurisdiction and use it for generation.
B.Implement retrieval-augmented generation (RAG) with a vector database of all relevant laws.
C.Fine-tune a model on a diverse set of enforceable contracts and incorporate an external compliance verifier that uses rule-based checks.
D.Use a large instruction-tuned model with carefully engineered prompts describing jurisdiction details.
AnswerC

Fine-tuning imparts domain knowledge, and the verifier ensures legal correctness.

Why this answer

Option D is correct because fine-tuning on curated legal documents teaches domain-specific language and enforceability, while a verifier (an external logic/rule system) checks compliance with laws. Option A is incorrect because prompt engineering is unreliable for precise legal reasoning. Option B is incorrect because small model likely lacks legal knowledge.

Option C is incorrect because RAG retrieves but does not verify enforceability.

57
MCQmedium

A company fine-tunes a text model on internal HR policies. After deployment, the model sometimes outputs sensitive employee information. What is the most likely cause?

A.The fine-tuning dataset contained personally identifiable information that was not removed.
B.The model was not trained with reinforcement learning from human feedback (RLHF).
C.The model has insufficient parameters to generalize properly.
D.The prompt engineering was too verbose and included misleading instructions.
AnswerA

Models can memorize training data; including sensitive information leads to leakage.

Why this answer

The most likely cause is that the fine-tuning dataset contained personally identifiable information (PII) that was not properly scrubbed. During fine-tuning, the model learns patterns and memorizes specific sequences from the training data. If the dataset includes sensitive employee records, the model can reproduce that information verbatim when prompted, leading to data leakage.

This is a well-known risk in fine-tuning, as models can overfit to rare or unique examples in the training set.

Exam trap

Google Cloud often tests the misconception that RLHF or prompt engineering can fix data leakage issues, but the trap here is that the root cause is always the training data itself—no amount of post-hoc alignment or prompt tweaking can prevent the model from reproducing memorized sensitive content.

How to eliminate wrong answers

Option B is wrong because RLHF is a technique used to align model outputs with human preferences, not to prevent memorization of training data; it does not address the root cause of data leakage from the fine-tuning dataset. Option C is wrong because insufficient parameters would typically cause underfitting or poor generalization, not the exact reproduction of sensitive information; memorization is more likely with larger models that have higher capacity to store training examples. Option D is wrong because verbose or misleading prompt engineering might degrade output quality but cannot cause the model to output specific employee data that was not present in its training or fine-tuning data; the model can only generate information it has learned.

58
MCQeasy

A developer wants to generate product descriptions from a list of features using Vertex AI. Which model type is best suited for this task?

A.An embedding model (e.g., textembedding-gecko@001).
B.A chat model (e.g., chat-bison@001).
C.A text generation model (e.g., text-bison@001).
D.A code generation model (e.g., code-bison@001).
AnswerC

Text generation models are ideal for generative tasks from prompts.

Why this answer

Option C is correct because text-bison@001 is a dedicated text generation model optimized for tasks like summarization, translation, and content creation from structured inputs. It can take a list of features as a prompt and generate coherent, descriptive product descriptions without needing conversational context or code-specific outputs.

Exam trap

The trap here is that candidates may confuse 'text generation' with 'chat' or 'embedding' models, assuming any generative model can handle the task, but Vertex AI separates these by specialization, and the exam tests awareness of which model class is purpose-built for non-conversational, non-code text creation.

How to eliminate wrong answers

Option A is wrong because embedding models like textembedding-gecko@001 are designed to convert text into numerical vectors for similarity search or clustering, not for generating new text. Option B is wrong because chat models like chat-bison@001 are optimized for multi-turn conversational interactions, not for single-turn structured generation tasks like producing descriptions from a feature list. Option D is wrong because code generation models like code-bison@001 are specialized for generating programming code, not natural language product descriptions.

59
MCQmedium

A developer receives the above JSON response from a Vertex AI PaLM API call for a medical advice application. What should the developer be most concerned about?

A.The safety score is very low (0.01)
B.The deployed model ID is not recognized
C.The output falls under the 'health' category, which may require compliance with regulations
D.The prediction content is incorrect
AnswerC

Health-related outputs need careful review.

Why this answer

Option C is correct because the JSON response includes a 'category' field with the value 'health', which triggers stringent regulatory compliance requirements such as HIPAA in the US or GDPR in Europe. For a medical advice application, the developer must ensure data handling, model transparency, and output validation meet these legal standards, as failure could result in severe penalties. The PaLM API's safety attributes and category labels are designed to flag such sensitive domains, making compliance the primary concern over other technical issues.

Exam trap

Google Cloud often tests the misconception that low safety scores or incorrect content are the primary risks, when in fact regulatory compliance for sensitive categories like 'health' is the most critical and non-obvious concern that developers must address first.

How to eliminate wrong answers

Option A is wrong because a safety score of 0.01 is not inherently concerning; it may indicate low confidence in the safety assessment rather than actual unsafe content, and the PaLM API uses separate blocking thresholds (e.g., safety_settings) to filter harmful outputs. Option B is wrong because the deployed model ID (e.g., 'text-bison@001') is a standard identifier for the PaLM model version, and unrecognized IDs typically cause API errors or fallback behavior, not a primary concern for a valid response. Option D is wrong because the prediction content's correctness is a secondary validation issue that can be addressed through prompt engineering or post-processing, whereas regulatory compliance is a non-negotiable legal requirement that must be handled before deployment.

60
Multi-Selectmedium

A data scientist is selecting a base model for generating Python code. Which TWO factors are most important to consider?

Select 2 answers
A.Model's license (proprietary vs open-source).
B.Model's performance on coding benchmarks like HumanEval.
C.Model's support for multiple programming languages.
D.Model's training data recency.
E.Model's parameter count (size).
AnswersA, B

License determines usage rights and compliance.

Why this answer

Option A is correct because the model's license determines whether the generated code can be used in commercial products without violating copyright or requiring attribution. Proprietary models may impose restrictions on output usage, while open-source models (e.g., CodeLlama, StarCoder) offer more flexibility for enterprise deployment. This is critical for compliance and intellectual property management in production environments.

Exam trap

Google Cloud often tests the misconception that larger parameter counts or broader language support are more important than licensing and benchmark performance, leading candidates to overlook the legal and functional constraints of deploying a code generation model in a business context.

61
MCQhard

A company has fine-tuned a foundation model on proprietary data. During evaluation, they find the model performs well on seen examples but poorly on unseen but similar tasks. What is the problem?

A.Underfitting
B.Catastrophic forgetting
C.Distribution shift
D.Domain shift between fine-tuning and deployment
AnswerD

Domain shift causes poor generalization to similar but different tasks.

Why this answer

Option D is correct because the model performs well on seen examples (fine-tuning distribution) but poorly on unseen but similar tasks (deployment distribution), which is a classic symptom of domain shift. This occurs when the fine-tuning data does not fully represent the deployment environment, causing the model to fail on inputs that differ in subtle but systematic ways from the training distribution. The model has not generalized to the target domain despite being well-fitted to the source domain.

Exam trap

Google Cloud often tests the distinction between distribution shift (a broad category) and domain shift (a specific type), so candidates mistakenly pick 'distribution shift' without recognizing that the question explicitly describes a domain mismatch between fine-tuning and deployment.

How to eliminate wrong answers

Option A is wrong because underfitting would cause poor performance on both seen and unseen examples, not good performance on seen examples alone. Option B is wrong because catastrophic forgetting refers to a model losing previously learned knowledge when fine-tuned on new data, but here the model retains performance on seen examples, indicating no forgetting occurred. Option C is wrong because distribution shift is a broad term that includes domain shift, but the specific scenario described—good performance on seen tasks but poor on similar unseen tasks—is precisely domain shift between fine-tuning and deployment, not a general covariate or label shift.

62
Multi-Selecthard

A team is evaluating generative AI models for a content moderation system. Which THREE metrics are most important to assess?

Select 3 answers
A.Percentage of outputs flagged by safety filters.
B.Cost per million tokens.
C.Inference latency under expected load.
D.BLEU score against human-written moderation guidelines.
E.Precision and recall on a test set of moderated content.
AnswersA, C, E

Indicates how often the model generates unsafe content.

Why this answer

Option A is correct because safety filters are a primary mechanism for detecting and blocking harmful or policy-violating content in generative AI outputs. In a content moderation system, the percentage of outputs flagged by these filters directly measures the model's tendency to produce unsafe content, which is critical for maintaining platform safety and compliance.

Exam trap

Google Cloud often tests the distinction between metrics that measure model performance on the task (safety, accuracy) versus metrics that measure operational or linguistic qualities (cost, BLEU), leading candidates to mistakenly include cost or BLEU as primary assessment criteria for content moderation.

63
Multi-Selecthard

Which THREE of the following are key considerations when deploying a generative AI model in a production environment with strict latency requirements? (Choose three.)

Select 3 answers
A.Deploy the largest model variant available to ensure highest quality.
B.Implement speculative decoding to generate candidate tokens with a smaller draft model and verify with the large model.
C.Use model quantization (e.g., int8) to reduce precision and speed up matrix multiplications.
D.Cache the key-value caches from previous decoding steps to avoid redundant computation.
E.Increase the inference batch size to maximize GPU utilization.
AnswersB, C, D

Speculative decoding significantly reduces time per token.

Why this answer

Option B is correct because speculative decoding uses a smaller, faster draft model to generate candidate tokens, which are then verified by the large model in parallel. This reduces the number of sequential autoregressive steps, significantly lowering latency while maintaining output quality.

Exam trap

Google Cloud often tests the distinction between latency and throughput, so the trap here is that candidates confuse batch size (which improves throughput) with latency reduction, or assume larger models always yield better performance without considering inference speed.

64
MCQmedium

You are an ML engineer at a retail company. You have deployed a generative AI model on Vertex AI to generate product descriptions. The model uses a custom container and is deployed to a single endpoint. Recently, you noticed that inference latency has increased significantly during peak hours, causing timeouts. You have checked the logs and found that the CPU utilization on the deployed instances is consistently above 90% during peak hours. The model is currently deployed with a single machine type (n1-standard-4) and no scaling. You need to reduce latency without incurring excessive cost. What should you do?

A.Optimize the model using quantization and reduce the number of replicas
B.Switch to batch prediction instead of online prediction
C.Change the machine type to n1-standard-8 and enable autoscaling with min replicas=1, max replicas=5
D.Add a GPU accelerator to the existing machine
AnswerC

More CPU and autoscaling handle peak load efficiently.

Why this answer

Option C is correct because upgrading to a larger machine (n1-standard-8) provides more CPU cores to handle the increased inference workload, while enabling autoscaling (min=1, max=5) allows the deployment to dynamically add replicas during peak hours to distribute the load and reduce latency. This combination addresses the high CPU utilization without over-provisioning during off-peak times, thus controlling cost.

Exam trap

The trap here is that candidates often assume adding a GPU (Option D) is always the best way to reduce inference latency, but for CPU-bound models with high utilization, scaling out with more replicas and a larger CPU machine is more cost-effective and directly addresses the bottleneck.

How to eliminate wrong answers

Option A is wrong because quantization reduces model size and can improve latency, but reducing the number of replicas would worsen the bottleneck by decreasing capacity, not help with high CPU utilization. Option B is wrong because batch prediction is designed for asynchronous, non-real-time processing and does not solve online inference latency during peak hours; it would change the use case entirely. Option D is wrong because adding a GPU accelerator to the existing n1-standard-4 machine would not address the CPU bottleneck (inference is CPU-bound in this scenario) and would increase cost unnecessarily without guaranteed latency improvement for a CPU-bound model.

65
MCQeasy

Refer to the exhibit. A company has this IAM policy on a Vertex AI project. Alice complains she cannot create a new model. What is the most likely reason?

A.She needs the roles/aiplatform.modelUser role
B.She needs the roles/aiplatform.admin role
C.She needs the roles/aiplatform.modelAdmin role
D.She needs the roles/aiplatform.modelCreator role
AnswerB

Admin role has full access, including model creation.

Why this answer

The IAM policy shown in the exhibit likely grants only basic permissions (e.g., roles/aiplatform.user) which do not include the ability to create models. The roles/aiplatform.admin role provides full administrative access, including model creation, deletion, and management across the Vertex AI project. Without this role, Alice lacks the necessary aiplatform.models.create permission, which is why she cannot create a new model.

Exam trap

Google Cloud often tests the misconception that there is a specific 'modelCreator' or 'modelAdmin' role, when in fact Vertex AI uses a flat admin role (roles/aiplatform.admin) for all create/delete operations, and candidates confuse custom role names with predefined ones.

How to eliminate wrong answers

Option A is wrong because roles/aiplatform.modelUser only grants read-only access to models (e.g., deploy and predict), not create permissions. Option C is wrong because roles/aiplatform.modelAdmin does not exist as a predefined role in Vertex AI; the correct role for model administration is roles/aiplatform.admin. Option D is wrong because roles/aiplatform.modelCreator is not a predefined IAM role in Vertex AI; model creation is covered by roles/aiplatform.admin or custom roles with aiplatform.models.create permission.

66
MCQmedium

After fine-tuning a foundation model on company emails, the model outputs confidential information. What is the most likely cause?

A.The prompt is too vague
B.The model is too large
C.The fine-tuning dataset was not anonymized
D.Overfitting to the training data leading to memorization
AnswerC

Unanonymized data can be memorized and reproduced by the model.

Why this answer

Option C is correct because the most likely cause of a fine-tuned model outputting confidential information is that the fine-tuning dataset contained sensitive data that was not anonymized. During fine-tuning, the model learns patterns and can memorize specific sequences, including confidential details like names, addresses, or proprietary information, which it then reproduces in responses. This is a well-known data leakage risk in fine-tuning workflows.

Exam trap

Google Cloud often tests the distinction between a model's inherent behavior (like overfitting) and the root cause in the data pipeline, so candidates mistakenly choose overfitting (Option D) instead of recognizing that the dataset itself was the source of the confidential information.

How to eliminate wrong answers

Option A is wrong because a vague prompt may lead to irrelevant or generic outputs, but it does not directly cause the model to output specific confidential information that was not present in the training data. Option B is wrong because model size (number of parameters) does not inherently cause memorization of confidential data; memorization is a function of training data exposure and fine-tuning methodology, not model scale alone. Option D is wrong because while overfitting can lead to memorization, the root cause is the presence of unanonymized confidential data in the fine-tuning dataset; overfitting is a symptom, not the primary cause, and the question asks for the 'most likely cause'.

67
MCQeasy

A startup is building a customer service chatbot that generates responses in real-time. They want the model to have up-to-date information on the latest product catalog but cannot afford frequent fine-tuning. Which technique should they use to inject current data into the model without retraining?

A.Rely on the model's zero-shot capabilities to infer product details.
B.Use retrieval-augmented generation (RAG) to fetch relevant documents from a vector database at inference time.
C.Craft detailed system prompts that include the entire product catalog in the prompt.
D.Fine-tune the base model weekly on the latest product catalog.
AnswerB

RAG enables the model to access external, up-to-date information without retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct technique because it allows the chatbot to fetch the most current product catalog entries from an external vector database at inference time, without requiring any model retraining. This keeps responses grounded in up-to-date information while avoiding the cost and latency of frequent fine-tuning.

Exam trap

Google Cloud often tests the distinction between in-context learning (via RAG or prompt engineering) and parametric knowledge (via fine-tuning), trapping candidates who think that simply adding more data to the prompt is scalable or that zero-shot inference can substitute for external retrieval.

How to eliminate wrong answers

Option A is wrong because zero-shot capabilities rely solely on the model's pre-existing knowledge, which cannot incorporate new or updated product catalog details without retraining. Option C is wrong because crafting detailed system prompts with the entire product catalog would exceed the model's context window limits and incur high token costs, making it impractical for real-time inference. Option D is wrong because fine-tuning weekly is expensive, time-consuming, and contradicts the requirement to avoid frequent retraining; it also risks catastrophic forgetting of previously learned information.

68
MCQmedium

A developer wants to build a RAG application using Vertex AI. Which vector database is natively integrated with Vertex AI for storing embeddings?

A.Firestore
B.Vertex AI Vector Search
C.Cloud SQL
D.Bigtable
AnswerB

Vector Search is purpose-built for storing and querying embeddings.

Why this answer

Vertex AI Vector Search is the native vector database integrated with Vertex AI for storing and querying embeddings. It is purpose-built for high-dimensional vector similarity search, enabling efficient retrieval in RAG applications without requiring external infrastructure.

Exam trap

Google Cloud often tests the misconception that any database can store embeddings equally well, but the key differentiator is native vector indexing and ANN search support, which only Vertex AI Vector Search provides among the listed options.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database designed for storing structured data, not optimized for vector similarity search or embedding storage. Option C is wrong because Cloud SQL is a relational database service (MySQL, PostgreSQL, SQL Server) that lacks native vector indexing and similarity search capabilities required for RAG. Option D is wrong because Bigtable is a wide-column NoSQL database for large-scale analytical workloads, not designed for low-latency vector similarity queries.

69
MCQmedium

A company wants to build a chatbot that answers questions based on internal documents. Which approach is most appropriate?

A.Use a pre-trained model without any customizations
B.Train a custom model from scratch
C.Fine-tune a model on the documents
D.Use a prompt with the documents in the context
AnswerD

This is the core of RAG: provide relevant documents in the prompt to ground the model's answers.

Why this answer

Option D is correct because Retrieval-Augmented Generation (RAG) allows the chatbot to dynamically include relevant internal documents in the prompt context without modifying the underlying model. This approach leverages the pre-trained model's language understanding while grounding answers in specific, up-to-date internal data, avoiding the cost and latency of fine-tuning or retraining.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the only way to incorporate proprietary data, but RAG is the most appropriate for dynamic, retrieval-based Q&A because it avoids retraining and keeps the model's knowledge current.

How to eliminate wrong answers

Option A is wrong because a pre-trained model without customization lacks access to the company's internal documents, leading to hallucinated or generic answers not grounded in proprietary data. Option B is wrong because training a custom model from scratch is computationally prohibitive and unnecessary; it requires massive labeled datasets and resources, whereas RAG achieves the same goal with far less effort. Option C is wrong because fine-tuning on documents teaches the model to memorize specific content, which is inefficient for large, frequently updated document sets and risks catastrophic forgetting, whereas RAG keeps the model static and retrieves fresh context per query.

70
MCQhard

A team is building a medical diagnosis assistant using a foundation model. To comply with regulations, they need to ensure the model does not make up facts. What is the best approach?

A.Use a small model to hallucinate less
B.Use grounding with Vertex AI Search
C.Reduce temperature to 0
D.Fine-tune on medical journals
AnswerB

Grounding provides verifiable citations and reduces fabrication.

Why this answer

Grounding with Vertex AI Search is the best approach because it connects the foundation model's outputs to a verifiable, curated knowledge base, ensuring factual accuracy and compliance with regulations that prohibit hallucination. By retrieving information from a trusted source (e.g., medical databases) in real time, the model can cite evidence and avoid generating unverified claims.

Exam trap

Google Cloud often tests the misconception that reducing temperature or using a smaller model can eliminate hallucination, when in fact only grounding with external, verifiable data sources can reliably prevent fact fabrication in high-stakes domains.

How to eliminate wrong answers

Option A is wrong because using a smaller model does not inherently reduce hallucination; smaller models have less capacity and may actually hallucinate more due to limited training data and weaker reasoning. Option C is wrong because reducing temperature to 0 makes the model deterministic but does not prevent it from generating plausible-sounding but false information; it still relies on its parametric knowledge, which can be incomplete or outdated. Option D is wrong because fine-tuning on medical journals alone does not guarantee factual accuracy; the model may memorize and reproduce errors, and it cannot dynamically verify facts against a live, authoritative source.

71
MCQeasy

A marketing team wants to generate product descriptions using a text generation model on Vertex AI. They need consistent output style across all descriptions, including tone and length. They have a small set of 10 high-quality example descriptions that capture the desired style. The team has limited ML expertise and wants a quick solution that does not require model retraining. Which approach should they use?

A.Use a pre-built template with no model input.
B.Fine-tune the model on a large external dataset of product descriptions.
C.Use few-shot prompting with the examples in the prompt.
D.Set the temperature to 0.9 to maximize creativity.
AnswerC

Few-shot prompting directly leverages examples to achieve consistent style without retraining.

Why this answer

Few-shot prompting is the correct approach because it allows the team to inject the desired style, tone, and length directly into the prompt using the 10 high-quality examples, without any model retraining. This technique leverages the in-context learning capability of large language models on Vertex AI, enabling consistent output from a small set of demonstrations. It is ideal for teams with limited ML expertise as it requires only prompt engineering, not fine-tuning or infrastructure changes.

Exam trap

Google Cloud often tests the misconception that higher temperature always improves output quality, but the trap here is that temperature controls randomness, not consistency, so candidates may incorrectly choose Option D without understanding that low temperature is required for reproducible style and length.

How to eliminate wrong answers

Option A is wrong because a pre-built template with no model input cannot generate dynamic, context-aware product descriptions; it produces static text that lacks the flexibility and nuance of a generative model. Option B is wrong because fine-tuning on a large external dataset would require significant ML expertise, data preparation, and compute resources, contradicting the requirement for a quick solution without model retraining. Option D is wrong because setting temperature to 0.9 maximizes randomness and creativity, which is the opposite of what is needed for consistent output style; a lower temperature (e.g., 0.2) would be more appropriate for deterministic, reproducible results.

72
MCQhard

A financial institution deploys a chatbot using Gemini Pro in Vertex AI. Compliance requires logging all user inputs and model outputs for audit. Which approach meets this requirement?

A.Capture logs via Cloud Monitoring
B.Enable Vertex AI Endpoint request-response logging
C.Use Cloud Logging sink with a filter for Vertex AI requests
D.Enable Vertex AI Model Registry logging
AnswerB

This captures every request and response for the deployed model, meeting audit requirements.

Why this answer

Vertex AI Endpoint request-response logging captures both the user's input prompt and the model's generated output, which is precisely what compliance auditing requires. This feature logs the exact payloads sent to and received from the deployed model, ensuring a complete audit trail without additional configuration.

Exam trap

The trap here is that candidates confuse Cloud Logging sinks or Cloud Monitoring with the specific Vertex AI feature that must be explicitly enabled on the endpoint, assuming that default logging captures request-response payloads when it does not.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is designed for metrics, alerts, and dashboards, not for capturing detailed request-response payloads for audit compliance. Option C is wrong because a Cloud Logging sink with a filter can only export logs that already exist; it does not enable the capture of Vertex AI request-response logs, which must be explicitly enabled on the endpoint. Option D is wrong because Vertex AI Model Registry logging tracks model version metadata and lifecycle events, not the user inputs and model outputs from inference calls.

73
MCQhard

A multimodal generative AI system processes both image and text inputs to produce captions. During inference, the image encoder sometimes produces noisy or missing features. Which architectural design decision best handles such input degradation without retraining?

A.Train a separate variational autoencoder to produce a clean latent representation from the noisy image.
B.Increase the image encoder’s capacity to better extract robust features.
C.Apply standard image preprocessing (e.g., denoising) to all inputs before feeding to the encoder.
D.Introduce a gating mechanism that learns to weigh image features based on confidence scores from the encoder.
AnswerD

Gating allows the model to ignore unreliable features dynamically.

Why this answer

Option D is correct because a gating mechanism dynamically adjusts the contribution of image features based on confidence scores from the encoder, allowing the model to gracefully handle noisy or missing features without retraining. This architectural design learns to suppress unreliable image inputs and rely more on text or other modalities, ensuring robust caption generation under input degradation.

Exam trap

Google Cloud often tests the misconception that preprocessing or model capacity adjustments are the only ways to handle input noise, but the key insight is that architectural mechanisms like gating can adaptively handle degradation at inference time without retraining.

How to eliminate wrong answers

Option A is wrong because training a separate variational autoencoder (VAE) to produce clean latent representations requires additional training data and retraining, which contradicts the 'without retraining' constraint; it also adds complexity without addressing dynamic degradation during inference. Option B is wrong because increasing the image encoder’s capacity does not inherently handle noisy or missing features—it may overfit to training data and still produce unreliable outputs when inputs degrade, and it requires retraining to change capacity. Option C is wrong because standard image preprocessing like denoising is a fixed, non-adaptive approach that cannot compensate for missing features or varying noise levels, and it may discard useful information; it also does not leverage the model’s ability to learn confidence-based weighting.

74
MCQhard

A developer runs the command above to test a text classification model deployed on a Vertex AI endpoint. The model returns an error. What is the most likely cause?

A.The endpoint ID '789' does not exist in the project
B.The model is not deployed to any endpoint
C.The instance schema (e.g., 'content' field) does not match the model's expected input signature
D.The region 'us-central1' does not match the region where the model is deployed
AnswerC

The model expects a different input format (e.g., 'text' field or a structured object), leading to the format error.

Why this answer

Option C is correct because the error 'Model does not support the given instance format' indicates a mismatch between the input schema and what the model expects. Option A (wrong endpoint ID) would produce a 'not found' error. Option B (region mismatch) would give a regional validation error.

Option D (model not deployed) would result in an endpoint not serving error.

75
MCQhard

Refer to the exhibit. A developer sees this error when trying to deploy a model from Vertex AI Model Registry. What is the most likely cause?

A.The region is not supported
B.The developer used the model display name instead of the full resource name
C.The model is not published
D.The model is in a different project
AnswerB

Display name is not a valid model reference; the full resource path is required.

Why this answer

The error occurs because Vertex AI Model Registry requires the full resource name (e.g., 'projects/{project}/locations/{region}/models/{model_id}') to deploy a model, not just the display name. The display name is a human-readable label that is not unique within a project, while the full resource name uniquely identifies the model version. Using the display name causes the API to fail with a 'not found' or 'invalid argument' error.

Exam trap

Google Cloud often tests the distinction between display names (non-unique, human-readable) and resource names (unique, API-required) in cloud services like Vertex AI, where candidates mistakenly assume display names can be used interchangeably with resource identifiers.

How to eliminate wrong answers

Option A is wrong because Vertex AI supports model deployment in all regions where the service is available, and the error message does not indicate a regional restriction. Option C is wrong because a model can be deployed from the registry even if it is not published to the public; publishing is only required for sharing with external users or making it available in the Model Garden. Option D is wrong because the error would reference a cross-project permission issue (e.g., 'permission denied' or 'resource not found in project'), not a display name mismatch.

Page 1 of 2 · 124 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Fundamentals of Generative AI questions.