Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 151–225

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 3 of 7

151

MCQmedium

A healthcare organization is developing a generative AI system to assist doctors with clinical decision support. They are concerned about regulatory compliance (e.g., HIPAA) and potential liability. What is the most important business strategy to mitigate these risks?

A.Limit the system to non-critical administrative tasks only.

B.Use an open-source model to avoid vendor lock-in and reduce costs.

C.Fully automate the system to reduce human error.

D.Implement a human-in-the-loop review process with clear accountability for AI-generated recommendations.

AnswerD

Human oversight ensures compliance and provides a clear chain of responsibility.

Why this answer

Option B is correct because human oversight and clear accountability are essential for high-stakes decisions. Option A is wrong because automation without oversight increases liability. Option C is wrong because open-source models may not comply with privacy requirements.

Option D is wrong because limiting scope reduces utility but does not address accountability.

Full explanation →

152

MCQeasy

A marketing team wants to generate product descriptions using generative AI. They need to ensure factual accuracy and avoid hallucinations. Which approach should they use?

A.Use a code generation model to generate structured descriptions.

B.Fine-tune the model on all product descriptions using supervised learning.

C.Implement a retrieval augmented generation (RAG) system that retrieves product facts from a database.

D.Use a large language model with detailed prompt instructions to be accurate.

AnswerC

RAG grounds the model's output in retrieved facts, improving accuracy.

Why this answer

Retrieval Augmented Generation (RAG) is the correct approach because it grounds the model's output in verifiable, external data sources. By retrieving product facts from a database in real-time, the system ensures that the generated descriptions are based on accurate information, directly mitigating the risk of hallucination. This method combines the generative power of an LLM with a retrieval step that provides factual context, making it ideal for applications where precision is critical.

Exam trap

Google Cloud often tests the misconception that detailed prompting alone (Option D) is sufficient to guarantee factual accuracy, when in reality, without external knowledge retrieval, the model can still generate plausible but incorrect information.

How to eliminate wrong answers

Option A is wrong because code generation models are designed to produce structured code or data formats, not to ensure factual accuracy in natural language product descriptions; they lack a retrieval mechanism to verify facts. Option B is wrong because fine-tuning on all product descriptions using supervised learning can embed training data biases and does not prevent hallucination on unseen or updated product facts; it also requires extensive labeled data and retraining for each change. Option D is wrong because while detailed prompt instructions can guide the model, they do not provide a mechanism to access or verify external facts; the model may still hallucinate based on its parametric knowledge, which can be outdated or incorrect.

Full explanation →

153

MCQmedium

A company needs to fine-tune a foundation model on Vertex AI for a custom text classification task with only 500 labeled examples. They want to minimize cost while achieving high accuracy. What is the MOST cost-effective approach?

A.Fine-tune the foundation model using full fine-tuning on the entire dataset.

B.Use model distillation to train a smaller student model.

C.Use Vertex AI LLM-based evaluation to compare multiple large models and select the best one.

D.Design prompts with few-shot examples and test it with the available data.

AnswerD

Prompt engineering with few-shot examples is low-cost and effective for small datasets.

Why this answer

Option B is correct because with a small dataset, prompt engineering and few-shot examples are often sufficient and much cheaper than fine-tuning. Fine-tuning (C) risks overfitting and higher cost. Model distillation (D) requires a large teacher model.

Evaluation (A) is not a training method.

Full explanation →

154

MCQmedium

A retail company wants to use GenAI to generate product descriptions. They have a small team of data scientists. What is the most efficient approach?

A.Collect more data for several months before starting

B.Train a model from scratch using their product data

C.Use a foundation model API with prompt engineering and few-shot examples

D.Buy a proprietary model from a startup

AnswerC

A foundation model API provides high-quality output with minimal effort; prompt engineering tailors it to product descriptions.

Why this answer

Option C is correct because using a foundation model API with prompt engineering and few-shot examples is the most efficient approach for a small team. It leverages pre-trained models (e.g., GPT-4, Claude) via API calls, requiring no infrastructure or training data, while prompt engineering and few-shot examples allow the model to adapt to the company's product catalog with minimal effort and cost.

Exam trap

Google Cloud often tests the misconception that more data or custom training is always better, but the trap here is that candidates overlook the efficiency and sufficiency of foundation model APIs with prompt engineering for small teams with limited data and compute resources.

How to eliminate wrong answers

Option A is wrong because collecting more data for several months delays deployment unnecessarily; foundation models already have broad language understanding and can generate product descriptions with minimal domain-specific data via few-shot prompting. Option B is wrong because training a model from scratch is computationally expensive, requires large labeled datasets, and demands deep ML expertise, which is inefficient for a small team with limited resources. Option D is wrong because buying a proprietary model from a startup introduces vendor lock-in, potential licensing costs, and may not offer the flexibility or rapid iteration that API-based foundation models provide.

Full explanation →

155

MCQhard

A developer is using Vertex AI Generative AI Studio to fine-tune a PaLM 2 model for code generation. After training, they notice the model generates plausible but incorrect code. What is the most likely cause?

A.Overfitting to training data

B.Insufficient training steps

C.Hallucination due to lack of grounding

D.Prompt format mismatch

AnswerA

Overfitting leads to memorization of training data, including mistakes, reducing generalization.

Why this answer

When a PaLM 2 model generates plausible but incorrect code after fine-tuning, the most likely cause is overfitting to the training data. Overfitting occurs when the model memorizes specific code patterns, syntax, or even bugs from the fine-tuning dataset rather than learning generalizable programming logic. This results in outputs that look syntactically correct and contextually relevant but fail to execute properly or solve the intended problem, because the model has not learned the underlying algorithmic principles.

Exam trap

The trap here is that candidates confuse 'plausible but incorrect code' with hallucination (Option C), but hallucination in code generation typically produces non-existent functions or libraries, whereas overfitting produces code that is syntactically valid and uses real functions but contains logical errors learned from the training data.

How to eliminate wrong answers

Option B is wrong because insufficient training steps typically lead to underfitting, where the model fails to capture even basic patterns from the training data, resulting in incoherent or irrelevant code—not plausible but incorrect code. Option C is wrong because hallucination due to lack of grounding is a phenomenon more associated with factual inaccuracies in text generation (e.g., inventing API names or libraries), not with generating syntactically valid but logically flawed code; fine-tuning on code data directly addresses grounding. Option D is wrong because prompt format mismatch would cause the model to misinterpret the input structure or produce outputs in the wrong format, not generate code that appears correct but is functionally wrong.

Full explanation →

156

MCQhard

You are a Generative AI architect at a large financial services firm. The firm has deployed a custom large language model (LLM) fine-tuned on proprietary financial reports to assist analysts in generating quarterly earnings summaries. The model is hosted on Vertex AI using a dedicated endpoint with autoscaling enabled. Recently, the model's output has exhibited two issues: (1) occasional factual inaccuracies about specific financial figures, and (2) a tendency to produce overly verbose and repetitive text in the summaries, sometimes exceeding the desired length of 200 words. The team has already tried adjusting the temperature parameter from 0.7 to 0.2 and increased the top-k sampling from 40 to 50, but the problems persist. The model's training data includes over 10,000 financial reports, and the fine-tuning process used low-rank adaptation (LoRA) with rank 16. The production environment uses a batch size of 1 for inference. You need to recommend a course of action that most directly addresses both the factual accuracy and verbosity issues without requiring a full retraining of the model. Which approach should you take?

A.Increase the LoRA rank to 32 and fine-tune the model for additional epochs on a curated subset of reports that focus on concise and accurate summaries.

B.Implement a retrieval-augmented generation (RAG) pipeline that queries a vector database of verified financial data, and apply constrained decoding with a maximum token limit and a repetition penalty.

C.Switch to a larger pre-trained model (e.g., PaLM 2 or GPT-4) and use the same fine-tuning data with higher rank LoRA to improve capability, then rely on the larger model's inherent accuracy.

D.Experiment with higher temperature (e.g., 0.9) and lower top-k (e.g., 20) to encourage more diverse and concise outputs, and add a post-processing step to truncate summaries to 200 words.

AnswerB

This directly improves factual accuracy by grounding outputs in retrieved evidence and reduces verbosity through decoding constraints, without retraining.

Why this answer

Option B is correct because it directly addresses both issues without retraining. A RAG pipeline grounds the model's outputs in verified financial data, eliminating factual inaccuracies. Constrained decoding with a maximum token limit and repetition penalty directly curbs verbosity and repetition, which temperature and top-k adjustments failed to fix.

Exam trap

Google Cloud often tests the misconception that adjusting hyperparameters like temperature or top-k can fix factual accuracy and verbosity, when in reality these issues stem from the model's lack of external knowledge and lack of output constraints, which require architectural changes like RAG and constrained decoding.

How to eliminate wrong answers

Option A is wrong because increasing LoRA rank and fine-tuning on a curated subset still relies on the model's parametric memory, which is prone to hallucination and does not guarantee factual accuracy; it also requires retraining, contradicting the 'no full retraining' constraint. Option C is wrong because switching to a larger model does not inherently solve factual inaccuracies (larger models can still hallucinate) and requires full retraining or significant adaptation, violating the constraint. Option D is wrong because higher temperature (0.9) increases randomness, likely worsening factual inaccuracies, and lower top-k (20) reduces diversity, which may not fix verbosity; post-processing truncation does not address the root cause of repetition or inaccuracy.

Full explanation →

157

MCQhard

You are a data scientist at a financial institution. You are using Vertex AI to fine-tune a large language model (LLM) for generating financial reports. You have prepared a dataset of 10,000 examples. During fine-tuning, you notice that the training loss is decreasing steadily, but the validation loss is increasing after 5 epochs. The model's generated reports on the validation set contain many factual errors and nonsensical statements. You suspect overfitting. You have limited compute budget and need to improve generalization. What should you do?

A.Increase the learning rate

B.Increase the number of training epochs to 20

C.Add more training examples from a public dataset

D.Implement early stopping with a patience of 2 epochs

AnswerD

Early stopping prevents overfitting.

Why this answer

Early stopping with a patience of 2 epochs is the correct approach because it directly addresses overfitting by halting training when the validation loss fails to improve for a specified number of epochs. This preserves the model's generalization ability without requiring additional compute or data, which aligns with the limited budget constraint. In Vertex AI, early stopping is a built-in hyperparameter tuning strategy that monitors validation metrics and stops the job to prevent further degradation.

Exam trap

The trap here is that candidates often confuse overfitting with underfitting and choose to add more data or increase epochs, failing to recognize that the validation loss increasing while training loss decreases is the classic sign of overfitting, which requires a regularization technique like early stopping.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate would make the optimizer take larger steps, which can cause the loss to diverge or oscillate, worsening the overfitting and factual errors. Option B is wrong because increasing the number of training epochs to 20 would continue training on the same data, likely exacerbating overfitting as the validation loss is already increasing after 5 epochs. Option C is wrong because adding more training examples from a public dataset may introduce domain mismatch or noise, and it does not address the immediate overfitting issue; it also requires additional compute and data curation resources, contradicting the limited budget constraint.

Full explanation →

158

MCQmedium

Refer to the exhibit. You ran the gcloud command to list a model, but received this error. What is the most likely issue?

A.The model's artifact URI is wrong

B.The model wasn't uploaded correctly

C.The model is missing a serving container

D.The model is in a different project

AnswerC

The error explicitly says 'no serving container image'.

Why this answer

Option B is correct because the error clearly states the model has no serving container image. Option A is wrong because a failed upload would show a different error. Option C is wrong because artifact URI issues would show a different message.

Option D is wrong because cross-project listing would require additional flags.

Full explanation →

159

Multi-Selectmedium

Which TWO techniques effectively reduce bias in generative model outputs? (Choose two.)

Select 2 answers

A.Apply adversarial debiasing during training or fine-tuning.

B.Increase the temperature parameter to introduce more variability.

C.Use a larger model with more parameters.

D.Fine-tune on a dataset with balanced representation across groups.

E.Reduce max output tokens to limit the model's expression.

AnswersA, D

Adversarial methods train the model to ignore protected attributes, reducing bias.

Why this answer

Options A and C are correct. A: fine-tuning on a balanced dataset reduces sampling bias in training data. C: adversarial debiasing actively adjusts the model to remove spurious correlations.

B is wrong because increasing temperature adds randomness without addressing bias. D is wrong because larger models may amplify biases. E is wrong because reducing max tokens does not affect bias.

Full explanation →

160

MCQhard

A media company wants to build a multi-modal generative app that accepts text, image, and video inputs and produces summaries. The app must handle variable-length videos up to 10 minutes. Which architecture is most scalable and cost-effective?

A.Use a pipeline to split videos into short clips, extract key frames, and process with Gemini 1.5 Pro (with context caching) to generate summaries.

B.Use Video Intelligence API to generate video captions, then feed captions to a text model.

C.Convert all inputs to text descriptions and use a text-only model.

D.Deploy a single Vertex AI endpoint with a model that can ingest multi-modal data directly.

AnswerA

B is correct because it handles variable-length content efficiently within model limits.

Why this answer

Option A is correct because splitting videos into short clips and extracting key frames reduces the computational load and token usage, while Gemini 1.5 Pro's context caching efficiently handles variable-length videos up to 10 minutes by reusing processed context across requests. This approach balances scalability (by avoiding processing entire videos at once) and cost-effectiveness (by minimizing API calls and storage), making it ideal for a multi-modal summarization app.

Exam trap

Google Cloud often tests the misconception that a single multi-modal endpoint is inherently scalable, but the trap here is that direct ingestion of raw video without preprocessing (like key frame extraction) leads to prohibitive token costs and latency, making pipeline-based approaches with caching more practical for variable-length inputs.

How to eliminate wrong answers

Option B is wrong because using Video Intelligence API to generate captions and then feeding them to a text model loses visual and temporal information from the video, such as scene transitions and non-verbal cues, which degrades summary quality. Option C is wrong because converting all inputs (images, videos) to text descriptions discards multi-modal richness, forcing a text-only model to infer visual details, which is inaccurate and inefficient for variable-length videos. Option D is wrong because deploying a single Vertex AI endpoint with a multi-modal model directly ingesting raw data would be computationally expensive and unscalable for 10-minute videos, as it requires processing every frame without optimization, leading to high latency and cost.

Full explanation →

161

Multi-Selectmedium

What are THREE best practices for responsible generative AI deployment?

Select 3 answers

A.Monitor model performance and data drift over time

B.Maximize model size for best accuracy

C.Maintain human oversight for critical decisions

D.Implement content filters to block harmful or biased outputs

E.Avoid fine-tuning the model to preserve original capabilities

AnswersA, C, D

Continuous monitoring helps detect degradation and ensures the model remains reliable.

Why this answer

Option A is correct because continuous monitoring of model performance and data drift is essential for maintaining the reliability and safety of generative AI systems. Data drift occurs when the statistical properties of input data change over time, which can degrade model accuracy and introduce unintended biases. Regular monitoring allows teams to detect these shifts early and retrain or adjust the model to sustain responsible behavior.

Exam trap

Google Cloud often tests the misconception that bigger models are always better, but the trap here is that responsible AI deployment focuses on safety, fairness, and reliability rather than raw performance metrics like model size.

Full explanation →

162

Multi-Selectmedium

Which TWO actions are recommended best practices for cost optimization when deploying generative AI models on Vertex AI?

Select 2 answers

A.Use batch prediction for non-real-time workloads

B.Set up autoscaling with a minimum number of replicas to avoid excessive scaling

C.Deploy the model in a single region to reduce network costs

D.Store all model prediction logs indefinitely for auditing

E.Always use GPU instances for inference

AnswersA, B

Batch prediction uses preemptible VMs, reducing cost.

Why this answer

Option A is correct because batch prediction processes predictions asynchronously in large batches, which is significantly more cost-effective than online (real-time) prediction for workloads that do not require immediate responses. Vertex AI batch prediction jobs automatically scale down to zero when not in use, eliminating idle compute costs, and you only pay for the resources consumed during the job execution.

Exam trap

Google Cloud often tests the misconception that single-region deployment always reduces costs, when in reality it can increase network egress charges and latency penalties for global users, making multi-region strategies with traffic management more cost-effective.

Full explanation →

163

MCQeasy

An engineer is testing a generative AI application using the Gemini API. They receive a 400 error with message 'INVALID_ARGUMENT: text has been blocked.' What is the most likely cause?

A.The specified Gemini model version does not exist

B.The input text was flagged by a safety filter

C.The API key is invalid

D.The API quota has been exceeded

AnswerB

Safety filters block inappropriate content and return 400 with blocked message.

Why this answer

Option B is correct because the error indicates the input text triggered a safety filter. Option A (quota exceeded) would give 429. Option C (authentication) gives 401/403.

Option D (model not found) gives 404.

Full explanation →

164

MCQmedium

A team set a budget alert for their GenAI API usage at $10,000. They received the alert with current spend of $12,500. Which business action is most appropriate as a first step?

A.Pause all non-critical use cases immediately

B.Switch to a cheaper model provider

C.Review usage patterns and optimize prompt lengths and frequencies

D.Increase the budget by 50% to $15,000

AnswerC

Optimizing usage is the most cost-effective first step; it can reduce consumption without disrupting operations.

Why this answer

Option C is correct because the first step in responding to a budget overrun should be to analyze usage patterns and optimize prompt lengths and frequencies. This approach identifies inefficiencies (e.g., unnecessarily verbose prompts, excessive retries) that directly reduce token consumption and cost without disrupting critical operations. It aligns with the principle of cost optimization before making architectural or policy changes.

Exam trap

Google Cloud often tests the misconception that immediate cost-cutting actions (like pausing or switching models) are the best first step, when in fact data-driven analysis and optimization should precede any operational or financial changes.

How to eliminate wrong answers

Option A is wrong because pausing all non-critical use cases is a reactive, blunt measure that may disrupt business processes and does not address the root cause of cost overruns; it should be considered only after analysis shows specific non-critical usage is the primary driver. Option B is wrong because switching to a cheaper model provider without understanding current usage patterns risks degrading output quality or compatibility, and may not address inefficiencies like prompt bloat or high-frequency calls. Option D is wrong because increasing the budget without investigating the overrun ignores the underlying issue and can lead to uncontrolled spending; it is a financial workaround, not a cost management strategy.

Full explanation →

165

MCQhard

A financial services firm is deploying a generative AI chatbot for customer inquiries. They have strict compliance requirements: all conversations must be auditable and the model must not use customer data for training. Which Google Cloud offering should they choose?

A.Private Google Access for on-premises connectivity

B.Dialogflow CX with Cloud Logging

C.Cloud AI Platform Pipelines

D.Vertex AI Agent Builder with data governance controls

AnswerD

Vertex AI Agent Builder offers built-in audit logging and data governance to meet compliance requirements.

Why this answer

Vertex AI Agent Builder is correct because it provides built-in data governance controls that prevent customer data from being used for model training, while also supporting full auditability through integration with Cloud Audit Logs and Cloud Logging. This directly addresses the firm's compliance requirements for auditable conversations and data privacy.

Exam trap

The trap here is that candidates may confuse Dialogflow CX (a conversational AI platform) with Vertex AI Agent Builder, not realizing that Dialogflow CX lacks the native data governance controls to prevent customer data from being used for model training, which is the key differentiator for compliance-heavy use cases.

How to eliminate wrong answers

Option A is wrong because Private Google Access is a networking feature that enables on-premises hosts to reach Google APIs over internal IP addresses, but it does not provide any chatbot functionality, audit logging, or data governance controls. Option B is wrong because Dialogflow CX with Cloud Logging provides conversational AI and audit logging, but it lacks the specific data governance controls to prevent customer data from being used for model training, which is a critical compliance requirement. Option C is wrong because Cloud AI Platform Pipelines is a workflow orchestration service for ML pipelines, not a chatbot deployment solution, and it does not offer the required auditability or data governance for customer conversations.

Full explanation →

166

MCQeasy

A team wants to fine-tune a PaLM 2 model with their own data on Vertex AI. What is the recommended way to prepare the training data?

A.TFRecord files

B.JSON Lines file with 'input_text' and 'output_text' keys

C.CSV file with prompt and completion columns

D.Pickle serialized objects

AnswerB

JSONL with the correct keys is required.

Why this answer

Fine-tuning for PaLM expects data in JSON Lines format with 'input_text' and 'output_text' fields.

Full explanation →

167

Multi-Selectmedium

A company is evaluating Google Cloud's generative AI offerings for enterprise use. Which TWO considerations are most important when selecting the right model deployment option?

Select 2 answers

A.Data residency

B.Developer preference

C.Model size

D.Latency requirements

E.Training time

AnswersA, D

Data residency is often a legal or compliance requirement.

Why this answer

Latency requirements and data residency are critical enterprise considerations. Model size and training time are secondary, and developer preference is not a primary factor.

Full explanation →

168

Multi-Selectmedium

What are THREE benefits of using embedding models in a Retrieval Augmented Generation (RAG) system?

Select 3 answers

A.They compress text into dense vectors for efficient retrieval.

B.They allow the model to generate new training data automatically.

C.They enable semantic similarity search beyond keyword matching.

D.They reduce the need for fine-tuning the generator model.

E.They provide deterministic outputs for the same query.

AnswersA, C, D

Vectors allow fast similarity search in vector databases.

Why this answer

Option A is correct because embedding models convert text into dense vector representations that capture semantic meaning, enabling efficient similarity search in vector databases. This compression reduces the dimensionality of the data, allowing the RAG system to quickly retrieve the most relevant documents from a large corpus based on vector distance metrics like cosine similarity.

Exam trap

Google Cloud often tests the misconception that embedding models are used for generating training data or ensuring deterministic outputs, when in fact their primary role is semantic compression and similarity-based retrieval, while output determinism is controlled by the generator model's parameters, not the embedding model.

Full explanation →

169

MCQmedium

A healthcare company is building a chatbot to answer patient queries using Vertex AI Agent Builder. They want to ensure the chatbot only uses approved medical references and does not generate unverified advice. How should they configure the agent?

A.Set up grounding with a private data store containing verified medical documents

B.Enable strict safety filters to block any medical advice

C.Increase the temperature parameter to get more diverse responses

D.Use Vertex AI Model Monitoring to track answer accuracy

AnswerA

Grounding restricts responses to the provided data store, ensuring only approved references are used.

Why this answer

Grounding with a curated data store ensures the chatbot only retrieves information from approved sources. Option B is wrong because safety filters block categories but not unverified content. Option C is wrong because high temperature increases creativity, risking unverified answers.

Option D is wrong because model monitoring detects drift but does not restrict sources.

Full explanation →

170

Multi-Selecteasy

Which TWO statements are true about generative AI models?

Select 2 answers

A.They are typically pre-trained on large datasets.

B.They are deterministic by design.

C.They always produce the same output for the same input.

D.They can generate new content not seen in training.

E.They require no data for training.

AnswersA, D

Pre-training on large corpora is standard.

Why this answer

Option A is correct because generative AI models, such as GPT-4 or DALL-E, are typically pre-trained on vast, diverse datasets (e.g., terabytes of text or images) using unsupervised or self-supervised learning. This pre-training phase allows the model to learn statistical patterns, grammar, and world knowledge, which is then fine-tuned for specific tasks. Without this large-scale pre-training, the model would lack the foundational understanding needed to generate coherent and contextually relevant outputs.

Exam trap

Google Cloud often tests the misconception that generative AI models are deterministic and always produce the same output for the same input, when in fact they are probabilistic by design, especially at non-zero temperature settings.

Full explanation →

171

MCQhard

A company is deploying a Gemini 1.0 Ultra model for a code generation assistant. They have set up Vertex AI Model Evaluation with a custom evaluation dataset to measure pass@1 accuracy. The initial evaluation shows 65% pass@1. They want to improve to 80% without collecting more training data. They have already attempted basic prompt engineering (e.g., 'write correct code') with limited improvement. Which approach is most likely to achieve the desired improvement?

A.Reduce the temperature to 0 and set top_p to 1.

B.Increase the number of output tokens and enable beam search with width 4.

C.Use chain-of-thought prompting with few-shot examples of correct code generation.

D.Apply reinforcement learning from human feedback (RLHF) using a reward model trained on the existing evaluation dataset.

AnswerC

Chain-of-thought elicits reasoning steps, improving accuracy beyond basic prompting.

Why this answer

Chain-of-thought prompting with few-shot examples is the most effective approach because it guides the model through step-by-step reasoning, which is critical for complex code generation tasks. This technique leverages the model's in-context learning ability to improve accuracy without additional training data, directly addressing the need to boost pass@1 from 65% to 80%.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (like temperature or beam search) can substitute for structured prompting techniques, when in reality, chain-of-thought prompting directly addresses the reasoning gap that limits pass@1 accuracy in code generation.

How to eliminate wrong answers

Option A is wrong because reducing temperature to 0 and setting top_p to 1 makes the model deterministic, which may reduce diversity but does not inherently improve correctness for complex code generation; it can even cause repetitive or suboptimal outputs. Option B is wrong because increasing output tokens and enabling beam search with width 4 can improve exploration but does not guarantee higher pass@1 accuracy; beam search is more suited for tasks like translation and may not align with the goal of generating a single correct code snippet. Option D is wrong because applying RLHF requires a reward model trained on human preferences, not just the existing evaluation dataset, and this approach demands significant additional data and computational resources, contradicting the constraint of not collecting more training data.

Full explanation →

172

MCQhard

A machine learning engineer is defining a Vertex AI pipeline for model evaluation using the JSON representation shown. The pipeline fails with an error that the 'eval_dataset' parameter is missing. What is the issue?

A.The component 'comp-model-eval' does not accept 'eval_dataset' as input

B.The 'project' parameter should be a pipeline input, not a constant

C.The runtimeConfig parameter values must be strings, not references

D.The pipeline spec does not declare 'eval_dataset' as a pipeline input parameter

AnswerD

The root inputDefinitions is empty, so 'eval_dataset' is not recognized as a pipeline parameter.

Why this answer

The pipeline spec defines 'eval_dataset' as a componentInput, but it is not defined in the root's inputDefinitions (option B). The runtimeConfig has the value, but the pipeline spec does not declare the parameter. The component (A) may be fine.

The constant (C) is for project. The runtimeConfig (D) is correct but the spec is missing the input definition.

Full explanation →

173

MCQmedium

A team is using a pre-trained language model to summarize legal documents. They find that summaries often miss key dates and parties involved. Which technique would most effectively improve factual accuracy?

A.Fine-tune the model on a dataset of legal summaries with annotated key entities.

B.Use top-p sampling with a low p value.

C.Increase the temperature parameter.

D.Use chain-of-thought prompting.

AnswerA

Fine-tuning adapts the model to domain-specific requirements, improving factual accuracy.

Why this answer

Fine-tuning on a dataset of legal summaries with annotated key entities directly teaches the model to recognize and reproduce critical factual elements like dates and parties. This supervised learning approach adjusts the model's weights to prioritize entity extraction and accurate generation, which is the most effective method for improving factual accuracy in domain-specific tasks.

Exam trap

Google Cloud often tests the misconception that inference-time parameters (temperature, top-p) or prompting strategies can substitute for targeted training, when in fact only fine-tuning with domain-specific annotated data reliably improves factual accuracy for structured entities.

How to eliminate wrong answers

Option B is wrong because top-p sampling with a low p value restricts the vocabulary to a small set of high-probability tokens, which can reduce creativity but does not address factual accuracy or entity recall—it may even omit rare but important entities. Option C is wrong because increasing the temperature parameter adds randomness to token selection, which typically reduces factual consistency and can lead to hallucinated or missing details. Option D is wrong because chain-of-thought prompting improves reasoning steps for multi-step tasks but does not inherently enforce factual accuracy for specific entities; it relies on the model's existing knowledge, which may still miss key dates and parties without targeted training.

Full explanation →

174

MCQeasy

A startup is developing a customer support chatbot using Vertex AI PaLM 2 API. They notice that the model sometimes generates plausible-sounding but factually incorrect information about company policies. The chatbot currently uses no external data. To reduce these hallucinations without retraining the model, the team needs a solution that can be implemented quickly and maintains low latency. They have access to the company's internal policy database stored in Cloud SQL. Which approach should they take?

A.Fine-tune the PaLM 2 model on a dataset of company policy documents.

B.Implement grounding by connecting the model to the company's policy database using Vertex AI Grounding.

C.Reduce the temperature parameter to 0 and increase top_k to 50.

D.Use prompt engineering to instruct the model to only answer from its internal knowledge.

AnswerB

Grounding directly ties responses to verified data, reducing hallucinations effectively.

Why this answer

Option B is correct because Vertex AI Grounding connects the PaLM 2 model to the company's policy database in Cloud SQL, allowing the model to retrieve and cite factual information in real time. This approach reduces hallucinations without retraining, meets the low-latency requirement, and leverages existing internal data. Grounding works by augmenting the prompt with retrieved context from the grounding source, ensuring responses are factually grounded.

Exam trap

Google Cloud often tests the misconception that adjusting sampling parameters (temperature, top_k) can fix factual inaccuracies, when in reality those parameters only control creativity and randomness, not knowledge grounding.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires a labeled dataset and retraining, which is time-consuming and does not meet the 'quickly implemented' requirement; it also does not guarantee real-time factual accuracy for dynamic policies. Option C is wrong because reducing temperature to 0 and increasing top_k to 50 only affects output randomness and diversity, not factual grounding—hallucinations stem from lack of external knowledge, not sampling parameters. Option D is wrong because prompt engineering alone cannot prevent the model from generating plausible-sounding falsehoods; the model's internal knowledge is static and may be outdated or incorrect, and it cannot access the Cloud SQL database without a retrieval mechanism.

Full explanation →

175

MCQhard

A company wants to ensure only authorized users can deploy gen AI models. The current policy allows all users in the domain. What is the best practice to restrict deployment?

A.Remove the binding

B.Add more roles

C.Add condition to restrict deployment

D.Use organizational policies

AnswerC

Conditions in IAM allow policies like requiring a specific IP range or MFA for deployment actions.

Why this answer

Option C is correct because adding a condition to restrict deployment (e.g., using IAM conditions in AWS, conditional access policies in Azure, or attribute-based access control in GCP) allows you to limit model deployment to only authorized users based on attributes like user role, project, or resource tags. This is the best practice because it enforces fine-grained access control without removing existing permissions or adding unnecessary roles, directly addressing the requirement to restrict deployment while maintaining existing user access.

Exam trap

Google Cloud often tests the misconception that organizational policies (Option D) are the catch-all for access control, but they are designed for resource-level governance (e.g., disabling service creation), not for user-specific deployment restrictions, which require conditional IAM policies.

How to eliminate wrong answers

Option A is wrong because removing the binding (e.g., an IAM policy binding or role assignment) would revoke all deployment permissions for all users, which is too restrictive and would break legitimate use cases. Option B is wrong because adding more roles does not inherently restrict deployment; it only grants additional permissions, potentially widening the attack surface and violating the principle of least privilege. Option D is wrong because organizational policies (e.g., organization policies in GCP or Azure Policy) are typically used for compliance and governance at the resource hierarchy level, not for fine-grained, user-specific deployment restrictions; they lack the granularity to target individual authorized users.

Full explanation →

176

MCQmedium

An organization uses an IAM policy for Vertex AI as shown. A security audit reveals that engineer@example.com deployed a model that inadvertently exposed sensitive data. What is the most likely reason this happened?

A.Audit logging is not enabled for DATA_WRITE events.

B.The admin user did not review the deployment.

C.The engineer had the aiplatform.user role, which includes permissions to deploy models without additional review.

D.The policy does not include a separation of duties between development and production.

AnswerC

The user role allows deployment, and no approval gate is enforced.

Why this answer

Option C is correct because the `aiplatform.user` role in Vertex AI includes the `aiplatform.model.deploy` permission, which allows any user with that role to deploy models without requiring additional approvals or administrative review. This lack of a secondary authorization step means the engineer could deploy a model that exposed sensitive data, even if the model had not been properly vetted for data leakage.

Exam trap

The trap here is that candidates may focus on operational failures like missing audit logs or lack of review, rather than recognizing that the IAM role itself grants the permission to deploy without any guardrails, which is the direct technical cause of the exposure.

How to eliminate wrong answers

Option A is wrong because audit logging for DATA_WRITE events records actions after they occur, but does not prevent the deployment itself; the exposure happened due to insufficient permissions control, not missing logs. Option B is wrong because the admin user not reviewing the deployment is a process failure, but the root cause is that the IAM policy granted the engineer the ability to deploy without any review being required. Option D is wrong because while separation of duties is a best practice, the specific IAM policy shown does not enforce it; the question asks for the most likely reason the exposure occurred, which is the direct permission granted by the `aiplatform.user` role.

Full explanation →

177

MCQmedium

A team is building a generative AI model for customer support. They notice the model often produces overly polite but unhelpful responses. Which technique would best improve response quality without sacrificing helpfulness?

A.Apply reinforcement learning from human feedback (RLHF)

B.Increase the amount of training data

C.Lower the top_k sampling value

D.Increase the temperature parameter

AnswerA

RLHF tunes the model to align with desired response characteristics.

Why this answer

RLHF directly addresses the misalignment between the model's training objective (e.g., predicting the next token) and the desired outcome (helpful, not just polite). By using human feedback to train a reward model, the system learns to optimize for response quality and helpfulness, reducing sycophantic or overly polite but uninformative outputs.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (temperature, top_k) or more data alone can fix alignment issues, when in fact only RLHF directly optimizes for human-judged helpfulness and quality.

How to eliminate wrong answers

Option B is wrong because simply increasing training data does not correct the model's tendency toward polite but unhelpful responses; it may reinforce existing patterns without addressing alignment. Option C is wrong because lowering top_k sampling reduces diversity by restricting token choices to the top k most likely tokens, which can make responses even more generic and less helpful, not more substantive. Option D is wrong because increasing the temperature parameter increases randomness in token selection, which can lead to less coherent or more erratic responses, not more helpful ones.

Full explanation →

178

Multi-Selecteasy

Which THREE components are core to a typical Retrieval Augmented Generation (RAG) system?

Select 3 answers

A.Classifier

B.Vector database

C.Embedding model

D.Rewriter

E.Large language model

AnswersB, C, E

Stores embeddings and enables similarity search.

Why this answer

A vector database (B) is core to a RAG system because it stores and indexes embeddings of external knowledge chunks, enabling efficient similarity search to retrieve the most relevant context for a user query. This retrieved context is then provided to the LLM to ground its response in factual data, reducing hallucinations and improving accuracy.

Exam trap

Google Cloud often tests the distinction between core mandatory components (embedding model, vector DB, LLM) and optional auxiliary components (classifier, rewriter, reranker) to see if candidates understand the minimal viable RAG architecture versus extended pipelines.

Full explanation →

179

Multi-Selecteasy

Which TWO techniques are commonly used to control the style and tone of a generative model's output?

Select 2 answers

A.Adjusting the temperature

B.Modifying the top_k value

C.Fine-tuning on a dataset with desired style

D.Prompt engineering with style instructions

E.Changing the top_p value

AnswersC, D

Fine-tuning adapts the model to a specific style.

Why this answer

Option C is correct because fine-tuning on a dataset that embodies the desired style directly adjusts the model's weights, making it consistently produce outputs with that specific tone and style. This is a fundamental technique for customizing generative models, as it teaches the model the exact patterns, vocabulary, and stylistic nuances present in the training data.

Exam trap

Google Cloud often tests the distinction between sampling parameters (temperature, top_k, top_p) that control output randomness and diversity versus training or conditioning techniques (fine-tuning, prompt engineering) that directly influence style and tone, leading candidates to incorrectly select sampling parameters as style-control methods.

Full explanation →

180

MCQeasy

A developer is configuring a Vertex AI Agent Builder agent to use grounding. They receive the above error when calling the API. What is the most likely cause?

A.The data store was just created and is not yet propagated

B.The agent is not authenticated to access the data store

C.The data store has not been created in the specified project

D.The grounding configuration is missing required permissions

AnswerC

The error indicates the referenced data store does not exist; it needs to be created first.

Why this answer

The error says the data store does not exist. The developer likely created a data store in a different project or did not create one. Option A is wrong because authentication errors give different messages.

Option C is wrong because permissions errors are 403. Option D is wrong because a race condition would not produce 'does not exist'.

Full explanation →

181

MCQeasy

A small marketing agency with 10 employees is exploring generative AI to create personalized ad copy for their clients. They have a limited budget of $5,000 per month and no in-house machine learning expertise. The CEO wants to have a working prototype within two weeks to show to a potential client. The agency's data is sensitive and cannot be shared with unauthorized third parties. Which strategy should they pursue?

A.Hire a team of data scientists to fine-tune an open-source model

B.Use a third-party platform that requires on-premise deployment

C.Build a custom foundation model from scratch using their client data

D.Use Google's Generative AI Studio with pre-trained models via API

AnswerD

Managed service enables quick, low-cost prototyping with data privacy.

Why this answer

Option D is correct because Google's Generative AI Studio provides pre-trained models via API, allowing the agency to quickly prototype personalized ad copy without needing in-house ML expertise. This approach respects the $5,000 budget (API usage is cost-effective for small-scale prototyping), meets the two-week timeline (no training required), and ensures data privacy by using Google Cloud's data governance controls (data is not shared with unauthorized third parties).

Exam trap

Google Cloud often tests the misconception that building or fine-tuning a model from scratch is the only way to achieve customization, when in fact pre-trained APIs with prompt engineering or lightweight fine-tuning can meet business constraints like budget, timeline, and expertise.

How to eliminate wrong answers

Option A is wrong because hiring a team of data scientists to fine-tune an open-source model would exceed the $5,000 monthly budget and the two-week timeline, and the agency lacks the in-house expertise to manage such a team. Option B is wrong because requiring on-premise deployment contradicts the agency's lack of ML expertise and limited budget; on-premise solutions typically involve high upfront costs and ongoing maintenance. Option C is wrong because building a custom foundation model from scratch is prohibitively expensive (often millions of dollars), requires vast amounts of data and compute resources, and cannot be completed within two weeks or within a $5,000 budget.

Full explanation →

182

MCQmedium

A security team wants to prevent prompt injection attacks on their generative AI application hosted on Vertex AI. Which best practice should they implement?

A.Use a custom model instead of a foundation model

B.Disable all logging

C.Use a private endpoint

D.Implement input validation and output filtering

AnswerD

This helps detect and block malicious prompts and undesired outputs.

Why this answer

Input validation and output filtering are key defenses against prompt injection. Custom models do not inherently prevent it; disabling logging reduces visibility; private endpoints do not block injection.

Full explanation →

183

Multi-Selectmedium

Which TWO techniques are most effective for improving the quality of a generative AI model's output when summarizing complex documents?

Select 2 answers

A.Providing few-shot examples of ideal summaries

B.Using a larger, more capable model (e.g., PaLM 2 instead of PaLM)

C.Increasing max output length significantly

D.Setting top_p to 0.1

E.Adjusting temperature to 0.8

AnswersA, B

Few-shot examples teach the model the desired output structure and detail.

Why this answer

Options B and D are correct. Few-shot examples guide the model to desired output format and consistency. Using a larger, more capable model often yields better summaries due to deeper language understanding.

Option A (temperature adjustment) is less critical for summaries. Option C (max output length) affects length, not quality. Option E (low top_p) may restrict output too much.

Full explanation →

184

Multi-Selecthard

A company deploys a Gemini model on Vertex AI for a customer-facing chatbot. They observe the chatbot occasionally produces toxic language. Which TWO measures should they implement immediately to reduce toxic outputs?

Select 2 answers

A.Increase the model's temperature to make outputs more conservative.

B.Use a separate language model to rephrase the outputs before sending to users.

C.Fine-tune the model on a curated dataset of polite conversations.

D.Enable the 'block offensive content' flag in the model's safety configuration.

E.Configure the safety thresholds in the Vertex AI endpoint deployment to block hate speech and toxic content.

AnswersD, E

This flag directly enables content filtering.

Why this answer

Option D is correct because enabling the 'block offensive content' flag directly activates Gemini's built-in safety filters, which are designed to detect and suppress toxic language at the model's output layer. This is an immediate, configuration-level measure that requires no additional training or external services, making it the fastest way to reduce harmful responses in a production chatbot.

Exam trap

Google Cloud often tests the misconception that increasing temperature or fine-tuning are quick fixes for safety issues, when in fact they are either counterproductive or require significant time and resources, whereas safety configuration flags are the immediate, recommended first step.

Full explanation →

185

Multi-Selectmedium

Which TWO factors are most critical when deciding to build a custom GenAI model vs. using a pre-built API? (Select two.)

Select 2 answers

A.Availability of in-house ML talent

B.Need for domain-specific knowledge

C.Number of layers in the model

D.Brand reputation of the model provider

E.Volume of expected inference requests

AnswersA, B

Building a custom model requires significant ML expertise; without it, using an API is more practical.

Why this answer

Option A is correct because building a custom GenAI model requires specialized machine learning expertise, including proficiency in frameworks like PyTorch or TensorFlow, experience with distributed training (e.g., using Horovod or DeepSpeed), and the ability to fine-tune architectures like transformers. Without in-house ML talent, the organization cannot effectively manage data curation, hyperparameter tuning, or model evaluation, making a pre-built API the more viable choice. This factor directly determines whether the organization has the technical capacity to undertake custom development.

Exam trap

Google Cloud often tests the distinction between strategic business factors (like in-house talent and domain specificity) versus operational or vendor-related details (like model layers, brand reputation, or request volume) to see if candidates can separate high-level decision drivers from low-level implementation concerns.

Full explanation →

186

MCQeasy

An organization wants to ensure their generative AI application does not produce toxic or harmful content. Which Vertex AI feature should they implement?

A.Safety filters and content moderation

B.Explainable AI

C.AutoML

D.Model Monitoring

AnswerA

These features are designed to detect and mitigate harmful content in model outputs.

Why this answer

Safety filters and content moderation in Vertex AI allow organizations to define and enforce policies that block or flag toxic, harmful, or inappropriate content generated by the model. This feature uses pre-built and customizable classifiers to evaluate prompts and responses against safety attributes (e.g., hate speech, harassment, sexually explicit content) before returning them to the user, directly addressing the requirement to prevent harmful outputs.

Exam trap

Google Cloud often tests the distinction between features that *analyze* model behavior (like Explainable AI or Model Monitoring) versus features that *actively enforce* safety policies (like Safety Filters), leading candidates to confuse monitoring or interpretability tools with content moderation controls.

How to eliminate wrong answers

Option B (Explainable AI) is wrong because it focuses on interpreting model predictions (e.g., feature attributions) rather than blocking toxic content; it provides transparency but no active content filtering. Option C (AutoML) is wrong because it automates model training and deployment for custom ML tasks, not content moderation or safety enforcement. Option D (Model Monitoring) is wrong because it tracks model performance and drift over time (e.g., prediction skew, data drift), not real-time content safety checks on individual outputs.

Full explanation →

187

MCQhard

A company wants to use generative AI for creative content generation (e.g., marketing copy). They need to ensure the content is original and does not plagiarize existing materials. Which combination of strategies is most effective?

A.Use a model with a high temperature setting and post-process with plagiarism checker.

B.Fine-tune the model on a dataset of already-created content to learn style.

C.Use a retrieval-augmented generation system that explicitly avoids copying.

D.Limit the model to generate only short snippets.

AnswerC

RAG can be configured to paraphrase or generate novel content while staying relevant, reducing plagiarism risk.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) systems explicitly retrieve relevant, non-copyrighted or licensed content from a curated knowledge base and generate outputs grounded in that retrieved data, which inherently reduces the risk of verbatim copying. Unlike simple plagiarism checkers or temperature adjustments, RAG combines retrieval with generation to ensure originality by design, making it the most effective strategy for avoiding plagiarism in creative content generation.

Exam trap

Google Cloud often tests the misconception that randomness (high temperature) or post-processing (plagiarism checkers) can prevent plagiarism, when in fact only retrieval-augmented generation or similar grounding techniques address the root cause of copying from training data.

How to eliminate wrong answers

Option A is wrong because high temperature settings increase randomness and creativity but do not prevent the model from memorizing and reproducing training data verbatim; a post-process plagiarism checker can only detect copying after generation, not prevent it, and may miss paraphrased or structurally similar content. Option B is wrong because fine-tuning on already-created content teaches the model to mimic existing styles and patterns, which increases the risk of overfitting and reproducing copyrighted or plagiarized material, especially if the dataset contains protected works. Option D is wrong because limiting output length does not address the core issue of originality; short snippets can still be direct copies of existing phrases or sentences, and the strategy fails to ensure content is novel or properly attributed.

Full explanation →

188

MCQeasy

A retail company wants to build a customer service chatbot that can handle returns, order status, and FAQs. They need to integrate with their existing backend systems. Which Google Cloud service should they use?

A.Vertex AI Model Garden

B.Vertex AI Agent Builder

C.Vertex AI Search

D.Vertex AI Codey API

AnswerB

Provides tools for building chatbots with backend integration.

Why this answer

Vertex AI Agent Builder is the correct choice because it provides a low-code platform specifically designed for building conversational AI agents (chatbots) that can be integrated with enterprise backend systems via APIs, connectors, and custom tools. It supports grounding in enterprise data, multi-turn dialogue management, and seamless integration with existing systems for handling returns, order status, and FAQs, making it the most suitable service for this use case.

Exam trap

The trap here is that candidates may confuse Vertex AI Agent Builder with Vertex AI Search or Model Garden, assuming any generative AI service can build a chatbot, but only Agent Builder provides the necessary conversational orchestration and backend integration capabilities required for a production customer service chatbot.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Garden is a repository of pre-trained and foundation models for discovery and deployment, not a service for building conversational agents with backend integration. Option C is wrong because Vertex AI Search is optimized for enterprise search and information retrieval over structured and unstructured data, not for building multi-turn conversational chatbots that require backend system integration. Option D is wrong because Vertex AI Codey API is focused on code generation and code-related tasks (e.g., code completion, chat, and generation), not on building customer service chatbots that interact with backend systems.

Full explanation →

189

MCQhard

For a document summarization task, a team wants to produce concise summaries without losing key information. Which combination of techniques is most effective?

A.Use few-shot examples and reduce max output tokens

B.Set top_p to 0.5 and increase repetition penalty

C.Use prompt caching and increase batch size

D.Increase temperature and use a larger model

AnswerA

Correct: Few-shot demonstrates concise style; max tokens enforces length limit.

Why this answer

Using few-shot examples teaches the desired format, and reducing max tokens enforces conciseness. Other options either encourage verbosity or are off-target.

Full explanation →

190

Multi-Selecteasy

Which TWO are benefits of using pre-trained foundation models instead of training from scratch?

Select 2 answers

A.Complete control over model architecture

B.Lower training cost

C.Eliminates the need for prompt engineering

D.Guaranteed absence of bias

E.Faster time to deployment

AnswersB, E

Pre-trained models require less compute and data, reducing cost.

Why this answer

Pre-trained foundation models have already been trained on vast datasets, so you only need to fine-tune them for your specific task. This dramatically reduces the compute resources, time, and data required compared to training from scratch, directly lowering training cost.

Exam trap

Google Cloud often tests the misconception that pre-trained models eliminate the need for any further engineering (like prompt engineering) or that they are completely bias-free, when in fact they still require careful tuning and can perpetuate biases from their training data.

Full explanation →

191

MCQeasy

Refer to the exhibit. A user wants formal translations from a generative AI model, but the model outputs informal style inconsistently. Which prompt engineering technique would best ensure consistent formal translations?

A.Use context caching

B.Provide a few-shot example with formal and informal pairs

C.Use a longer system prompt with detailed rules

D.Set top_k to 1

AnswerB

Correct: Few-shot examples directly show the expected output format.

Why this answer

Providing a few-shot example that explicitly demonstrates the desired formal translation guides the model to follow that pattern. System instructions can help but are less direct.

Full explanation →

192

MCQmedium

A company uses a text-to-image model to generate marketing visuals. The results often misinterpret the prompt, e.g., 'a red car' generates a blue car. Which technique should they try first to align the output with the prompt?

A.Use a negative prompt to exclude blue

B.Refine the prompt with more adjectives and context, e.g., 'bright red sports car'

C.Upscale the image resolution to 1024x1024

D.Increase the guidance scale to 20

AnswerB

Clearer, more descriptive prompts help the model understand the desired output.

Why this answer

Option B is correct because refining the prompt with more adjectives and context directly addresses the root cause of misalignment: insufficient specificity in the text description. Text-to-image models rely on the semantic richness of the prompt to guide the latent diffusion process; adding 'bright red sports car' provides stronger conditioning signals that steer the model's cross-attention layers toward the intended color and object attributes. This is the most efficient first step before adjusting hyperparameters like guidance scale.

Exam trap

The trap here is that candidates often jump to hyperparameter tuning (guidance scale) or post-processing (upscaling) as a first fix, when the most fundamental and cost-effective step is to improve the input prompt's specificity, which directly controls the conditioning signal in the diffusion process.

How to eliminate wrong answers

Option A is wrong because using a negative prompt to exclude 'blue' is a reactive band-aid that does not fix the core issue of the model failing to associate 'red' with the car; it also risks suppressing other unintended features and can degrade image quality by over-constraining the latent space. Option C is wrong because upscaling resolution to 1024x1024 only increases pixel density and does not alter the semantic alignment between the prompt and the generated image; the model's misinterpretation of 'red' would persist at any resolution. Option D is wrong because increasing the guidance scale to 20 excessively amplifies the prompt's influence, often leading to image saturation, artifacts, and mode collapse, while still not correcting the fundamental misassociation of the color attribute.

Full explanation →

193

MCQmedium

A media company is using Vertex AI's Imagen model to generate images for marketing campaigns. They have a set of prompts that describe desired scenes, but the generated images often contain artifacts such as distorted faces or unnatural lighting. The team has tried varying the prompt wording but the issues persist. They are using the default parameters (no modifications). They have a budget for additional compute resources and want to improve image quality without switching to a more expensive model. The team has access to a small set of high-quality images in the same style as their target outputs. What should the team do?

A.Increase the guidance scale parameter to make the model follow prompts more closely.

B.Use a more detailed prompt style with negative prompts to avoid artifacts.

C.Fine-tune the Imagen model on the small set of high-quality images to improve output quality.

D.Increase the number of images generated per prompt and manually select the best ones.

AnswerC

Fine-tuning adapts the model to produce images with fewer artifacts and desired style.

Why this answer

Option B is correct because fine-tuning Imagen on the small set of high-quality images can improve the model's ability to generate images with fewer artifacts and better style consistency. Option A is wrong because increasing the number of images generated does not improve quality per image. Option C is wrong because adjusting guidance scale without fine-tuning may not address the specific artifacts.

Option D is wrong because using a different prompt style may help but the team already tried varying wording; the issue is likely the model's lack of familiarity with the desired quality.

Full explanation →

194

MCQeasy

A developer wants to use Gemini 1.5 Pro to analyze hour-long video content and generate a summary. Which feature of Gemini 1.5 Pro is most suitable for this task?

A.Long context window (up to 1 million tokens)

B.Multimodal generation from text and images

C.Code generation and debugging

D.Function calling to retrieve external data

AnswerA

The long context allows ingesting entire video content for summarization.

Why this answer

Option D is correct because Gemini 1.5 Pro's long context window (up to 1M tokens) allows processing entire videos. Option A (multimodal generation) is useful but not the key feature. Option B (function calling) is for APIs.

Option C (code generation) is not relevant.

Full explanation →

195

MCQmedium

During a RAG pipeline implementation, the retrieval system frequently returns irrelevant documents, causing the generator to produce incorrect answers. Which change is most likely to improve the relevance of retrieved documents?

A.Add a re-ranking step using a cross-encoder model to refine the top results.

B.Increase the number of documents retrieved from the vector store.

C.Use a different embedding model with higher vector dimension.

D.Decrease the chunk size of documents to reduce noise.

AnswerA

Re-ranking directly improves document relevance by deep semantic matching.

Why this answer

Re-ranking with a cross-encoder model evaluates query-document pairs more deeply, improving precision at the cost of latency. Increasing the number of documents may introduce more noise. Changing embedding dimension or chunk size may help but are less targeted.

Full explanation →

196

MCQhard

You are a machine learning engineer at a healthcare startup. Your team has developed a generative AI model that summarizes patient medical records. The model is deployed on Vertex AI Endpoints using a custom container. You have configured the endpoint with a single n1-standard-4 machine (4 vCPUs, 15 GB memory) without accelerators. The model uses a small transformer architecture. During load testing with 50 concurrent requests, you observe that the average latency is 8 seconds, which exceeds the requirement of 2 seconds. Additionally, some requests time out after 10 seconds. You suspect the CPU is the bottleneck. You also notice that the model inference code uses TensorFlow but is not optimized for inference. Which action should you take to reduce latency?

A.Reduce the model size by pruning and quantization, then redeploy.

B.Increase the request timeout to 30 seconds to accommodate the latency.

C.Enable autoscaling to handle the load with multiple replicas.

D.Deploy the model on a machine with a GPU and use TensorRT for inference optimization.

AnswerD

GPU acceleration and model optimization can drastically reduce latency.

Why this answer

Option D is correct because the CPU is identified as the bottleneck, and deploying on a GPU with TensorRT optimization directly addresses this by accelerating the TensorFlow inference. TensorRT optimizes the model graph and fuses layers, significantly reducing latency for transformer-based models, which is essential to meet the 2-second requirement.

Exam trap

The trap here is that candidates may choose autoscaling (Option C) thinking it handles high concurrency, but they overlook that the per-request latency remains unchanged on CPU, failing to meet the 2-second requirement.

How to eliminate wrong answers

Option A is wrong because pruning and quantization reduce model size and can improve latency, but they may degrade model accuracy and do not address the fundamental CPU bottleneck as effectively as GPU acceleration with TensorRT. Option B is wrong because increasing the timeout to 30 seconds does not reduce latency; it only masks the problem, and requests still exceed the 2-second requirement, leading to poor user experience. Option C is wrong because autoscaling adds more replicas to handle concurrent requests, but each request still runs on a CPU-bound n1-standard-4 machine, so the per-request latency remains high and does not solve the CPU bottleneck.

Full explanation →

197

MCQmedium

A machine learning engineer is building a text-to-image model using Vertex AI. They want to reduce inference latency. Which strategy is most effective?

A.Use a larger image resolution

B.Use a smaller model variant

C.Enable batch processing

D.Increase the number of inference steps

AnswerB

Smaller models are faster.

Why this answer

Option B is correct because using a smaller model variant directly reduces the number of parameters and computational operations required per inference pass, which lowers latency. In text-to-image models like Imagen or Stable Diffusion, the model size is the primary driver of forward-pass time, so a smaller variant (e.g., fewer layers or reduced latent dimensions) yields faster generation.

Exam trap

The trap here is that candidates confuse throughput optimization (batch processing) with latency reduction, or assume that more steps or higher resolution improve quality without considering the latency trade-off.

How to eliminate wrong answers

Option A is wrong because larger image resolution increases the pixel space the model must process, which increases computational load and latency, not reduces it. Option C is wrong because batch processing improves throughput (images per second) but does not reduce per-request latency; it may even increase the time to first token for an individual request. Option D is wrong because increasing the number of inference steps (e.g., diffusion denoising steps) directly increases the sequential computation time, making latency worse.

Full explanation →

198

MCQeasy

A startup wants to quickly integrate a generative AI chatbot into their customer support platform. They need a solution that can answer questions based on their internal knowledge base with minimal setup. Which Google Cloud service should they use?

A.Use Model Garden to deploy a pre-built Q&A model

B.Call the Gemini API directly and implement grounding logic manually

C.Use Cloud AI Notebooks to fine-tune a model on their knowledge base

D.Use Vertex AI Agent Builder to create a conversational agent grounded in their data

AnswerD

Agent Builder offers pre-built components for grounding and conversation flow, enabling rapid deployment.

Why this answer

Option C is correct because Vertex AI Agent Builder provides a no-code/low-code environment to build conversational agents grounded in enterprise data, making it ideal for quick integration with a knowledge base. Option A (Gemini API with manual grounding) requires more development effort. Option B (Cloud AI Notebooks) is for data science, not production deployment.

Option D (Model Garden) is for accessing and deploying models but not for building a complete agent.

Full explanation →

199

MCQhard

A company is deploying a chatbot that uses a foundation model. They want to minimize latency for user queries. Which action is most effective?

A.Use a larger model with more parameters

B.Disable safety filters

C.Increase the number of tokens

D.Use a smaller distilled model

AnswerD

Distilled models are optimized for speed.

Why this answer

Distilled models are smaller, faster versions of larger foundation models, trained to mimic their behavior while requiring fewer computational resources. This directly reduces inference latency because fewer parameters mean faster forward passes through the network, which is critical for real-time chatbot responses.

Exam trap

Google Cloud often tests the misconception that 'bigger is better' for performance, but in latency-constrained scenarios, model size is inversely related to speed, and candidates may overlook distillation as a standard optimization technique.

How to eliminate wrong answers

Option A is wrong because larger models with more parameters increase computational complexity and memory bandwidth requirements, which actually increases latency rather than reducing it. Option B is wrong because disabling safety filters does not affect model inference speed; safety filters are post-processing steps that add negligible latency compared to the model itself. Option C is wrong because increasing the number of tokens (the output length) forces the model to perform more autoregressive generation steps, which linearly increases latency per additional token.

Full explanation →

200

MCQeasy

A data scientist wants to fine-tune a foundation model from Vertex AI Model Garden on their custom dataset. They want to choose a cost-effective method that updates only a small subset of parameters. Which fine-tuning approach should they use?

A.Full fine-tuning

B.Prompt tuning

C.Parameter-Efficient Fine-Tuning (PEFT) like LoRA

D.Reinforcement Learning from Human Feedback (RLHF)

AnswerC

PEFT methods update only a small subset of parameters.

Why this answer

Option C is correct because Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA (Low-Rank Adaptation) are specifically designed to update only a small subset of parameters (e.g., low-rank matrices injected into transformer layers) while keeping the majority of the foundation model frozen. This drastically reduces memory and compute costs compared to full fine-tuning, making it the most cost-effective choice for customizing a model from Vertex AI Model Garden on a custom dataset.

Exam trap

The trap here is that candidates often confuse prompt tuning (which does not update model parameters) with parameter-efficient fine-tuning (which updates a small subset of parameters), leading them to incorrectly select Option B as a cost-effective method for updating parameters.

How to eliminate wrong answers

Option A is wrong because full fine-tuning updates all model parameters, which is computationally expensive and memory-intensive, contradicting the requirement for a cost-effective method that updates only a small subset of parameters. Option B is wrong because prompt tuning is a soft-prompt technique that does not update any model parameters; instead, it learns a small set of virtual tokens prepended to the input, which is not a parameter-efficient fine-tuning method (it is a prompt-based approach). Option D is wrong because Reinforcement Learning from Human Feedback (RLHF) is a training paradigm that uses human preferences to align model behavior, typically requiring multiple models (reward model, policy model) and full or PEFT fine-tuning, and it is not primarily a cost-effective method for updating a small subset of parameters on a custom dataset.

Full explanation →

201

MCQmedium

A company is deploying a chatbot that must ensure customer data remains within the European Union. Which approach should they take?

A.Use Vertex AI Agent Builder with global endpoint

B.Use the Gemini API with a regional endpoint in europe-west4

C.Use Vertex AI with a multi-region endpoint

D.Deploy a custom model on GKE in a specific region

AnswerB

Regional endpoints ensure data remains in the specified region.

Why this answer

The Gemini API offers regional endpoints, such as europe-west4, which can restrict data processing to that region. Vertex AI multi-region endpoints may not guarantee EU-only residency.

Full explanation →

202

MCQmedium

A global e-commerce company wants to translate product descriptions into 50 languages with high accuracy. They need to handle domain-specific terms (e.g., 'size chart', 'return policy'). Which approach should they use?

A.Use the Gemini API with a prompt like 'Translate to French'

B.Build a custom agent with Vertex AI Agent Builder

C.Use Vertex AI Translation with custom glossaries

D.Use Imagen to generate translated images

AnswerC

Custom glossaries ensure domain-specific terms are translated correctly.

Why this answer

Option C is correct because Vertex AI Translation with custom glossaries is specifically designed for high-accuracy, domain-specific translations. Custom glossaries allow you to define precise translations for terms like 'size chart' and 'return policy', ensuring consistency across 50 languages. This approach leverages Google's neural machine translation models while overriding generic translations with your business-specific terminology.

Exam trap

The trap here is that candidates may confuse general-purpose generative AI APIs (like Gemini) with specialized translation services, or assume that any AI model can handle domain-specific translation without customization, when in fact glossaries are required for consistent, accurate terminology.

How to eliminate wrong answers

Option A is wrong because the Gemini API is a general-purpose generative AI model, not a specialized translation service; it lacks built-in support for custom glossaries and may produce inconsistent or hallucinated translations for domain-specific terms. Option B is wrong because Vertex AI Agent Builder is designed for building conversational agents and workflows, not for bulk, high-accuracy translation tasks; it would require significant custom development to replicate glossary-based translation. Option D is wrong because Imagen is a text-to-image generation model, not a translation tool; it cannot translate text and would be irrelevant for translating product descriptions.

Full explanation →

203

MCQhard

A large enterprise runs a production application that uses the Gemini API on Vertex AI for real-time content moderation. They are experiencing occasional 429 (Too Many Requests) errors during peak hours. Their current quota is 1000 requests per minute (RPM) and they are hitting around 950 RPM on average, with spikes up to 1050. They have already implemented exponential backoff and retry logic. They need to reduce the error rate without reducing the quality of moderation. Which additional measure should they take?

A.Deploy the model on a dedicated Vertex AI endpoint with autoscaling.

B.Switch to a lower-tier model like Gemini 1.0 Pro to reduce quota consumption.

C.Implement a local caching layer for common moderation queries.

D.Request a quota increase from Google Cloud support.

AnswerC

Caching eliminates duplicate requests, reducing the request rate and errors.

Why this answer

Option C is correct because implementing a local caching layer for common moderation queries reduces the number of identical requests sent to the Gemini API, directly lowering the effective RPM without compromising moderation quality. Since the enterprise is already using exponential backoff and retry logic, caching addresses the root cause of hitting quota limits by eliminating redundant API calls, which is a standard pattern for rate-limit mitigation in production AI workloads.

Exam trap

Google Cloud often tests the misconception that scaling infrastructure (Option A) or switching models (Option B) solves API quota issues, when the real constraint is the API rate limit itself, which requires reducing the number of calls through caching or other client-side optimizations.

How to eliminate wrong answers

Option A is wrong because deploying a dedicated endpoint with autoscaling does not increase the quota limit; it only scales compute resources, and the 429 errors are due to API quota exhaustion, not endpoint capacity. Option B is wrong because switching to a lower-tier model like Gemini 1.0 Pro would reduce quality of moderation, which violates the requirement to not reduce quality, and it does not address the fundamental issue of hitting the RPM quota. Option D is wrong because requesting a quota increase is a valid long-term solution but does not address the immediate need to reduce error rate without reducing quality; it also assumes Google Cloud will approve the increase, which is not guaranteed, and it does not optimize existing usage.

Full explanation →

204

MCQmedium

A manufacturing company wants to use generative AI to create maintenance manuals from sensor data. The manuals must be accurate and reflect the latest equipment configurations. Which approach best ensures data freshness and consistency?

A.Train the model in real-time as sensor data streams in.

B.Periodically retrain the model with the latest sensor data.

C.Have human technicians review and update the manuals manually.

D.Use a retrieval-augmented generation (RAG) system that queries a live database of sensor configurations.

AnswerD

RAG ensures responses are based on the most current data.

Why this answer

Option D is correct because a retrieval-augmented generation (RAG) system retrieves the most current equipment configurations directly from a live database at inference time, ensuring the generated manual reflects real-time sensor data without requiring model retraining. This approach decouples the static knowledge in the LLM from the dynamic data source, guaranteeing both accuracy and freshness while avoiding the latency and cost of continuous retraining.

Exam trap

Google Cloud often tests the misconception that retraining (Option B) is the only way to keep an LLM current, when in fact RAG provides a more efficient and accurate mechanism for incorporating live data without modifying the model itself.

How to eliminate wrong answers

Option A is wrong because training a model in real-time as sensor data streams in is impractical due to catastrophic forgetting, high computational overhead, and the inability of online learning to guarantee that the model's weights stabilize to reflect the latest configurations without extensive validation. Option B is wrong because periodic retraining introduces a window of staleness between retraining cycles, during which sensor data may change, leading to manuals that are not current; it also requires significant infrastructure for data collection, preprocessing, and model deployment. Option C is wrong because manual review and update by human technicians is slow, error-prone, and cannot scale to the volume and velocity of sensor data, defeating the purpose of using generative AI for automation.

Full explanation →

205

MCQmedium

A company wants to use Generative AI for customer support chatbots. They are concerned about cost and latency. Which deployment option best balances these concerns?

A.Deploy an open-source model on-premise to avoid cloud costs

B.Rely on a third-party chatbot API that abstracts the model

C.Use the largest available foundation model via API for highest accuracy

D.Use a fine-tuned version of a smaller model on Vertex AI with response caching

AnswerD

A tuned smaller model reduces compute cost and caching minimizes repeated inference, lowering latency. Vertex AI provides scalable infrastructure.

Why this answer

Option D is correct because using a fine-tuned smaller model on Vertex AI with response caching reduces both cost and latency. Smaller models require fewer computational resources, and caching avoids redundant inference calls, directly addressing the company's concerns without sacrificing accuracy for the specific task.

Exam trap

Google Cloud often tests the misconception that 'larger model = better accuracy always' or that 'on-premise is always cheaper,' ignoring the total cost of ownership, scaling overhead, and the efficiency gains from fine-tuning and caching for specific use cases.

How to eliminate wrong answers

Option A is wrong because deploying on-premise incurs high upfront hardware and maintenance costs, and may not scale efficiently for variable customer support loads, often increasing total cost of ownership (TCO) despite avoiding cloud fees. Option B is wrong because relying on a third-party chatbot API abstracts the model but does not inherently optimize cost or latency; it may introduce per-call pricing and network overhead, and the provider controls model size and caching. Option C is wrong because using the largest available foundation model via API maximizes accuracy but also maximizes inference cost and latency due to higher parameter count and compute requirements, which is the opposite of balancing cost and latency.

Full explanation →

206

MCQhard

A company is evaluating the ROI of a generative AI project. Which metric is most appropriate?

A.Reduction in time to complete tasks using the generative AI tool

B.Reduction in model error rate on a test set

C.Increase in user satisfaction scores

D.Cost per inference compared to historical average

AnswerA

Time savings directly translate to labor cost reduction or increased throughput, providing a clear ROI.

Why this answer

Option A is correct because the primary business justification for a generative AI project is operational efficiency, measured directly by the reduction in time to complete tasks. Unlike technical metrics such as model error rate, this metric ties the AI's output to tangible productivity gains, which is the core of ROI analysis in a business context. Generative AI tools are designed to augment human workflows, so time savings translate into cost savings and increased throughput, making it the most appropriate metric for evaluating return on investment.

Exam trap

Google Cloud often tests the distinction between technical performance metrics (like model error rate) and business outcome metrics, trapping candidates who default to evaluating AI models as they would in a data science context rather than from a business leadership perspective.

How to eliminate wrong answers

Option B is wrong because reduction in model error rate on a test set is a technical performance metric, not a business ROI metric; it measures model accuracy but does not account for the cost of deployment, user adoption, or actual business value generated. Option C is wrong because increase in user satisfaction scores, while valuable, is a lagging indicator that can be influenced by factors unrelated to the AI's direct impact on productivity or cost, and it does not quantify financial return. Option D is wrong because cost per inference compared to historical average focuses solely on operational cost efficiency, ignoring the revenue or time-saving benefits that the generative AI tool provides, thus failing to capture the full ROI picture.

Full explanation →

207

MCQhard

An enterprise uses a fine-tuned PaLM 2 model for code generation. They want to ensure the generated code passes security audits. Which combination of techniques would be most effective?

A.Integrate a static analysis tool in the pipeline and add a safety filter to reject code containing dangerous functions.

B.Use a few-shot prompt with examples of secure code and set temperature to 1.0.

C.Fine-tune the model on a dataset of insecure code and use top-p=0.9.

D.Increase the model's context window and use a system instruction to 'be secure'.

AnswerA

Static analysis and safety filters directly block insecure code patterns.

Why this answer

Option D is correct because integrating a static analysis tool and a safety filter provides automated, real-time security checks. Option A (few-shot with high temperature) increases risk due to randomness. Option B (fine-tuning on insecure code) is counterproductive.

Option C (increasing context window) does not directly address security.

Full explanation →

208

MCQeasy

A startup wants to generate concise summaries of long news articles using an LLM on Vertex AI. They prioritize low latency and cost. Which model choice is most appropriate?

A.Use Gemini 1.5 Pro for the highest accuracy.

B.Use PaLM 2 Bison, as it is the most economical.

C.Use Vertex AI Text Embeddings, since embeddings can generate summaries.

D.Use Gemini 1.5 Flash, which is designed for high throughput and low cost.

AnswerD

A is correct because Flash balances performance and cost.

Why this answer

Gemini 1.5 Flash is optimized for high-throughput, low-latency, and cost-efficient summarization tasks, making it the ideal choice for a startup that needs to process long news articles quickly without incurring high costs. It balances performance and economy, whereas Gemini 1.5 Pro prioritizes accuracy at higher latency and cost, and PaLM 2 Bison is less efficient for this use case.

Exam trap

The trap here is that candidates often assume the most accurate model (Gemini 1.5 Pro) is always the best choice, overlooking the specific business requirements for low latency and cost, which Gemini 1.5 Flash directly addresses.

How to eliminate wrong answers

Option A is wrong because Gemini 1.5 Pro, while offering high accuracy, has higher latency and cost, which contradicts the startup's priority for low latency and cost. Option B is wrong because PaLM 2 Bison is not the most economical for summarization; it is a general-purpose model that may not provide the optimized throughput and cost-efficiency of Gemini 1.5 Flash, and it is being deprecated in favor of newer models. Option C is wrong because Vertex AI Text Embeddings generate vector representations of text, not natural language summaries; they cannot produce concise textual summaries directly.

Full explanation →

209

Multi-Selecthard

Which THREE factors should be considered when choosing between Gemini 1.5 Pro and Gemini 1.5 Flash for a customer-facing chatbot? (Choose three.)

Select 3 answers

A.Cost constraints: Flash is more cost-effective per token

B.Task complexity: Pro is better for complex reasoning

C.Safety filters: Pro has stricter safety defaults

D.Latency requirements: Flash provides faster responses

E.Multimodal capability: Flash does not support image input

AnswersA, B, D

Flash is cheaper.

Why this answer

Option A is correct because Gemini 1.5 Flash is designed as a cost-optimized model, offering significantly lower per-token pricing compared to Gemini 1.5 Pro. For a customer-facing chatbot with high query volumes, cost efficiency is a primary consideration, making Flash the more economical choice for routine interactions.

Exam trap

The trap here is that candidates often assume Flash lacks multimodal capabilities or that Pro has stricter safety defaults, when in fact both models share the same safety configuration and both support multimodal inputs, with the key differentiators being cost, latency, and task complexity.

Full explanation →

210

Multi-Selecteasy

A company is deploying a large language model (LLM) for customer support using Vertex AI. Which TWO best practices should they follow to ensure high-quality and cost-effective responses?

Select 2 answers

A.Deploy the model on Spot VMs to reduce infrastructure costs

B.Store prompts in plain text files for easy version control

C.Implement prompt optimization techniques to tailor responses

D.Use Vertex AI Model Monitoring to track input drift and response quality

E.Use a single large model for all query types to maintain consistency

AnswersC, D

Prompt optimization helps generate more accurate and relevant responses for specific use cases.

Why this answer

The correct answers are B (implement prompt optimization) and D (use Vertex AI Model Monitoring for drift). Prompt optimization improves response quality, and model monitoring detects performance degradation. Option A is wrong because using a single large model for all queries may be inefficient; smaller specialized models or routing can be better.

Option C is risky for production workloads. Option E exposes sensitive prompts.

Full explanation →

211

Multi-Selecteasy

A company is considering using gen AI for customer support. Which two business strategies are most important for success?

Select 2 answers

A.Measure customer satisfaction metrics

B.Ignore data privacy

C.Deploy without testing

D.Ensure human-in-the-loop for critical interactions

E.Use the cheapest model

AnswersA, D

Metrics help evaluate success and guide improvements.

Why this answer

Measuring customer satisfaction metrics (A) is critical because it provides quantitative feedback on the generative AI system's performance, enabling iterative improvements to the model's responses and alignment with business goals. Without metrics like CSAT or NPS, the company cannot validate whether the AI is reducing resolution time or improving user experience, which are key ROI indicators for gen AI deployments.

Exam trap

Google Cloud often tests the misconception that cost optimization (cheapest model) or speed-to-market (deploy without testing) are primary success factors, when in reality governance, safety, and continuous measurement are the foundational strategies for sustainable gen AI adoption.

Full explanation →

212

Multi-Selectmedium

Which THREE capabilities does Vertex AI Agent Builder provide out of the box? (Select THREE.)

Select 3 answers

A.Automatic escalation to human agents when confidence is low

B.Fine-tuning of foundation models using custom datasets

C.Enterprise search over internal documents

D.Grounding with enterprise data sources like Cloud Storage and BigQuery

E.Pre-built customer service and sales agents

AnswersA, D, E

Agent Builder supports fallback to human agents based on confidence thresholds.

Why this answer

Agent Builder includes pre-built agents (A), grounding (D), and automatic fallback (E). B is done via Model Garden, not Agent Builder directly. C is a feature of Vertex AI Search.

Full explanation →

213

MCQhard

A financial services firm wants to deploy generative AI for automated investment advice. They are subject to strict regulatory oversight requiring explainability and audit trails. Which strategy best meets these requirements?

A.Fine-tune a model on historical trading data without human review.

B.Use a black-box large language model with monitoring.

C.Deploy a rule-based system augmented with generative AI for content generation.

D.Implement human-in-the-loop with full logging of model inputs, outputs, and human decisions.

AnswerD

This provides a transparent audit trail and human accountability, satisfying regulatory demands.

Why this answer

Option D is correct because it directly addresses the regulatory requirements for explainability and audit trails by incorporating human oversight and comprehensive logging. The human-in-the-loop (HITL) mechanism ensures that critical investment decisions are reviewed by qualified professionals, while full logging of model inputs, outputs, and human decisions creates a transparent, auditable record. This approach satisfies financial regulations like MiFID II or SEC rules that mandate explainability and accountability in automated advice systems.

Exam trap

Google Cloud often tests the misconception that monitoring or rule-based augmentation alone is sufficient for regulatory compliance, when in fact strict oversight and complete audit trails are mandatory for explainability in high-stakes domains like finance.

How to eliminate wrong answers

Option A is wrong because fine-tuning a model on historical trading data without human review introduces risks of overfitting to past market conditions and lacks the necessary audit trail and explainability for regulatory compliance. Option B is wrong because using a black-box large language model with monitoring still fails to provide the required explainability, as the internal decision-making process remains opaque and cannot be audited or justified to regulators. Option C is wrong because a rule-based system augmented with generative AI for content generation, while more transparent, still lacks the structured human oversight and full logging of decisions needed to meet strict audit trail requirements, and the generative AI component can introduce unpredictable outputs that undermine explainability.

Full explanation →

214

MCQmedium

An enterprise deploys a large language model (LLM) for internal document summarization. Users complain that summaries sometimes include statements not present in the original document. Which mitigation strategy should the team prioritize to address this hallucination issue?

A.Train a discriminator model to detect hallucinations and perform adversarial training.

B.Implement retrieval-augmented generation (RAG) to ground the model in the original documents and require citations.

C.Apply reinforcement learning from human feedback (RLHF) using a reward model that penalizes hallucinations.

D.Reduce the model's temperature parameter to 0 to make outputs deterministic.

AnswerB

RAG ties outputs to source documents, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the most direct and effective mitigation for hallucination in document summarization because it forces the LLM to base its output on retrieved chunks of the original document. By requiring citations, the model must reference specific passages, making it verifiable and reducing the likelihood of fabricating content. This grounds the generation in the source material, addressing the root cause of hallucination—lack of factual grounding—rather than relying on post-hoc correction or output tuning.

Exam trap

Google Cloud often tests the misconception that reducing temperature or applying RLHF alone can solve hallucination, when in fact these methods do not provide the explicit grounding that RAG offers for document-specific tasks.

How to eliminate wrong answers

Option A is wrong because training a discriminator model and performing adversarial training is a complex, resource-intensive approach that does not directly prevent hallucinations at inference time; it only improves robustness against adversarial inputs, not factual grounding. Option C is wrong because RLHF with a reward model that penalizes hallucinations can reduce their frequency over time, but it requires extensive human feedback and fine-tuning, and does not guarantee grounding in specific source documents for each summary. Option D is wrong because reducing the temperature parameter to 0 makes outputs deterministic but does not eliminate hallucinations—it only reduces randomness; the model can still confidently generate false statements that were never in the source document.

Full explanation →

215

Multi-Selecteasy

A team is using a language model for customer feedback analysis. They want to improve the accuracy of sentiment extraction. Which TWO techniques should they apply? (Choose two.)

Select 2 answers

A.Increase the temperature to 0.8 to allow more creative interpretations.

B.Provide few-shot examples of correctly labeled sentiment in the prompt.

C.Use the model's built-in sentiment analysis API instead of prompting.

D.Add a system instruction that asks the model to strictly follow JSON output format.

E.Fine-tune the model on a labeled dataset of customer feedback.

AnswersB, E

Few-shot examples guide the model's output format and accuracy.

Why this answer

Options A and C are correct. Few-shot prompting provides examples of correct sentiment labeling, and fine-tuning on a labeled dataset adapts the model to the domain. Option B (increasing temperature) adds randomness and reduces accuracy.

Option D (using a different API) is not a technique for improving the current model. Option E (JSON formatting) helps structure output but does not improve sentiment accuracy.

Full explanation →

216

MCQmedium

A healthcare provider plans to implement gen AI for clinical note summarization. They have limited AI expertise. Which Google Cloud approach best aligns with their business strategy?

A.Hire a team of data scientists

B.Use Vertex AI Agent Builder with pre-built templates

C.Deploy an open-source model on Compute Engine

D.Build a custom model from scratch

AnswerB

Leverages managed services and reduces the need for in-house AI expertise.

Why this answer

Vertex AI Agent Builder provides pre-built templates and a low-code interface specifically designed for organizations with limited AI expertise. It enables rapid deployment of generative AI solutions like clinical note summarization without requiring deep data science skills, directly aligning with the healthcare provider's business strategy of minimizing technical overhead while leveraging AI.

Exam trap

Google Cloud often tests the misconception that 'more technical control' (e.g., custom models or open-source deployment) is always better, but the trap here is that the question explicitly prioritizes business strategy and limited expertise, making low-code/no-code solutions like Vertex AI Agent Builder the correct choice over technically complex alternatives.

How to eliminate wrong answers

Option A is wrong because hiring a team of data scientists contradicts the 'limited AI expertise' constraint and introduces significant cost and time overhead, which is not a strategic fit for rapid implementation. Option C is wrong because deploying an open-source model on Compute Engine requires substantial DevOps, model tuning, and infrastructure management expertise, which the provider lacks. Option D is wrong because building a custom model from scratch demands advanced machine learning skills, large labeled datasets, and extensive training resources, making it impractical for an organization with limited AI expertise.

Full explanation →

217

Multi-Selectmedium

Which TWO techniques are most effective for improving factual accuracy in a generative AI model's responses? (Choose two.)

Select 2 answers

A.Retrieval-Augmented Generation (RAG) with curated datasets.

B.Increasing the model's temperature to 1.5.

C.Grounding with a trusted knowledge base.

D.Using longer system prompts with multiple instructions.

E.Fine-tuning on a large corpus of general text.

AnswersA, C

RAG retrieves relevant, up-to-date documents to inform responses.

Why this answer

Grounding and RAG both provide external authoritative sources to enhance factual accuracy. Fine-tuning on general data doesn't guarantee accuracy, and increasing temperature hurts accuracy. Prompt engineering is helpful but not as robust as retrieval-based methods.

Full explanation →

218

MCQhard

A data scientist sees the above error when trying to deploy a model to an endpoint. What is the most likely cause?

A.The IAM permissions are insufficient

B.The model import into Vertex AI Model Registry is still in progress

C.The endpoint does not exist

D.The model is already deployed to another endpoint

AnswerB

Model is still DEPLOYING, not ready.

Why this answer

The error indicates that the model is not yet fully imported into the Vertex AI Model Registry. Deploying a model to an endpoint requires the model resource to be in an 'ACTIVE' state; if the import is still in progress, the deployment request will fail. This is a common timing issue when a model is uploaded but not yet registered.

Exam trap

Google Cloud often tests the misconception that any deployment failure is due to permissions or missing resources, when in fact the model's lifecycle state (e.g., still importing) is the root cause.

How to eliminate wrong answers

Option A is wrong because insufficient IAM permissions would typically result in a 403 Forbidden error, not a model-not-found or import-in-progress error. Option C is wrong because if the endpoint did not exist, the error would be a 404 Not Found for the endpoint resource, not a model import issue. Option D is wrong because a model can be deployed to multiple endpoints simultaneously; the error would instead mention a conflict or quota limit, not an import status.

Full explanation →

219

MCQhard

A data scientist is using Vertex AI Model Registry to manage multiple versions of a custom text classification model. They need to ensure that only the version that passes all evaluation metrics can be deployed to a Vertex AI Endpoint for online predictions. What deployment strategy should they use?

A.Use A/B testing between versions and manually select the best performer

B.Set up continuous evaluation with model monitoring to auto-promote versions that meet thresholds

C.Manually track version IDs and deploy the latest version

D.Deploy all versions to a single endpoint and route traffic manually

AnswerB

Continuous evaluation automatically checks metrics and can auto-promote versions that pass defined thresholds.

Why this answer

Model Registry supports continuous evaluation and can automatically promote versions that meet thresholds. Option A is wrong because manual version control is error-prone. Option B is wrong because A/B testing is for traffic splitting, not automatic promotion.

Option D is wrong because batch predictions are offline.

Full explanation →

220

MCQhard

A healthcare startup has fine-tuned a Vertex AI PaLM 2 model on a dataset of medical records to generate patient summaries. The model produces fluent text but occasionally fabricates diagnoses not present in the input. The team has already tried increasing the training data size by 20% and adjusting the temperature from 0.7 to 0.2, but hallucinations persist. The summaries must be factually accurate for regulatory compliance. What should the team do next?

A.Increase the maximum output tokens to allow the model to generate more detailed summaries.

B.Implement a RAG pipeline using Vertex AI Search to retrieve relevant medical documents before generation.

C.Add more few-shot examples to the prompt for each generation.

D.Switch the base model to Gemini 1.5 Pro without additional changes.

AnswerB

RAG provides grounded, up-to-date context, reducing hallucinations significantly.

Why this answer

Option B is correct because augmenting the model with a retrieval-augmented generation (RAG) pipeline grounded in a trusted medical knowledge base directly addresses hallucination by forcing the model to reference verified sources. Option A is wrong because changing the base model does not solve the fundamental lack of grounding. Option C is wrong because few-shot examples improve output format but not factual accuracy.

Option D is wrong because increasing the context window does not prevent fabrication; it may even introduce more irrelevant information.

Full explanation →

221

MCQmedium

A startup with $500k in seed funding wants to integrate GenAI into their SaaS product for automated report generation. They have 2 ML engineers and expect 10,000 monthly users initially. They estimate that using a foundation model API (e.g., Gemini) will cost $0.10 per 1K tokens, and each report uses about 5K tokens. Alternatively, they could fine-tune an open-source model on their domain data, estimated at $50k for compute and $20k for engineering time, with inference cost of $0.02 per 1K tokens on a dedicated endpoint. Which approach is more cost-effective over the first 12 months assuming 50,000 reports per month?

A.Use the foundation model API because it has lower upfront cost

B.Use a combination of both depending on report complexity

C.Build a custom model from scratch

D.Fine-tune the open-source model because it has lower per-report cost

AnswerD

Fine-tuning yields lower per-token cost, resulting in $190k total over a year, which is cheaper than the API.

Why this answer

Option D is correct because the total cost of fine-tuning over 12 months is $70,000 upfront plus $0.02 per 1K tokens * 5K tokens per report * 50,000 reports per month * 12 months = $600,000, totaling $670,000. The API approach costs $0.10 per 1K tokens * 5K tokens * 50,000 reports * 12 = $3,000,000, making fine-tuning significantly cheaper at scale despite the upfront investment.

Exam trap

Google Cloud often tests the misconception that lower upfront cost always means lower total cost, ignoring the multiplicative effect of per-unit costs at scale—candidates fixate on the $70k fine-tuning investment versus the API's zero upfront cost without calculating the 12-month total.

How to eliminate wrong answers

Option A is wrong because it ignores the per-report cost at scale; the API's $0.10 per 1K tokens leads to $3M over 12 months, far exceeding the fine-tuning total of $670k. Option B is wrong because a combination approach would not reduce costs—using the API for complex reports still incurs high per-token costs, and the problem does not specify complexity tiers that would justify splitting workloads. Option C is wrong because building a custom model from scratch requires massive data, compute, and engineering resources (often millions of dollars), far beyond the $500k seed funding and small team, making it impractical for a startup.

Full explanation →

222

Multi-Selecthard

An organization is building a search application using Vertex AI Vector Search. They have encoded their documents into embeddings and want to retrieve the most similar documents for a query. Which TWO actions are required to set up a Vector Search index?

Select 2 answers

A.Specify embedding dimension size in the index config.

B.Deploy the index to the IndexEndpoint.

C.Train a custom embedding model.

D.Download the query embeddings to local storage.

E.Create an IndexEndpoint resource.

AnswersB, E

Deployment makes the index available for querying.

Why this answer

A and D are correct: Creating an IndexEndpoint and deploying the index to an endpoint are necessary steps. B is not required because dimensions come from the model, not set here. C is done after deployment.

E is for training, not serving.

Full explanation →

223

MCQeasy

A team notices their text generation model repeats phrases excessively. Which technique would most directly reduce repetition?

A.Use beam search with a beam width of 5

B.Apply a repetition penalty of 1.2

C.Increase top_k to 100

D.Lower temperature to 0.5

AnswerB

Repetition penalty directly discourages repeated tokens.

Why this answer

Using a repetition penalty during decoding discourages the model from repeating tokens. Option A is wrong because it increases randomness, which might reduce repetition but could also reduce coherence. Option B is wrong because beam search can increase repetition.

Option D is wrong because temperature reduction makes output more deterministic, potentially increasing repetition.

Full explanation →

224

MCQmedium

You are the AI lead at an e-commerce company that uses a generative model to write product descriptions from images and key attributes. The model is a multimodal transformer that encodes both image and text (attributes) and decodes a description. Recently, your team deployed a new version of the image encoder that uses a more powerful backbone (ViT-L instead of ViT-B). After deployment, the generated descriptions became longer but often include irrelevant visual details (e.g., background objects) and occasionally misrepresent the product's main features. The model was fine-tuned on the same dataset as before. The descriptions from the old model were concise and focused. What is the most likely cause of the degradation and the best fix?

A.The decoder is now too small relative to the encoder; reduce the encoder's hidden size or increase the decoder's capacity.

B.The new encoder produces less discriminative features; replace it with an older version.

C.Lower the decoder's temperature to reduce diversity and hallucination.

D.The powerful encoder introduces overfitting to the training images; continue fine-tuning with additional loss terms that penalize description of irrelevant details (e.g., using attention regularization).

AnswerD

Attention regularization forces the model to focus on product-relevant regions.

Why this answer

Option D is correct because the more powerful ViT-L encoder captures richer, high-resolution features, including background details, which the decoder then amplifies into longer, less focused descriptions. This is a form of overfitting to irrelevant visual patterns in the training images, not to the labels. Adding attention regularization (e.g., penalizing attention weights on non-salient regions) forces the model to focus on product-relevant features, restoring conciseness and accuracy without reverting to the weaker encoder.

Exam trap

Google Cloud often tests the misconception that a more powerful encoder always improves performance, when in fact it can introduce overfitting to irrelevant features, and the fix is not to downgrade the encoder but to add regularization that guides attention to salient regions.

How to eliminate wrong answers

Option A is wrong because the decoder's capacity is not inherently too small; the issue is that the encoder now provides a richer but noisier representation, and simply resizing encoder/decoder dimensions does not address the root cause of attending to irrelevant details. Option B is wrong because replacing the encoder with an older version would discard the potential benefits of ViT-L (e.g., better attribute recognition) and is a regression, not a fix; the problem is not that features are less discriminative but that they are too detailed and unfocused. Option C is wrong because lowering temperature reduces randomness in token sampling but does not prevent the decoder from faithfully reproducing irrelevant visual details that the encoder already extracted; it would only make the output more deterministic, not more concise or accurate.

Full explanation →

225

MCQhard

A model generates biased output. Which technique is least effective?

A.Use adversarial debiasing

B.Apply safety filters

C.Set frequency penalty to 1.0

D.Fine-tune on diverse data

AnswerC

Affects repetition, not fairness or bias.

Why this answer

Setting frequency penalty to 1.0 reduces token repetition but does not address bias. Fine-tuning on diverse data, adversarial debiasing, and safety filters all directly tackle bias.

Full explanation →

Page 3 of 7

All pages

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output

See all domains with question counts →