Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 226–300

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 4 of 7

226

MCQhard

A company has a large dataset of proprietary documents and wants to build a Q&A system using a foundation model without exposing the documents to the model. Which approach is most appropriate?

A.Use RAG with Vertex AI Vector Search and embeddings

B.Use a zero-shot model with context in prompt

C.Fine-tune the model on the documents

D.Use prompt engineering to instruct the model

AnswerA

RAG retrieves documents at query time without training on them.

Why this answer

Option A is correct because Retrieval-Augmented Generation (RAG) with Vertex AI Vector Search allows the system to retrieve relevant document chunks via embeddings without exposing the full documents to the foundation model. The model only sees the retrieved context in the prompt, ensuring proprietary data remains isolated and not used for training or memorization.

Exam trap

Google Cloud often tests the misconception that fine-tuning or prompt engineering can solve data privacy concerns, when in reality RAG is the only approach that keeps proprietary documents isolated from the model's training and inference pipeline.

How to eliminate wrong answers

Option B is wrong because a zero-shot model with context in the prompt still requires the entire document content to be included in the prompt, which exposes the proprietary data to the model and may exceed token limits. Option C is wrong because fine-tuning the model on the documents would directly expose the proprietary data to the model during training, risking memorization and data leakage. Option D is wrong because prompt engineering alone cannot retrieve specific information from a large dataset; it only instructs the model on how to respond, not where to find the data.

Full explanation →

227

Multi-Selectmedium

Which TWO options are benefits of using Vertex AI Model Garden compared to using raw pre-trained models from external sources? (Choose two.)

Select 2 answers

A.Lower cost compared to using generic APIs

B.Ability to fine-tune models on custom data

C.Integration with Vertex AI tools like evaluation and monitoring

D.Simplified deployment and scaling with Vertex AI endpoints

E.Guaranteed data privacy and no data sharing

AnswersC, D

Native integration with Vertex AI ecosystem.

Why this answer

Option C is correct because Vertex AI Model Garden is deeply integrated with the Vertex AI ecosystem, providing native access to tools like Vertex AI Evaluation for model performance assessment and Vertex AI Monitoring for drift detection and observability. This integration eliminates the need for custom pipelines to connect external models with these managed services, streamlining the MLOps workflow.

Exam trap

Google Cloud often tests the distinction between inherent platform benefits (like integration and managed deployment) versus features that are not exclusive to Model Garden (like fine-tuning or cost), leading candidates to mistakenly select options that are generally true for any model but not unique advantages of Model Garden.

Full explanation →

228

Multi-Selecthard

Which THREE capabilities are provided by Vertex AI Agent Builder? (Choose three.)

Select 3 answers

A.Automated model hyperparameter tuning.

B.Integration with Dialogflow CX for conversational flows.

C.Support for multimodal (text, image, video) input processing in agents.

D.Creating custom agents with memory and tool integration.

E.Built-in grounding with Google Search to improve answer accuracy.

AnswersB, D, E

Agent Builder can leverage Dialogflow CX for advanced conversational design.

Why this answer

Options A, B, and D are correct. Vertex AI Agent Builder allows creating custom agents with memory and tools (A), integrates with Dialogflow CX (B), and provides built-in grounding with Google Search (D). Automated hyperparameter tuning (C) is a feature of Vertex AI Training, not Agent Builder.

Multimodal inputs (E) are supported by Gemini models but not a built-in capability of Agent Builder.

Full explanation →

229

Multi-Selecthard

Which THREE techniques are commonly used to improve the overall quality and coherence of generative model outputs? (Choose three.)

Select 3 answers

A.Using self-consistency or iterative refinement to choose the best output.

B.In-context learning (few-shot prompting) with relevant examples.

C.Applying output safety filters to remove inappropriate content.

D.Prompt chaining to decompose complex tasks into simpler sub-tasks.

E.Random sampling to increase output diversity.

AnswersA, B, D

Iterative methods improve reliability and coherence by selecting the most consistent response.

Why this answer

Options A, B, and E are correct. A: few-shot prompting provides examples that improve output structure. B: prompt chaining breaks complex tasks into steps, enhancing coherence.

E: iterative refinement (e.g., self-consistency) improves quality by generating multiple outputs and selecting the best. C is wrong because random sampling degrades quality. D is wrong because safety filters only block harmful content, not improve quality.

Full explanation →

230

MCQhard

A retail company is building a generative AI chatbot to assist customers with product recommendations and order tracking. The chatbot uses Vertex AI with Gemini 1.5 Pro, and the development team has implemented a Retrieval-Augmented Generation (RAG) pipeline using Vertex AI Search for grounding. The pipeline uses a vector store containing product descriptions and order history. During testing, the team observes that the chatbot sometimes provides incorrect order statuses—for example, claiming an order is 'shipped' when it is actually 'pending'. The team suspects the issue is related to how context is retrieved and used. The RAG pipeline currently retrieves the top 5 chunks based on cosine similarity from the vector store, and passes them as context to the model. The team is considering several changes to improve factual accuracy. Which single action would most effectively reduce hallucinations in this scenario?

A.Switch from Vertex AI Search to a different vector database like Pinecone.

B.Reduce the model temperature to 0.0 to make outputs more deterministic.

C.Increase the similarity score threshold for retrieval to 0.85 to filter out less relevant chunks.

D.Increase the top-K retrieval value to 10 to provide more context to the model.

AnswerC

Option A is correct because a higher threshold reduces irrelevant context, directly improving factual grounding.

Why this answer

Option C is correct because increasing the similarity score threshold to 0.85 ensures that only highly relevant chunks are passed to the Gemini 1.5 Pro model, directly reducing the risk of the model generating responses based on irrelevant or low-confidence context. In a RAG pipeline using Vertex AI Search, low-similarity chunks can contain order statuses from different customers or products, leading to hallucinations like incorrect order statuses. Filtering out these less relevant chunks improves the factual grounding of the model's output.

Exam trap

Google Cloud often tests the misconception that simply adding more context (higher top-K) or making the model more deterministic (lower temperature) will fix hallucinations, when the real issue is the relevance and quality of the retrieved context in a RAG pipeline.

How to eliminate wrong answers

Option A is wrong because switching to a different vector database like Pinecone does not address the core issue of retrieval relevance; the problem lies in the similarity threshold and chunk selection, not the database technology. Option B is wrong because reducing temperature to 0.0 makes the model more deterministic but does not fix the underlying issue of irrelevant or incorrect context being retrieved; the model will still confidently generate incorrect answers based on poor context. Option D is wrong because increasing top-K to 10 would retrieve more chunks, potentially including even more low-relevance or noisy context, which could worsen hallucinations rather than improve factual accuracy.

Full explanation →

231

MCQeasy

Refer to the exhibit. What is the most likely cause of this error?

A.The user does not have the required IAM role

B.The model is too large

C.The network is down

D.The project ID is incorrect

AnswerA

The error is a permission denial, meaning the user's IAM roles do not include 'aiplatform.models.upload'.

Why this answer

The error shown in the exhibit is an HTTP 403 Forbidden response, which indicates that the server understood the request but refuses to authorize it. In Google Cloud, this is most commonly caused by the user's identity lacking the necessary IAM role or permission to call the specific API or access the resource. Even if the project ID is correct and the network is functional, a missing IAM role (e.g., `aiplatform.user` or `roles/aiplatform.user`) will result in this exact error.

Exam trap

Google Cloud often tests the distinction between authentication (who you are) and authorization (what you can do), and the trap here is that candidates confuse a 403 Forbidden with a 404 Not Found or a network error, leading them to pick 'The project ID is incorrect' or 'The network is down' instead of recognizing the IAM permission failure.

How to eliminate wrong answers

Option B is wrong because model size does not cause an HTTP 403 error; a model that is too large would typically result in a 413 Payload Too Large or a resource-exhausted error, not an authorization failure. Option C is wrong because a network outage would produce a connectivity error (e.g., timeout, DNS resolution failure, or HTTP 502/503), not a 403 Forbidden response which requires a successful TCP connection and HTTP request to reach the server. Option D is wrong because an incorrect project ID would cause a 404 Not Found or a 400 Bad Request (e.g., 'Project not found'), not a 403 Forbidden; the 403 specifically indicates the request was received and the project exists, but the caller lacks authorization.

Full explanation →

232

MCQmedium

A developer is using Vertex AI Gemini API for a chatbot. The chatbot sometimes outputs harmful content. What is the best first step to mitigate this?

A.Fine-tune the model on curated safe data

B.Add a human-in-the-loop review

C.Use safety filters and safety settings in the API request

D.Switch to a smaller model

AnswerC

Safety settings directly filter harmful content at inference time.

Why this answer

Option C is correct because the Vertex AI Gemini API provides built-in safety filters and configurable safety settings (e.g., `safety_settings` parameter with categories like `HARM_CATEGORY_HARASSMENT` and thresholds like `BLOCK_ONLY_HIGH`) that allow developers to block harmful outputs at inference time without retraining. This is the fastest and most direct first step to mitigate harmful content, as it requires no additional infrastructure or model modification.

Exam trap

Google Cloud often tests the misconception that the first step to mitigate harmful content is to fine-tune the model, when in reality the immediate, low-cost, and recommended first step is to leverage the API's built-in safety filters and settings.

How to eliminate wrong answers

Option A is wrong because fine-tuning on curated safe data is a resource-intensive, secondary step that does not address immediate harmful outputs during inference and may not cover all edge cases of harmful content. Option B is wrong because adding a human-in-the-loop review introduces latency and cost, and is a reactive measure rather than a proactive first step to block harmful content at the API level. Option D is wrong because switching to a smaller model does not inherently reduce harmful outputs; smaller models can still generate harmful content and may have reduced capabilities for safe response generation.

Full explanation →

233

MCQhard

An organization wants to use a generative model to automatically generate legal contracts. The model must produce clauses that are not only grammatically correct but also legally enforceable and consistent with current jurisdiction laws. Which combination of techniques best ensures legal compliance?

A.Fine-tune a small model exclusively on legal contracts from a single jurisdiction and use it for generation.

B.Implement retrieval-augmented generation (RAG) with a vector database of all relevant laws.

C.Fine-tune a model on a diverse set of enforceable contracts and incorporate an external compliance verifier that uses rule-based checks.

D.Use a large instruction-tuned model with carefully engineered prompts describing jurisdiction details.

AnswerC

Fine-tuning imparts domain knowledge, and the verifier ensures legal correctness.

Why this answer

Option D is correct because fine-tuning on curated legal documents teaches domain-specific language and enforceability, while a verifier (an external logic/rule system) checks compliance with laws. Option A is incorrect because prompt engineering is unreliable for precise legal reasoning. Option B is incorrect because small model likely lacks legal knowledge.

Option C is incorrect because RAG retrieves but does not verify enforceability.

Full explanation →

234

MCQmedium

A financial services company is building a customer-facing chatbot using Vertex AI Gemini API to answer questions about account balances, transactions, and branch locations. The chatbot must adhere to strict data privacy regulations (e.g., GDPR) that prohibit sending personally identifiable information (PII) to the model provider. The architecture uses a retrieval-augmented generation (RAG) approach where customer queries are passed to a Cloud Run service, which queries a BigQuery database for relevant data and then sends the context along with the query to the Gemini API. The team is concerned that the context may contain PII. They want to minimize modifications to the existing architecture. What step should the team take to ensure compliance?

A.Build a separate anonymization pipeline using Cloud Data Loss Prevention to remove PII before sending context.

B.Modify the chatbot to reject any query that might contain PII based on regex patterns.

C.Route all requests through a third-party proxy that strips PII before sending to Gemini.

D.Configure the Gemini API to disable data logging and use the Enterprise tier that ensures data stays within Google Cloud's controls.

AnswerD

Enterprise features of Vertex AI provide data residency and no logging, satisfying compliance with minimal changes.

Why this answer

Option B is correct. Vertex AI Gemini API can be configured to not log prompts and responses, and data is processed within Google Cloud's infrastructure (data residency). Option A (third-party proxy) adds complexity and latency.

Option C (always reject PII queries) would break functionality. Option D (anonymizing pipeline) is costly and not minimal.

Full explanation →

235

MCQmedium

A company wants to use GenAI to automate customer support. They have a large knowledge base. Which approach maximizes ROI in the first 6 months?

A.Deploy a general-purpose chatbot without customization

B.Use a pre-built conversational AI platform with Retrieval-Augmented Generation (RAG)

C.Build a custom LLM from scratch using their data

D.Fine-tune a foundation model on historical support tickets

AnswerB

A pre-built platform with RAG allows rapid deployment and leverages existing knowledge base, maximizing ROI in the short term.

Why this answer

Option B maximizes ROI in the first 6 months because it leverages a pre-built conversational AI platform integrated with Retrieval-Augmented Generation (RAG). RAG allows the model to dynamically retrieve relevant information from the existing knowledge base at inference time, providing accurate, context-aware responses without the need for costly retraining or custom model development. This approach balances rapid deployment, low upfront investment, and high accuracy, making it the most cost-effective solution for automating customer support quickly.

Exam trap

Google Cloud often tests the misconception that fine-tuning is always the best way to incorporate proprietary data, but the trap here is that fine-tuning does not provide real-time access to a dynamic knowledge base and is far more resource-intensive than RAG, which is the optimal strategy for rapid, cost-effective deployment in customer support scenarios.

How to eliminate wrong answers

Option A is wrong because deploying a general-purpose chatbot without customization would rely solely on the model's pre-trained knowledge, which lacks access to the company's specific knowledge base, leading to frequent hallucinations and incorrect answers that degrade customer trust and require extensive human oversight. Option C is wrong because building a custom LLM from scratch using their data is prohibitively expensive (often millions of dollars) and time-consuming (typically 12+ months), far exceeding the 6-month ROI window and requiring massive computational resources and specialized ML teams. Option D is wrong because fine-tuning a foundation model on historical support tickets alone does not incorporate the live knowledge base; it only adapts the model to past conversation patterns, which may become stale or miss updated information, and still requires significant compute and data preparation costs without the real-time retrieval capability that RAG provides.

Full explanation →

236

MCQhard

A retail company has deployed a customer support chatbot using Vertex AI Agent Builder. The chatbot is configured with a knowledge base stored in BigQuery (user manuals) and Cloud Storage (product images). The agent uses a Gemini 1.5 Pro model for response generation. Users report that the chatbot frequently gives incorrect answers and sometimes does not reference the knowledge base at all. Logs show high latency (average response time > 10 seconds) and many responses are generic or hallucinated. The agent's grounding configuration currently uses the default settings. The development team is considering the following actions: A) Switch to a smaller model like Gemini 1.5 Flash to reduce latency. B) Increase the context window of the model to allow more knowledge base content. C) Enable Vertex AI Search for grounding and configure a search aggregation strategy that retrieves relevant documents from the knowledge base. D) Fine-tune the Gemini model with the company's historical chat logs to improve domain-specific responses. Which action should the team take FIRST to address the issues?

A.Switch to a smaller model like Gemini 1.5 Flash to reduce latency.

B.Enable Vertex AI Search for grounding and configure a search aggregation strategy that retrieves relevant documents from the knowledge base.

C.Increase the context window of the model to allow more knowledge base content.

D.Fine-tune the Gemini model with the company's historical chat logs to improve domain-specific responses.

AnswerB

This directly improves retrieval accuracy and ensures the model references the knowledge base, addressing both hallucination and latency (by retrieving only relevant content).

Why this answer

Option C is correct because the symptoms indicate the agent is not effectively retrieving and leveraging the knowledge base. Enabling grounding with Vertex AI Search and configuring search aggregation directly addresses the incorrect answers and lack of knowledge base usage. Reducing model size (A) might help latency but not accuracy.

Increasing context window (B) could hurt performance further. Fine-tuning (D) is costly and may not fix retrieval issues without proper grounding.

Full explanation →

237

MCQhard

A financial institution wants to use generative AI to generate personalized investment advice. They face strict regulatory requirements on explainability and bias. Which approach should they take?

A.Use a foundation model with prompt engineering

B.Use a custom model trained from scratch

C.Use a RAG system with curated proprietary data

D.Use a closed-source model with vendor lock-in

AnswerC

Enables control, explainability, and bias auditing.

Why this answer

Option C is correct because a Retrieval-Augmented Generation (RAG) system allows the financial institution to ground generative AI outputs in curated, proprietary data sources (e.g., regulatory guidelines, client risk profiles, historical performance). This approach enhances explainability by enabling traceable citations back to specific documents, and reduces bias by controlling the data fed to the model, which is critical for meeting strict regulatory requirements like GDPR or SEC rules on algorithmic fairness.

Exam trap

Google Cloud often tests the misconception that prompt engineering alone can solve domain-specific compliance needs, when in reality RAG is required to ground outputs in curated, auditable data for regulated industries.

How to eliminate wrong answers

Option A is wrong because prompt engineering alone on a foundation model does not guarantee explainability or bias control; the model may still generate outputs based on its pre-trained, opaque weights, making it impossible to trace advice to specific regulatory or proprietary data. Option B is wrong because training a custom model from scratch requires massive amounts of labeled, unbiased data and computational resources, and still risks hidden biases in the training process, while also lacking the built-in retrieval mechanism for transparent, auditable citations. Option D is wrong because a closed-source model with vendor lock-in limits the institution's ability to audit the model's internal logic, customize bias mitigation, or ensure compliance with evolving regulations, as the vendor controls all updates and data handling.

Full explanation →

238

MCQeasy

A data scientist is using the Gemini API to generate product descriptions for an e-commerce site. The descriptions are often too verbose and include speculative claims that are not in the product specifications. The scientist wants to reduce hallucinations and control the length of the output without retraining the model. What should they do?

A.Increase the max output token count to 2048 and decrease temperature to 0.1.

B.Refine the prompt to be concise and include instructions to stick to facts and limit output to 50 words.

C.Add three few-shot examples of short, factual descriptions.

D.Set temperature to 0.0 and top_k to 1.

AnswerB

Clear constraints in the prompt directly control length and hallucination.

Why this answer

Option A is correct because verbose prompts often lead to verbose output; simplifying the prompt reduces length. Adding explicit constraints like 'only use provided facts' reduces hallucination. Option B is wrong because increasing token count would make descriptions longer, not shorter.

Option C is wrong because lower temperature reduces randomness but doesn't prevent speculation. Option D is wrong because few-shot examples may help but do not directly enforce length or factuality.

Full explanation →

239

MCQmedium

A company fine-tunes a text model on internal HR policies. After deployment, the model sometimes outputs sensitive employee information. What is the most likely cause?

A.The fine-tuning dataset contained personally identifiable information that was not removed.

B.The model was not trained with reinforcement learning from human feedback (RLHF).

C.The model has insufficient parameters to generalize properly.

D.The prompt engineering was too verbose and included misleading instructions.

AnswerA

Models can memorize training data; including sensitive information leads to leakage.

Why this answer

The most likely cause is that the fine-tuning dataset contained personally identifiable information (PII) that was not properly scrubbed. During fine-tuning, the model learns patterns and memorizes specific sequences from the training data. If the dataset includes sensitive employee records, the model can reproduce that information verbatim when prompted, leading to data leakage.

This is a well-known risk in fine-tuning, as models can overfit to rare or unique examples in the training set.

Exam trap

Google Cloud often tests the misconception that RLHF or prompt engineering can fix data leakage issues, but the trap here is that the root cause is always the training data itself—no amount of post-hoc alignment or prompt tweaking can prevent the model from reproducing memorized sensitive content.

How to eliminate wrong answers

Option B is wrong because RLHF is a technique used to align model outputs with human preferences, not to prevent memorization of training data; it does not address the root cause of data leakage from the fine-tuning dataset. Option C is wrong because insufficient parameters would typically cause underfitting or poor generalization, not the exact reproduction of sensitive information; memorization is more likely with larger models that have higher capacity to store training examples. Option D is wrong because verbose or misleading prompt engineering might degrade output quality but cannot cause the model to output specific employee data that was not present in its training or fine-tuning data; the model can only generate information it has learned.

Full explanation →

240

MCQmedium

A retail company is deploying a generative AI chatbot on Vertex AI to provide product recommendations. The chatbot uses a base foundation model with no fine-tuning. Users report that the chatbot sometimes gives offensive or insensitive responses. The team must quickly implement safety controls without modifying the model. They also want to reduce irrelevant off-topic answers. Which combination of techniques should they apply?

A.Fine-tune the model on a curated dataset of safe retail conversations.

B.Set temperature to 0.0 and top_p to 0.1.

C.Enable Vertex AI Safety Filters and craft system instructions defining appropriate behavior.

D.Provide 50 few-shot examples of safe interactions.

AnswerC

Safety filters block harmful output and system instructions guide the model's tone and relevance.

Why this answer

Option D is correct because safety filters (e.g., Vertex AI Safety Settings) block harmful content, and prompt engineering with system instructions keeps the model on topic and respectful. Option A is wrong because temperature adjustment alone does not prevent toxicity. Option B is wrong because few-shot examples may not cover all safety scenarios.

Option C is wrong because fine-tuning is not allowed per the constraint (no model modification).

Full explanation →

241

MCQmedium

A financial institution wants to deploy a generative AI solution for contract analysis. They need to ensure compliance with regulations. Which approach is best?

A.Deploy a large open-source model fine-tuned on public legal documents

B.Use a general-purpose pre-trained model with no modifications to minimize risk

C.Fine-tune a model on a curated dataset of past contracts and implement human-in-the-loop review

D.Implement retrieval-augmented generation (RAG) with the company's legal document database

AnswerC

Fine-tuning on relevant data improves accuracy, and human review catches any regulatory violations before finalization.

Why this answer

Option C is best because fine-tuning on a curated dataset of past contracts ensures the model learns domain-specific language and compliance patterns, while human-in-the-loop review provides a critical safety net for regulatory adherence. This combination directly addresses the need for accuracy and accountability in contract analysis, where errors can have legal consequences.

Exam trap

Google Cloud often tests the misconception that retrieval-augmented generation (RAG) alone is sufficient for domain-specific compliance, when in fact it requires fine-tuning or strict validation to prevent misinterpretation of retrieved legal texts.

How to eliminate wrong answers

Option A is wrong because deploying a large open-source model fine-tuned on public legal documents introduces risks from unvetted, potentially outdated or jurisdictionally inappropriate data, and lacks the controlled curation needed for compliance. Option B is wrong because a general-purpose pre-trained model with no modifications will lack the specialized knowledge of contract law, regulatory terms, and clause structures, leading to high error rates and non-compliance. Option D is wrong because retrieval-augmented generation (RAG) with the company's legal document database, while useful for grounding responses, does not inherently train the model on compliance patterns and still requires careful prompt engineering and validation to avoid hallucinations in critical contract analysis.

Full explanation →

242

MCQmedium

Refer to the exhibit. A team attempted to start a model tuning job but received the error 'Quota limit exceeded for tuning jobs in region us-central1'. What is the most appropriate action?

A.Request a quota increase for tuning jobs in us-central1

B.Change the region to us-west1 and retry

C.Reduce the size of the training data

D.Use a different base model

AnswerA

Correct: Quota issues are resolved by requesting a higher limit.

Why this answer

The error indicates that the quota for tuning jobs in us-central1 has been reached. Requesting a quota increase directly addresses the issue. Using a different region or model might work but is not the first recommended step.

Full explanation →

243

MCQeasy

A developer wants to generate product descriptions from a list of features using Vertex AI. Which model type is best suited for this task?

A.An embedding model (e.g., textembedding-gecko@001).

B.A chat model (e.g., chat-bison@001).

C.A text generation model (e.g., text-bison@001).

D.A code generation model (e.g., code-bison@001).

AnswerC

Text generation models are ideal for generative tasks from prompts.

Why this answer

Option C is correct because text-bison@001 is a dedicated text generation model optimized for tasks like summarization, translation, and content creation from structured inputs. It can take a list of features as a prompt and generate coherent, descriptive product descriptions without needing conversational context or code-specific outputs.

Exam trap

The trap here is that candidates may confuse 'text generation' with 'chat' or 'embedding' models, assuming any generative model can handle the task, but Vertex AI separates these by specialization, and the exam tests awareness of which model class is purpose-built for non-conversational, non-code text creation.

How to eliminate wrong answers

Option A is wrong because embedding models like textembedding-gecko@001 are designed to convert text into numerical vectors for similarity search or clustering, not for generating new text. Option B is wrong because chat models like chat-bison@001 are optimized for multi-turn conversational interactions, not for single-turn structured generation tasks like producing descriptions from a feature list. Option D is wrong because code generation models like code-bison@001 are specialized for generating programming code, not natural language product descriptions.

Full explanation →

244

MCQmedium

A developer receives the above JSON response from a Vertex AI PaLM API call for a medical advice application. What should the developer be most concerned about?

A.The safety score is very low (0.01)

B.The deployed model ID is not recognized

C.The output falls under the 'health' category, which may require compliance with regulations

D.The prediction content is incorrect

AnswerC

Health-related outputs need careful review.

Why this answer

Option C is correct because the JSON response includes a 'category' field with the value 'health', which triggers stringent regulatory compliance requirements such as HIPAA in the US or GDPR in Europe. For a medical advice application, the developer must ensure data handling, model transparency, and output validation meet these legal standards, as failure could result in severe penalties. The PaLM API's safety attributes and category labels are designed to flag such sensitive domains, making compliance the primary concern over other technical issues.

Exam trap

Google Cloud often tests the misconception that low safety scores or incorrect content are the primary risks, when in fact regulatory compliance for sensitive categories like 'health' is the most critical and non-obvious concern that developers must address first.

How to eliminate wrong answers

Option A is wrong because a safety score of 0.01 is not inherently concerning; it may indicate low confidence in the safety assessment rather than actual unsafe content, and the PaLM API uses separate blocking thresholds (e.g., safety_settings) to filter harmful outputs. Option B is wrong because the deployed model ID (e.g., 'text-bison@001') is a standard identifier for the PaLM model version, and unrecognized IDs typically cause API errors or fallback behavior, not a primary concern for a valid response. Option D is wrong because the prediction content's correctness is a secondary validation issue that can be addressed through prompt engineering or post-processing, whereas regulatory compliance is a non-negotiable legal requirement that must be handled before deployment.

Full explanation →

245

Multi-Selectmedium

A data scientist is selecting a base model for generating Python code. Which TWO factors are most important to consider?

Select 2 answers

A.Model's license (proprietary vs open-source).

B.Model's performance on coding benchmarks like HumanEval.

C.Model's support for multiple programming languages.

D.Model's training data recency.

E.Model's parameter count (size).

AnswersA, B

License determines usage rights and compliance.

Why this answer

Option A is correct because the model's license determines whether the generated code can be used in commercial products without violating copyright or requiring attribution. Proprietary models may impose restrictions on output usage, while open-source models (e.g., CodeLlama, StarCoder) offer more flexibility for enterprise deployment. This is critical for compliance and intellectual property management in production environments.

Exam trap

Google Cloud often tests the misconception that larger parameter counts or broader language support are more important than licensing and benchmark performance, leading candidates to overlook the legal and functional constraints of deploying a code generation model in a business context.

Full explanation →

246

MCQhard

A company is using Vertex AI for multimodal generative AI to analyze images and text. They need to ensure that the model's outputs are auditable and can be traced back to the input data. Which feature should they enable?

A.Vertex AI Feature Store

B.Vertex AI Experiments

C.Cloud Logging

D.Vertex AI Model Monitoring with Explainable AI

AnswerD

Model Monitoring with Explainable AI provides attribution and traceability.

Why this answer

Option D is correct because Vertex AI Model Monitoring with Explainable AI provides feature attributions that map model predictions back to specific input features (e.g., pixels in images or tokens in text). This creates an auditable trail by quantifying how each input contributed to the output, enabling traceability for compliance and debugging. The other options lack the direct input-to-output attribution required for auditability.

Exam trap

The trap here is that candidates confuse operational logging (Cloud Logging) or experiment tracking (Vertex AI Experiments) with the specific need for input-to-output attribution, which only Explainable AI provides for auditability.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, but it does not provide per-prediction attribution or traceability from model outputs back to specific inputs. Option B is wrong because Vertex AI Experiments tracks training runs, hyperparameters, and metrics, but it focuses on model development history, not on explaining individual inference outputs. Option C is wrong because Cloud Logging captures operational logs (e.g., API calls, errors) but does not generate feature-level explanations or attributions that link a specific output to its input data.

Full explanation →

247

MCQhard

A company is deploying a large language model on Vertex AI for real-time inference. They observe high latency and want to optimize. They have already enabled model caching. What next step should they take to reduce latency?

A.Add more GPUs to the prediction endpoint

B.Use a larger, more accurate model variant

C.Increase the batch size for inference requests

D.Apply model quantization to reduce precision

AnswerD

Quantization speeds up inference with minimal accuracy loss.

Why this answer

Option D is correct because reducing model precision (e.g., int8 quantization) can significantly reduce compute time and latency. Option A is wrong because increasing batch size increases latency per request. Option B is wrong while scaling predicts may help, it's not a model optimization.

Option C is wrong because using a larger model increases latency.

Full explanation →

248

MCQhard

A company has fine-tuned a foundation model on proprietary data. During evaluation, they find the model performs well on seen examples but poorly on unseen but similar tasks. What is the problem?

A.Underfitting

B.Catastrophic forgetting

C.Distribution shift

D.Domain shift between fine-tuning and deployment

AnswerD

Domain shift causes poor generalization to similar but different tasks.

Why this answer

Option D is correct because the model performs well on seen examples (fine-tuning distribution) but poorly on unseen but similar tasks (deployment distribution), which is a classic symptom of domain shift. This occurs when the fine-tuning data does not fully represent the deployment environment, causing the model to fail on inputs that differ in subtle but systematic ways from the training distribution. The model has not generalized to the target domain despite being well-fitted to the source domain.

Exam trap

Google Cloud often tests the distinction between distribution shift (a broad category) and domain shift (a specific type), so candidates mistakenly pick 'distribution shift' without recognizing that the question explicitly describes a domain mismatch between fine-tuning and deployment.

How to eliminate wrong answers

Option A is wrong because underfitting would cause poor performance on both seen and unseen examples, not good performance on seen examples alone. Option B is wrong because catastrophic forgetting refers to a model losing previously learned knowledge when fine-tuned on new data, but here the model retains performance on seen examples, indicating no forgetting occurred. Option C is wrong because distribution shift is a broad term that includes domain shift, but the specific scenario described—good performance on seen tasks but poor on similar unseen tasks—is precisely domain shift between fine-tuning and deployment, not a general covariate or label shift.

Full explanation →

249

MCQhard

A company is using a fine-tuned LLM for generating financial reports. They need to ensure that the output complies with regulatory standards and does not include speculative content. Which combination of techniques should they implement?

A.Increase the model's safety settings to maximum, use a low top-p value, and limit output tokens.

B.Fine-tune the model on historical compliant reports, use RAG with a regulatory database, and implement a human-in-the-loop review.

C.Use a larger model with more parameters and rely on its inherent knowledge.

D.Use a system instruction to adhere to regulations, set temperature to 0.0, and apply a keyword filter.

AnswerB

Combines domain adaptation, real-time grounding, and human oversight.

Why this answer

Option A is correct because fine-tuning on historical compliant reports, using RAG with a regulatory database, and implementing a human-in-the-loop review provides multiple layers of compliance. Option B (system instruction with low temperature and keyword filter) is insufficient for complex regulations. Option C (safety settings and low top-p) may block non-speculative content and doesn't ensure compliance.

Option D (larger model) does not guarantee compliance.

Full explanation →

250

Multi-Selecthard

A team is evaluating generative AI models for a content moderation system. Which THREE metrics are most important to assess?

Select 3 answers

A.Percentage of outputs flagged by safety filters.

B.Cost per million tokens.

C.Inference latency under expected load.

D.BLEU score against human-written moderation guidelines.

E.Precision and recall on a test set of moderated content.

AnswersA, C, E

Indicates how often the model generates unsafe content.

Why this answer

Option A is correct because safety filters are a primary mechanism for detecting and blocking harmful or policy-violating content in generative AI outputs. In a content moderation system, the percentage of outputs flagged by these filters directly measures the model's tendency to produce unsafe content, which is critical for maintaining platform safety and compliance.

Exam trap

Google Cloud often tests the distinction between metrics that measure model performance on the task (safety, accuracy) versus metrics that measure operational or linguistic qualities (cost, BLEU), leading candidates to mistakenly include cost or BLEU as primary assessment criteria for content moderation.

Full explanation →

251

Multi-Selecthard

Which THREE of the following are key considerations when deploying a generative AI model in a production environment with strict latency requirements? (Choose three.)

Select 3 answers

A.Deploy the largest model variant available to ensure highest quality.

B.Implement speculative decoding to generate candidate tokens with a smaller draft model and verify with the large model.

C.Use model quantization (e.g., int8) to reduce precision and speed up matrix multiplications.

D.Cache the key-value caches from previous decoding steps to avoid redundant computation.

E.Increase the inference batch size to maximize GPU utilization.

AnswersB, C, D

Speculative decoding significantly reduces time per token.

Why this answer

Option B is correct because speculative decoding uses a smaller, faster draft model to generate candidate tokens, which are then verified by the large model in parallel. This reduces the number of sequential autoregressive steps, significantly lowering latency while maintaining output quality.

Exam trap

Google Cloud often tests the distinction between latency and throughput, so the trap here is that candidates confuse batch size (which improves throughput) with latency reduction, or assume larger models always yield better performance without considering inference speed.

Full explanation →

252

MCQhard

A global company deploying gen AI across multiple regions needs to minimize latency and comply with data sovereignty. What architecture should they adopt?

A.Single global deployment with CDN

B.Multi-region deployment with Vertex AI

C.Use a third-party API

D.On-premises deployment only

AnswerB

Vertex AI supports deploying models in multiple regions, reducing latency and enabling data residency compliance.

Why this answer

Option D is correct because multi-region deployment with Vertex AI allows serving models close to users (low latency) while adhering to data residency requirements. Option A is wrong because a single global deployment may violate data sovereignty and increase latency. Option B is wrong because on-premises deployment is costly and limits scalability.

Option C is wrong because third-party APIs may not offer multi-region data control.

Full explanation →

253

MCQmedium

You are an ML engineer at a retail company. You have deployed a generative AI model on Vertex AI to generate product descriptions. The model uses a custom container and is deployed to a single endpoint. Recently, you noticed that inference latency has increased significantly during peak hours, causing timeouts. You have checked the logs and found that the CPU utilization on the deployed instances is consistently above 90% during peak hours. The model is currently deployed with a single machine type (n1-standard-4) and no scaling. You need to reduce latency without incurring excessive cost. What should you do?

A.Optimize the model using quantization and reduce the number of replicas

B.Switch to batch prediction instead of online prediction

C.Change the machine type to n1-standard-8 and enable autoscaling with min replicas=1, max replicas=5

D.Add a GPU accelerator to the existing machine

AnswerC

More CPU and autoscaling handle peak load efficiently.

Why this answer

Option C is correct because upgrading to a larger machine (n1-standard-8) provides more CPU cores to handle the increased inference workload, while enabling autoscaling (min=1, max=5) allows the deployment to dynamically add replicas during peak hours to distribute the load and reduce latency. This combination addresses the high CPU utilization without over-provisioning during off-peak times, thus controlling cost.

Exam trap

The trap here is that candidates often assume adding a GPU (Option D) is always the best way to reduce inference latency, but for CPU-bound models with high utilization, scaling out with more replicas and a larger CPU machine is more cost-effective and directly addresses the bottleneck.

How to eliminate wrong answers

Option A is wrong because quantization reduces model size and can improve latency, but reducing the number of replicas would worsen the bottleneck by decreasing capacity, not help with high CPU utilization. Option B is wrong because batch prediction is designed for asynchronous, non-real-time processing and does not solve online inference latency during peak hours; it would change the use case entirely. Option D is wrong because adding a GPU accelerator to the existing n1-standard-4 machine would not address the CPU bottleneck (inference is CPU-bound in this scenario) and would increase cost unnecessarily without guaranteed latency improvement for a CPU-bound model.

Full explanation →

254

MCQhard

The exhibit shows the deployment configuration for a conversational AI model used in a finance application. Users report that responses are creative but often contain factually incorrect financial advice. Which parameter change would most improve factual accuracy?

A.Add grounding sources, such as "EnterpriseSearch" or "Web"

B.Lower temperature to 0.1

C.Increase topP to 1.0

D.Increase maxOutputTokens to 1024

AnswerA

Grounding forces the model to base responses on real data, directly improving factual accuracy.

Why this answer

Option B is correct because grounding sources (e.g., Google Search or a knowledge base) inject real-world facts into responses, reducing hallucination. Option A (lower temperature) would reduce creativity but not directly fix factual inaccuracies. Option C (increased max tokens) addresses length, not facts.

Option D (top_p increase) would make output even more variable.

Full explanation →

255

MCQhard

A media company is using a generative AI model to create video captions. The model is deployed on Vertex AI with autoscaling. During peak hours, they observe high latency and request timeouts. Which action would most effectively address this issue?

A.Optimize the prompt to reduce output length

B.Reduce the maximum number of replicas to limit resource usage

C.Switch to a GPU-based machine type for faster inference

D.Increase the minimum number of replicas in the autoscaling configuration

AnswerD

Higher minimum replicas reduce cold starts and improve latency during traffic spikes.

Why this answer

Increasing the minimum number of replicas ensures that during peak hours, the model already has a baseline of warm instances ready to handle requests, reducing cold-start latency and preventing timeouts. Autoscaling can take time to spin up new replicas, so a higher minimum replica count directly mitigates the latency spike by pre-provisioning capacity.

Exam trap

The trap here is that candidates confuse performance optimization (faster inference per request) with capacity planning (ensuring enough concurrent replicas), leading them to choose GPU upgrades or prompt tweaks instead of addressing the autoscaling configuration.

How to eliminate wrong answers

Option A is wrong because optimizing the prompt to reduce output length may lower per-request compute time but does not address the root cause of insufficient concurrent serving capacity during traffic spikes. Option B is wrong because reducing the maximum number of replicas would cap the autoscaler's ability to add instances, worsening the bottleneck and increasing timeouts. Option C is wrong because switching to a GPU-based machine type can accelerate inference per request but does not solve the scaling issue; it may even increase cold-start time and cost without guaranteeing enough replicas to handle peak load.

Full explanation →

256

MCQmedium

After deploying a text-to-image model, the output images often contain distorted objects. The team suspects the prompt is too complex. Which prompt engineering technique should they try first?

A.Increase the guidance scale.

B.Add more descriptive adjectives.

C.Use a negative prompt to exclude distortions.

D.Break the prompt into simpler, separate steps.

AnswerD

Simpler prompts reduce the risk of distortion.

Why this answer

Option A is correct because breaking the prompt into simpler, separate steps reduces complexity and helps the model generate each part correctly. Option B (adding more adjectives) increases complexity. Option C (increasing guidance scale) can amplify distortions.

Option D (using a negative prompt) helps but is not the first step for simplifying the prompt.

Full explanation →

257

MCQhard

A financial services company wants to use Vertex AI Grounding with enterprise data to power a regulatory compliance chatbot. They have strict data residency requirements: data must remain in the EU. What should they do?

A.Enable Data Residency by selecting a EU region during data store creation

B.Use a VPN to the US region

C.Convert data to private tokens

D.Use a Vertex AI endpoint in a European region

AnswerA

Data stores for grounding are region-specific; selecting EU ensures data stays in EU.

Why this answer

Option B is correct because when creating a data store for grounding, you must select a region in the EU to meet data residency. Option A is wrong because the endpoint region does not determine data storage location. Option C is wrong because a VPN does not change data residency.

Option D is wrong because private tokens are unrelated to residency.

Full explanation →

258

MCQhard

A team deployed a fine-tuned model for code generation. After training, the model produces syntactically correct but functionally wrong code. What is the most likely cause?

A.Incorrect prompt format

B.Low temperature setting

C.Insufficient training epochs

D.Overfitting to training data

AnswerD

Model memorizes training examples, losing generalization.

Why this answer

Option D is correct because overfitting to training data causes the model to memorize specific code patterns and syntax from the training set without learning the underlying logic or functional requirements. This results in syntactically correct outputs that fail to generalize to new, unseen coding tasks, producing functionally wrong code despite proper syntax.

Exam trap

Google Cloud often tests the misconception that syntactically correct but functionally wrong code is caused by prompt or temperature issues, when in fact it is a classic sign of overfitting where the model memorizes syntax without understanding logic.

How to eliminate wrong answers

Option A is wrong because incorrect prompt format typically leads to malformed or irrelevant outputs, not syntactically correct but functionally wrong code; the model would likely produce gibberish or off-topic responses. Option B is wrong because low temperature setting reduces randomness and makes outputs more deterministic, which would actually improve syntactic correctness and consistency, not cause functional errors. Option C is wrong because insufficient training epochs would result in underfitting, where the model fails to learn even basic syntax and produces incomplete or incoherent code, not syntactically correct but functionally wrong outputs.

Full explanation →

259

Multi-Selectmedium

Which THREE are best practices for responsible deployment of generative AI in a customer-facing application?

Select 3 answers

A.Implement human-in-the-loop review for sensitive outputs

B.Train the model on all available data to maximize coverage

C.Implement content filters to block inappropriate outputs

D.Use only small models to reduce risk

E.Conduct regular bias and fairness audits

AnswersA, C, E

Human review adds accountability and error correction.

Why this answer

Option A is correct because human-in-the-loop (HITL) review ensures that sensitive outputs—such as those involving protected health information (PHI), personally identifiable information (PII), or high-stakes decisions—are vetted by a human before reaching the customer. This mitigates the risk of harmful or biased generations that automated guardrails might miss, aligning with responsible AI principles like accountability and safety.

Exam trap

Google Cloud often tests the misconception that 'more data is always better' or that 'smaller models are safer,' when in fact responsible deployment hinges on data quality, continuous monitoring, and layered safeguards rather than model size or data volume alone.

Full explanation →

260

MCQeasy

Refer to the exhibit. A company has this IAM policy on a Vertex AI project. Alice complains she cannot create a new model. What is the most likely reason?

A.She needs the roles/aiplatform.modelUser role

B.She needs the roles/aiplatform.admin role

C.She needs the roles/aiplatform.modelAdmin role

D.She needs the roles/aiplatform.modelCreator role

AnswerB

Admin role has full access, including model creation.

Why this answer

The IAM policy shown in the exhibit likely grants only basic permissions (e.g., roles/aiplatform.user) which do not include the ability to create models. The roles/aiplatform.admin role provides full administrative access, including model creation, deletion, and management across the Vertex AI project. Without this role, Alice lacks the necessary aiplatform.models.create permission, which is why she cannot create a new model.

Exam trap

Google Cloud often tests the misconception that there is a specific 'modelCreator' or 'modelAdmin' role, when in fact Vertex AI uses a flat admin role (roles/aiplatform.admin) for all create/delete operations, and candidates confuse custom role names with predefined ones.

How to eliminate wrong answers

Option A is wrong because roles/aiplatform.modelUser only grants read-only access to models (e.g., deploy and predict), not create permissions. Option C is wrong because roles/aiplatform.modelAdmin does not exist as a predefined role in Vertex AI; the correct role for model administration is roles/aiplatform.admin. Option D is wrong because roles/aiplatform.modelCreator is not a predefined IAM role in Vertex AI; model creation is covered by roles/aiplatform.admin or custom roles with aiplatform.models.create permission.

Full explanation →

261

MCQmedium

After fine-tuning a foundation model on company emails, the model outputs confidential information. What is the most likely cause?

A.The prompt is too vague

B.The model is too large

C.The fine-tuning dataset was not anonymized

D.Overfitting to the training data leading to memorization

AnswerC

Unanonymized data can be memorized and reproduced by the model.

Why this answer

Option C is correct because the most likely cause of a fine-tuned model outputting confidential information is that the fine-tuning dataset contained sensitive data that was not anonymized. During fine-tuning, the model learns patterns and can memorize specific sequences, including confidential details like names, addresses, or proprietary information, which it then reproduces in responses. This is a well-known data leakage risk in fine-tuning workflows.

Exam trap

Google Cloud often tests the distinction between a model's inherent behavior (like overfitting) and the root cause in the data pipeline, so candidates mistakenly choose overfitting (Option D) instead of recognizing that the dataset itself was the source of the confidential information.

How to eliminate wrong answers

Option A is wrong because a vague prompt may lead to irrelevant or generic outputs, but it does not directly cause the model to output specific confidential information that was not present in the training data. Option B is wrong because model size (number of parameters) does not inherently cause memorization of confidential data; memorization is a function of training data exposure and fine-tuning methodology, not model scale alone. Option D is wrong because while overfitting can lead to memorization, the root cause is the presence of unanonymized confidential data in the fine-tuning dataset; overfitting is a symptom, not the primary cause, and the question asks for the 'most likely cause'.

Full explanation →

262

MCQhard

A healthcare provider wants to use generative AI to automatically draft clinical notes from doctor-patient conversations. They must comply with HIPAA and ensure patient data privacy. Which strategy best meets their requirements?

A.Outsource note generation to a third-party HIPAA-compliant vendor

B.Use Google Cloud Healthcare API integrated with Vertex AI

C.Deploy a custom model on-premises with strict access controls

D.Use a public LLM with a data anonymization pipeline

AnswerB

The Healthcare API is HIPAA-compliant and allows secure AI processing.

Why this answer

Option B is correct because Google Cloud Healthcare API with Vertex AI provides a HIPAA-compliant, managed environment that integrates generative AI capabilities directly with healthcare data. The Healthcare API enforces data residency, access controls, and audit logging, while Vertex AI allows fine-tuning or using foundation models without exposing PHI to public endpoints. This combination ensures patient data privacy and regulatory compliance without requiring on-premises infrastructure.

Exam trap

Google Cloud often tests the misconception that on-premises deployment (Option C) is always the most secure choice, but the trap here is that cloud-native HIPAA-compliant services like Google Cloud Healthcare API can offer superior security, compliance, and scalability when properly configured with BAAs and data residency controls.

How to eliminate wrong answers

Option A is wrong because outsourcing to a third-party vendor introduces additional risk of data exposure during transmission and requires extensive Business Associate Agreements (BAAs) and due diligence, which may not fully align with the provider's direct control over privacy. Option C is wrong because deploying a custom model on-premises, while secure, is often cost-prohibitive and lacks the scalability and managed compliance features of cloud-native solutions like Google Cloud Healthcare API, which already handles HIPAA requirements. Option D is wrong because using a public LLM with a data anonymization pipeline is risky; anonymization is not foolproof and can be reversed via re-identification attacks, and public LLMs typically do not offer HIPAA-compliant data processing guarantees, violating privacy requirements.

Full explanation →

263

MCQmedium

A company is building a search application that requires grounding answers in their internal knowledge base. They want to use Vertex AI Search and Conversation with a custom datastore. Which configuration is essential to ensure the model only answers based on their documents?

A.Enable streaming responses to get real-time answers.

B.Fine-tune the model on the company's documents.

C.Configure the answer generation to use grounding with the enterprise datastore as the source.

D.Set the model's temperature to 0 to make responses deterministic.

AnswerC

C is correct because grounding ties the answer to the datastore content.

Why this answer

Option C is correct because Vertex AI Search and Conversation provides a built-in grounding capability that explicitly ties answer generation to a specified enterprise datastore. By configuring grounding with the custom datastore as the source, the model is constrained to retrieve and synthesize answers exclusively from the indexed documents, preventing reliance on its parametric knowledge or external sources.

Exam trap

Google Cloud often tests the distinction between techniques that influence output style (temperature, streaming) versus those that control knowledge sources (grounding), leading candidates to confuse deterministic generation with factual grounding.

How to eliminate wrong answers

Option A is wrong because enabling streaming responses controls the delivery mechanism (real-time token-by-token output) but does not restrict the model's knowledge source; it can still generate answers from its training data. Option B is wrong because fine-tuning adapts the model's weights to the company's documents, which can improve relevance but does not guarantee grounding—the model may still hallucinate or use pre-training knowledge, and Vertex AI Search does not require fine-tuning for retrieval-augmented generation. Option D is wrong because setting temperature to 0 makes responses deterministic (low randomness) but does not enforce grounding; the model can still confidently produce incorrect answers from its internal knowledge.

Full explanation →

264

Multi-Selecthard

Which THREE factors should be considered when selecting a foundation model for a generative AI application in a regulated industry?

Select 3 answers

A.Transparency of the model's training data and sources

B.Support for data residency and sovereignty requirements

C.Latency and throughput requirements

D.Size of the model in terms of parameters

E.Bias and fairness evaluation results

AnswersA, B, E

Regulated industries require understanding of data provenance to ensure compliance.

Why this answer

Option A is correct because in regulated industries (e.g., healthcare, finance), transparency of training data and sources is critical for compliance with regulations like GDPR or HIPAA. Without knowing the provenance and composition of the training data, an organization cannot audit for prohibited content, verify consent, or ensure the model does not inadvertently expose sensitive information. This transparency directly impacts the ability to perform due diligence and meet legal obligations for data usage.

Exam trap

Google Cloud often tests the misconception that technical performance metrics (like latency or parameter count) are primary selection criteria for regulated industries, when in fact governance factors like transparency, data residency, and bias evaluation are the non-negotiable requirements.

Full explanation →

265

Multi-Selectmedium

A team wants to reduce hallucinations in a question-answering model. Which THREE techniques should they consider?

Select 3 answers

A.Fine-tune the model on a curated factual dataset

B.Use retrieval-augmented generation (RAG)

C.Apply prompt engineering with specific instructions to cite sources

D.Reduce the number of tokens in output

E.Increase the temperature parameter

AnswersA, B, C

Fine-tuning on factual data improves accuracy.

Why this answer

Fine-tuning on a curated factual dataset directly adjusts the model's weights to prioritize accurate, domain-specific knowledge, reducing the likelihood of generating unsupported or hallucinated content. This technique anchors the model's output in verified data, making it more reliable for question-answering tasks.

Exam trap

Google Cloud often tests the misconception that reducing output length or increasing randomness (temperature) can improve factual accuracy, when in reality these parameters control style and creativity, not truthfulness.

Full explanation →

266

MCQmedium

A financial technology company has deployed a custom-tuned PaLM 2 model on Vertex AI to generate personalized investment recommendations for retail clients. The model was fine-tuned on a corpus of historical market data and advisory transcripts. Recently, the compliance team flagged that several recommendations contradicted SEC guidelines, and the model sometimes repeated prohibited statements from outdated training materials. The team has already implemented safety filters (e.g., blocking toxic content) and adjusted the model's system instructions to be more conservative. However, the issues persist. The model's deployment parameters are: temperature=0.4, top_p=0.9, max_output_tokens=500, and no grounding. The company must maintain compliance without significantly increasing latency. What should they do next?

A.Increase temperature to 0.7 to allow more diverse responses, and add a second model to verify outputs

B.Perform an additional fine-tuning round exclusively on the most recent SEC regulatory filings and compliance-approved content

C.Implement a chain-of-thought prompting technique that requires the model to explain its reasoning step by step

D.Configure Vertex AI grounding using a curated data store of real-time SEC regulations and market data

AnswerD

Grounding with an authoritative, live data source directly ensures outputs comply with current regulations and eliminates reliance on outdated training data.

Why this answer

Option D is correct because grounding with real-time authoritative sources (e.g., up-to-date SEC regulations and market data) ensures outputs are based on current, compliant information, directly addressing the root cause of outdated or prohibited content. Option A (fine-tuning) may introduce bias and doesn't guarantee real-time accuracy. Option B (temperature increase) would worsen variability.

Option C (chain-of-thought) can improve reasoning but does not anchor to current compliance data.

Full explanation →

267

MCQeasy

A startup is building a customer service chatbot that generates responses in real-time. They want the model to have up-to-date information on the latest product catalog but cannot afford frequent fine-tuning. Which technique should they use to inject current data into the model without retraining?

A.Rely on the model's zero-shot capabilities to infer product details.

B.Use retrieval-augmented generation (RAG) to fetch relevant documents from a vector database at inference time.

C.Craft detailed system prompts that include the entire product catalog in the prompt.

D.Fine-tune the base model weekly on the latest product catalog.

AnswerB

RAG enables the model to access external, up-to-date information without retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct technique because it allows the chatbot to fetch the most current product catalog entries from an external vector database at inference time, without requiring any model retraining. This keeps responses grounded in up-to-date information while avoiding the cost and latency of frequent fine-tuning.

Exam trap

Google Cloud often tests the distinction between in-context learning (via RAG or prompt engineering) and parametric knowledge (via fine-tuning), trapping candidates who think that simply adding more data to the prompt is scalable or that zero-shot inference can substitute for external retrieval.

How to eliminate wrong answers

Option A is wrong because zero-shot capabilities rely solely on the model's pre-existing knowledge, which cannot incorporate new or updated product catalog details without retraining. Option C is wrong because crafting detailed system prompts with the entire product catalog would exceed the model's context window limits and incur high token costs, making it impractical for real-time inference. Option D is wrong because fine-tuning weekly is expensive, time-consuming, and contradicts the requirement to avoid frequent retraining; it also risks catastrophic forgetting of previously learned information.

Full explanation →

268

MCQmedium

An ML engineer sees the above deployment output. The business wants to reduce inference cost. Which action should they take?

A.Use a larger model

B.Change to a lower-cost machine type

C.Deploy to multiple regions

D.Increase traffic split

AnswerB

Using a smaller machine type reduces per-request compute cost.

Why this answer

Option B is correct because switching to a lower-cost machine type directly reduces the per-request compute cost without altering the model architecture or inference logic. This is a common cost-optimization strategy in cloud-based ML deployments, where instance types (e.g., from GPU to CPU or from a larger to a smaller GPU) can be selected based on latency and throughput requirements, provided the model fits within the machine's memory and compute constraints.

Exam trap

Google Cloud often tests the misconception that 'more resources' (larger model, more regions) always improves performance, but here the business goal is cost reduction, so the correct action is to downsize infrastructure while maintaining acceptable quality.

How to eliminate wrong answers

Option A is wrong because using a larger model increases both memory footprint and compute operations per inference, which raises cost and latency—the opposite of the business goal. Option C is wrong because deploying to multiple regions adds infrastructure overhead, data transfer costs, and management complexity, increasing rather than reducing inference cost. Option D is wrong because increasing traffic split (e.g., routing more requests to a shadow or canary deployment) does not reduce cost; it may increase resource utilization or require additional compute capacity.

Full explanation →

269

MCQeasy

A retail company wants to build a chatbot that answers product questions and provides personalized recommendations. They have a small labeled dataset and limited ML expertise. Which approach should they take?

A.Fine-tune Gemini with their product data using Vertex AI Generative AI Studio.

B.Build a custom transformer model using TensorFlow on Vertex AI Workbench.

C.Use BigQuery ML to train a classification model on customer queries.

D.Use Vertex AI Agent Builder with a pre-built agent and integrate their product catalog via Search and Conversation.

AnswerD

A is correct because it leverages managed services with minimal ML effort.

Why this answer

Option D is correct because Vertex AI Agent Builder provides a pre-built agent framework that integrates with Search and Conversation, allowing the company to quickly deploy a chatbot using their product catalog without needing extensive ML expertise. This approach leverages Google's foundation models and retrieval-augmented generation (RAG) to answer product questions and generate personalized recommendations, making it ideal for a small labeled dataset and limited ML resources.

Exam trap

Google Cloud often tests the misconception that fine-tuning or custom model building is necessary for domain-specific tasks, when in fact pre-built agent frameworks with RAG can achieve the same goal with far less data and expertise.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini with a small labeled dataset risks overfitting and requires significant ML expertise to manage the fine-tuning pipeline, which the company lacks. Option B is wrong because building a custom transformer model from scratch using TensorFlow on Vertex AI Workbench demands deep ML expertise and large datasets, contradicting the company's constraints. Option C is wrong because BigQuery ML is designed for structured data classification (e.g., SQL-based models), not for building conversational chatbots that handle natural language queries and recommendations.

Full explanation →

270

MCQeasy

A retailer wants to use generative AI to write product descriptions automatically. They have a large dataset of existing product descriptions and need to customize a foundation model for their brand voice. Which Vertex AI feature should they use?

A.Vertex AI Search with grounding

B.Prompt design with the Gemini API directly

C.Vertex AI Model Evaluation

D.Vertex AI custom model tuning

AnswerD

Tuning adapts the model to the retailer's brand voice using their dataset.

Why this answer

Tuning (like supervised fine-tuning) allows customizing a pretrained model with your own data. Option A is wrong because prompt design is a technique, not a Vertex AI feature. Option B is wrong because Vertex AI Search is for search, not text generation customization.

Option D is wrong because Model Evaluation assesses performance, not customization.

Full explanation →

271

MCQeasy

A company is evaluating whether to build a custom generative AI solution from scratch or use a pre-built API from a cloud provider. Which factor most strongly supports the build-from-scratch approach?

A.The team has limited machine learning expertise.

B.Speed to market is the top priority.

C.Minimizing initial development cost is critical.

D.The solution requires deep integration with proprietary data and unique domain-specific outputs.

AnswerD

Custom models can be fine-tuned on proprietary data for unique needs.

Why this answer

Building a custom generative AI solution from scratch is most strongly supported when deep integration with proprietary data and unique domain-specific outputs is required. Pre-built APIs are typically trained on general data and may not capture the nuances of specialized domains, whereas a custom model can be fine-tuned or trained from scratch on proprietary datasets to achieve higher accuracy and relevance for unique business needs.

Exam trap

The trap here is that candidates may confuse 'minimizing cost' (Option C) with long-term total cost of ownership, but Cisco specifically tests the immediate strategic driver for build vs. buy, which is the need for proprietary data integration and unique outputs.

How to eliminate wrong answers

Option A is wrong because limited ML expertise would favor using a pre-built API to avoid the complexity of model training, infrastructure management, and hyperparameter tuning. Option B is wrong because speed to market is a key advantage of pre-built APIs, which offer immediate access to generative capabilities without the months of development required for a custom solution. Option C is wrong because minimizing initial development cost typically favors pre-built APIs, which have lower upfront investment compared to the significant costs of data preparation, compute resources, and specialized talent needed for building from scratch.

Full explanation →

272

MCQmedium

A data scientist is trying to get online predictions from a Vertex AI endpoint but receives the error shown. What is the most likely cause?

A.The region in the request does not match the endpoint region

B.The model has not been deployed to the specified endpoint

C.The endpoint ID is incorrect

D.The model ID is incorrect

AnswerB

The error message directly states the model is not deployed to the endpoint.

Why this answer

The error indicates that the model is not deployed to the endpoint. In Vertex AI, an endpoint is a resource that hosts one or more deployed models. If a model has not been deployed to the endpoint, any prediction request to that endpoint will fail with a 'model not found' or similar error, even if the endpoint ID and region are correct.

Exam trap

Google Cloud often tests the distinction between endpoint existence and model deployment, where candidates confuse a valid endpoint ID with the requirement that a model must be explicitly deployed to that endpoint before predictions can be served.

How to eliminate wrong answers

Option A is wrong because if the region in the request did not match the endpoint region, the error would typically be a 'region mismatch' or 'not found' error at the API routing level, not a model deployment error. Option C is wrong because an incorrect endpoint ID would result in a '404 Not Found' or 'endpoint not found' error, not a model deployment error. Option D is wrong because the model ID is not directly used in the prediction request to an endpoint; the endpoint routes to the deployed model, so an incorrect model ID would not cause this specific error unless the model was never deployed.

Full explanation →

273

MCQmedium

A developer wants to build a RAG application using Vertex AI. Which vector database is natively integrated with Vertex AI for storing embeddings?

A.Firestore

B.Vertex AI Vector Search

C.Cloud SQL

D.Bigtable

AnswerB

Vector Search is purpose-built for storing and querying embeddings.

Why this answer

Vertex AI Vector Search is the native vector database integrated with Vertex AI for storing and querying embeddings. It is purpose-built for high-dimensional vector similarity search, enabling efficient retrieval in RAG applications without requiring external infrastructure.

Exam trap

Google Cloud often tests the misconception that any database can store embeddings equally well, but the key differentiator is native vector indexing and ANN search support, which only Vertex AI Vector Search provides among the listed options.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database designed for storing structured data, not optimized for vector similarity search or embedding storage. Option C is wrong because Cloud SQL is a relational database service (MySQL, PostgreSQL, SQL Server) that lacks native vector indexing and similarity search capabilities required for RAG. Option D is wrong because Bigtable is a wide-column NoSQL database for large-scale analytical workloads, not designed for low-latency vector similarity queries.

Full explanation →

274

MCQmedium

A developer deployed a large language model on Vertex AI for real-time chat. Users report slow response times. The model generates sentences one word at a time. Which optimization should be applied to reduce latency?

A.Batch multiple user queries together.

B.Deploy the model with more accelerators.

C.Enable prompt caching to reuse previous queries.

D.Use streaming responses to start output earlier.

AnswerD

Streaming sends tokens as they are generated, reducing the wait for the full response.

Why this answer

Enabling streaming allows the model to output tokens progressively, reducing perceived latency. Option A is wrong because prompt caching doesn't speed up generation. Option C is wrong because batching increases latency for real-time.

Option D is wrong because more accelerators may actually increase overhead without optimization.

Full explanation →

275

MCQmedium

A company wants to build a chatbot that answers questions based on internal documents. Which approach is most appropriate?

A.Use a pre-trained model without any customizations

B.Train a custom model from scratch

C.Fine-tune a model on the documents

D.Use a prompt with the documents in the context

AnswerD

This is the core of RAG: provide relevant documents in the prompt to ground the model's answers.

Why this answer

Option D is correct because Retrieval-Augmented Generation (RAG) allows the chatbot to dynamically include relevant internal documents in the prompt context without modifying the underlying model. This approach leverages the pre-trained model's language understanding while grounding answers in specific, up-to-date internal data, avoiding the cost and latency of fine-tuning or retraining.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the only way to incorporate proprietary data, but RAG is the most appropriate for dynamic, retrieval-based Q&A because it avoids retraining and keeps the model's knowledge current.

How to eliminate wrong answers

Option A is wrong because a pre-trained model without customization lacks access to the company's internal documents, leading to hallucinated or generic answers not grounded in proprietary data. Option B is wrong because training a custom model from scratch is computationally prohibitive and unnecessary; it requires massive labeled datasets and resources, whereas RAG achieves the same goal with far less effort. Option C is wrong because fine-tuning on documents teaches the model to memorize specific content, which is inefficient for large, frequently updated document sets and risks catastrophic forgetting, whereas RAG keeps the model static and retrieves fresh context per query.

Full explanation →

276

MCQmedium

A company wants to build a customer support chatbot that answers based on internal documentation. They use Vertex AI Search and want to ensure the model only uses retrieved documents. What should they do?

A.Fine-tune the model on the documentation

B.Enable grounding with Vertex AI Search

C.Increase max output tokens

D.Set temperature to 0.0

AnswerB

Ground forces the model to answer based on provided context.

Why this answer

Grounding with Vertex AI Search restricts the model to the retrieved context, preventing it from using internal knowledge. Setting temperature to 0.0 reduces creativity but doesn't enforce retrieval. Fine-tuning is unnecessary, and increasing max tokens doesn't affect retrieval.

Full explanation →

277

MCQhard

A team is building a medical diagnosis assistant using a foundation model. To comply with regulations, they need to ensure the model does not make up facts. What is the best approach?

A.Use a small model to hallucinate less

B.Use grounding with Vertex AI Search

C.Reduce temperature to 0

D.Fine-tune on medical journals

AnswerB

Grounding provides verifiable citations and reduces fabrication.

Why this answer

Grounding with Vertex AI Search is the best approach because it connects the foundation model's outputs to a verifiable, curated knowledge base, ensuring factual accuracy and compliance with regulations that prohibit hallucination. By retrieving information from a trusted source (e.g., medical databases) in real time, the model can cite evidence and avoid generating unverified claims.

Exam trap

Google Cloud often tests the misconception that reducing temperature or using a smaller model can eliminate hallucination, when in fact only grounding with external, verifiable data sources can reliably prevent fact fabrication in high-stakes domains.

How to eliminate wrong answers

Option A is wrong because using a smaller model does not inherently reduce hallucination; smaller models have less capacity and may actually hallucinate more due to limited training data and weaker reasoning. Option C is wrong because reducing temperature to 0 makes the model deterministic but does not prevent it from generating plausible-sounding but false information; it still relies on its parametric knowledge, which can be incomplete or outdated. Option D is wrong because fine-tuning on medical journals alone does not guarantee factual accuracy; the model may memorize and reproduce errors, and it cannot dynamically verify facts against a live, authoritative source.

Full explanation →

278

MCQhard

A media company uses generative AI to produce personalized news summaries. They notice that summaries occasionally contain factual errors and biased language. What business strategy should they implement to address these issues while maintaining user engagement?

A.Disable personalization and serve generic summaries to all users.

B.Allow users to flag errors and manually correct summaries in real-time.

C.Implement a human review layer for high-risk topics and use automated fact-checking for all content, with a feedback loop for model improvement.

D.Replace AI with entirely human-written summaries.

AnswerC

This ensures accuracy and allows continuous improvement.

Why this answer

Option C is correct because it balances accuracy and engagement by combining automated fact-checking with human review for high-risk topics. This hybrid approach reduces factual errors and biased language while maintaining the personalization that drives user engagement. The feedback loop continuously improves the model, addressing root causes rather than just symptoms.

Exam trap

Google Cloud often tests the misconception that either full automation or full human oversight is the only solution, when the correct answer is a hybrid approach that leverages the strengths of both AI and human judgment.

How to eliminate wrong answers

Option A is wrong because disabling personalization eliminates the core value proposition of generative AI for news summaries, likely reducing user engagement significantly without addressing the underlying model flaws. Option B is wrong because allowing real-time manual corrections by users is impractical at scale, introduces latency, and does not prevent errors from reaching users in the first place; it also lacks a systematic feedback mechanism for model improvement. Option D is wrong because replacing AI with entirely human-written summaries is cost-prohibitive, slow, and defeats the purpose of using generative AI for scalability and personalization.

Full explanation →

279

MCQeasy

A marketing team wants to generate product descriptions using a text generation model on Vertex AI. They need consistent output style across all descriptions, including tone and length. They have a small set of 10 high-quality example descriptions that capture the desired style. The team has limited ML expertise and wants a quick solution that does not require model retraining. Which approach should they use?

A.Use a pre-built template with no model input.

B.Fine-tune the model on a large external dataset of product descriptions.

C.Use few-shot prompting with the examples in the prompt.

D.Set the temperature to 0.9 to maximize creativity.

AnswerC

Few-shot prompting directly leverages examples to achieve consistent style without retraining.

Why this answer

Few-shot prompting is the correct approach because it allows the team to inject the desired style, tone, and length directly into the prompt using the 10 high-quality examples, without any model retraining. This technique leverages the in-context learning capability of large language models on Vertex AI, enabling consistent output from a small set of demonstrations. It is ideal for teams with limited ML expertise as it requires only prompt engineering, not fine-tuning or infrastructure changes.

Exam trap

Google Cloud often tests the misconception that higher temperature always improves output quality, but the trap here is that temperature controls randomness, not consistency, so candidates may incorrectly choose Option D without understanding that low temperature is required for reproducible style and length.

How to eliminate wrong answers

Option A is wrong because a pre-built template with no model input cannot generate dynamic, context-aware product descriptions; it produces static text that lacks the flexibility and nuance of a generative model. Option B is wrong because fine-tuning on a large external dataset would require significant ML expertise, data preparation, and compute resources, contradicting the requirement for a quick solution without model retraining. Option D is wrong because setting temperature to 0.9 maximizes randomness and creativity, which is the opposite of what is needed for consistent output style; a lower temperature (e.g., 0.2) would be more appropriate for deterministic, reproducible results.

Full explanation →

280

MCQeasy

A data scientist wants to generate realistic product images for an online catalog using Google Cloud's generative AI. Which service should they use?

A.Imagen on Vertex AI

B.Codey API for code generation

C.Gemini API with text-to-text prompts

D.Vertex AI Model Garden without a specific model

AnswerA

Imagen is purpose-built for image generation.

Why this answer

Option B is correct because Imagen is Google Cloud's image generation model. Option A is wrong because Gemini is a multimodal model but not specialized for image generation. Option C is wrong because Codey is for code generation.

Option D is wrong because Vertex AI Model Garden is a model repository, not a specific generation service.

Full explanation →

281

Multi-Selectmedium

Which TWO techniques are effective for reducing bias in generative AI model outputs?

Select 2 answers

A.Increasing model size to learn more patterns

B.Training on diverse and representative datasets

C.Relying solely on post-hoc filters

D.Using adversarial debiasing methods during fine-tuning

E.Limiting the model to only factual prompts

AnswersB, D

Correct: Diverse data helps reduce biased associations.

Why this answer

Option B is correct because training on diverse and representative datasets directly reduces sampling bias and coverage gaps in the training distribution, which are primary sources of stereotypical or skewed outputs. By ensuring the model sees balanced examples across demographics, contexts, and edge cases, it learns more equitable representations and reduces the likelihood of generating biased content.

Exam trap

Google Cloud often tests the misconception that increasing model size or adding post-hoc filters is sufficient to mitigate bias, when in reality these approaches fail to address the root causes of bias in training data and model representations.

Full explanation →

282

MCQhard

A financial institution deploys a chatbot using Gemini Pro in Vertex AI. Compliance requires logging all user inputs and model outputs for audit. Which approach meets this requirement?

A.Capture logs via Cloud Monitoring

B.Enable Vertex AI Endpoint request-response logging

C.Use Cloud Logging sink with a filter for Vertex AI requests

D.Enable Vertex AI Model Registry logging

AnswerB

This captures every request and response for the deployed model, meeting audit requirements.

Why this answer

Vertex AI Endpoint request-response logging captures both the user's input prompt and the model's generated output, which is precisely what compliance auditing requires. This feature logs the exact payloads sent to and received from the deployed model, ensuring a complete audit trail without additional configuration.

Exam trap

The trap here is that candidates confuse Cloud Logging sinks or Cloud Monitoring with the specific Vertex AI feature that must be explicitly enabled on the endpoint, assuming that default logging captures request-response payloads when it does not.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is designed for metrics, alerts, and dashboards, not for capturing detailed request-response payloads for audit compliance. Option C is wrong because a Cloud Logging sink with a filter can only export logs that already exist; it does not enable the capture of Vertex AI request-response logs, which must be explicitly enabled on the endpoint. Option D is wrong because Vertex AI Model Registry logging tracks model version metadata and lifecycle events, not the user inputs and model outputs from inference calls.

Full explanation →

283

Multi-Selecthard

A financial analyst uses generative AI to summarize earnings reports. The summaries vary in style. Which THREE methods can improve consistency? (Choose three.)

Select 3 answers

A.Set temperature to 0.2

B.Increase max output tokens

C.Enable citation mode

D.Use few-shot prompting with fixed examples

E.Fine-tune on a curated dataset of desired summaries

AnswersA, D, E

Reduces output randomness.

Why this answer

Fine-tuning on curated data aligns the model to a desired style, few-shot prompting provides consistent examples, and low temperature reduces randomness. Increasing max tokens does not affect style, and citation mode adds references but not style consistency.

Full explanation →

284

Multi-Selecteasy

A developer is using the Vertex AI PaLM API to generate code. They want to ensure the output is safe and adheres to company policies. Which THREE attributes can they configure in the safety_settings parameter?

Select 3 answers

A.Language detection

B.Sentiment analysis

C.Toxicity

D.Harassment

E.Sexually explicit content

AnswersC, D, E

Toxicity is a standard safety category.

Why this answer

B, D, and E are standard safety categories for PaLM API. A and C are not available as safety categories.

Full explanation →

285

MCQmedium

A user reports that the model's response to the same prompt varies significantly across different calls. Which parameter change would most likely reduce variability?

A.Decrease topK to 10.

B.Decrease temperature to 0.2.

C.Increase candidateCount to 3.

D.Increase maxOutputTokens to 2000.

AnswerB

Lower temperature reduces randomness, making outputs more consistent.

Why this answer

Option A is correct because decreasing temperature reduces randomness and makes outputs more deterministic. Option B (increase maxOutputTokens) does not affect variability. Option C (increase candidateCount) returns multiple candidates but each still varies.

Option D (decrease topK) reduces diversity but temperature has a stronger effect.

Full explanation →

286

MCQhard

A multimodal generative AI system processes both image and text inputs to produce captions. During inference, the image encoder sometimes produces noisy or missing features. Which architectural design decision best handles such input degradation without retraining?

A.Train a separate variational autoencoder to produce a clean latent representation from the noisy image.

B.Increase the image encoder’s capacity to better extract robust features.

C.Apply standard image preprocessing (e.g., denoising) to all inputs before feeding to the encoder.

D.Introduce a gating mechanism that learns to weigh image features based on confidence scores from the encoder.

AnswerD

Gating allows the model to ignore unreliable features dynamically.

Why this answer

Option D is correct because a gating mechanism dynamically adjusts the contribution of image features based on confidence scores from the encoder, allowing the model to gracefully handle noisy or missing features without retraining. This architectural design learns to suppress unreliable image inputs and rely more on text or other modalities, ensuring robust caption generation under input degradation.

Exam trap

Google Cloud often tests the misconception that preprocessing or model capacity adjustments are the only ways to handle input noise, but the key insight is that architectural mechanisms like gating can adaptively handle degradation at inference time without retraining.

How to eliminate wrong answers

Option A is wrong because training a separate variational autoencoder (VAE) to produce clean latent representations requires additional training data and retraining, which contradicts the 'without retraining' constraint; it also adds complexity without addressing dynamic degradation during inference. Option B is wrong because increasing the image encoder’s capacity does not inherently handle noisy or missing features—it may overfit to training data and still produce unreliable outputs when inputs degrade, and it requires retraining to change capacity. Option C is wrong because standard image preprocessing like denoising is a fixed, non-adaptive approach that cannot compensate for missing features or varying noise levels, and it may discard useful information; it also does not leverage the model’s ability to learn confidence-based weighting.

Full explanation →

287

MCQhard

After fine-tuning a model on customer support data, the model starts using profanity. What is the most effective mitigation?

A.Add profanity to training data as negative examples

B.Reduce learning rate and retrain

C.Increase temperature to reduce confidence

D.Enable a safety attribute filter

AnswerD

Blocks profanity in real-time without retraining.

Why this answer

Enabling a safety attribute filter blocks profanity at inference time. Reducing learning rate may not help, adding profanity as negative examples could help but is less immediate, and increasing temperature increases randomness, potentially worsening the issue.

Full explanation →

288

MCQhard

A research lab is fine-tuning a large language model on a small dataset of medical records. They observe that the model overfits, memorizing specific patient details and producing outputs that violate privacy regulations. Which technique should they apply to improve generalization and reduce memorization?

A.Increase the batch size to 64

B.Increase the number of training epochs

C.Use early stopping based on validation loss

D.Apply differential privacy (DP-SGD) during fine-tuning

AnswerD

DP-SGD bounds the influence of any single example, reducing memorization and improving privacy.

Why this answer

Differential privacy (DP-SGD) is the correct technique because it directly addresses memorization of sensitive patient data by adding calibrated noise to the gradient updates during fine-tuning. This bounds the model's ability to encode any single individual's information, improving generalization and ensuring compliance with privacy regulations like HIPAA.

Exam trap

Google Cloud often tests the misconception that early stopping or batch size adjustments can prevent memorization, when in fact only techniques like differential privacy directly bound the influence of individual training examples.

How to eliminate wrong answers

Option A is wrong because increasing batch size to 64 reduces gradient variance but does not prevent memorization of specific patient details; it may even accelerate overfitting on a small dataset. Option B is wrong because increasing the number of training epochs exacerbates overfitting, causing the model to memorize more training examples and worsen privacy violations. Option C is wrong because early stopping based on validation loss only halts training when validation performance degrades, but it does not impose any privacy guarantee or fundamentally limit memorization of unique patient records.

Full explanation →

289

MCQmedium

A team wants to improve the factual accuracy of their chatbot responses regarding internal company policies. What is the most effective approach?

A.Use few-shot prompting with example Q&A pairs

B.Increase the model's maximum tokens

C.Fine-tune the model on policy documents

D.Use RAG with Vertex AI Search indexing the policies

AnswerD

Correct: RAG retrieves fresh data from indexed policies, ensuring factual accuracy.

Why this answer

RAG with Vertex AI Search retrieves current policy documents, providing authoritative context. Fine-tuning may not capture frequent updates, and other options do not integrate live knowledge.

Full explanation →

290

MCQeasy

A company notices that their AI chatbot occasionally generates incorrect information. Which technique can best reduce hallucinations without retraining?

A.Use a longer system prompt without examples

B.Use system instructions to constrain the model to only answer from provided context

C.Set top_p to 0.1

D.Increase temperature to 0.9

AnswerB

Correct: This confines the model to the given context, minimizing hallucination.

Why this answer

Using system instructions to constrain the model to the provided context directly reduces fabrication. Increasing temperature or setting top_p low would not specifically target hallucinations.

Full explanation →

291

Multi-Selecthard

Which THREE benefits does Vertex AI Agent Builder provide over building a custom conversational agent from scratch?

Select 3 answers

A.Automatic scaling and load balancing

B.Pre-built integration for grounding on enterprise data sources

C.Full control over the underlying ML model architecture

D.Built-in safety filters and guardrails

E.Guaranteed lower inference latency

AnswersA, B, D

Managed service scales according to demand without manual intervention.

Why this answer

Options B, C, and E are correct. Pre-built grounding with your data reduces development effort; built-in safety filters ensure compliance; automatic scaling handles traffic without manual ops. Option A (full control over ML models) is more true for custom builds.

Option D (lower latency) is not guaranteed; custom builds can optimize latency.

Full explanation →

292

MCQeasy

A company wants to estimate the total cost of ownership (TCO) for a gen AI solution on Google Cloud. Which factors are most important?

A.Only model training cost

B.Compute, storage, and API call costs

C.Only inference cost

D.Only compute cost

AnswerB

These three categories cover the primary cost drivers in a gen AI solution.

Why this answer

Option B is correct because the total cost of ownership (TCO) for a generative AI solution on Google Cloud encompasses all operational expenses, including compute (e.g., TPU/GPU instances for training and inference), storage (e.g., Cloud Storage for datasets and model artifacts), and API call costs (e.g., Vertex AI prediction requests). Focusing on a single cost component, such as training or inference alone, ignores the recurring expenses of serving the model and storing data, which often dominate long-term TCO.

Exam trap

Google Cloud often tests the misconception that TCO is dominated by a single cost factor (e.g., training), when in reality, inference and API costs frequently surpass training expenses in production deployments.

How to eliminate wrong answers

Option A is wrong because it ignores inference, storage, and API costs, which are significant for production gen AI solutions where models are queried repeatedly. Option C is wrong because inference cost is only one part of TCO; training, storage, and API overhead also contribute heavily, especially with large models like PaLM 2 or Gemini. Option D is wrong because compute cost alone excludes storage (e.g., model checkpoints, training data) and API call fees (e.g., per-token billing for Vertex AI), leading to an incomplete TCO estimate.

Full explanation →

293

MCQmedium

A company is using Vertex AI Model Registry to manage multiple versions of its custom generative model. They want to automatically route a percentage of traffic to a new model version for testing. What should they do?

A.Set up a Cloud Tasks queue to distribute requests

B.Create a new endpoint for each version

C.Deploy both versions to the same endpoint and adjust traffic split settings

D.Use a load balancer in front of the endpoints

AnswerC

Vertex AI endpoints allow splitting traffic percentage across deployed models.

Why this answer

Vertex AI Endpoints support traffic splitting between model versions.

Full explanation →

294

Multi-Selecthard

Which TWO strategies can effectively reduce the operational costs of a generative AI model in production without significantly degrading user experience?

Select 2 answers

A.Use larger batch sizes for inference

B.Increase the frequency of model retraining to improve efficiency

C.Cache frequent prompt completions

D.Adopt a pay-per-use pricing model instead of a flat rate

E.Deploy multiple models and route requests by complexity

AnswersC, D

Caching reduces duplicate inference calls, lowering cost.

Why this answer

Caching frequent prompt completions reduces operational costs by eliminating redundant inference calls for identical or similar user requests. This directly lowers compute usage and latency without degrading user experience, as cached responses are served instantly. It is a common optimization in production LLM deployments, especially for high-traffic applications with repetitive queries.

Exam trap

Google Cloud often tests the misconception that increasing batch sizes or retraining frequency inherently reduces costs, when in fact these actions typically increase resource usage or introduce operational overhead without guaranteeing cost savings.

Full explanation →

295

MCQeasy

A startup wants to build a generative AI application for customer support. Their main concern is cost control while maintaining low latency. Which Google Cloud service is most suitable for deploying their custom model?

A.BigQuery ML

B.Cloud Run

C.Vertex AI Workbench

D.Vertex AI Prediction

AnswerD

Vertex AI Prediction provides autoscaling online prediction endpoints with low latency, ideal for cost-sensitive production.

Why this answer

Vertex AI Prediction is the correct choice because it provides a fully managed, serverless endpoint for deploying custom models with autoscaling to zero, which directly addresses the startup's need for cost control by only charging for compute resources when the endpoint serves predictions. It also supports low latency through optimized prediction containers and can leverage GPUs or TPUs for inference, making it ideal for real-time customer support applications.

Exam trap

The trap here is that candidates often confuse development tools (like Vertex AI Workbench) or batch inference services (like BigQuery ML) with production deployment services, overlooking that Vertex AI Prediction is the only option purpose-built for serving custom models with cost-efficient, low-latency inference.

How to eliminate wrong answers

Option A is wrong because BigQuery ML is designed for training and executing machine learning models using SQL queries directly within BigQuery, not for deploying custom models as low-latency, real-time prediction endpoints; it is more suited for batch inference on large datasets. Option B is wrong because Cloud Run is a serverless compute platform for running stateless containers, but it lacks native support for model serving optimizations like GPU acceleration, model versioning, and autoscaling tailored to inference workloads, which are critical for cost-effective, low-latency predictions. Option C is wrong because Vertex AI Workbench is a Jupyter-based development environment for building and training models, not a deployment service; it does not provide managed prediction endpoints or the infrastructure for serving custom models in production.

Full explanation →

296

MCQmedium

The exhibit shows the output of describing a model on Vertex AI. What does 'modelSource: MODEL_GARDEN' indicate about this model?

A.The model was imported from the Vertex AI Model Garden.

B.The model was trained on Vertex AI from scratch.

C.The model has been exported to Model Garden.

D.The model was fine-tuned using AutoML.

AnswerA

MODEL_GARDEN indicates it's a Model Garden model.

Why this answer

MODEL_GARDEN means the model was imported from Model Garden (a pre-trained model). It does not mean it was trained on Vertex AI, exported, or fine-tuned.

Full explanation →

297

MCQhard

A streaming platform uses a large generative model for personalized content suggestions. Budget constraints require minimizing inference costs without significantly degrading quality. Which approach is most effective?

A.Deploy the model on higher-end accelerators to save time.

B.Use a distilled version of the model.

C.Implement stronger safety filters to reduce output length.

D.Cache frequent prompts to avoid regeneration.

AnswerB

Distilled models are smaller, faster, and cheaper with comparable quality for many tasks.

Why this answer

Using a smaller distilled model reduces compute cost with minimal quality loss for recommendation tasks. Option A is wrong because stronger safety filters don't reduce cost. Option B is wrong because caching is limited.

Option D is wrong because more accelerators increase cost.

Full explanation →

298

MCQeasy

A project manager wants to understand which Google Cloud generative AI services are subject to the 'Prohibited Use' policy. Where can they find the most up-to-date information?

A.Google Cloud documentation

B.Google's AI Principles

C.The Google Cloud Acceptable Use Policy

D.The Gemini Terms of Service

AnswerC

This policy explicitly lists prohibited uses for all Google Cloud services.

Why this answer

The Google Cloud Acceptable Use Policy outlines prohibited uses for all services, including generative AI. Other documents may not be the authoritative source.

Full explanation →

299

MCQeasy

A developer is using the Gemini API to generate code snippets. They notice the outputs often contain deprecated API calls. Which parameter adjustment or prompt strategy would most effectively encourage the model to use current APIs?

A.Add a system instruction specifying 'Use the most recent API version and avoid deprecated functions.'

B.Set top-p to 0.5 to reduce output diversity

C.Provide one few-shot example of a correct API call

D.Set temperature to 1.5 to increase creativity

AnswerA

System instructions provide explicit guidance to the model on desired behavior.

Why this answer

Option A is correct because including context in the system instruction (e.g., 'Use only the latest stable version of the API') directly guides the model. Option B is wrong because temperature affects randomness, not temporal awareness. Option C is wrong because few-shot examples can help if they show current APIs, but the instruction is more direct.

Option D is wrong because top-p is about nucleus sampling, not freshness.

Full explanation →

300

MCQhard

A developer runs the command above to test a text classification model deployed on a Vertex AI endpoint. The model returns an error. What is the most likely cause?

A.The endpoint ID '789' does not exist in the project

B.The model is not deployed to any endpoint

C.The instance schema (e.g., 'content' field) does not match the model's expected input signature

D.The region 'us-central1' does not match the region where the model is deployed

AnswerC

The model expects a different input format (e.g., 'text' field or a structured object), leading to the format error.

Why this answer

Option C is correct because the error 'Model does not support the given instance format' indicates a mismatch between the input schema and what the model expects. Option A (wrong endpoint ID) would produce a 'not found' error. Option B (region mismatch) would give a regional validation error.

Option D (model not deployed) would result in an endpoint not serving error.

Full explanation →

Page 4 of 7

All pages

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output

See all domains with question counts →