Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 1–75

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 7

1

MCQhard

A financial services company wants to use generative AI to generate personalized investment advice. They must ensure responses comply with regulatory requirements (e.g., no guarantees of returns). Which Vertex AI safety feature should they primarily use?

A.Vertex AI Grounding with their compliance database.

B.Prompt engineering with instructions to avoid guarantees.

C.Safety filters with a custom blocklist that includes phrases like 'guaranteed return'.

D.Reinforcement learning from human feedback (RLHF) on the model.

AnswerC

Safety filters can block defined categories or custom phrases.

Why this answer

Option C is correct because safety filters with a custom blocklist allow the company to define specific prohibited phrases (e.g., 'guaranteed return') that the model must avoid generating. This provides a deterministic, rule-based enforcement layer that directly addresses regulatory compliance by blocking disallowed content at inference time, without relying on the model's probabilistic behavior.

Exam trap

The trap here is that candidates often confuse grounding (factual retrieval) with compliance enforcement, or assume prompt engineering is sufficient for regulatory guardrails, when in fact only a deterministic blocklist can reliably prevent specific prohibited phrases from appearing in generated outputs.

How to eliminate wrong answers

Option A is wrong because Vertex AI Grounding connects the model to external data sources for factuality, but it does not enforce compliance rules—it retrieves information but does not block specific prohibited phrases. Option B is wrong because prompt engineering is a soft, non-deterministic approach; the model may still generate guarantees despite instructions, especially with adversarial or edge-case inputs. Option D is wrong because RLHF aligns the model based on human preferences over time, but it is not a real-time safety filter and cannot guarantee that specific regulatory phrases are never generated in production.

Full explanation →

2

MCQhard

A healthcare startup uses a generative model fine-tuned on general medical literature to provide preliminary diagnostic suggestions from patient text. The model frequently misses rare diseases and sometimes suggests common conditions that are unlikely given the symptoms. The startup has a curated dataset of rare disease case reports and wants to improve the model’s sensitivity to rare conditions without sacrificing overall accuracy. They cannot afford to retrain the entire model from scratch. The model is deployed on Vertex AI Prediction with low latency requirement. Which approach should they take?

A.Perform continued fine-tuning on the rare disease dataset using a low learning rate.

B.Add a system prompt instructing the model to consider rare diseases more carefully.

C.Reduce top-p sampling to focus on high-probability tokens, assuming rare diseases have lower probability.

D.Implement a human-in-the-loop system: for outputs with low confidence or suspected rare disease, route to a human expert.

AnswerD

Human-in-the-loop catches edge cases without retraining, preserving accuracy for common conditions.

Why this answer

Option D is correct because implementing a human-in-the-loop process for rare disease flags combines AI with expert review, catching misses while maintaining speed for common cases. Option A is wrong because prompt engineering alone may not teach the model about rare diseases. Option B is wrong because increasing top-p restricts vocabulary but doesn't inject knowledge.

Option C is wrong because fine-tuning again might cause catastrophic forgetting of common conditions.

Full explanation →

3

Multi-Selectmedium

A company is establishing governance practices for generative AI models. Which three actions are essential for responsible AI deployment?

Select 3 answers

A.Use model versioning to track changes.

B.Regularly audit model outputs for bias.

C.Monitor for data leakage from training data.

D.Implement a human review process for critical decisions.

E.Open-source the model to ensure transparency.

AnswersA, B, D

Versioning ensures reproducibility and accountability for model updates.

Why this answer

Options A, C, and D are correct. Regular bias audits ensure fairness; model versioning provides traceability; human review processes catch critical errors. Data leakage monitoring is important but not always considered a core governance pillar; open-sourcing is voluntary and not essential.

Full explanation →

4

MCQhard

A team is fine-tuning a large language model on custom data using Vertex AI. They find that the training loss decreases but validation loss increases. What is the best course of action?

A.Increase the number of training epochs.

B.Reduce the model size or add dropout regularization.

C.Increase the learning rate.

D.Switch to a smaller batch size.

AnswerB

Regularization techniques combat overfitting.

Why this answer

The increasing validation loss while training loss decreases is a classic sign of overfitting, where the model memorizes the training data but fails to generalize. Reducing model size or adding dropout regularization directly combats overfitting by limiting the model's capacity or introducing noise during training, which forces the model to learn more robust features. This is the best course of action because it addresses the root cause without further exacerbating the problem.

Exam trap

Google Cloud often tests the distinction between underfitting and overfitting, and the trap here is that candidates may confuse increasing validation loss with underfitting and incorrectly choose to increase epochs or learning rate, rather than recognizing the hallmark divergence of overfitting.

How to eliminate wrong answers

Option A is wrong because increasing the number of training epochs would further overfit the model to the training data, worsening the validation loss. Option C is wrong because increasing the learning rate can cause the model to overshoot minima and destabilize training, potentially increasing both training and validation loss, and does not address overfitting. Option D is wrong because switching to a smaller batch size introduces more noise in gradient estimates, which can sometimes help generalization but is not a direct or reliable remedy for overfitting; it may also slow convergence and is not the primary solution for the described loss divergence.

Full explanation →

5

MCQhard

A government agency is deploying a generative AI chatbot to answer citizen questions about public services. The chatbot must provide accurate and consistent information, scale to handle peak loads during tax season, and comply with strict data sovereignty laws that require all data to stay within the country. The agency has a moderate budget and in-house IT team but limited AI expertise. Which deployment architecture should they choose?

A.Build and host the model on-premises using open-source tools

B.Deploy a pre-trained model on Vertex AI in the required region with auto-scaling

C.Deploy the model on Vertex AI across multiple regions for availability

D.Use a third-party managed generative AI service that guarantees data residency

AnswerB

Keeps data within region, auto-scales, and requires minimal AI expertise.

Why this answer

Option A is correct because using Vertex AI within a single region ensures data sovereignty, and autoscaling handles peak loads. Option B (multi-region) violates data sovereignty. Option C (on-premises) lacks scalability and AI expertise.

Option D (managed service from a third-party) may not meet sovereignty or budget.

Full explanation →

6

MCQmedium

A team built a GenAI chatbot that uses a vector database to retrieve context. Users report irrelevant responses. What is the most likely business strategy issue?

A.The model is too small to generate accurate responses

B.The chatbot is too verbose

C.The system is overfitting to the training data

D.The embedding model is not aligned with the domain vocabulary

AnswerD

If the embeddings do not capture domain-specific meanings, retrieved context will be irrelevant, leading to poor answers.

Why this answer

Option D is correct because irrelevant responses in a RAG (Retrieval-Augmented Generation) chatbot most often stem from the embedding model failing to capture domain-specific semantics. If the embedding model was trained on general text (e.g., Wikipedia) but the chatbot operates in a specialized field like legal or medical, the vector similarity search will retrieve context that is semantically distant from the user's query, leading to irrelevant answers. This is a business strategy issue because the team chose an embedding model that does not align with their domain vocabulary, undermining the entire retrieval pipeline.

Exam trap

Google Cloud often tests the misconception that irrelevant responses are caused by model size or overfitting, when in fact the retrieval stage (embedding model and vector search) is the primary bottleneck in a RAG architecture.

How to eliminate wrong answers

Option A is wrong because model size (number of parameters) primarily affects generation quality and coherence, not the relevance of retrieved context; a small model can still produce accurate responses if the retrieved context is correct. Option B is wrong because verbosity is a stylistic output issue unrelated to the core problem of irrelevant responses; a verbose chatbot might still be accurate. Option C is wrong because overfitting to training data would cause the model to memorize specific examples and fail to generalize, but the symptom here is irrelevant responses due to poor retrieval, not hallucination or memorization of training data.

Full explanation →

7

MCQhard

A company has a generative AI model that is too slow for real-time inference. What architectural change would help?

A.Apply model quantization and deploy on TPUs

B.Switch to a larger, more accurate model

C.Deploy the model on more powerful CPUs

D.Use distributed training across multiple GPUs

AnswerA

Quantization reduces memory footprint and speeds up computation, and TPUs provide high throughput for trained models.

Why this answer

Model quantization reduces the precision of the model's weights (e.g., from FP32 to INT8), which significantly decreases memory footprint and computation time, enabling faster inference. Deploying on TPUs (Tensor Processing Units) further accelerates matrix operations through specialized hardware, making this combination ideal for real-time latency requirements.

Exam trap

Google Cloud often tests the distinction between training optimization (distributed training) and inference optimization (quantization, pruning, hardware acceleration), so the trap here is that candidates confuse improving training speed with improving inference latency.

How to eliminate wrong answers

Option B is wrong because switching to a larger, more accurate model increases computational complexity and latency, worsening the speed problem. Option C is wrong because CPUs are general-purpose processors with limited parallel matrix computation capabilities compared to GPUs or TPUs, so using more powerful CPUs still cannot match the throughput needed for real-time inference. Option D is wrong because distributed training across multiple GPUs addresses training speed, not inference latency; inference is typically a single-pass operation that benefits from model optimization and hardware acceleration, not parallel training techniques.

Full explanation →

8

MCQhard

A financial services firm is developing a GenAI application for investment advice. They need to ensure regulatory compliance. Which business strategy should they prioritize?

A.Rapidly deploy an MVP and iterate based on user feedback

B.Implement strict human-in-the-loop review for all investment recommendations

C.Open-source the model to gain community trust

D.Partner with a cloud provider that offers indemnification for model outputs

AnswerB

Human oversight is required by regulations for financial advice, ensuring accuracy and compliance.

Why this answer

In regulated industries like financial services, GenAI applications must prioritize compliance over speed. Option B is correct because a human-in-the-loop (HITL) review ensures that every investment recommendation is auditable and meets regulatory standards (e.g., SEC or FINRA rules), mitigating risks of hallucinated or non-compliant outputs. This strategy directly addresses the need for accountability and transparency in high-stakes decision-making.

Exam trap

Google Cloud often tests the misconception that speed or technical features (like open-sourcing or indemnification) can substitute for regulatory compliance, but in regulated domains, human oversight and auditability are non-negotiable.

How to eliminate wrong answers

Option A is wrong because rapidly deploying an MVP without rigorous compliance checks risks generating non-compliant or misleading investment advice, which could lead to severe regulatory penalties and loss of client trust. Option C is wrong because open-sourcing the model does not inherently ensure regulatory compliance; it may expose proprietary data or create liability if the model produces biased or inaccurate outputs, and community trust does not substitute for legal adherence. Option D is wrong because cloud provider indemnification covers legal costs for model outputs but does not prevent the generation of non-compliant advice; it is a risk transfer mechanism, not a compliance strategy.

Full explanation →

9

MCQhard

A generative AI model is trained on a dataset containing biased text. The team wants to debias the model without significantly sacrificing performance on the original task. Which approach is most appropriate?

A.Curate a smaller, balanced dataset that is representative of fair outcomes and fine-tune the model using a combination of the original data and this dataset with a regularization penalty on bias metrics.

B.Train an adversarial classifier to predict protected attributes from the model's hidden representations and minimize that prediction accuracy.

C.Filter the original training dataset to remove all sentences containing biased terms or stereotypes.

D.After training, apply a separate classifier on the model's output logits to adjust the final predictions for fairness.

AnswerA

This approach directly reduces bias while retaining task performance through regularization.

Why this answer

Option A is correct because it directly addresses bias in the training data by combining the original dataset with a curated, balanced dataset and applying a regularization penalty on bias metrics. This approach allows the model to retain performance on the original task while explicitly penalizing biased representations during fine-tuning, which is a standard technique in fairness-aware machine learning. The regularization term acts as a constraint that guides the optimization away from biased decision boundaries without requiring full retraining or architectural changes.

Exam trap

Google Cloud often tests the misconception that simply removing biased data or applying post-hoc adjustments is sufficient for debiasing, when in fact these methods fail to address latent biases in model representations and can degrade performance or introduce new biases.

How to eliminate wrong answers

Option B is wrong because training an adversarial classifier to minimize prediction accuracy of protected attributes is a debiasing technique, but it operates on hidden representations and can significantly degrade model performance by removing useful information correlated with protected attributes, often leading to a trade-off that sacrifices task accuracy. Option C is wrong because simply filtering out sentences with biased terms or stereotypes is ineffective; bias can be implicit in non-obvious patterns, and removing data can introduce distribution shift and reduce model robustness without guaranteeing fairness. Option D is wrong because applying a separate classifier on output logits to adjust predictions is a post-processing method that does not address bias in the model's internal representations; it can improve fairness metrics but often at the cost of calibration and may not generalize well across different subgroups.

Full explanation →

10

MCQeasy

A data scientist is using Vertex AI generative AI studio to create a chatbot. The chatbot gives inconsistent answers to similar questions. Which parameter should they adjust to make responses more consistent?

A.Decrease temperature to 0.2

B.Increase top-p to 0.9

C.Increase presence penalty to 0.5

D.Decrease frequency penalty to 0.0

AnswerA

Lower temperature makes the model more deterministic, leading to more consistent outputs.

Why this answer

Option C is correct because lowering temperature makes the model more deterministic, reducing randomness. Option A is wrong because top-p affects diversity but not as directly as temperature. Option B is wrong because presence penalty encourages new topics, increasing variability.

Option D is wrong because frequency penalty reduces repetition but doesn't enforce consistency.

Full explanation →

11

Multi-Selecthard

A team is fine-tuning a large language model for medical advice. Which TWO techniques are most effective for improving the safety and reliability of the model's outputs?

Select 2 answers

A.Constitutional AI

B.Lowering the temperature to 0.0

C.Increasing training data size

D.Increasing top_p to 1.0

E.Reinforcement learning from human feedback (RLHF)

AnswersA, E

Constitutional AI uses predefined rules to guide model behavior.

Why this answer

Constitutional AI (A) is correct because it embeds a set of ethical principles directly into the model's training process, allowing the model to self-critique and revise its outputs to avoid harmful or unsafe medical advice. This technique proactively enforces safety constraints without requiring extensive human labeling, making it highly effective for high-stakes domains like healthcare.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (temperature, top_p) or data scaling alone can solve safety issues, when in fact alignment techniques like Constitutional AI and RLHF are specifically designed for that purpose.

Full explanation →

12

MCQhard

During a load test, a Vertex AI endpoint serving a large language model experiences high latency and increased error rates. The endpoint is configured with autoscaling. What is the most likely cause?

A.There is a network bottleneck

B.The model size is too large for the machine type

C.The endpoint is using a global load balancer

D.The autoscaling metric is based on CPU utilization but the model is GPU-bound

AnswerD

GPU-bound models require GPU-based metrics for effective autoscaling.

Why this answer

If autoscaling is based on CPU utilization but the model is GPU-bound, the scaling metric does not reflect the actual load, causing insufficient resources.

Full explanation →

13

Multi-Selecteasy

Which TWO of the following are key differences between generative AI and discriminative AI? (Choose two.)

Select 2 answers

A.Generative models can create new data samples, while discriminative models only assign labels to existing data.

B.Generative models require less training data than discriminative models.

C.Generative models cannot be used for supervised learning tasks like classification.

D.Generative models model the joint probability distribution of inputs and labels, whereas discriminative models model the conditional probability of labels given inputs.

E.Discriminative models always outperform generative models on tasks like image classification.

AnswersA, D

Generation is a hallmark of generative AI.

Why this answer

Option A is correct because generative AI models learn the underlying distribution of the data, enabling them to generate new, realistic samples (e.g., images, text) from the learned distribution. In contrast, discriminative models learn decision boundaries to classify or label existing data without the ability to create new data instances. This fundamental difference in capability—creation versus discrimination—is a core distinction between the two paradigms.

Exam trap

Google Cloud often tests the misconception that generative models are only for unsupervised tasks and cannot perform classification, leading candidates to incorrectly select Option C, while also testing the false assumption that discriminative models are universally superior, as in Option E.

Full explanation →

14

MCQmedium

Despite applying safety filters, a generative AI model still produces toxic outputs in some cases. Which additional technique should be applied?

A.Add more examples of toxic content to training

B.Increase the filter threshold

C.Use RLHF with human feedback to reduce toxicity

D.Decrease the model's temperature

AnswerC

Correct: RLHF explicitly trains the model away from toxic outputs.

Why this answer

RLHF using human feedback to penalize toxic responses directly reduces such outputs. Other options either weaken safety or do not target toxicity specifically.

Full explanation →

15

MCQmedium

A company is building a generative AI chatbot for customer support using Vertex AI. They want to ground the model responses with their internal knowledge base stored in Cloud Storage and BigQuery. Which feature should they use to ensure the model only answers from the provided data and avoids hallucination?

A.Vertex AI Grounding with Vertex AI Search

B.Vertex AI Prediction

C.Vertex AI Pipelines

D.Cloud Functions

AnswerA

Vertex AI Grounding with Search enables grounding on enterprise data sources.

Why this answer

Vertex AI Grounding with Vertex AI Search is the correct feature because it allows the model to retrieve and cite information from a specified data source (such as Cloud Storage and BigQuery) to generate responses. This process, known as grounding, ensures the model's output is based solely on the provided authoritative data, effectively reducing hallucinations by constraining the model to factual, retrieved content rather than relying on its internal parametric knowledge.

Exam trap

The trap here is that candidates may confuse Vertex AI Prediction (a general model serving endpoint) with the grounding feature, mistakenly thinking that simply deploying a model with Vertex AI Prediction will automatically restrict its answers to a specific knowledge base, when in fact grounding requires explicit integration with Vertex AI Search and a configured data store.

How to eliminate wrong answers

Option B is wrong because Vertex AI Prediction is a service for deploying and serving models to generate predictions or responses, but it does not inherently include grounding capabilities to restrict answers to a specific knowledge base; it would require additional integration with a retrieval system. Option C is wrong because Vertex AI Pipelines is an orchestration service for building and managing ML workflows, not a feature for grounding model responses or preventing hallucinations. Option D is wrong because Cloud Functions is a serverless compute service for running event-driven code, and while it could be used to build a custom retrieval pipeline, it is not a native Vertex AI feature for grounding and does not provide the built-in retrieval and citation mechanisms needed to ensure answers come only from the provided data.

Full explanation →

16

Multi-Selecteasy

Which TWO features are available in Vertex AI Studio for prompt engineering? (Choose two.)

Select 2 answers

A.Side-by-side comparison of model outputs

B.One-click deployment to a Vertex AI endpoint

C.Ability to test prompts with different model parameters (temperature, top_p)

D.Fine-tuning models directly in the interface

E.Building conversational agents with drag-and-drop

AnswersA, C

Allows output comparison.

Why this answer

Option A is correct because Vertex AI Studio provides a side-by-side comparison feature that allows prompt engineers to evaluate outputs from multiple model configurations or parameter settings simultaneously. This enables direct visual comparison of responses, helping to identify the most effective prompt phrasing or parameter combination without manual switching.

Exam trap

The trap here is that candidates may confuse Vertex AI Studio's prompt engineering features with those of Vertex AI Agent Builder or Vertex AI Model Registry, leading them to select options like one-click deployment or drag-and-drop agent building that belong to separate services.

Full explanation →

17

Multi-Selecteasy

A team is selecting a foundation model for a text summarization use case. They need to consider factors that affect both model performance and production deployment. Which THREE factors are most critical? (Choose three.)

Select 3 answers

A.Model parameter count (billions of parameters).

B.Inference latency and throughput capabilities.

C.Context window length (maximum input tokens).

D.Training data provenance and licensing.

E.Pricing per token (input + output).

AnswersB, C, E

C is correct because it affects user experience and scaling.

Why this answer

Inference latency and throughput are critical for production deployment because they directly determine the user experience and operational cost. A model with high latency may be unsuitable for real-time summarization, while low throughput limits the number of concurrent requests the system can handle, affecting scalability and cost-efficiency.

Exam trap

Google Cloud often tests the distinction between model-centric factors (like parameter count) and deployment-centric factors (like latency and pricing), trapping candidates who assume bigger models are always better without considering operational constraints.

Full explanation →

18

MCQmedium

A company wants to scale their generative AI application globally with low latency. Which infrastructure configuration is most suitable?

A.Use a CDN to cache responses.

B.Multiple regional endpoints with traffic routing to the nearest region.

C.On-premises deployment for all regions.

D.Single endpoint in us-central1 with high max replicas.

AnswerB

Regional deployment reduces latency by serving from nearby cloud regions.

Why this answer

Option B is correct because deploying multiple regional endpoints with traffic routing to the nearest region minimizes latency by directing user requests to the geographically closest inference endpoint. This architecture leverages global load balancing (e.g., using Anycast DNS or HTTP(S) load balancers with backend services in multiple regions) to reduce round-trip time (RTT) and meet latency SLAs for real-time generative AI applications.

Exam trap

The trap here is that candidates often confuse CDN caching with real-time inference, assuming caching can accelerate dynamic AI responses, but generative AI outputs are unique per request and cannot be pre-cached.

How to eliminate wrong answers

Option A is wrong because a CDN caches static content (e.g., images, CSS) but cannot cache dynamic, context-dependent generative AI responses, which require real-time model inference; thus, it does not reduce latency for API calls. Option C is wrong because on-premises deployment lacks global scalability and introduces high latency for users outside the local region, defeating the purpose of global low-latency access. Option D is wrong because a single endpoint in us-central1 forces all global traffic to traverse long distances, causing high latency for users far from that region, regardless of the number of replicas.

Full explanation →

19

MCQeasy

Which Google Cloud product provides access to pre-trained foundation models like Gemini?

A.Dataflow

B.Vertex AI Generative AI Studio

C.Cloud Translation

D.Vertex AI Model Registry

AnswerB

Generative AI Studio (Model Garden) provides access to a variety of foundation models including Gemini.

Why this answer

Vertex AI Generative AI Studio is the correct answer because it is the Google Cloud service specifically designed to provide access to pre-trained foundation models like Gemini, allowing users to test, customize, and deploy them via a managed interface. Unlike other services, Generative AI Studio directly integrates with Gemini's API and offers prompt engineering, tuning, and model evaluation capabilities.

Exam trap

The trap here is that candidates confuse Vertex AI Model Registry (a model management tool) with Generative AI Studio (the actual interface for accessing and experimenting with foundation models), leading them to pick D instead of B.

How to eliminate wrong answers

Option A is wrong because Dataflow is a fully managed stream and batch data processing service based on Apache Beam, not a platform for accessing or interacting with pre-trained foundation models. Option C is wrong because Cloud Translation is a specialized service for language translation using pre-trained models, but it does not provide access to general-purpose foundation models like Gemini or support for multimodal tasks. Option D is wrong because Vertex AI Model Registry is a metadata management service for storing and versioning models, not a tool for directly accessing or experimenting with pre-trained foundation models like Gemini.

Full explanation →

20

MCQhard

A research team is using a large language model to analyze medical research papers and generate summaries. They need to minimize hallucinations while retaining key details. They have access to a curated database of paper abstracts. Which approach is best?

A.Fine-tune the model on the entire database of papers.

B.Use chain-of-thought prompting to reason step-by-step.

C.Use few-shot prompting with examples of accurate summaries and set temperature=0.0.

D.Implement RAG to retrieve relevant abstracts and incorporate them into the prompt.

AnswerD

RAG provides direct factual context from the database.

Why this answer

Option B is correct because implementing RAG to retrieve relevant abstracts and incorporate them into the prompt directly grounds the output in the curated database, reducing hallucinations. Option A (few-shot with low temperature) does not prevent hallucination if the model lacks knowledge. Option C (fine-tuning on the entire database) is costly and may overfit.

Option D (chain-of-thought) improves reasoning but not factual grounding.

Full explanation →

21

MCQeasy

A retail company wants to use generative AI to generate product descriptions for thousands of items. They need to ensure that the descriptions are consistent with their brand voice and do not contain factual inaccuracies. What is the most effective strategy?

A.Use a rule-based system to generate descriptions from product attributes.

B.Fine-tune a model on historical product descriptions and use prompt engineering with brand guidelines.

C.Use a large language model with no safety filters to maximize output variety.

D.Use a pre-trained model without any customization and rely on post-processing filters.

AnswerB

Fine-tuning tailors the model to the brand's style; prompt engineering reinforces guidelines and reduces hallucinations.

Why this answer

Option B is correct because fine-tuning on historical product descriptions tailors the model to the brand's specific style, and prompt engineering with brand guidelines ensures adherence to voice and reduces hallucinations. Option A is wrong because a generic pre-trained model may not capture brand voice and is more prone to hallucination. Option C is wrong because rule-based systems lack the flexibility and creativity of generative AI.

Option D is wrong because removing safety filters increases the risk of inappropriate or inaccurate content.

Full explanation →

22

MCQhard

A large e-commerce company deploys a generative AI chatbot on Vertex AI for customer service. The chatbot is powered by a fine-tuned model on the company's historical support tickets. Despite high accuracy on training topics, the chatbot frequently gives irrelevant or off-topic answers when customers ask about new products or promotions. The company maintains a comprehensive product catalog and a knowledge base of current promotions. The chatbot's prompts include a system instruction to 'Answer based on your knowledge' and no other retrieval mechanism. The response time requirement is under 3 seconds. Which course of action should the team take?

A.Implement a RAG pipeline that retrieves relevant product and promotion data from the knowledge base and injects it into the prompt.

B.Increase the temperature to encourage the model to generate more diverse answers.

C.Add additional safety filters to block irrelevant responses.

D.Fine-tune the model again on a larger dataset that includes recent support tickets.

AnswerA

RAG provides current, specific context to the model, directly improving relevance for new topics.

Why this answer

Option C is correct because implementing RAG with the product catalog allows real-time retrieval of current information, addressing the irrelevance for new products without needing retraining. Option A is wrong because fine-tuning again on outdated data won't help with new products. Option B is wrong because increasing temperature makes outputs more random and less focused.

Option D is wrong because adding more safety filters doesn't improve topical relevance.

Full explanation →

23

MCQmedium

You are the lead AI engineer at a financial services firm. You have fine-tuned a large language model on historical trade reports to generate daily market summaries. The model is deployed on Google Cloud's Vertex AI using a custom container. A few weeks after deployment, the operations team notices that inference latency has increased by 300%, causing timeouts. You investigate and find that the model's memory consumption has grown unexpectedly, and the GPUs are idling due to high data transfer wait times. The model architecture and code have not changed. Which action is most likely to resolve the latency issue?

A.Upgrade to a more powerful GPU instance (e.g., A100 to H100) to handle the increased memory footprint.

B.Enable preemptible VM instances to reduce cost and redeploy the model on a faster network.

C.Periodically clear the key-value cache between inference requests and implement cache truncation for long sequences.

D.Recompile the model using XLA with optimizations for dynamic shapes.

AnswerC

Clearing and managing the KV cache reduces memory bloat and speeds up inference.

Why this answer

The latency spike is caused by the key-value (KV) cache growing unboundedly across inference requests, leading to excessive memory consumption and data transfer wait times. Periodically clearing the KV cache between requests and truncating it for long sequences directly addresses the root cause by freeing GPU memory and reducing I/O bottlenecks, without requiring hardware upgrades or recompilation.

Exam trap

Google Cloud often tests the misconception that hardware upgrades or compilation optimizations can fix memory management issues, when the real problem is a software-level cache leak that must be handled explicitly in the serving infrastructure.

How to eliminate wrong answers

Option A is wrong because upgrading to a more powerful GPU (e.g., A100 to H100) does not fix the underlying issue of an ever-growing KV cache; it merely masks the symptom with more memory, and the high data transfer wait times would persist due to cache bloat. Option B is wrong because enabling preemptible VMs reduces cost but does not resolve memory growth or data transfer latency; preemptible instances can be terminated at any time, worsening reliability, and a faster network does not address the cache-induced memory pressure. Option D is wrong because recompiling with XLA for dynamic shapes optimizes computation graphs but does not prevent the KV cache from accumulating across requests; the latency issue stems from memory management, not from suboptimal compilation.

Full explanation →

24

Multi-Selectmedium

A company has a generative AI chatbot on Vertex AI that shows high response latency. They want to reduce latency without significantly increasing cost. Which TWO actions should they take? (Choose two.)

Select 2 answers

A.Increase the min_replica_count to keep more instances always warm.

B.Enable streaming responses using server-sent events.

C.Reduce the max_output_tokens parameter in the model configuration.

D.Use machine types with GPUs.

E.Switch to a larger model like Gemini 1.5 Pro for better accuracy.

AnswersB, C

C is correct because streaming gives partial results sooner.

Why this answer

Option B is correct because enabling streaming responses using server-sent events (SSE) allows the chatbot to send tokens incrementally as they are generated, rather than waiting for the full response. This reduces the perceived latency for the end user, as the first token appears much sooner, even though the total generation time may remain similar. This approach directly addresses high response latency without increasing compute cost, as it does not require additional infrastructure or model changes.

Exam trap

The trap here is that candidates often confuse reducing latency with reducing total generation time, but streaming only reduces perceived latency by delivering tokens earlier, while options like reducing max_output_tokens actually cut total generation time and cost by limiting output length.

Full explanation →

25

MCQeasy

A team uses a generative model to summarize lengthy legal documents. The summaries are accurate but often exceed the target length of 200 words, varying widely. Which simple adjustment should be applied to ensure consistent output length?

A.Fine-tune the model on summaries that are exactly 200 words.

B.Set the max output tokens parameter to 200.

C.Add a system prompt that says 'Summarize in exactly 200 words.'

D.Lower the temperature to reduce variability in word choices.

AnswerB

Max token limits directly truncate the output, enforcing the length constraint.

Why this answer

Option B is correct because setting a maximum token limit directly controls the output length. Option A is wrong because temperature affects creativity, not length. Option C is wrong because prompt engineering can request a specific length but the model may not strictly follow it; token limit is more reliable.

Option D is wrong because fine-tuning is heavy and may not be needed when a simple constraint works.

Full explanation →

26

MCQhard

A global corporation with 50,000 employees has seen rapid adoption of GenAI across marketing, product, and engineering teams. Each team selected its own models and cloud accounts, resulting in fragmented governance, unexpected costs, and varying output quality. The CFO demands a unified strategy to control costs and ensure consistency. The Chief AI Officer proposes several solutions. Which course of action best balances control with innovation?

A.Migrate all GenAI workloads to a single on-premises server to reduce cloud costs

B.Establish a GenAI Center of Excellence (CoE) that provides approved models, shared APIs, and best practices, while allowing team-specific customizations

C.Mandate all teams use a single model (e.g., Gemini) via a centralized Vertex AI endpoint with usage quotas

D.Allow teams to continue using their own models but require them to submit monthly cost reports

AnswerB

A CoE promotes standardization and governance while enabling innovation through customization, balancing both needs.

Why this answer

Option C is correct because a GenAI Center of Excellence provides standardized models and best practices while allowing teams to customize as needed, balancing control and flexibility. A (mandate a single model) stifles innovation. B (monthly reports) does not address fragmentation proactively.

D (on-prem) is costly and limits model access.

Full explanation →

27

Multi-Selecthard

Which THREE approaches are effective for reducing bias in generative model outputs? (Choose three.)

Select 3 answers

A.Set temperature to a very high value.

B.Use adversarial training.

C.Use a balanced training dataset.

D.Use prompt engineering to specify neutral tone.

E.Fine-tune on a debiased dataset.

AnswersC, D, E

Balanced data reduces representation bias.

Why this answer

Option C is correct because a balanced training dataset reduces the risk of the model learning spurious correlations or skewed distributions that lead to biased outputs. By ensuring that all demographic groups, topics, or perspectives are represented proportionally, the model's learned probability distribution is less likely to favor one group over another, directly mitigating representation bias at the data level.

Exam trap

The trap here is that candidates confuse randomness (high temperature) with fairness, or mistake adversarial training (a robustness technique) for a bias mitigation method, when in fact bias reduction requires data-level or fine-tuning interventions like balanced datasets, debiased fine-tuning, or prompt engineering.

Full explanation →

28

MCQhard

An organization is using Vertex AI to fine-tune a large language model. They notice training is taking longer than expected and cost is increasing. Which action is most likely to reduce training time and cost without significantly impacting model quality?

A.Increase the number of training steps

B.Increase the batch size

C.Use a higher learning rate

D.Enable mixed-precision training (bfloat16)

AnswerD

Mixed-precision reduces computation and memory, speeding up training on TPUs and GPUs.

Why this answer

Mixed-precision training (bfloat16) reduces memory usage and speeds up computation on compatible hardware while maintaining model quality. Increasing batch size or learning rate risks convergence issues; increasing steps increases cost.

Full explanation →

29

MCQhard

A company is using Gemini Pro for code generation. They want to ensure that the generated code does not contain security vulnerabilities. Which approach should they implement?

A.Enable grounding with security scanning tools

B.Use the Vertex AI Codey API with safety settings

C.Implement a human-in-the-loop review with automated scanning

D.Use a custom safety attribute filter

AnswerC

Combining human review with automated scanning is the recommended approach for code security.

Why this answer

Option D is correct because for security, human review combined with automated scanning is best practice. Option A is wrong because custom safety filters are generic and not code-specific. Option B is wrong because grounding with scanning tools is not a standard feature.

Option C is wrong because safety settings are about content harm, not code vulnerabilities.

Full explanation →

30

MCQmedium

A team is tuning a large language model for a question-answering task. They notice the model gives high confidence scores to answers that are factually incorrect. Which evaluation metric should they primarily use to detect this overconfidence problem?

A.Perplexity

B.Expected Calibration Error (ECE)

C.BLEU score

D.ROUGE-L

AnswerB

ECE directly quantifies how well confidence scores reflect actual correctness.

Why this answer

Expected Calibration Error (ECE) directly measures the alignment between a model's predicted confidence and its actual accuracy. In this scenario, high confidence on incorrect answers indicates miscalibration, and ECE quantifies this mismatch by binning predictions by confidence and computing the average absolute difference between accuracy and confidence per bin.

Exam trap

Google Cloud often tests the distinction between intrinsic evaluation metrics (like perplexity) and calibration metrics, leading candidates to mistakenly choose perplexity when the core issue is confidence miscalibration rather than general model uncertainty.

How to eliminate wrong answers

Option A is wrong because Perplexity measures how well a probability distribution predicts a sample, reflecting model uncertainty over token sequences, but it does not assess calibration of confidence scores against factual correctness. Option C is wrong because BLEU score evaluates n-gram overlap between generated and reference texts for translation quality, not confidence calibration or factual accuracy. Option D is wrong because ROUGE-L measures longest common subsequence recall for summarization tasks, and is unrelated to detecting overconfidence in model predictions.

Full explanation →

31

MCQmedium

A retail company is building a product description generator using a large language model on Vertex AI. They need to ensure the generated descriptions do not contain offensive language. Which strategy should they implement?

A.Fine-tune the model on a dataset of clean product descriptions

B.Implement a content moderation filter (e.g., Perspective API) as a post-processing step

C.Use Vertex AI Model Monitoring to detect anomalies in model predictions

D.Include explicit instructions in the prompt to avoid offensive language

AnswerB

Post-processing filters catch offensive outputs before delivery to users.

Why this answer

Option B is correct because content moderation filters like Perspective API act as a post-processing safeguard that can catch offensive language the model might generate despite prompt engineering or fine-tuning. This approach provides a deterministic, rule-based or ML-based check that is independent of the model's training, ensuring compliance with content policies in production. It is a standard practice for deploying LLMs in customer-facing applications where safety is critical.

Exam trap

Google Cloud often tests the misconception that prompt engineering or fine-tuning alone can guarantee safety, when in practice a dedicated post-processing filter is required for reliable content moderation in production.

How to eliminate wrong answers

Option A is wrong because fine-tuning on clean product descriptions reduces but does not eliminate the risk of generating offensive language; the model can still hallucinate or produce harmful outputs due to biases in the base model or adversarial inputs. Option C is wrong because Vertex AI Model Monitoring detects anomalies in prediction distributions (e.g., drift, data skew) but does not inspect individual outputs for offensive content; it is a monitoring tool, not a content filter. Option D is wrong because including explicit instructions in the prompt is a weak safeguard; LLMs can ignore or misinterpret instructions, especially under prompt injection or when generating long descriptions, making it unreliable as a sole defense.

Full explanation →

32

MCQeasy

A developer is using Vertex AI Studio to prototype a chat application. They want to provide the model with a system instruction to set the tone and style. How should they configure this in the Vertex AI Studio interface?

A.Add the instruction as part of the prompt text

B.Set the temperature parameter to a high value

C.Use the 'System Instruction' field in the model configuration

D.Add the instruction in the 'Context' parameter

AnswerC

Vertex AI Studio has a dedicated field for system instructions.

Why this answer

Option C is correct because Vertex AI Studio provides a dedicated 'System Instruction' field in the model configuration panel, which allows developers to set the tone, style, and behavioral guidelines for the model without mixing them into the user prompt. This field is specifically designed to hold system-level instructions that are prepended to the conversation context, ensuring consistent behavior across multiple turns.

Exam trap

The trap here is that candidates often confuse the 'System Instruction' field with the 'Context' parameter, mistakenly thinking both serve the same purpose, but the 'Context' parameter is designed for providing background knowledge or few-shot examples, not for setting persistent behavioral instructions.

How to eliminate wrong answers

Option A is wrong because adding the instruction as part of the prompt text would mix system-level guidance with user input, making it harder to maintain consistency and potentially causing the model to treat the instruction as part of the conversation rather than a persistent directive. Option B is wrong because the temperature parameter controls randomness in output generation, not the tone or style; a high temperature increases creativity and variability but does not enforce a specific behavioral instruction. Option D is wrong because the 'Context' parameter in Vertex AI Studio is used to provide background information or examples for grounding the model, not for setting system-level behavioral instructions like tone or style.

Full explanation →

33

MCQhard

Which of the following is a best practice when using Vertex AI for prompt engineering?

A.Always set temperature to 0

B.Use consistent formatting and delimiters

C.Avoid using examples in the prompt

D.Use very long prompts to include all possible instructions

AnswerB

Consistent structure helps the model parse instructions and reduces errors.

Why this answer

Consistent formatting and delimiters (e.g., using triple backticks, XML tags, or clear section headers) help the model parse instructions and context reliably, reducing ambiguity and improving output quality. This is a core best practice in prompt engineering on Vertex AI because it leverages the model's attention mechanisms to focus on distinct prompt segments, leading to more predictable and accurate responses.

Exam trap

Google Cloud often tests the misconception that 'more is better' in prompts or that deterministic settings like temperature=0 are universally optimal, leading candidates to overlook the importance of structured, concise formatting.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 always is not a best practice; temperature controls randomness, and while 0 yields deterministic outputs, many tasks benefit from slight variability (e.g., creative generation or diverse suggestions), and Vertex AI supports a range of 0.0 to 1.0. Option C is wrong because including examples (few-shot prompting) is a powerful technique to guide the model's behavior and improve performance, especially for complex or nuanced tasks; avoiding them would reduce effectiveness. Option D is wrong because very long prompts can exceed context windows, dilute key instructions, and increase latency or cost; Vertex AI models have token limits (e.g., 8,192 tokens for Gemini), and concise, well-structured prompts are more efficient.

Full explanation →

34

Multi-Selecthard

A team is fine-tuning a model for a legal document summarization task. They need to ensure high accuracy and avoid hallucinations. Which TWO approaches should they combine? (Choose two.)

Select 2 answers

A.Use Retrieval-Augmented Generation to retrieve relevant legal texts

B.Increase temperature to 1.5 during inference

C.Implement early stopping during fine-tuning

D.Incorporate a human-in-the-loop review process

E.Use character-level tokenization to improve spelling

AnswersA, D

RAG grounds the summary in actual documents, reducing hallucination.

Why this answer

Correct: A and D. A (RAG) provides source material to ground summaries. D (human-in-the-loop validation) catches errors before final output.

B (increase temperature) is counterproductive. C (early stopping) addresses overfitting but not factuality. E (character-level tokenization) is not relevant.

Full explanation →

35

MCQmedium

Refer to the exhibit. A developer creates a model resource with this YAML config but gets an error that the model is not deployable. What is missing?

A.model_type

B.artifact_uri

C.container_spec

D.description

AnswerC

container_spec is required to tell Vertex AI which container to use.

Why this answer

The error 'model is not deployable' occurs because the YAML config lacks a `container_spec` field. In Vertex AI, a model must specify how to serve predictions—either via a pre-built container (using `container_spec`) or a custom container. Without this, the model has no runtime environment and cannot be deployed to an endpoint.

Exam trap

Google Cloud often tests the misconception that `artifact_uri` is the key requirement for deployment, but the real mandatory field is the container specification that defines the runtime environment.

How to eliminate wrong answers

Option A is wrong because `model_type` is not a required field for deployment; Vertex AI infers the model type from the artifact or container. Option B is wrong because `artifact_uri` is optional—it points to the model artifacts but is not mandatory if the container already includes them. Option D is wrong because `description` is purely metadata and has no impact on deployability.

Full explanation →

36

MCQhard

An e-commerce company fine-tunes a model on customer reviews to generate product feedback summaries. They want to ensure the model does not reproduce toxic language from the training data. Besides filtering the training data, which additional technique is most effective at inference time?

A.Set temperature to 0.0 to reduce variance

B.Set top-k to 10 to limit token choices

C.Pass the model output through a toxicity detection model and conditionally regenerate or block

D.Use beam search with a high beam width

AnswerC

Inference-time filtering is a robust safety layer that catches toxic outputs without retraining.

Why this answer

Option A is correct because a toxicity classifier (e.g., Perspective API) applied to the output can block or flag toxic content. Option B is wrong because temperature reduction does not guarantee avoidance of toxic patterns. Option C is wrong because beam search may produce repetitive, not safer, outputs.

Option D is wrong because top-k sampling reduces randomness but does not filter toxicity.

Full explanation →

37

MCQeasy

A user provides a long document as context for a question-answering task, but the model outputs irrelevant answers. What is the most likely cause?

A.The document exceeds the model's context window, truncating important details.

B.Safety filters are blocking the relevant response.

C.The model's temperature is too low, making it deterministic.

D.The model is not generating any tokens.

AnswerA

Models have a maximum input length; exceeding it truncates the beginning or end.

Why this answer

The model's context window may be exceeded, causing loss of relevant information. Option B is wrong because temperature is unrelated to context length. Option C is wrong because safety filters would block content, not cause irrelevance.

Option D is wrong because tokens are always generated.

Full explanation →

38

MCQeasy

A developer is using Vertex AI with an API key and gets the above error. What is the likely cause?

A.Wrong endpoint

B.Excessive quota

C.Expired API key

D.Insufficient permissions

AnswerC

The error message explicitly says 'API key not valid', common when key is expired.

Why this answer

The error indicates that the API key used to authenticate with Vertex AI is no longer valid. API keys can expire due to a configured expiration policy or if they have been revoked in the Google Cloud Console. Since the developer is using an API key directly (rather than a service account or OAuth token), an expired key is the most direct cause of an authentication failure.

Exam trap

The trap here is that candidates confuse authentication failures (401) with authorization failures (403), leading them to select 'Insufficient permissions' when the actual issue is an expired or invalid API key.

How to eliminate wrong answers

Option A is wrong because a wrong endpoint would result in a DNS resolution or HTTP 404 error, not an authentication-related error. Option B is wrong because excessive quota returns a 429 HTTP status code (RESOURCE_EXHAUSTED), not an authentication failure. Option D is wrong because insufficient permissions would return a 403 HTTP status code (PERMISSION_DENIED), which is distinct from the authentication error caused by an invalid or expired API key.

Full explanation →

39

MCQeasy

A financial services firm wants to use generative AI to summarize lengthy regulatory documents for compliance officers. They need high accuracy and the ability to reference specific source paragraphs. The team is evaluating a retrieval-augmented generation (RAG) approach on Google Cloud. However, they are concerned about latency when querying large documents. Which architecture change would most effectively reduce response time?

A.Switch to a pure vector search without indexing

B.Increase the number of chunks retrieved per query

C.Use a larger embedding model to improve retrieval accuracy

D.Implement semantic chunking with overlapping to reduce document size per retrieval

AnswerD

Smaller, well-structured chunks speed up retrieval and generation.

Why this answer

Semantic chunking with overlapping reduces the size of each retrieved chunk while preserving context, which directly lowers the amount of text processed per query and speeds up the generation step. This architecture change minimizes latency by ensuring the retriever fetches only the most relevant, compact segments, reducing the load on both the embedding and LLM inference stages.

Exam trap

Google Cloud often tests the misconception that improving retrieval accuracy or increasing context always benefits latency, when in fact reducing the per-query data volume through smarter chunking is the most direct way to cut response time.

How to eliminate wrong answers

Option A is wrong because pure vector search without indexing would require a full scan of all document embeddings, drastically increasing retrieval time and negating any latency benefit. Option B is wrong because increasing the number of chunks retrieved per query expands the context window, which increases the LLM's processing time and overall response latency. Option C is wrong because a larger embedding model improves retrieval accuracy but introduces higher computational cost during both indexing and query encoding, which increases latency rather than reducing it.

Full explanation →

40

MCQmedium

A company is building a customer support chatbot using Vertex AI Agent Builder. They want the agent to answer questions based on internal knowledge base documents stored in Cloud Storage. Which feature should they configure to ensure the agent can retrieve relevant information from these documents?

A.Deploy the agent to a Vertex AI endpoint

B.Fine-tune a Gemini model on the knowledge base

C.Enable grounding with a data store

D.Configure a safety filter to block irrelevant queries

AnswerC

Grounding allows the agent to retrieve information from a data store created from Cloud Storage documents.

Why this answer

Grounding connects the agent to external data sources like Cloud Storage, allowing it to retrieve and use information from the knowledge base. Option A is wrong because deploying to an endpoint is for model serving, not data retrieval. Option B is wrong because safety filters control content, not retrieval.

Option D is wrong because custom training is for fine-tuning, not retrieval.

Full explanation →

41

MCQeasy

A social media company uses a generative AI model to moderate user posts. The model occasionally allows offensive content. Which safety technique should be implemented?

A.Use a different tokenizer to avoid offensive words.

B.Configure safety filters on the model endpoint in Vertex AI.

C.Add few-shot examples of safe posts in the prompt.

D.Reduce the temperature to 0.

AnswerB

Safety filters are designed to detect and block harmful content.

Why this answer

Safety filters explicitly block harmful content categories. Option A is wrong because lower temperature may produce bland but not necessarily safe outputs. Option B is wrong because few-shot examples may not cover all offensive patterns.

Option D is wrong because changing tokens does not inherently filter content.

Full explanation →

42

MCQmedium

A news organization is using Vertex AI Gemini to summarize articles. They observe that the summaries sometimes contain hallucinated facts—specifically, dates and statistics that are not in the original article. The team is using the default temperature and top_p settings. They want to reduce hallucinations without making summaries too repetitive or overly conservative. They also need to keep latency low. Which action should they take?

A.Increase the temperature to 1.0 and lower top_p to 0.1.

B.Enable grounding with Google Search to provide factual source context.

C.Fine-tune the model on a large dataset of articles and human-written summaries.

D.Lower the temperature to 0.0 and increase top_p to 1.0.

AnswerB

Grounding connects the model to verified information, reducing hallucination.

Why this answer

Option D is correct. Grounding with Google Search can provide factual references. Option A (lower temperature) may reduce creativity but not eliminate hallucination.

Option B (increase top_p) might increase randomness. Option C (fine-tuning) is expensive and might not generalize.

Full explanation →

43

MCQeasy

A developer is using Vertex AI PaLM API to generate code snippets. The responses sometimes contain security vulnerabilities. What is the best practice to mitigate this?

A.Implement input validation and output filtering with safety attributes

B.Disable safety filters to allow more output

C.Increase the max output tokens

D.Set safety settings to block all categories

AnswerA

Validating inputs and filtering outputs reduces security risks.

Why this answer

Option A is correct because input validation and output filtering with safety attributes directly address security vulnerabilities by sanitizing user inputs and filtering model outputs for harmful content. The Vertex AI PaLM API provides safety attribute scores (e.g., toxicity, harassment) that allow developers to programmatically block or flag responses that exceed defined thresholds, reducing the risk of generating insecure code snippets.

Exam trap

The trap here is that candidates may think increasing token limits or disabling filters improves output quality, when in fact the core issue is controlling content safety through validation and filtering, not adjusting generation parameters.

How to eliminate wrong answers

Option B is wrong because disabling safety filters removes all guardrails, allowing the model to generate potentially harmful or insecure code without any mitigation, which increases security risks. Option C is wrong because increasing max output tokens does not affect the content's security; it only allows longer responses, which could include more vulnerabilities. Option D is wrong because setting safety settings to block all categories is overly restrictive and may prevent legitimate code generation, but more importantly, it does not address the root cause of vulnerabilities—input validation and output filtering are needed to catch context-specific issues like insecure code patterns.

Full explanation →

44

MCQmedium

A fintech startup is building a generative AI application that generates personalized investment advice based on user profiles and market data. They are using Vertex AI Agent Builder to create an agent that retrieves information from a BigQuery table containing user data and from a real-time market data API. The agent needs to ensure that responses comply with financial regulations, meaning the model must not give specific stock recommendations unless the user explicitly requests them after disclaimers. The team has implemented grounding with both sources. During testing, the agent sometimes spontaneously suggests buying a particular stock without being asked, which could lead to regulatory issues. The team wants to enforce strict control over the agent's behavior. What should the team do?

A.Increase the safety filter sensitivity to block any financial recommendations

B.Add more historical data to the BigQuery table to improve grounding accuracy

C.Implement a custom system instruction that explicitly prohibits unsolicited stock recommendations and requires a disclaimer before any advice

D.Fine-tune the model on a dataset of compliant conversations

AnswerC

System instructions provide explicit behavioral constraints.

Why this answer

Option D is correct because a custom system instruction can explicitly constrain the agent's behavior. Option A is wrong because adding more data may not control behavior. Option B is wrong because fine-tuning may not cover all edge cases.

Option C is wrong because a safety filter may not catch subtle recommendations.

Full explanation →

45

MCQmedium

A large enterprise has deployed generative AI assistants in three separate departments (HR, Marketing, and Customer Support) using different tools and models. Over the past quarter, the company has observed escalating cloud costs, inconsistent user experiences, and reports of data leakage in Customer Support logs. The CTO wants to address these issues while maintaining innovation velocity. As the Generative AI Leader, what course of action should you recommend?

A.Standardize on a single model and tool across all departments, restricting usage to one platform.

B.Implement a centralized AI governance platform with cost monitoring, model registry, and security guardrails.

C.Discontinue the Customer Support assistant to eliminate data leakage risk and reduce costs.

D.Allow each department to continue independently but require monthly cost and compliance reports.

AnswerB

Centralized governance addresses cost, security, and consistency while allowing flexibility.

Why this answer

Option C is correct because implementing a centralized AI governance platform provides unified cost management, security controls, and standardization, aligning with business strategies to scale generative AI responsibly. Option A is wrong because creating additional silos would worsen fragmentation. Option B is wrong because it fails to address root causes and misses an opportunity for optimization.

Option D is wrong because regulatory compliance is not optional and ignoring it can lead to severe penalties.

Full explanation →

46

MCQhard

A multinational corporation is using Vertex AI to generate multilingual customer support responses. They have fine-tuned the Gemini model on support tickets in English and now want to extend to 10 additional languages. The fine-tuning dataset for new languages is small (1000 tickets each). During evaluation, the model performs well for common languages (Spanish, French) but poorly for languages like Finnish and Thai. The team needs to improve performance for low-resource languages. They have budget constraints and cannot collect more data quickly. Which approach should they take?

A.Switch to Vertex AI Codey API for generating responses in all languages.

B.Use a multilingual foundation model and fine-tune with cross-lingual transfer learning techniques.

C.Deploy separate fine-tuned models for each language.

D.Collect more training data for low-resource languages via crowdsourcing.

AnswerB

Gemini is inherently multilingual; cross-lingual transfer can boost low-resource performance.

Why this answer

Option B is correct. Cross-lingual transfer learning (like using a multilingual model or fine-tuning with a high-resource language pair) leverages data from similar languages. Option A (collect more data) is not feasible quickly.

Option C (use Codey) is irrelevant. Option D (separate models) increases cost and complexity.

Full explanation →

47

Multi-Selectmedium

A company is deploying a generative AI system that generates customer-facing emails. The system must ensure outputs are not toxic, biased, or harmful. Which TWO techniques are most effective for reducing toxicity in model outputs without significantly affecting performance?

Select 2 answers

A.Increase the maximum output token count to allow more context.

B.Set temperature to a very low value (e.g., 0.1).

C.Fine-tune the model on a dataset of safe emails using reinforcement learning from human feedback (RLHF).

D.Apply a toxicity detection and filtering layer using Vertex AI Safety Filters.

E.Provide 50 few-shot examples of safe emails in every prompt.

AnswersC, D

RLHF aligns model behavior to human preferences, reducing toxicity effectively.

Why this answer

Options A and D are correct. Fine-tuning with RLHF (or using a safety-tuned model) directly aligns the model to avoid toxic outputs. Output filtering (e.g., safety classifiers) provides a robust post-processing layer.

Option B (temperature) does not prevent toxicity, only randomness. Option C (few-shot) is insufficient for safety. Option E (increasing tokens) may increase risk.

Full explanation →

48

MCQhard

A global e-commerce company is using Vertex AI to build a generative AI chatbot for customer support. The chatbot is powered by the Gemini 1.5 Pro model and uses a vector search index for retrieval-augmented generation (RAG) over product documentation. The company has deployed the application in four regions (us-central1, europe-west4, asia-east1, and australia-southeast1) using a multi-region deployment with a global endpoint. The application is critical and requires high availability with a target latency of under 500ms for the RAG pipeline. Recently, users in Australia are experiencing inconsistent latency spikes, with response times exceeding 2 seconds during peak hours. The team suspects that the issue is related to the vector search index's replication and serving configuration. The index has 10 million embeddings with a dimension of 768. It is stored in a single regional bucket in us-central1, and the vector search index endpoint is deployed in all four regions with the same deployed index ID. The team is using the default configuration for index updates and serving. Which action should the team take to resolve the latency issue for Australian users?

A.Move the vector search index to a multi-regional Cloud Storage bucket (e.g., 'us') to reduce latency for index updates.

B.Create a new regional bucket in australia-southeast1 and store a copy of the index there, then redeploy the vector search index endpoint to use the local bucket.

C.Deploy a separate vector search index endpoint for each region with its own index copy stored in a regional bucket in that region.

D.Increase the number of replicas for the vector search index in all regions to improve throughput and reduce latency.

AnswerA

Multi-regional buckets provide better replication and availability across regions, reducing update latency for distant regions.

Why this answer

The latency spike for Australian users is caused by the vector search index being stored in a single regional bucket in us-central1. When the index is updated, the new embeddings must be rebuilt and streamed from us-central1 to the australia-southeast1 endpoint, introducing significant cross-region latency. Moving the index to a multi-regional Cloud Storage bucket (e.g., 'us') allows the index to be served from a location closer to all regions, reducing the update propagation time and improving consistency for Australian users.

Exam trap

Google Cloud often tests the misconception that deploying separate endpoints or increasing replicas solves cross-region latency, when the real issue is the single-region storage bucket causing update propagation delays.

How to eliminate wrong answers

Option B is wrong because creating a separate regional bucket in australia-southeast1 and storing a copy of the index there does not solve the update propagation issue; the index must be rebuilt and streamed from the source bucket (us-central1) to the new bucket, and the endpoint still references the original deployed index ID, so the local copy is not automatically used. Option C is wrong because deploying separate endpoints per region with local index copies increases operational complexity and cost, and does not address the root cause of cross-region latency during index updates; the global endpoint already routes to the nearest region, but the index data still originates from us-central1. Option D is wrong because increasing the number of replicas improves throughput and query handling within a region, but does not reduce the latency caused by cross-region data transfer during index updates; the bottleneck is the update propagation, not query processing capacity.

Full explanation →

49

Multi-Selectmedium

A developer is tuning a text-generation model for creative writing. They want the outputs to be more diverse and less repetitive. Which THREE parameters/changes can help? (Choose three.)

Select 3 answers

A.Increase temperature to 0.9

B.Reduce top-k to 10

C.Increase presence penalty to 0.5

D.Increase top-p to 0.95

E.Reduce frequency penalty to 0.0

AnswersA, C, D

Higher temperature increases randomness and diversity.

Why this answer

Increasing temperature to 0.9 raises the randomness of the probability distribution over the vocabulary, making the model more likely to sample less probable tokens. This directly increases output diversity and reduces repetitiveness by flattening the softmax curve, which is a standard technique for creative generation.

Exam trap

Google Cloud often tests the misconception that reducing top-k or top-p increases diversity, when in fact narrowing the sampling pool (lower top-k or lower top-p) reduces diversity, and the correct approach is to increase these values or increase temperature/penalties.

Full explanation →

50

MCQhard

An enterprise wants to adopt GenAI across departments but faces resistance from legal and compliance. Which strategy should the AI leader prioritize?

A.Outsource the entire initiative to a consulting firm

B.Build a comprehensive governance framework covering data use, review, and monitoring

C.Deploy a single pilot in a low-risk department to demonstrate value

D.Mandate use of GenAI through executive order

AnswerB

A governance framework ensures that GenAI use is compliant, transparent, and aligned with corporate policies, gaining trust from legal and compliance.

Why this answer

Option B is correct because legal and compliance resistance stems from concerns about data privacy, regulatory adherence, and model accountability. A comprehensive governance framework directly addresses these by defining data usage policies, implementing review mechanisms for model outputs, and establishing continuous monitoring to detect drift or bias, which is essential for enterprise-grade GenAI deployment.

Exam trap

Google Cloud often tests the misconception that a low-risk pilot (Option C) is the best first step to overcome resistance, but the trap is that without a governance framework, even a pilot can expose the enterprise to compliance risks, and the question specifically asks for a strategy to address legal and compliance resistance, not just to demonstrate value.

How to eliminate wrong answers

Option A is wrong because outsourcing to a consulting firm does not resolve internal legal and compliance concerns; it shifts responsibility without ensuring the enterprise has control over data governance, model transparency, or audit trails, which are critical for regulatory compliance. Option C is wrong because deploying a single low-risk pilot, while useful for proof-of-concept, does not address the root cause of resistance from legal and compliance—it may demonstrate value but lacks the governance structure needed to satisfy their requirements for data handling, review, and monitoring across all departments. Option D is wrong because mandating use through executive order bypasses the legitimate concerns of legal and compliance teams, likely escalating resistance and risking non-compliance with regulations like GDPR or HIPAA, as GenAI models can inadvertently expose sensitive data or produce unverifiable outputs.

Full explanation →

51

MCQeasy

A company wants to offer a generative AI feature where the output must follow a very specific tone and style as per the brand guidelines. Which strategy is most reliable?

A.Post-process the output with a style transfer algorithm.

B.Use a general-purpose model with a system prompt describing the style.

C.Use a different model for each content type.

D.Fine-tune a model on a dataset of branded content.

AnswerD

Fine-tuning internalizes the style, leading to more reliable and consistent output.

Why this answer

Fine-tuning a model on a dataset of branded content is the most reliable strategy because it adjusts the model's internal weights to consistently produce outputs that match the specific tone and style of the brand. Unlike prompt-based methods, fine-tuning embeds the stylistic constraints directly into the model's parameters, ensuring adherence even for complex or nuanced brand guidelines.

Exam trap

The trap here is that candidates overestimate the reliability of prompt engineering (Option B) for enforcing strict, consistent stylistic constraints, underestimating how easily a general-purpose model can deviate from a system prompt when faced with complex or ambiguous inputs.

How to eliminate wrong answers

Option A is wrong because post-processing with a style transfer algorithm adds latency, can introduce artifacts, and may not preserve the original content's meaning while reliably matching brand-specific tone and style. Option B is wrong because a general-purpose model with a system prompt is fragile—subtle variations in prompt phrasing or model updates can cause the output to drift from the desired style, and the model lacks deep internalization of the brand's unique patterns. Option C is wrong because using a different model for each content type does not guarantee consistent tone and style across types; it increases maintenance overhead and still requires each model to be individually tuned or prompted to follow brand guidelines.

Full explanation →

52

MCQeasy

A startup is deciding between using a pre-trained model via API vs. hosting their own open-source model. Which factor is most critical for their decision?

A.The accuracy on a benchmark dataset

B.The number of parameters in the model

C.The level of community support for the open-source model

D.Total cost of ownership including infrastructure and expertise

AnswerD

A startup must consider API pricing vs. cloud infrastructure and the hiring costs for model maintenance.

Why this answer

Total cost of ownership (TCO) is the most critical factor because it encompasses not only the direct costs of infrastructure (compute, storage, networking) but also the hidden costs of expertise (MLOps engineers, security hardening, ongoing maintenance) and opportunity costs. A pre-trained API may have higher per-token costs but lower upfront investment, while self-hosting an open-source model requires significant capital expenditure on GPUs, cooling, and power, plus the operational burden of scaling inference under variable load. This decision directly impacts the startup's burn rate and runway, making TCO the primary driver for a resource-constrained organization.

Exam trap

Google Cloud often tests the misconception that technical superiority (accuracy or parameter count) is the primary decision factor, when in reality the business context—specifically TCO—drives the choice between API consumption and self-hosting for startups.

How to eliminate wrong answers

Option A is wrong because benchmark accuracy is a static metric that does not account for real-world deployment costs, latency requirements, or data privacy constraints; a model with slightly lower accuracy may be far more cost-effective or compliant. Option B is wrong because the number of parameters is a coarse proxy for model capability but does not directly determine inference cost, latency, or the total cost of ownership; a smaller model with efficient quantization can outperform a larger model in throughput and cost per request. Option C is wrong because community support, while helpful for troubleshooting, does not address the core financial and operational viability of self-hosting; a well-supported model still requires the startup to bear all infrastructure and expertise costs.

Full explanation →

53

Multi-Selecteasy

Which TWO of the following are capabilities of Vertex AI Model Garden? (Choose 2)

Select 2 answers

A.Generate code snippets for common programming tasks.

B.Ability to generate images from text descriptions.

C.Deploy custom container images for model serving.

D.Access to a curated set of foundation models like PaLM and Gemini.

E.Ability to fine-tune and deploy foundation models.

AnswersD, E

Model Garden gives access to foundation models.

Why this answer

Option A and D are correct. Model Garden provides a curated set of foundation models and allows fine-tuning. Option B (generating images) is Imagen's capability.

Option C (generating code) is Codey's capability. Option E (deploying custom containers) is a general Vertex AI feature not specific to Model Garden.

Full explanation →

54

MCQhard

A data scientist fine-tunes a model on a small proprietary dataset. After fine-tuning, the model repeats training examples verbatim. What is the most effective mitigation?

A.Reduce the temperature during inference to 0.

B.Train for more epochs to improve generalization.

C.Use early stopping based on validation loss.

D.Add regularization like dropout and use a smaller learning rate.

AnswerD

Regularization techniques discourage memorization and encourage generalization.

Why this answer

Increasing the learning rate slightly or using dropout can reduce memorization. Option A is wrong because more epochs increase memorization. Option B is wrong because ground truth stopping doesn't prevent memorization.

Option D is wrong because temperature during inference doesn't fix overfitting.

Full explanation →

55

MCQeasy

A marketing agency wants to use Vertex AI to automatically generate social media posts for clients. They plan to use the Gemini API with few-shot prompting. The agency's developers have limited experience with generative AI and want the fastest way to prototype and iterate on prompts. They are already using Google Cloud for other services. Which approach should they take to quickly develop and test prompts?

A.Use a third-party platform like OpenAI Playground and migrate later.

B.Use Google Cloud Shell to invoke the model via curl commands.

C.Use Vertex AI Studio (Gen AI Studio) to design and test prompts interactively.

D.Write Python scripts using the Vertex AI SDK and run them in Airflow.

AnswerC

Vertex AI Studio is designed for rapid prototyping with a visual interface.

Why this answer

Option A is correct. Vertex AI Studio provides a no-code interface for prompt design and testing. Option B (write code) is slower.

Option C (use Cloud Shell) is possible but less user-friendly. Option D (third-party tool) adds complexity.

Full explanation →

56

MCQhard

An organization is using Vertex AI Gemini API for a multimodal chatbot. They notice that the model sometimes provides incorrect information with high confidence. They want to reduce hallucinations without retraining the model. What is the most effective approach?

A.Provide ground-truth context from a knowledge base using grounding

B.Increase the temperature parameter to make the model more creative

C.Reduce the maximum output tokens to force concise answers

D.Adjust safety settings to filter uncertain responses

AnswerA

Grounding supplies factual information that the model can use to generate accurate responses.

Why this answer

Safety settings reduce hallucinations? Actually, hallucinations are not purely safety. Grounding with a knowledge base provides factual retrieval to reduce hallucinations. Option B is wrong because safety settings block harmful content, not necessarily hallucinations.

Option C is wrong because increasing temperature increases randomness. Option D is wrong because reduced token count limits responses but may not reduce hallucinations.

Full explanation →

57

MCQeasy

A company wants to build a chatbot that can answer questions about its internal knowledge base using natural language. Which Google Cloud Generative AI offering should they use to quickly prototype and deploy this chatbot with minimal coding?

A.Generative AI Studio

B.Vertex AI Endpoints

C.Cloud Natural Language API

D.Vertex AI Model Garden

AnswerA

Generative AI Studio offers a drag-and-drop interface for building chatbots.

Why this answer

Generative AI Studio provides a no-code/low-code environment to prototype and deploy chatbots with foundation models.

Full explanation →

58

MCQhard

A gen AI application produces hallucinations (factually incorrect outputs). Which mitigation strategy is LEAST effective?

A.Using prompt templates with constraints

B.Using grounding with a knowledge base

C.Implementing retrieval-augmented generation

D.Increasing model temperature

AnswerD

Higher temperature leads to more diverse but less predictable outputs, exacerbating hallucinations.

Why this answer

Increasing model temperature makes the model more random and creative, which directly increases the likelihood of hallucinations. It does not constrain or ground the output in factual data, making it the least effective mitigation strategy among the options.

Exam trap

Google Cloud often tests the misconception that increasing model temperature improves accuracy by making the model 'more confident,' when in reality it increases randomness and hallucination risk.

How to eliminate wrong answers

Option A is wrong because prompt templates with constraints (e.g., specifying 'only answer from the provided context') reduce the model's freedom to generate unverified content, thereby lowering hallucination risk. Option B is wrong because grounding with a knowledge base ties the model's outputs to verified facts, preventing fabrication by restricting the response to a trusted data source. Option C is wrong because retrieval-augmented generation (RAG) explicitly fetches relevant documents from a knowledge base before generation, ensuring the output is based on retrieved evidence rather than parametric memory alone.

Full explanation →

59

Multi-Selectmedium

A team is evaluating generative AI models on Vertex AI. They need to compare models based on specific criteria. Which TWO criteria are most important for selecting a model for a text summarization task?

Select 2 answers

A.ROUGE scores

B.Training dataset size

C.Cost per token

D.Model size in parameters

E.Latency

AnswersA, E

ROUGE evaluates summary quality against references.

Why this answer

ROUGE scores are the standard evaluation metric for text summarization tasks, measuring the overlap of n-grams, word sequences, and word pairs between generated summaries and reference summaries. This directly quantifies summary quality, making it the most important criterion for model selection.

Exam trap

Google Cloud often tests the misconception that model size or cost are primary selection criteria, when in fact task-specific metrics like ROUGE are the correct focus for evaluating generative model output quality.

Full explanation →

60

MCQmedium

A machine learning engineer is deploying a large generative model on Vertex AI. The model requires a GPU with high memory. Which machine configuration should they choose?

A.c2-standard-16 with no GPU

B.a2-highgpu-4g with 4 A100 GPUs

C.n1-standard-4 with a single T4 GPU

D.n2-standard-8 with a single P4 GPU

AnswerB

A2 machines offer A100s with large memory, suitable for large models.

Why this answer

A2 high-gpu machines with NVIDIA A100 GPUs provide high memory for large models.

Full explanation →

61

MCQhard

A company is using Vertex AI Gemini API to analyze customer feedback. They notice that the model occasionally generates offensive content. They have already set safety settings to block high-probability harmful content. What additional step should they take to further reduce offensive outputs?

A.Set the temperature to 0.0

B.Adjust safety settings to block medium-probability harmful content

C.Enable context caching

D.Fine-tune the model on customer feedback data

AnswerB

Stricter thresholds block more offensive outputs.

Why this answer

Option B is correct because the company has already blocked high-probability harmful content, but offensive outputs can still occur at lower probability thresholds. By adjusting safety settings to block medium-probability harmful content, they tighten the filter to catch more borderline cases without requiring model retraining or sacrificing output diversity. This leverages Vertex AI's configurable safety filters, which operate on likelihood categories (e.g., high, medium, low) rather than just binary blocking.

Exam trap

The trap here is that candidates assume fine-tuning (Option D) is the default fix for any output quality issue, but safety filtering is a separate, configurable layer that should be tuned before retraining, and temperature (Option A) is often mistakenly thought to control safety when it only controls randomness.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0.0 makes the model deterministic and reduces creativity, but it does not filter or block offensive content; temperature controls randomness in token selection, not safety. Option C is wrong because context caching improves latency and cost for repeated prompts by storing context, but it has no effect on content safety or filtering harmful outputs. Option D is wrong because fine-tuning on customer feedback data could inadvertently reinforce biases or offensive patterns in the data, and it does not directly address safety filtering; safety settings are a separate, configurable layer that should be adjusted first.

Full explanation →

62

MCQeasy

Refer to the exhibit. A machine learning engineer is configuring a model using this YAML. What is the purpose of the 'tuningPipeline' field?

A.It specifies a pipeline to fine-tune the base model

B.It configures the model for online prediction

C.It defines the hyperparameters for training from scratch

D.It sets the model for batch prediction

AnswerA

The tuningPipeline references a pipeline that performs supervised fine-tuning of the base model.

Why this answer

The 'tuningPipeline' field in this YAML configuration specifies a dedicated pipeline for fine-tuning the base model, which is a common practice in MLOps frameworks like Vertex AI Pipelines or Kubeflow. It allows the engineer to define a separate workflow for parameter-efficient fine-tuning (e.g., LoRA) or full fine-tuning, distinct from training from scratch or serving. This field is essential for orchestrating the fine-tuning process, including data preprocessing, training, and evaluation steps, without affecting the base model's original weights.

Exam trap

Google Cloud often tests the distinction between 'tuningPipeline' (fine-tuning an existing model) and 'trainingPipeline' (training from scratch), and the trap here is that candidates confuse fine-tuning with full training or assume the field is for inference tasks like prediction.

How to eliminate wrong answers

Option B is wrong because 'tuningPipeline' is not used for online prediction; online prediction is typically configured via a separate 'predict' or 'serving' pipeline or endpoint specification. Option C is wrong because 'tuningPipeline' is specifically for fine-tuning an existing base model, not for training from scratch, which would require a different pipeline definition (e.g., 'trainingPipeline') with full hyperparameter search. Option D is wrong because batch prediction is handled by a distinct 'batchPrediction' or 'batch' pipeline configuration, not by the tuning pipeline, which focuses on model adaptation rather than inference.

Full explanation →

63

MCQmedium

A media company uses Vertex AI to generate video captions. The generated captions sometimes contain factual errors about named entities (e.g., actor names). Which technique would most likely reduce these errors?

A.Enable response caching

B.Increase the temperature parameter

C.Use Vertex AI grounding with a knowledge base of verified entities

D.Decrease top_p to 0.3

AnswerC

Grounding supplies factual context to the model.

Why this answer

Option C is correct because Vertex AI grounding connects the model to a knowledge base of verified entities, allowing it to retrieve authoritative facts during generation. This reduces hallucinations about named entities by constraining outputs to validated data rather than relying solely on the model's parametric knowledge.

Exam trap

The trap here is that candidates confuse techniques that control output randomness (temperature, top_p) with techniques that improve factual accuracy, overlooking the fundamental need for external knowledge retrieval via grounding.

How to eliminate wrong answers

Option A is wrong because response caching stores previous outputs for reuse, which does not correct factual errors—it may even propagate them. Option B is wrong because increasing the temperature parameter increases randomness in token selection, making factual errors more likely, not less. Option D is wrong because decreasing top_p to 0.3 narrows the sampling pool to only the most probable tokens, which can reduce creativity but does not address factual accuracy about named entities—it still relies on the model's internal knowledge, which may be incorrect.

Full explanation →

64

MCQhard

During fine-tuning a model on Vertex AI, the job fails with error 'ResourceExhausted: Out of memory'. What is the most likely cause?

A.Too few training steps

B.Batch size too large

C.Dataset too small

D.Wrong machine type with insufficient memory

AnswerD

Insufficient GPU or TPU memory leads to OOM; selecting a larger machine type often resolves it.

Why this answer

Using a machine type with insufficient memory for the model size and batch size is the most common cause of OOM errors during training.

Full explanation →

65

MCQhard

Refer to the exhibit. A user with this IAM role tries to deploy a model to a Vertex AI Endpoint but fails. What is the most likely reason?

A.The user is not authorized to use Vertex AI at all

B.The model artifact is not in the same region as the endpoint

C.The user needs the roles/aiplatform.deployer role

D.The user needs the roles/aiplatform.admin role

AnswerC

Deploying a model requires the aiplatform.deployer role or equivalent permissions.

Why this answer

The user has an IAM role but lacks the specific permission `aiplatform.deployments.create` required to deploy a model to a Vertex AI Endpoint. The `roles/aiplatform.deployer` role includes this permission, while the user's existing role does not, causing the deployment to fail. Even if the user can use other Vertex AI services, deploying a model to an endpoint is a distinct action that requires this specific role.

Exam trap

Google Cloud often tests the distinction between broad roles like `roles/aiplatform.admin` and specific roles like `roles/aiplatform.deployer`, trapping candidates who assume that any Vertex AI role can perform all actions, when in fact deployment requires a dedicated permission set.

How to eliminate wrong answers

Option A is wrong because the user is able to interact with Vertex AI (they have an IAM role), but the failure is specific to the deploy action, not a blanket denial of all Vertex AI access. Option B is wrong because model artifacts can be deployed to endpoints in any region as long as the endpoint exists; Vertex AI supports cross-region deployment by copying the model artifact to the endpoint's region automatically. Option D is wrong because the `roles/aiplatform.admin` role is overly permissive and includes full administrative access, which is not required for deploying a model; the principle of least privilege dictates that the `roles/aiplatform.deployer` role is sufficient and more appropriate.

Full explanation →

66

MCQhard

A financial institution wants to deploy a gen AI model for fraud detection but must comply with strict regulations regarding explainability. What is the best strategy?

A.Use Vertex AI Explainable AI with a complex model

B.Deploy multiple models and ensemble

C.Use a large black-box model and rely on external auditing

D.Implement a smaller interpretable model with acceptable accuracy

AnswerD

Interpretable models satisfy explainability requirements while maintaining reasonable performance.

Why this answer

Option D is correct because regulatory compliance for fraud detection demands explainability, which complex black-box models cannot provide. A smaller interpretable model (e.g., logistic regression or decision tree) offers transparency into decision factors, satisfying regulations like GDPR's right to explanation while maintaining acceptable accuracy for the use case.

Exam trap

Google Cloud often tests the misconception that post-hoc explainability tools (like Vertex AI Explainable AI) are equivalent to inherent model interpretability, leading candidates to choose complex models with added explanation layers instead of simpler, transparent models.

How to eliminate wrong answers

Option A is wrong because Vertex AI Explainable AI provides post-hoc explanations for complex models, but these approximations may not meet strict regulatory standards for full transparency and can be unreliable. Option B is wrong because ensembling multiple models increases complexity and opacity, making it harder to explain individual predictions and often violating explainability requirements. Option C is wrong because relying on external auditing for a large black-box model does not guarantee inherent explainability; auditors still face the same opacity, and regulations typically require model-inherent interpretability, not just external review.

Full explanation →

67

MCQmedium

A company deployed a Gemini model on Vertex AI for real-time inference. After a week, they notice that some requests return 500 Internal Server Error, and the endpoint is occasionally unreachable. The endpoint is configured with minReplicaCount=1 and maxReplicaCount=2. What is the most likely cause?

A.Autoscaling is disabled, so the endpoint cannot handle traffic spikes.

B.The model was updated while the endpoint was serving requests.

C.The endpoint is under-provisioned: minReplicaCount=1 is too low for peak load, causing the single replica to become saturated.

D.The project has reached its Vertex AI endpoint quota.

AnswerC

C is correct because the single replica cannot handle bursts, leading to errors.

Why this answer

Option C is correct because with minReplicaCount=1 and maxReplicaCount=2, the endpoint starts with a single replica. Under peak load, that single replica can become saturated (CPU/memory exhaustion), causing 500 errors and unreachability. Autoscaling can add a second replica, but if the traffic spike is sudden or the scaling metric takes time to trigger, the single replica is overwhelmed before the second instance is provisioned.

Exam trap

The trap here is that candidates assume autoscaling instantly handles spikes, but they overlook the provisioning delay and the fact that a single replica can be overwhelmed before the second replica is ready.

How to eliminate wrong answers

Option A is wrong because autoscaling is not disabled; the minReplicaCount=1 and maxReplicaCount=2 configuration explicitly enables autoscaling between 1 and 2 replicas. Option B is wrong because updating a model while the endpoint is serving requests does not cause 500 errors or unreachability; Vertex AI supports model updates with zero-downtime via canary deployments or traffic splitting. Option D is wrong because reaching the Vertex AI endpoint quota would result in a 429 (Too Many Requests) or a quota-exceeded error, not a 500 Internal Server Error or intermittent unreachability.

Full explanation →

68

Multi-Selecteasy

Which TWO methods are most effective for improving factual accuracy in a language model's responses? (Choose two.)

Select 2 answers

A.Use prompt engineering to instruct the model to rely on provided facts.

B.Decrease the temperature to make responses more deterministic.

C.Increase top-k sampling to consider a wider range of tokens.

D.Replace the model with a smaller, more focused model.

E.Implement Retrieval-Augmented Generation (RAG) with a trusted knowledge base.

AnswersA, E

Prompt engineering can explicitly direct the model to verify claims or stick to given knowledge.

Why this answer

Options A and C are correct. A: prompt engineering with specific instructions can guide the model to be more careful. C: RAG retrieves verified information from external sources, reducing hallucination.

B is wrong because increasing top-k introduces randomness. D is wrong because decreasing temperature makes output more deterministic but not necessarily accurate. E is wrong because using a smaller model tends to reduce factual accuracy due to limited knowledge.

Full explanation →

69

MCQeasy

A graphic design company wants to generate high-quality synthetic images for product mockups. Which Google Cloud generative AI service is most suitable?

A.AutoML Vision

B.Imagen on Vertex AI

C.Codey APIs for code generation

D.Natural Language API

AnswerB

Imagen is specifically built for image generation and is accessible via Vertex AI.

Why this answer

Imagen on Vertex AI is the correct choice because it is Google Cloud's state-of-the-art text-to-image diffusion model specifically designed to generate high-quality, photorealistic synthetic images from natural language prompts. This directly meets the requirement for creating product mockups, as Imagen can produce custom visuals with fine-grained control over style and composition, and it integrates seamlessly with Vertex AI for deployment and management.

Exam trap

The trap here is that candidates may confuse AutoML Vision's ability to classify or detect objects in images with generative image creation, leading them to select Option A despite it lacking any generative capability.

How to eliminate wrong answers

Option A is wrong because AutoML Vision is a traditional machine learning service for training custom image classification, object detection, or segmentation models on labeled datasets; it does not generate synthetic images from text prompts. Option C is wrong because Codey APIs are specialized for generating code snippets, documentation, and code completions, not for creating visual content like images. Option D is wrong because Natural Language API is designed for analyzing and extracting insights from text (e.g., sentiment, entity recognition), not for generating synthetic images.

Full explanation →

70

MCQmedium

A company uses a generative AI model to generate product descriptions. They notice variations in style and length across products. How can they enforce consistent formatting?

A.Adjust top-k sampling to include more token candidates.

B.Set a system instruction specifying style and structure.

C.Randomly select few-shot examples from a pool of descriptions.

D.Use a high temperature and vary the prompt slightly.

AnswerB

System instructions guide the model's behavior across all responses.

Why this answer

System instructions set tone and format rules for the model. Option B is wrong because temperature range increases randomness. Option C is wrong because random example selection reduces consistency.

Option D is wrong because top-k sampling increases variability.

Full explanation →

71

MCQmedium

A healthcare company wants to use generative AI to summarize patient records. They are concerned about data privacy and HIPAA compliance. Which Google Cloud feature should they use to protect patient data?

A.Cloud Audit Logs

B.Confidential VMs

C.Cloud Data Loss Prevention (DLP) API

D.Customer-managed encryption keys (CMEK) with VPC Service Controls

AnswerD

CMEK ensures data is encrypted with keys controlled by the customer, and VPC-SC prevents data exfiltration.

Why this answer

D is correct because Customer-managed encryption keys (CMEK) with VPC Service Controls provide a defense-in-depth approach for HIPAA compliance. CMEK allows the healthcare company to control and manage the encryption keys used to protect patient data at rest, while VPC Service Controls prevent data exfiltration by restricting data movement outside a defined service perimeter. This combination ensures that even if an attacker gains access, they cannot decrypt the data or move it out of the controlled environment, directly addressing data privacy and HIPAA requirements.

Exam trap

The trap here is that candidates often confuse data discovery and de-identification tools (DLP) with data protection and access control mechanisms (CMEK + VPC Service Controls), leading them to pick Cloud DLP API despite it not providing encryption or perimeter controls required for HIPAA compliance.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs only record who did what, when, and where, but do not protect or encrypt patient data; they are a monitoring tool, not a data protection mechanism. Option B is wrong because Confidential VMs encrypt data in use using AMD SEV, but they do not control data exfiltration or provide the perimeter-based access controls needed for HIPAA compliance; they focus on memory encryption, not data movement restrictions. Option C is wrong because Cloud Data Loss Prevention (DLP) API is used for inspecting, classifying, and de-identifying sensitive data, but it does not provide encryption key management or network-level controls to prevent unauthorized data access or exfiltration.

Full explanation →

72

MCQmedium

To improve factuality in generative AI, which is the best approach?

A.Set top_p to 0.1

B.Reduce output length

C.Grounded generation with citations

D.Increase model size

AnswerC

Anchors answers to external evidence.

Why this answer

Grounded generation with citations forces the model to base answers on retrieved evidence, directly improving factuality. Increasing model size may help but not as targeted, reducing length doesn't improve accuracy, and adjusting top_p is unrelated.

Full explanation →

73

Multi-Selecthard

A company is considering monetizing a generative AI-powered product. Which two business models are most common and viable?

Select 2 answers

A.Free with advertising.

B.One-time license fee for the model.

C.Pay-per-use based on tokens consumed.

D.Subscription tiered by usage.

E.Selling user data collected from interactions.

AnswersC, D

Pay-per-use matches costs to usage, common in cloud API services.

Why this answer

Option C is correct because pay-per-use based on tokens consumed aligns directly with the operational cost structure of generative AI models, where each inference incurs compute and memory costs proportional to the number of tokens processed. This model allows customers to pay only for what they use, making it viable for variable workloads and avoiding upfront commitment, while providers can scale revenue with usage. It is the most common monetization strategy for API-based generative AI services, such as OpenAI's GPT-4 or Anthropic's Claude, where pricing is explicitly tied to token counts.

Exam trap

Google Cloud often tests the misconception that one-time licensing (Option B) is viable for AI models, but candidates must recognize that generative AI models are not static software—they require ongoing compute, updates, and scaling, making subscription or pay-per-use models the only sustainable approaches.

Full explanation →

74

MCQmedium

A data science team is fine-tuning a large language model using Vertex AI to generate marketing copy. They notice that the generated text is often repetitive and lacks creativity. Which technique should they apply to improve output diversity?

A.Increase the temperature parameter to 0.9.

B.Decrease the beam search width to 1.

C.Decrease the top-k sampling threshold.

D.Add more examples of repetitive text to the training dataset.

AnswerA

Higher temperature increases randomness and diversity in generated text.

Why this answer

Increasing the temperature parameter to 0.9 raises the randomness of the probability distribution over tokens, allowing less likely tokens to be selected. This directly counteracts repetitive output by encouraging the model to explore more diverse word choices, which is a standard technique for improving creativity in text generation.

Exam trap

Google Cloud often tests the misconception that decreasing sampling thresholds (like top-k or beam width) increases diversity, when in fact they reduce the candidate pool and make output more deterministic.

How to eliminate wrong answers

Option B is wrong because decreasing beam search width to 1 reduces the number of candidate sequences considered, which actually makes output more deterministic and less diverse, worsening repetitiveness. Option C is wrong because decreasing the top-k sampling threshold restricts the model to only the k most likely tokens, which reduces diversity and can increase repetition. Option D is wrong because adding more examples of repetitive text to the training dataset would reinforce the unwanted behavior, making the model more likely to generate repetitive output, not less.

Full explanation →

75

MCQmedium

A team uses Vertex AI to host a large language model. They want to reduce latency for real-time applications. What is the best strategy?

A.Increase number of replicas

B.Switch to a smaller model

C.Use model quantization

D.Use batch prediction instead of online

AnswerC

Quantization reduces model size and speeds up inference.

Why this answer

Option C is correct because model quantization reduces the precision of the model's weights (e.g., from FP32 to INT8), which decreases memory footprint and computational requirements, directly lowering inference latency for real-time applications on Vertex AI. This is a standard optimization technique for deploying large language models with minimal accuracy loss while meeting latency SLAs.

Exam trap

Google Cloud often tests the misconception that scaling resources (replicas) directly reduces latency, when in fact latency optimization requires algorithmic changes like quantization or pruning, not just horizontal scaling.

How to eliminate wrong answers

Option A is wrong because increasing the number of replicas improves throughput and availability but does not reduce per-request latency; it may even add overhead from load balancing. Option B is wrong because switching to a smaller model reduces latency but sacrifices model capability and output quality, which is not a 'best' strategy when the team specifically needs a large language model. Option D is wrong because batch prediction is designed for asynchronous, high-throughput scenarios and introduces higher latency per request, making it unsuitable for real-time applications.

Full explanation →

Page 1 of 7

All pages

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output

See all domains with question counts →