Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 76–150

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 2 of 7

76

Multi-Selectmedium

A team notices the RAG pipeline sometimes retrieves irrelevant documents. Which THREE improvements should they consider? (Choose three.)

Select 3 answers

A.Add a reranking step

B.Use exact keyword matching instead of embedding similarity

C.Increase chunk size of documents

D.Reduce the number of retrieved documents

E.Use a higher quality embedding model

AnswersA, D, E

Reranks retrieved documents by relevance.

Why this answer

Using a higher quality embedding model improves semantic understanding, adding a reranking step refines results, and reducing the number of retrieved documents reduces noise. Increasing chunk size can dilute relevance, and using exact keyword matching loses semantic context.

Full explanation →

77

MCQhard

A large enterprise is migrating their on-premise ML workloads to Vertex AI. They have a custom PyTorch model for text classification that they want to serve with minimal code changes. Which Vertex AI capability should they use for model serving?

A.Vertex AI Endpoints with a pre-built PyTorch runtime

B.Vertex AI Prediction with a custom container

C.Vertex AI Model Garden

D.Vertex AI Vector Search for approximate nearest neighbor

AnswerB

Custom containers support any framework and allow minimal code changes.

Why this answer

Option C is correct because Vertex AI Prediction with a custom container allows them to package their PyTorch model with dependencies and serve it without rewriting code. Option A (Vertex AI Endpoints) is an older term; custom containers are the way. Option B (Vertex AI Model Garden) hosts pre-built models.

Option D (Vertex AI Vector Search) is for embeddings, not classification.

Full explanation →

78

MCQeasy

To ensure that a generative AI model uses the most current information from the web for answering user queries, which Vertex AI feature should be enabled?

A.Grounding with Google Search

B.Safety filters

C.Context caching

D.Model tuning

AnswerA

Correct: This feature retrieves current web information to ground responses.

Why this answer

Grounding with Google Search is the correct feature because it enables the model to retrieve and reference real-time information from the web, ensuring responses are based on the most current data available. This is achieved by integrating Google Search results directly into the model's generation process, allowing it to cite live sources and reduce hallucinations from outdated training data.

Exam trap

Google Cloud often tests the distinction between features that improve output quality through external data retrieval (Grounding) versus those that modify the model's internal behavior (tuning, caching, filtering), leading candidates to confuse safety or optimization features with live data access.

How to eliminate wrong answers

Option B is wrong because safety filters are designed to block harmful or inappropriate content, not to fetch current web information. Option C is wrong because context caching stores frequently accessed context to reduce latency and cost, but it does not provide live web data. Option D is wrong because model tuning adjusts the model's parameters on a specific dataset to improve performance on a task, but it does not enable real-time web retrieval.

Full explanation →

79

MCQeasy

A developer is using Vertex AI Studio to test prompts for a text generation model. They want the model to follow a specific output format (JSON). Which prompt engineering approach is most effective?

A.Set stop sequences to '}'.

B.Include a few-shot example of the exact JSON format in the prompt.

C.Set the system instruction to 'Always output JSON.'

D.Set temperature to 0 to make output deterministic.

AnswerB

Providing an example gives the model a concrete template to follow.

Why this answer

Option B is correct because including a few-shot example of the exact JSON format in the prompt provides the model with a concrete pattern to follow, which is the most reliable method for enforcing structured output in generative models. Few-shot prompting leverages in-context learning, where the model uses the provided example to infer the desired schema and formatting rules, reducing ambiguity and improving adherence to the specified JSON structure.

Exam trap

Google Cloud often tests the misconception that system instructions or hyperparameter tuning alone can enforce output format, when in practice, few-shot examples are the most direct and reliable method for guiding model behavior in structured generation tasks.

How to eliminate wrong answers

Option A is wrong because setting stop sequences to '}' would prematurely terminate generation at the first closing brace, which may cut off nested JSON objects or arrays, and does not guarantee the model outputs valid JSON from the start. Option C is wrong because a system instruction like 'Always output JSON' is a high-level directive that models often fail to follow precisely without explicit formatting examples, as they may still produce markdown, extra text, or malformed JSON. Option D is wrong because setting temperature to 0 makes output deterministic but does not enforce a specific output format; the model could still generate non-JSON text or deviate from the required schema, as temperature controls randomness, not structure.

Full explanation →

80

MCQhard

A team is using Vertex AI Pipelines to deploy a generative AI model for real-time inference. The model sometimes generates harmful content. They want to implement a safety filter that checks the output before returning it to the user, but they need to minimize latency. Which approach best balances safety and performance?

A.Use a secondary lightweight classifier to filter outputs in real-time.

B.Retrain the model on every flagged harmful output.

C.Manually review all outputs before delivery.

D.Disable safety checks to improve latency.

AnswerA

A small classifier adds minimal latency while providing effective filtering.

Why this answer

Option A is correct because deploying a secondary lightweight classifier (e.g., a distilled BERT or a small logistic regression model) as a post-processing filter allows real-time inference with minimal latency overhead. This approach decouples safety from the primary generative model, enabling fast rejection of harmful outputs without retraining or blocking the main inference pipeline.

Exam trap

Google Cloud often tests the misconception that safety must be integrated into the generative model itself (e.g., via retraining or fine-tuning), when in practice a separate, lightweight post-processing filter is the standard for low-latency production systems.

How to eliminate wrong answers

Option B is wrong because retraining the model on every flagged harmful output is computationally expensive, introduces significant latency, and can lead to catastrophic forgetting or overfitting to specific examples, making it impractical for real-time inference. Option C is wrong because manual review of all outputs introduces unacceptable latency and does not scale, violating the requirement to minimize latency. Option D is wrong because disabling safety checks entirely eliminates the safety requirement, which is explicitly needed, and would expose users to harmful content, failing the core objective.

Full explanation →

81

MCQhard

A financial services firm uses a fine-tuned Gemini model in Vertex AI for regulatory compliance checks. They notice that token usage is high, increasing costs. They want to reduce costs without sacrificing accuracy. Which approach should they take?

A.Switch to a smaller base model like PaLM 2 Bison

B.Enable context caching to reuse previous responses

C.Set max output tokens to a lower value and use more precise prompts

D.Reduce temperature to 0.0

AnswerC

Directly reduces output tokens; precise prompts maintain accuracy.

Why this answer

Option C is correct because reducing max output tokens directly lowers the number of tokens generated per request, which is the primary cost driver in pay-per-token models like Gemini. Using more precise prompts further reduces token waste by guiding the model to produce concise, relevant outputs without sacrificing accuracy, as compliance checks often require specific, structured responses rather than verbose explanations.

Exam trap

The trap here is that candidates often confuse cost-reduction strategies that affect model behavior (like temperature or model size) with those that directly reduce token count, leading them to pick options that change output quality rather than token usage.

How to eliminate wrong answers

Option A is wrong because switching to a smaller base model like PaLM 2 Bison may reduce per-token cost but can degrade accuracy on complex regulatory compliance tasks, as smaller models have less capacity for nuanced understanding and may miss critical compliance nuances. Option B is wrong because context caching is designed to reduce latency and cost for repeated identical prompts by reusing cached responses, but it does not help when each compliance check involves unique input data (e.g., different contracts or transactions), making cache hits unlikely. Option D is wrong because setting temperature to 0.0 makes the model deterministic but does not reduce token usage; it may even increase token count if the model becomes overly repetitive or verbose in its attempts to be precise.

Full explanation →

82

MCQmedium

An e-commerce company is using Vertex AI PaLM 2 for Text (via Model Garden) to generate product descriptions. They have an existing pipeline that calls the model with a prompt including product attributes. Recently, they migrated to the Gemini API. The team notices that the Gemini model sometimes outputs descriptions that are factually inconsistent with the input (e.g., wrong color or size). This was less frequent with PaLM 2. They have not changed the prompts. What is the most likely cause and solution?

A.Revert to PaLM 2 since it was more reliable for this task.

B.Add negative prompts to discourage incorrect facts.

C.Adjust the prompt to be more explicit about adhering to the input data, and reduce the temperature.

D.Increase the model's temperature to make outputs more deterministic.

AnswerC

Different models may require slight prompt adjustments; lower temperature and clearer instructions improve factual precision.

Why this answer

Option C is correct. Gemini may follow instructions differently; adjusting the prompt or temperature can help. Option A (increase model temperature) may increase randomness, worsening consistency.

Option B (add negative prompts) might not address factual alignment. Option D (switch back to PaLM 2) does not solve the cause.

Full explanation →

83

MCQeasy

A developer wants to generate Python code using Google Cloud's generative AI. Which model should they invoke?

A.Chirp

B.Codey

C.Imagen

D.Meena

AnswerB

Codey is designed for code generation.

Why this answer

Option A is correct because Codey is Google Cloud's specialized code generation model. Option B is wrong because Imagen generates images. Option C is wrong because Chirp is for audio.

Option D is wrong because Meena is a chatbot, not code-focused.

Full explanation →

84

MCQmedium

A developer uses the Gemini API to summarize long articles. The summaries often miss key points from the end of the article. Which technique specifically addresses this length-based loss of information?

A.Increase the max output tokens to 2048

B.Break the article into sections and ask the model to summarize each section, then combine

C.Truncate the article to the first 2000 tokens

D.Use a different model with a larger context window

AnswerB

This structured approach ensures each part is summarized, mitigating attention drop-off.

Why this answer

Option D is correct because instructing the model to 'summarize each section and then combine' (chain-of-thought style) helps it process the full document. Option A is wrong because increasing max tokens doesn't change how the model attends to the input. Option B is wrong because truncation worsens the problem.

Option C is wrong because a different model with a longer context window could help, but the question is about technique for the current model.

Full explanation →

85

MCQhard

The exhibit shows the response from a model deployed on Vertex AI that includes safety attributes. The application must reject any prediction where the toxicity score exceeds 0.8. Based on the response, what action should the application take?

A.Reject the prediction because the toxicity score exceeds 0.8.

B.Retry the request with a lower temperature.

C.Display the prediction because the insult score is below 0.8.

D.Log the prediction but still display it.

AnswerA

Toxicity 0.85 > 0.8, so reject.

Why this answer

The toxicity score is 0.85 which exceeds 0.8, so the application should reject the prediction and not display it.

Full explanation →

86

MCQmedium

A company is using a generative AI model for internal report generation. They notice costs are high because each request processes large amounts of text. Which business strategy would most effectively reduce costs while maintaining quality?

A.Fine-tune a smaller model on a specialized dataset.

B.Use a more powerful model to reduce retries.

C.Implement caching for repeated requests.

D.Increase the batch size for online predictions.

AnswerA

A smaller fine-tuned model can provide sufficient quality at lower cost for specific tasks.

Why this answer

Fine-tuning a smaller model on a specialized dataset reduces computational cost per inference because smaller models have fewer parameters and require less memory and processing power. By tailoring the model to the company's specific domain (e.g., internal reports), it can maintain output quality comparable to a larger general-purpose model, directly addressing the cost-per-request issue without sacrificing accuracy.

Exam trap

Google Cloud often tests the misconception that 'bigger is always better' or that caching universally reduces costs, but the trap here is that candidates overlook the unique nature of generative AI outputs and the cost benefits of model specialization over raw scale or caching.

How to eliminate wrong answers

Option B is wrong because using a more powerful model typically increases per-request cost and latency, and while it may reduce retries, the net cost often rises due to higher compute requirements. Option C is wrong because caching only helps if identical requests are repeated frequently; for generative AI report generation, each request is often unique (different text inputs), making caching ineffective for reducing per-request processing costs. Option D is wrong because increasing batch size for online predictions can reduce per-request cost only if requests are batched together, but online (real-time) predictions usually require low latency and process one request at a time, so larger batch sizes are not applicable and may increase latency.

Full explanation →

87

MCQhard

A company with limited AI expertise wants to adopt gen AI. They need a solution that integrates with existing data and applications. Which Google Cloud offering is best?

A.Apigee

B.Colab Enterprise

C.BigQuery ML

D.Vertex AI Agent Builder

AnswerD

Provides a low-code platform for building and deploying gen AI agents that integrate with enterprise data and applications.

Why this answer

Option D is correct because Vertex AI Agent Builder is designed for building conversational AI agents with easy integration to enterprise data sources. Option A is wrong because Colab Enterprise is a notebook environment, not a full solution. Option B is wrong because Apigee is an API management platform.

Option C is wrong because BigQuery ML is for SQL-based ML, not gen AI agents.

Full explanation →

88

MCQmedium

A global nonprofit organization is deploying a generative AI chatbot to provide educational content in multiple languages to underserved communities. They operate in regions with limited internet connectivity. The chatbot must work offline or with minimal data usage. The team has a moderate budget and limited technical staff. Which deployment strategy should they use?

A.Fine-tune an open-source model and host it on a cloud VM with auto-scaling

B.Deploy a distilled version of the model on edge devices using TensorFlow Lite

C.Host a large foundation model on Google Cloud and use a mobile app to send API requests

D.Deploy a distill of a smaller model on Google Cloud VM instances

AnswerB

Enables offline inference with low resource usage.

Why this answer

Option B is correct because deploying a distilled version of the model on edge devices using TensorFlow Lite directly addresses the constraints of offline operation, minimal data usage, and limited technical staff. Distillation reduces model size and computational requirements, enabling inference on local hardware without cloud dependency, which is critical for underserved regions with intermittent connectivity.

Exam trap

The trap here is that candidates confuse 'distillation on edge' with 'distillation on cloud VMs' (Option D), overlooking that edge deployment is the only way to guarantee offline functionality, while cloud VMs still require network access for inference.

How to eliminate wrong answers

Option A is wrong because hosting a fine-tuned model on a cloud VM with auto-scaling requires constant internet connectivity for the chatbot to function, which fails the offline requirement. Option C is wrong because using a large foundation model via API requests from a mobile app incurs high data usage and relies on continuous cloud access, contradicting the need for minimal data usage and offline capability. Option D is wrong because deploying a distilled model on Google Cloud VM instances still requires internet connectivity for inference, missing the offline requirement, and does not leverage edge deployment for local processing.

Full explanation →

89

MCQmedium

Refer to the exhibit. A developer executed the command to list endpoints. They notice that two models are deployed to the same endpoint. What is the most likely reason for this configuration?

A.It is a canary deployment with traffic splitting

B.The endpoint is misconfigured and will cause conflicts

C.The models are from different frameworks

D.It is a batch prediction endpoint

AnswerA

Multiple models on the same endpoint support gradual rollout by splitting traffic.

Why this answer

A is correct because deploying two models to the same endpoint with traffic splitting is a standard canary deployment strategy. In this configuration, a small percentage of inference requests are routed to the new model while the majority go to the stable model, allowing validation of the new model's performance before full rollout. This is commonly supported by model serving platforms like Amazon SageMaker, where you can specify a production variant with a traffic weight (e.g., 90%) and a canary variant with a lower weight (e.g., 10%).

Exam trap

Google Cloud often tests the misconception that deploying two models to the same endpoint is always an error, when in fact it is a deliberate pattern for canary testing or A/B testing with traffic splitting.

How to eliminate wrong answers

Option B is wrong because deploying two models to the same endpoint with traffic splitting is a deliberate, supported configuration, not a misconfiguration; conflicts are avoided by routing traffic based on defined weights. Option C is wrong because models from different frameworks can be deployed to the same endpoint without issue, as the serving layer handles framework-specific inference containers independently. Option D is wrong because batch prediction endpoints typically use a single model or a single job configuration, not multiple models deployed simultaneously with traffic splitting.

Full explanation →

90

MCQeasy

Which of the following is a key consideration when selecting a GenAI model for a cost-sensitive application?

A.Model size in parameters

B.Latency and throughput requirements

C.Number of training epochs

D.The model's training data source

AnswerB

Latency and throughput directly determine the infrastructure needed and thus the cost per inference.

Why this answer

For cost-sensitive applications, latency and throughput requirements directly impact infrastructure costs, as lower latency often requires more expensive compute resources (e.g., higher GPU memory, faster inference hardware) and higher throughput may necessitate scaling out instances. Model size in parameters is a secondary factor that influences latency and throughput, but the primary cost driver is the operational performance needed to meet service-level agreements (SLAs).

Exam trap

Google Cloud often tests the misconception that model size (parameters) is the primary cost driver, but the exam emphasizes that operational metrics like latency and throughput are the direct determinants of infrastructure cost in production.

How to eliminate wrong answers

Option A is wrong because model size in parameters affects memory and compute requirements but is not the key consideration for cost sensitivity; a smaller model can still be costly if latency or throughput demands are high. Option C is wrong because number of training epochs is a training-time hyperparameter that does not directly influence inference cost or operational cost in a deployed application. Option D is wrong because the model's training data source impacts bias, accuracy, and compliance, but not the direct operational cost of running inference at scale.

Full explanation →

91

Multi-Selecthard

An organization is building a generative AI application on Vertex AI. Which THREE actions should they take to ensure responsible AI practices?

Select 3 answers

A.Disable content filtering

B.Implement human review for sensitive outputs

C.Conduct fairness evaluation

D.Create a safety policy and enforce via content filtering

E.Use only Google's foundation models

AnswersB, C, D

Human review ensures appropriate oversight.

Why this answer

Implementing human review, conducting fairness evaluation, and creating a safety policy with content filtering are key responsible AI practices. Using only Google's models does not guarantee responsibility, and disabling filtering is counterproductive.

Full explanation →

92

MCQeasy

A startup wants to embed generative AI features into their mobile app but has limited ML expertise. Which Google Cloud service is best suited for rapid integration with no ML training?

A.Vertex AI Model Garden

B.Vertex AI Agent Builder

C.Gemini API

D.Cloud Run with a custom container

AnswerC

Gemini API is ready-to-use with minimal setup.

Why this answer

Option C is correct because the Gemini API provides a simple REST API endpoint without requiring ML expertise. Option A is wrong because Vertex AI Model Garden still requires some ML knowledge. Option B is wrong because Vertex AI Agent Builder is more complex.

Option D is wrong because Cloud Run is a compute service, not AI-specific.

Full explanation →

93

MCQeasy

A company uses a generative AI model to answer customer queries. The model sometimes returns outdated information. Which technique should they apply to ensure responses rely on current data?

A.Fine-tune the model on historical data.

B.Extend the context window to include more tokens.

C.Increase the model's temperature to encourage novelty.

D.Use grounding with a refreshed knowledge base.

AnswerD

Grounding connects the model to a current, authoritative data source, ensuring recency.

Why this answer

Grounding the model with up-to-date source documents ensures responses are based on current information, reducing outdated outputs. Option A is wrong because prompt engineering alone cannot guarantee recency. Option B is wrong because fine-tuning with old data may perpetuate outdated patterns.

Option D is wrong because temperature adjustment does not affect factual recency.

Full explanation →

94

MCQeasy

A developer is using the Gemini API to generate creative product taglines. The taglines are often bland and uncreative. The developer wants more variety and novelty in the outputs. Which parameter adjustment would most effectively increase the diversity of the generated taglines?

A.Decrease top_p from 1.0 to 0.5.

B.Set frequency_penalty to 2.0.

C.Increase temperature from 0.2 to 0.9.

D.Decrease temperature from 0.7 to 0.2.

AnswerC

Higher temperature increases randomness, leading to more diverse and creative outputs.

Why this answer

Option A is correct because higher temperature increases the randomness and creativity of the output. Option B is wrong because lower temperature makes output more deterministic and less creative. Option C is wrong because lower top_p reduces diversity.

Option D is wrong because a high frequency penalty may discourage novel words, reducing creativity.

Full explanation →

95

MCQhard

A research organization is building a generative AI model to assist in drug discovery by generating molecular structures. They have a large dataset of proprietary chemical compounds and want to train a model from scratch. They have extensive ML expertise but limited GPU resources. The organization must comply with strict data privacy regulations that prohibit data from leaving their on-premises environment. Which strategy enables them to train the model efficiently while meeting compliance?

A.Train the model entirely on-premises using existing servers

B.Use Google Cloud Confidential VMs with attached GPUs for secure training

C.Partner with a cloud provider to train the model on their infrastructure

D.Transfer the data to Google Cloud and use standard GPU instances

AnswerB

Confidential VMs encrypt data in use, meeting privacy needs with scalable GPUs.

Why this answer

Google Cloud Confidential VMs with attached GPUs provide hardware-based memory encryption (using AMD SEV or Intel TDX) that protects data in use, enabling secure training on sensitive proprietary chemical data in the cloud. This allows the organization to leverage scalable GPU resources for efficient model training while maintaining compliance with strict data privacy regulations that prohibit data from leaving their on-premises environment.

Exam trap

Google Cloud often tests the misconception that any cloud GPU instance is sufficient for compliance, but the trap here is that standard GPU instances lack in-use memory encryption, which is required when data privacy regulations prohibit data from leaving the on-premises environment.

How to eliminate wrong answers

Option A is wrong because training entirely on-premises using existing servers would be inefficient due to limited GPU resources, leading to excessively long training times for a generative AI model from scratch. Option C is wrong because partnering with a cloud provider without specifying a secure, encrypted compute environment (like Confidential VMs) would expose the proprietary data to potential privacy risks and violate compliance requirements. Option D is wrong because transferring data to Google Cloud and using standard GPU instances does not provide the necessary in-use data encryption, leaving the data vulnerable during processing and failing to meet strict data privacy regulations.

Full explanation →

96

MCQhard

A financial services firm is using Vertex AI to generate investment reports. They need to ensure that the model outputs are explainable and comply with regulatory requirements. Which Vertex AI feature should they use?

A.Vertex AI Model Registry

B.Vertex Explainable AI

C.Vertex AI Safety Settings

D.Vertex AI AutoML

AnswerB

Explainable AI provides attributions for model predictions, aiding regulatory compliance.

Why this answer

Vertex Explainable AI provides feature importance and explanations for model predictions. Option A is wrong because Model Registry is for version management. Option B is wrong because Safety Settings block harmful content but don't provide explanations.

Option D is wrong because AutoML is for model training, not explainability.

Full explanation →

97

MCQhard

A company deploys a Gemini model on Vertex AI for a healthcare application. They need to ensure that the model does not generate medical advice and that responses are grounded in trusted medical sources. Which combination of safety measures should they implement?

A.Enable safety filters and use Vertex AI Grounding with a labeled medical dataset

B.Use Vertex AI Grounding with a public dataset and disable safety filters

C.Enable safety filters only, without grounding

D.Fine-tune the model on a curated medical dataset and disable safety filters for faster responses

AnswerA

This combination ensures safety and factual grounding.

Why this answer

Option D is correct because safety filters and grounded generation with a labeled dataset ensure compliance. Option A is wrong because fine-tuning alone does not enforce safety filters. Option B is wrong because disabling grounding increases hallucination risk.

Option C is wrong because using a public dataset without safety filters is risky.

Full explanation →

98

Multi-Selecthard

A global e-commerce company uses generative AI to generate product descriptions in multiple languages. They want to ensure consistency across markets while respecting cultural nuances. Which THREE strategies should they adopt?

Select 3 answers

A.Standardize all descriptions to a neutral tone to avoid cultural issues.

B.Develop region-specific prompt templates that incorporate local cultural references and legal requirements.

C.Engage local marketing teams to review and approve AI-generated descriptions before publication.

D.Use a single global model with a translation layer to convert English descriptions.

E.Use A/B testing to measure engagement metrics per region and iterate on prompts.

AnswersB, C, E

Tailored prompts guide the model to produce culturally appropriate content.

Why this answer

Options A, C, and E are correct. Region-specific prompts, local human review, and A/B testing ensure consistency and cultural sensitivity. Option B is wrong because direct translation may miss nuances.

Option D is wrong because it reduces localization.

Full explanation →

99

MCQeasy

A company wants to build a chatbot using Vertex AI that can answer customer questions based on their internal knowledge base. Which Google Cloud service should they use to store and retrieve the knowledge base efficiently?

A.Cloud Storage

B.Vertex AI Vector Search

C.BigQuery

D.Vertex AI Matching Engine

AnswerB

Vertex AI Vector Search provides scalable vector similarity search for knowledge retrieval.

Why this answer

Vertex AI Vector Search is the correct choice because it is purpose-built for semantic similarity search over embeddings, enabling the chatbot to retrieve relevant chunks from the knowledge base based on meaning rather than exact keyword matches. It integrates natively with Vertex AI and supports high-dimensional vector indexing, making it efficient for large-scale retrieval-augmented generation (RAG) workflows.

Exam trap

The trap here is that Google Cloud often tests the rebranding of Vertex AI Matching Engine to Vertex AI Vector Search, leading candidates to select the outdated service name (Matching Engine) instead of the current correct name (Vector Search).

How to eliminate wrong answers

Option A is wrong because Cloud Storage is an object storage service for unstructured data, not a vector database; it lacks built-in similarity search capabilities and would require additional services to perform semantic retrieval. Option C is wrong because BigQuery is a serverless data warehouse designed for SQL-based analytics on structured data, not for storing and querying dense vector embeddings with approximate nearest neighbor (ANN) search. Option D is wrong because Vertex AI Matching Engine is the previous name for what is now Vertex AI Vector Search; the service was rebranded, so the current correct name is Vector Search, making Matching Engine a deprecated or legacy term in this context.

Full explanation →

100

Multi-Selecthard

A machine learning engineer is tuning a large language model on Vertex AI for question answering. They want to evaluate the model's performance before deployment. Which THREE metrics should they consider?

Select 3 answers

A.Cost per training epoch

B.F1 score

C.Exact match (EM)

D.Training time per epoch

E.ROUGE-L score

AnswersB, C, E

F1 balances precision and recall.

Why this answer

Options A, B, and D are correct for evaluation. Option C is wrong because training time is not a performance metric. Option E is wrong because cost per epoch is a cost metric, not performance.

Full explanation →

101

Multi-Selecteasy

Which TWO are advantages of using Retrieval-Augmented Generation (RAG) over fine-tuning?

Select 2 answers

A.No need to retrain the base model

B.Requires less data preparation

C.Lower inference latency

D.More secure because model weights are not modified

E.Better suited for rapidly changing knowledge bases

AnswersA, E

Correct: RAG works with the pre-trained model and a retrieval system.

Why this answer

RAG doesn't require retraining and is easy to update with new information. Inference latency is typically higher for RAG due to retrieval, and data preparation is still needed for indexing.

Full explanation →

102

MCQeasy

A developer is using the Gemini API to generate creative marketing copy. They want the output to be more diverse and unexpected. Which parameter should they increase?

A.Temperature.

B.Presence penalty.

C.Top-p.

D.Frequency penalty.

AnswerA

Higher temperature increases randomness and diversity.

Why this answer

Option A is correct because increasing temperature increases randomness, leading to more diverse and unexpected outputs. Option B (top-p) also affects diversity but temperature has a more direct effect. Options C and D (frequency and presence penalties) discourage repetition and encourage novelty, but temperature is the primary parameter for randomness.

Full explanation →

103

MCQhard

An organization uses a fine-tuned model for medical diagnosis and must comply with HIPAA. Which measure is essential when deploying the model on Vertex AI?

A.Store all patient data in Cloud Storage with object versioning.

B.Enable encryption at rest for all resources.

C.Use a publicly accessible endpoint for faster response times.

D.Use a private Google Cloud Access and disable external internet access for the endpoint.

AnswerD

This ensures the endpoint is not publicly accessible, a key requirement for HIPAA.

Why this answer

Option D is correct because HIPAA requires that patient data be protected from unauthorized access during transmission and deployment. Using a private Google Cloud Access endpoint with external internet access disabled ensures that the model endpoint is only reachable within the organization's VPC network, preventing data exposure over the public internet and meeting HIPAA's security rule for safeguarding electronic protected health information (ePHI).

Exam trap

Google Cloud often tests the misconception that encryption at rest or basic data storage features are sufficient for HIPAA compliance, when in fact network-level access controls (like private endpoints) are the critical measure for protecting ePHI during model inference.

How to eliminate wrong answers

Option A is wrong because storing patient data in Cloud Storage with object versioning provides data retention and recovery capabilities but does not address the core requirement of securing the model endpoint or controlling network access, which is essential for HIPAA compliance. Option B is wrong because enabling encryption at rest for all resources is a baseline security practice and is already enabled by default in Google Cloud; it does not specifically address the need to restrict network access to the deployed model endpoint, which is a key HIPAA requirement. Option C is wrong because using a publicly accessible endpoint for faster response times directly violates HIPAA's requirement to protect ePHI from unauthorized access, as a public endpoint exposes the model to the internet and increases the risk of data breaches.

Full explanation →

104

MCQhard

Refer to the exhibit. A Vertex AI endpoint configured with the above deployment is returning HTTP 429 (Too Many Requests) errors during peak traffic. The current CPU utilization reaches 80% consistently. What should the team adjust to resolve this?

A.Increase maxReplicaCount to 10

B.Increase scaleTarget to 0.9

C.Change machineType to n1-highmem-2

D.Increase minReplicaCount to 2

AnswerA

Correct: Higher max allows more replicas to handle traffic spikes.

Why this answer

429 errors indicate insufficient capacity. Increasing maxReplicaCount allows more replicas to be added when load increases. Changing machine type or scale target would not directly address the capacity shortage.

Full explanation →

105

MCQeasy

A developer wants to integrate Gemini multimodal capabilities (text + image) into a mobile app using Python. Which Google Cloud client library should they use?

A.Dialogflow CX

B.Vertex AI client library (google-cloud-aiplatform)

C.Cloud Vision API

D.Natural Language API

AnswerB

The Vertex AI client library supports Gemini API for multimodal generation.

Why this answer

The Google Generative AI client library (or Vertex AI client library for Gemini API) supports multimodal inputs. Option A is wrong because Cloud Vision is for image analysis, not generative. Option B is wrong because Natural Language is text-only.

Option D is wrong because Dialogflow is for conversational agents, not direct API calls.

Full explanation →

106

MCQmedium

A bank wants to use LLMs to generate responses for customer support chat. All conversations must be logged, and any PII must be masked. The solution must comply with financial regulations. Which combination of Vertex AI services should be used?

A.Deploy a custom model on Cloud Run and write a Cloud Function to mask PII.

B.Use Vertex AI Prediction with a custom container that masks PII before inference.

C.Use the Gemini API directly with a custom logging solution in Cloud Logging.

D.Use Vertex AI Agent Builder with Data Governance, which can automatically mask PII and log interactions.

AnswerD

B is correct because it provides built-in compliance features.

Why this answer

Option D is correct because Vertex AI Agent Builder integrates with Data Governance to automatically mask PII and log interactions, meeting both the logging and compliance requirements without custom development. This managed service ensures adherence to financial regulations by providing built-in data loss prevention (DLP) capabilities and audit trails, unlike the other options which require manual or less integrated approaches.

Exam trap

Google Cloud often tests the misconception that custom development (e.g., Cloud Functions or custom containers) is necessary for PII masking and logging, when in fact managed services like Vertex AI Agent Builder with Data Governance provide a more compliant and integrated solution out of the box.

How to eliminate wrong answers

Option A is wrong because deploying a custom model on Cloud Run with a Cloud Function for PII masking introduces operational complexity and latency, and does not natively integrate with Vertex AI's logging or compliance features, risking gaps in regulatory adherence. Option B is wrong because using Vertex AI Prediction with a custom container that masks PII before inference still requires custom development for logging and does not leverage Vertex AI's built-in data governance, making it harder to ensure consistent compliance across all interactions. Option C is wrong because using the Gemini API directly with a custom logging solution in Cloud Logging lacks automatic PII masking and data governance, forcing manual implementation that is error-prone and may not meet strict financial regulations for auditability and data protection.

Full explanation →

107

Multi-Selectmedium

A company is building a generative AI application that must adhere to strict data residency regulations. Which TWO Google Cloud features can help ensure that data does not leave a specific geographic region?

Select 2 answers

A.Using the regional endpoint for Vertex AI

B.Vertex AI Model Caching

C.Global load balancer with Cloud Armor

D.Cloud CDN for content delivery

E.Deploying models on dedicated VMs in a specific region

AnswersA, E

Regional endpoints ensure API calls stay within the region.

Why this answer

Options B and D are correct. Dedicated VMs and the regional endpoint for Vertex AI ensure data stays in a specified region. Option A is wrong because model caching may use regional resources but does not enforce residency.

Option C is wrong because Cloud CDN distributes data globally. Option E is wrong because Global Load Balancer routes traffic globally.

Full explanation →

108

MCQmedium

Refer to the exhibit. A team has deployed a model to an endpoint with the configuration shown. They notice that during peak traffic, the endpoint frequently returns 429 (Too Many Requests) errors. Which action should they take to resolve this issue?

A.Change MACHINE_TYPE to n1-highmem-4

B.Increase MIN_REPLICA_COUNT to 5

C.Decrease MAX_REPLICA_COUNT to 1

D.Disable autoscaling by setting MIN_REPLICA_COUNT equals MAX_REPLICA_COUNT

AnswerB

More minimum replicas provide capacity for sudden traffic spikes.

Why this answer

Increasing MIN_REPLICA_COUNT ensures a minimum number of replicas are always available to handle traffic bursts, reducing 429 errors. Other options would not help or would worsen the problem.

Full explanation →

109

MCQeasy

You are using Vertex AI Model Garden to deploy a Llama model. Which deployment option provides the best latency for real-time inference?

A.Use Batch Prediction

B.Deploy to a Compute Engine VM

C.Deploy to Vertex AI Endpoint with a fixed number of replicas

D.Use MaaS (Model-as-a-Service) with autoscaling

AnswerC

Fixed replicas ensure always-on instances for low latency.

Why this answer

Option C is correct because a fixed number of replicas avoids cold starts and provides consistent low latency. Option A is wrong because Compute Engine VMs require manual management and may not be optimized. Option B is wrong because MaaS with autoscaling can have cold starts.

Option D is wrong because batch prediction is for asynchronous workloads, not real-time.

Full explanation →

110

MCQmedium

A data scientist notices that a text generation model deployed on Vertex AI returns repetitive outputs after a few turns in a chat application. What is the most likely cause and the best parameter adjustment?

A.The max_output_tokens is too low; increase it to allow more diverse output.

B.The top_p value is too high; reduce top_p to limit token sampling.

C.The model is overfitted; switch to a smaller model.

D.The temperature is too low; increase temperature to add randomness.

AnswerB

Reducing top_p narrows the token pool, reducing repetition.

Why this answer

Repetitive outputs in a chat application after a few turns are typically caused by the model getting stuck in a loop due to high cumulative probability from top-p sampling. Reducing top_p limits the set of tokens considered at each step, forcing the model to explore less likely tokens and breaking the repetition cycle. This directly addresses the issue without sacrificing coherence, unlike temperature adjustments which affect randomness globally.

Exam trap

Google Cloud often tests the misconception that temperature and top-p both control randomness in the same way, but the trap here is that candidates confuse 'increasing randomness' (temperature) with 'limiting the sampling pool' (top-p), leading them to choose D instead of B.

How to eliminate wrong answers

Option A is wrong because max_output_tokens controls the length of the output, not the diversity of token choices; increasing it would allow longer repetitive sequences, not fix the repetition. Option C is wrong because overfitting is a training-phase issue unrelated to inference-time repetition; switching to a smaller model would reduce capacity but not specifically address the sampling behavior causing loops. Option D is wrong because increasing temperature adds randomness to all token probabilities, which can actually worsen repetition by making the model more likely to pick high-probability tokens repeatedly; the problem is too much diversity in the sampling set, not too little.

Full explanation →

111

MCQhard

Refer to the exhibit. A data scientist is fine-tuning a model. The training loss and accuracy are improving each epoch. However, after training, the model performs poorly on a held-out validation set. What is the most likely issue?

A.Underfitting

B.Inappropriate learning rate

C.Data leakage

D.Overfitting

AnswerD

Overfitting leads to good training performance but poor validation.

Why this answer

The model's training loss and accuracy improve each epoch, but performance on the validation set is poor. This classic symptom indicates overfitting, where the model memorizes the training data (including noise) rather than learning generalizable patterns. In fine-tuning, this often occurs when the model is trained for too many epochs or the dataset is too small relative to model capacity.

Exam trap

Google Cloud often tests the distinction between overfitting and underfitting by presenting improving training metrics alongside poor validation performance, which candidates may misinterpret as a learning rate issue or data leakage if they do not recognize the hallmark divergence pattern.

How to eliminate wrong answers

Option A is wrong because underfitting would show poor performance on both training and validation sets, not improving training metrics. Option B is wrong because an inappropriate learning rate typically causes training instability (e.g., loss divergence or stagnation), not a clear divergence between training and validation performance. Option C is wrong because data leakage would cause both training and validation metrics to be artificially high (since validation data leaks into training), not a gap where training is good and validation is poor.

Full explanation →

112

Multi-Selecthard

A company is migrating an on-premises NLP pipeline to Vertex AI. Which three capabilities of Vertex AI align with common MLOps best practices for generative AI? (Choose THREE)

Select 3 answers

A.Automatic model retraining based on performance degradation

B.Local on-premises execution

C.Continuous training with Vertex AI Pipelines

D.Manual data labeling only

E.Model registry for versioning

AnswersA, C, E

Triggering retraining when performance drops is a key MLOps practice.

Why this answer

Option A is correct because Vertex AI's model monitoring can automatically trigger retraining when performance metrics (e.g., prediction drift or data drift) degrade below a threshold. This aligns with MLOps best practices for maintaining generative AI model quality over time without manual intervention.

Exam trap

Google Cloud often tests the misconception that MLOps for generative AI requires on-premises execution or manual-only labeling, but the correct answer emphasizes cloud-native automation and versioning as core best practices.

Full explanation →

113

MCQmedium

A developer uses a code generation model to write Python functions. The output frequently contains syntax errors due to incorrect braces and indentation. Which technique should be used to produce syntactically valid code?

A.Increase the temperature to introduce more varied token choices.

B.Apply constrained decoding techniques that enforce a grammar for the target programming language.

C.Fine-tune the model on a large corpus of syntactically correct Python code.

D.Provide a few-shot example of correct Python function in the prompt.

AnswerB

Constrained decoding ensures the generated tokens follow legal syntax rules.

Why this answer

Option A is correct because constrained decoding (e.g., with guidance or grammar) forces the output to match a formal grammar, preventing syntax errors. Option B is wrong because few-shot helps with format but does not enforce grammar. Option C is wrong because temperature changes do not fix syntax.

Option D is wrong because fine-tuning is heavy; constrained decoding is a lighter real-time fix.

Full explanation →

114

MCQmedium

A marketing agency uses gen AI for content generation. They need to brand consistently. What is a key business consideration?

A.Use only generated content

B.Implement content moderation and brand guidelines

C.Use the most creative model

D.Optimize for speed

AnswerB

Guides the model to produce on-brand content and review outputs.

Why this answer

Option B is correct because consistent branding requires enforcing predefined guidelines on tone, style, and terminology across all generated content. Without content moderation and brand guidelines, a generative AI model may produce off-brand, inconsistent, or even harmful outputs, undermining brand identity. This is a core business strategy for deploying gen AI at scale, ensuring alignment with marketing objectives.

Exam trap

Google Cloud often tests the misconception that generative AI can be deployed autonomously without governance, leading candidates to overvalue raw creativity or speed over the business-critical need for controlled, brand-aligned output.

How to eliminate wrong answers

Option A is wrong because relying solely on generated content without human oversight or curation risks producing factually incorrect, off-brand, or legally problematic material, as generative models lack inherent understanding of brand context. Option C is wrong because the most creative model may prioritize novelty over adherence to brand constraints, leading to unpredictable outputs that violate brand guidelines. Option D is wrong because optimizing for speed can sacrifice output quality and consistency, increasing the likelihood of generating content that fails to meet brand standards or requires extensive post-editing.

Full explanation →

115

MCQhard

A real-time customer support chatbot using Gemini is experiencing high latency. The team must maintain response quality while improving speed. Which technique should they implement?

A.Switch to a larger model

B.Increase the batch size

C.Use context caching for frequent queries

D.Decrease the temperature

AnswerC

Correct: Context caching speeds up responses for repeated patterns while retaining quality.

Why this answer

Context caching reuses context from common intents, reducing processing time without sacrificing quality. Temperature changes or larger models would not help latency or quality.

Full explanation →

116

MCQmedium

A team uses PaLM 2 API to generate product descriptions, but the output sometimes contains factual inaccuracies. What is the best approach to improve accuracy?

A.Increase the temperature parameter

B.Reduce the top_k value

C.Use grounding with Google Search

D.Set the max_output_tokens higher

AnswerC

Grounding supplies factual references, helping the model generate accurate information.

Why this answer

Grounding with Google Search is the correct approach because it allows the PaLM 2 API to retrieve real-time, verifiable information from the web, directly reducing factual inaccuracies in generated product descriptions. Unlike parameter adjustments, grounding provides an external knowledge source that the model can cite, ensuring outputs are based on current and accurate data rather than relying solely on its training data.

Exam trap

Google Cloud often tests the misconception that tuning generation parameters (temperature, top_k, max tokens) can fix factual accuracy issues, when in reality those parameters control randomness and length, not the model's reliance on its training data versus external sources.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter makes the model more random and creative, which would likely increase, not decrease, factual inaccuracies. Option B is wrong because reducing the top_k value limits the pool of tokens the model can sample from, which may reduce diversity but does not address the root cause of hallucination or factual errors. Option D is wrong because setting max_output_tokens higher only allows longer responses, which can actually increase the chance of generating more inaccuracies without improving factual correctness.

Full explanation →

117

MCQeasy

A startup wants to generate images from text descriptions for their marketing materials. They prefer a managed service that requires minimal coding. Which Google Cloud generative AI offering should they use?

A.Vertex AI Imagen

B.Natural Language API

C.Document AI

D.Cloud Speech-to-Text

AnswerA

Imagen provides text-to-image generation capabilities via Vertex AI.

Why this answer

Imagen on Vertex AI provides a managed API for text-to-image generation with simple API calls. Option A is wrong because Speech-to-Text is for audio transcription. Option B is wrong because Natural Language AI is for text analysis.

Option D is wrong because Document AI is for document processing.

Full explanation →

118

Multi-Selecthard

Which TWO actions can help reduce latency for an online prediction endpoint served by a large language model on Vertex AI? (Select TWO.)

Select 2 answers

A.Enable response caching for frequent similar queries

B.Increase the max input token limit to capture more context

C.Use automatic scaling to add more replicas

D.Deploy a smaller distilled version of the model

E.Set the model to preemptible instances

AnswersA, D

Caching avoids model inference for duplicate or similar requests, reducing latency.

Why this answer

Using a smaller model (A) and enabling response caching (D) can reduce latency. B increases latency. C reduces latency but is not a direct action (the platform handles it).

E does not exist as a standard optimization.

Full explanation →

119

MCQmedium

A data scientist runs the above command to upload a model to Vertex AI Model Registry. The model is a TensorFlow 2.6 model trained on tabular data. After deployment to an endpoint, the prediction latency is higher than expected. What is the most likely cause?

A.The artifact URI points to a single file instead of a directory

B.The model should be uploaded with a different display name

C.The container image used is CPU-only, but a GPU-accelerated image would improve latency

D.The model is uploaded to the wrong region

AnswerC

Using a CPU-only container for inference can be slower; a GPU image can reduce latency.

Why this answer

The command uses a tf2-cpu image; GPU-optimized images offer faster inference for many models. Option A is wrong because the command is correct. Option B is wrong because the artifact URI is a directory, and the command is correct.

Option C is wrong because the region is specified.

Full explanation →

120

MCQeasy

A developer wants to quickly experiment with different foundation models available in Google Cloud. Which tool should they use?

A.BigQuery ML

B.Cloud Console Compute Engine

C.Gen AI Studio in Vertex AI

D.Vertex AI Model Registry

AnswerC

Gen AI Studio allows prompt testing and model comparison.

Why this answer

Gen AI Studio provides a user interface to test prompts and compare models. Model Registry is for managing models, Compute Engine is for VMs, BigQuery ML is for SQL ML.

Full explanation →

121

Multi-Selecthard

An organization is developing a GenAI strategy for multiple business units. Which THREE steps should they take to ensure alignment? (Select three.)

Select 3 answers

A.Implement a chargeback model for usage costs

B.Allow each business unit to independently choose models

C.Establish common data governance policies

D.Create a center of excellence (CoE) for GenAI

E.Prioritize use cases based on ROI and risk

AnswersC, D, E

Common policies ensure data consistency, compliance, and reusability across units.

Why this answer

Establishing common data governance policies (C) ensures that all business units adhere to consistent standards for data quality, privacy, and security, which is critical for training and deploying reliable GenAI models. Without unified governance, disparate data practices can lead to model bias, compliance violations, and integration failures across the organization.

Exam trap

Google Cloud often tests the misconception that financial controls (chargeback) or decentralized model selection are sufficient for alignment, when in fact they miss the core need for shared governance, centralized expertise, and risk-based prioritization.

Full explanation →

122

MCQeasy

A company wants to use generative AI to summarize customer support tickets. Which Google Cloud tool is best suited for this task?

A.Vertex AI Text Generation (Gemini)

B.Dialogflow CX

C.Document AI

D.AutoML Tables

AnswerA

Vertex AI text generation supports summarization tasks via Gemini models.

Why this answer

Vertex AI Text Generation (Gemini) is the correct choice because it is a generative AI service specifically designed for natural language understanding and generation tasks, such as summarizing customer support tickets. Gemini models can process long-form text and produce concise, coherent summaries by leveraging transformer-based architectures fine-tuned for instruction-following and text completion. This makes it ideal for extracting key information from support conversations and generating actionable summaries.

Exam trap

The trap here is that candidates may confuse Dialogflow CX (a conversational AI builder) with a generative AI tool, overlooking that Dialogflow is for structured dialogue flows rather than open-ended text generation.

How to eliminate wrong answers

Option B (Dialogflow CX) is wrong because it is a conversational AI platform for building chatbots and virtual agents, not a generative text summarization tool; it focuses on intent recognition and dialogue management rather than free-form text generation. Option C (Document AI) is wrong because it is designed for document processing and extraction of structured data (e.g., OCR, form parsing) from scanned documents, not for generative summarization of unstructured text. Option D (AutoML Tables) is wrong because it is a tabular data modeling service for regression and classification tasks on structured datasets, not a natural language generation tool.

Full explanation →

123

MCQmedium

A developer is using Vertex AI's Generative AI Studio to prototype a text summarization model. The initial results are too verbose. What is the most efficient way to adjust the output length without retraining?

A.Switch to a smaller base model like BERT

B.Use a separate classifier to filter long responses

C.Fine-tune the model with a dataset of concise summaries

D.Modify the prompt with specific length instructions and adjust model parameters

AnswerD

Prompt engineering and parameter tuning can control output.

Why this answer

Option B is correct because adjusting parameters like max output tokens, temperature, and top_p directly controls verbosity in the prompt design. Option A is wrong because retraining is unnecessary. Option C is wrong because building a separate classifier adds complexity.

Option D is wrong because switching to a smaller base model may not yield desired quality.

Full explanation →

124

MCQmedium

Why is the model responding in English despite the prompt asking for French translation?

A.The model endpoint is configured for English only

B.The temperature is too high, causing random outputs

C.The system instruction to translate to French was not set; the user prompt alone is not sufficient

D.The maxOutputTokens is too low to complete the translation

AnswerC

Gemini requires system instruction for task specification.

Why this answer

Option C is correct because in Google Cloud's Vertex AI and Generative AI offerings, the system instruction is a separate, persistent directive that sets the model's behavior, such as language output. The user prompt alone, even if it asks for a French translation, is not sufficient to override the default language of the model; the system instruction must explicitly specify the target language. Without this instruction, the model defaults to its training language (typically English), regardless of the user's request.

Exam trap

The trap here is that candidates assume a user prompt's explicit instruction (e.g., 'Translate to French') is enough to override the model's default language, but in Google Cloud's Generative AI, the system instruction is the authoritative control for persistent behavior, not the user prompt.

How to eliminate wrong answers

Option A is wrong because model endpoints in Vertex AI are not configured for a specific language; they serve all languages the model supports, and language behavior is controlled via system instructions or prompt engineering, not endpoint configuration. Option B is wrong because a high temperature increases randomness in token selection but does not cause the model to ignore a language instruction; it would still attempt to follow the prompt, albeit with more creative or varied outputs, not systematically output English. Option D is wrong because maxOutputTokens limits the length of the response, not the language; if set too low, the model would produce a truncated translation, not switch to English.

Full explanation →

125

MCQeasy

A company uses a text-to-image model to generate marketing visuals. The outputs often contain distorted human faces. Which technique is most likely to improve face generation?

A.Fine-tune the model on a curated dataset of human faces

B.Increase the output resolution

C.Increase the number of inference steps

D.Reduce the classifier-free guidance scale

AnswerA

Fine-tuning specializes the model for better face generation.

Why this answer

Fine-tuning the model on a high-quality dataset of human faces directly addresses the distortion issue. Option B is wrong because increasing inference steps may improve image quality but not specifically faces. Option C is wrong because reducing CFG scale reduces adherence to the prompt, not face quality.

Option D is wrong because increasing image size might not fix distortion.

Full explanation →

126

MCQmedium

A data scientist is fine-tuning a large language model using Vertex AI. The training job fails with an out-of-memory error. Which action should they take to resolve this issue?

A.Change the accelerator to TPU

B.Use a larger model

C.Increase the batch size

D.Reduce the batch size

AnswerD

Smaller batch size reduces memory footprint.

Why this answer

Reducing batch size lowers memory consumption per step, directly addressing OOM. Option A is wrong because increasing batch size worsens memory usage. Option B is wrong because switching to a larger model increases memory.

Option D is wrong because TPUs also have memory limits; reducing batch size works on TPU as well.

Full explanation →

127

MCQeasy

Refer to the exhibit. A user receives this error when trying to get predictions from a Vertex AI endpoint. What is the most likely cause?

A.The endpoint does not exist

B.The endpoint is in a different region

C.The user lacks necessary IAM permissions

D.The model is not deployed

AnswerC

PERMISSION_DENIED indicates missing permissions.

Why this answer

Option B is correct because the error message explicitly says PERMISSION_DENIED, indicating lack of IAM permissions. Option A (endpoint does not exist) would give NOT_FOUND error. Option C (model not deployed) would give a different error.

Option D (different region) would also give a different error.

Full explanation →

128

MCQeasy

Refer to the exhibit. What access does the IAM policy grant to developer@example.com?

A.Ability to use Vertex AI models for prediction and view metadata.

B.No effective permissions because the role is incorrect.

C.Ability to deploy and manage models.

D.Full control over all Vertex AI resources.

AnswerA

roles/aiplatform.user grants permissions to predict, explain, and view resources.

Why this answer

Option B is correct because roles/aiplatform.user allows using models for predictions and viewing metadata, but not deploying or deleting models. Options A and C require higher roles.

Full explanation →

129

Multi-Selectmedium

Which TWO options are best practices for deploying generative AI models on Vertex AI? (Choose two.)

Select 2 answers

A.Disable logging to reduce cost

B.Enable automatic scaling to handle variable traffic

C.Use Vertex AI Model Monitoring to detect drift

D.Manually scale instances based on expected load

E.Serve the model directly without optimization

AnswersB, C

Automatic scaling adjusts resources based on demand.

Why this answer

Option B is correct because Vertex AI's automatic scaling dynamically adjusts the number of serving instances based on incoming traffic, ensuring low latency during spikes and cost efficiency during lulls. This is a best practice for production workloads where traffic patterns are unpredictable, as it eliminates the need for manual capacity planning.

Exam trap

Google Cloud often tests the misconception that manual scaling is more reliable or cost-effective than automatic scaling, but in cloud-native environments, automatic scaling is the standard best practice for variable workloads.

Full explanation →

130

MCQhard

A generative AI model for code generation sometimes produces syntactically incorrect code. The team wants to reduce syntax errors without retraining the entire model. Which approach is most effective?

A.Implement constrained decoding with grammar rules

B.Run a syntax checker after generation and regenerate

C.Add a system prompt that instructs the model to produce valid code

D.Increase beam search width

AnswerA

Constrained decoding ensures output respects syntax rules.

Why this answer

Constrained decoding with grammar rules directly enforces the syntax of the target programming language during token generation, preventing the model from producing invalid constructs. This approach modifies the decoding process (e.g., using a context-free grammar or a formal syntax specification) to mask or forbid tokens that would lead to a syntax error, without altering the underlying model weights. It is the most effective method because it guarantees syntactically correct output at generation time, rather than relying on post-hoc fixes or probabilistic adjustments.

Exam trap

The trap here is that candidates often choose a post-hoc correction method (Option B) or a prompt-based approach (Option C) because they seem simpler, but they fail to recognize that only a decoding-time constraint can guarantee syntactic validity without retraining, which is the core requirement of the question.

How to eliminate wrong answers

Option B is wrong because running a syntax checker after generation and regenerating is inefficient and does not prevent errors; it relies on trial-and-error, which can be costly and may still produce invalid code if the model repeatedly generates similar errors. Option C is wrong because adding a system prompt is a soft instruction that the model may not reliably follow, especially for complex or edge-case syntax rules, and it does not enforce constraints at the token level. Option D is wrong because increasing beam search width improves the diversity and likelihood of finding high-probability sequences but does not incorporate any syntactic constraints; it may still produce syntactically incorrect code if the highest-scoring beams violate grammar rules.

Full explanation →

131

MCQmedium

A healthcare startup wants to use generative AI to provide clinical decision support. They must minimize the risk of harmful hallucinations. Which business strategy is most appropriate?

A.Implement retrieval-augmented generation with meticulously curated medical literature.

B.Limit the model's output length to reduce hallucination risk.

C.Deploy a large general-purpose model and rely on post-processing filters.

D.Use a custom fine-tuned model on a proprietary medical dataset.

AnswerA

RAG uses retrieved, vetted documents to generate answers, significantly reducing hallucinations by grounding responses in authoritative sources.

Why this answer

Retrieval-augmented generation (RAG) grounds the model's output in a trusted, external knowledge base—here, curated medical literature—which directly reduces the risk of hallucination by forcing the model to cite or derive answers from verified sources. This is the most effective strategy for clinical decision support because it combines generative flexibility with factual accuracy, unlike methods that only limit output or rely on post-hoc filtering.

Exam trap

Google Cloud often tests the misconception that fine-tuning alone is sufficient for domain-specific accuracy, when in fact RAG is superior for reducing hallucinations because it provides dynamic, verifiable grounding rather than static memorization.

How to eliminate wrong answers

Option B is wrong because limiting output length does not address the root cause of hallucinations; a short response can still be factually incorrect or harmful. Option C is wrong because post-processing filters are reactive and cannot reliably catch subtle or context-dependent hallucinations in a high-stakes medical domain, and large general-purpose models lack domain-specific grounding. Option D is wrong because a custom fine-tuned model on a proprietary dataset may still hallucinate if the dataset is incomplete, biased, or not rigorously curated, and fine-tuning does not inherently provide a retrieval mechanism to verify facts against authoritative sources.

Full explanation →

132

MCQeasy

A company is using a generative AI model to generate product descriptions. They notice the outputs often include factual inaccuracies about product specifications. Which technique would best address this issue without modifying the model's architecture?

A.Implement a Retrieval-Augmented Generation (RAG) pipeline that retrieves product specs from a database

B.Decrease the temperature parameter to 0.1

C.Increase the max output tokens to 1024

D.Use few-shot prompting with 5 examples of correct descriptions

AnswerA

RAG grounds generation in retrieved relevant documents, improving factual accuracy.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct technique because it grounds the model's output in factual, up-to-date product specifications retrieved from an external database. This directly addresses factual inaccuracies without modifying the model's architecture, as the model generates text based on retrieved context rather than relying solely on its parametric knowledge.

Exam trap

Google Cloud often tests the misconception that adjusting generation parameters (like temperature or token limits) or providing examples can fix factual accuracy, when in fact only retrieval-augmented methods or fine-tuning on verified data can correct hallucinations without changing the model architecture.

How to eliminate wrong answers

Option B is wrong because decreasing the temperature parameter to 0.1 makes the model more deterministic and reduces randomness, but it does not provide any factual grounding; it can still hallucinate incorrect specifications. Option C is wrong because increasing max output tokens only allows longer generations and does not improve factual accuracy; it may even increase the chance of errors. Option D is wrong because few-shot prompting with examples can guide the style and format but cannot supply specific, dynamic product specs; the model may still invent details not present in the examples.

Full explanation →

133

MCQmedium

A large enterprise is evaluating gen AI for internal knowledge management. They need to ensure accuracy and reduce hallucinations. Which strategy is most effective?

A.Fine-tune a model on domain-specific data

B.Increase model temperature

C.Use Retrieval-Augmented Generation (RAG)

D.Use a larger model without customization

AnswerC

RAG retrieves relevant documents and conditions the model on them, dramatically reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the most effective strategy because it grounds the model's responses in an external, authoritative knowledge base, retrieving relevant documents at inference time to provide factual context. This directly reduces hallucinations by ensuring the generated output is based on retrieved evidence rather than relying solely on the model's parametric memory, which is critical for enterprise knowledge management where accuracy is paramount.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the universal solution for domain adaptation, but the trap here is that fine-tuning does not provide a dynamic, verifiable knowledge source, whereas RAG explicitly decouples knowledge storage from generation, enabling real-time updates and source attribution.

How to eliminate wrong answers

Option A is wrong because fine-tuning on domain-specific data embeds knowledge into the model's weights, which can still lead to hallucinations when the model encounters novel or edge-case queries, and it does not provide a mechanism to cite or verify the source of information. Option B is wrong because increasing model temperature introduces randomness into token selection, which amplifies hallucinations and reduces the determinism required for accurate knowledge retrieval. Option D is wrong because using a larger model without customization does not address the root cause of hallucinations; larger models still rely on parametric memory and can fabricate information, especially for niche or proprietary enterprise data.

Full explanation →

134

MCQmedium

Refer to the exhibit. The endpoint is experiencing high latency during traffic spikes. The team wants to improve response time by reducing queueing. Which change to the configuration would be most effective?

A.Decrease minReplicaCount to 0

B.Change the model version to '2'

C.Decrease the target value in autoscaling metric to 50

D.Increase maxReplicaCount to 10

AnswerD

More replicas handle higher load.

Why this answer

Increasing maxReplicaCount to 10 allows the autoscaler to provision more replicas during traffic spikes, distributing the incoming requests across additional endpoints. This directly reduces queueing at each replica because the load is spread over more instances, lowering per-instance latency. The change targets the root cause—insufficient capacity to handle peak load—rather than adjusting thresholds or model versions.

Exam trap

Google Cloud often tests the misconception that lowering the autoscaling target metric (Option C) is the primary fix for high latency, when in fact the maxReplicaCount ceiling is the bottleneck that must be raised to allow sufficient capacity during spikes.

How to eliminate wrong answers

Option A is wrong because decreasing minReplicaCount to 0 would cause the endpoint to scale down to zero replicas during idle periods, leading to cold starts and increased latency when traffic spikes, which worsens queueing. Option B is wrong because changing the model version to '2' does not affect the number of replicas or queueing behavior; it only changes the model artifact, which may have different inference latency but does not address scaling capacity. Option C is wrong because decreasing the target value in the autoscaling metric (e.g., CPU utilization or requests per replica) would cause the autoscaler to add replicas sooner, but without increasing maxReplicaCount, the endpoint may still hit the upper limit and queue requests; the target value adjustment alone does not provide additional capacity during extreme spikes.

Full explanation →

135

MCQeasy

A startup wants to generate product descriptions from a few keywords using a large language model. They have no prior ML experience and need the fastest time-to-market. Which Google Cloud service should they use?

A.Vertex AI Studio

B.Vertex AI Workbench with custom training

C.Vertex AI Agent Builder

D.Vertex AI Model Garden

AnswerA

No-code prompt engineering and testing.

Why this answer

Vertex AI Studio provides a no-code/low-code environment with pre-trained foundation models and prompt templates, enabling rapid generation of product descriptions from keywords without any ML expertise. It offers the fastest time-to-market because it eliminates the need for custom model training, infrastructure setup, or coding, directly leveraging Google's generative AI capabilities through a simple interface.

Exam trap

The trap here is that candidates might confuse Vertex AI Studio with Vertex AI Model Garden, thinking Model Garden offers a faster path because it lists models, but Model Garden still requires deployment and configuration steps, whereas Studio provides immediate generation capabilities.

How to eliminate wrong answers

Option B is wrong because Vertex AI Workbench with custom training requires writing code, selecting models, and managing training jobs, which demands ML experience and significantly increases time-to-market compared to using a pre-built solution. Option C is wrong because Vertex AI Agent Builder is designed for creating conversational agents and chatbots, not for generating product descriptions from keywords; it adds unnecessary complexity and overhead for this simple text generation task. Option D is wrong because Vertex AI Model Garden is a repository of pre-trained models that still requires users to select, deploy, and potentially fine-tune models, which involves ML knowledge and setup time, not offering the fastest path for a non-ML team.

Full explanation →

136

MCQhard

A law firm uses a generative model to analyze contracts and extract key clauses. The model often outputs irrelevant clauses or misses important ones. They want to improve the relevance of the outputs without retraining the entire model. Which approach is best?

A.Increase the input token limit to provide the entire contract in the prompt.

B.Decrease the temperature to make outputs more deterministic.

C.Implement Retrieval-Augmented Generation (RAG) with a curated legal clause database and a reranker to select the most on-topic passages.

D.Fine-tune the base model on a labeled dataset of contract-clause pairs.

AnswerC

RAG supplies the model with relevant context, and reranking refines the selection, directly boosting relevance.

Why this answer

Option C is correct because RAG with a legal corpus retrieves clause-specific paragraphs, and reranking prioritizes relevant content, improving precision. Option A is wrong because temperature adjustment does not improve relevance. Option B is wrong because increasing context length may dilute focus.

Option D is wrong because fine-tuning requires significant data and resources.

Full explanation →

137

Multi-Selecteasy

Which TWO factors are most important when choosing a base foundation model for fine-tuning on a domain-specific task?

Select 2 answers

A.Model size and architecture

B.Model popularity in the developer community

C.Relevance of the model's training data to the target domain

D.Model license (open-source vs. proprietary)

E.Inference latency of the base model

AnswersA, C

Larger models may have better performance but higher cost; architecture affects fine-tuning ease.

Why this answer

Options A and D are correct. Model size/architecture affects capability and cost; training data relevance ensures domain knowledge transfer. Option B (model license) is less critical for fine-tuning feasibility.

Option C (popularity) is not a technical factor. Option E (inference latency) can be optimized post-fine-tuning, but choice of base model matters less.

Full explanation →

138

MCQmedium

A marketing team wants to use Vertex AI to generate ad copy. They need the model to follow a specific tone and style. What is the best approach?

A.Use Vertex AI Grounding to retrieve style guides

B.Provide few-shot examples in the prompt and adjust temperature

C.Fine-tune the model on a dataset of past ad copy

D.Enable safety filters to enforce brand guidelines

AnswerB

Few-shot prompting can guide tone effectively.

Why this answer

Option A is correct because prompt engineering with examples is the most flexible and quick method. Option B is wrong because fine-tuning requires large labeled datasets. Option C is wrong because grounding retrieves external info, not tone.

Option D is wrong because safety filters do not control style.

Full explanation →

139

MCQmedium

A healthcare company is building a chatbot to answer patient queries based on their medical documents stored in Cloud Storage. They want to minimize latency and ensure data residency in the EU. Which Vertex AI service should they use?

A.Vertex AI Model Garden with fine-tuning

B.Vertex AI Search with document grounding

C.Vertex AI Agent Builder with web search

D.Vertex AI Codey APIs

AnswerB

Supports private document indexing and data residency controls.

Why this answer

Vertex AI Search with document grounding is correct because it allows the chatbot to ground responses in the customer's own medical documents stored in Cloud Storage, ensuring low latency through optimized indexing and retrieval, while supporting data residency controls to keep data within the EU. This service is specifically designed for enterprise search and Q&A over private document repositories, making it ideal for healthcare use cases requiring compliance and fast responses.

Exam trap

The trap here is that candidates may confuse Vertex AI Search (which grounds in private documents) with Vertex AI Agent Builder (which defaults to web search), or assume fine-tuning is necessary for domain-specific Q&A when retrieval-augmented generation (RAG) with document grounding is the correct approach for minimizing latency and ensuring data residency.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Garden with fine-tuning is intended for selecting and customizing foundation models, not for directly grounding answers in specific documents; it would require additional retrieval infrastructure and does not natively enforce data residency. Option C is wrong because Vertex AI Agent Builder with web search grounds responses in public web data, not private medical documents, and cannot guarantee data residency in the EU. Option D is wrong because Vertex AI Codey APIs are specialized for code generation and completion, not for answering queries based on document content.

Full explanation →

140

MCQhard

A healthcare startup is using Vertex AI Imagen to generate synthetic medical images for training a diagnostic model. The images must comply with HIPAA regulations and cannot contain any real patient data. The team fine-tuned Imagen on a dataset of de-identified medical scans. However, during testing, they notice that some generated images closely resemble specific patients from the original dataset, even though the dataset was de-identified. They suspect that the model memorized some training examples. The team needs to address this issue without losing image quality. They have access to the original training data and Vertex AI tools. What action should they take?

A.Use a post-processing step to blur or distort generated images.

B.Re-tune the model using differential privacy (DP-SGD) to prevent memorization of individual examples.

C.Increase the size of the training dataset by adding more synthetic images.

D.Apply stricter output safety filters to block images that look like any known patient.

AnswerB

Differential privacy limits what the model can learn about specific training examples, reducing memorization.

Why this answer

Option D is correct. Differential privacy during fine-tuning adds noise to prevent memorization. Option A (more data) might not help if the model is overfitting.

Option B (stronger safety filters) won't prevent uniqueness recall. Option C (post-processing) only alters output after generation, doesn't fix memorization.

Full explanation →

141

Multi-Selectmedium

Which TWO features are available in Vertex AI Agent Builder to enhance the conversational abilities of an agent? (Choose TWO.)

Select 2 answers

A.Slot filling

B.Sentiment analysis

C.Code execution

D.Intent matching

E.Knowledge base integration

AnswersA, D

Slot filling collects required parameters from user input.

Why this answer

Slot filling is correct because it allows the agent to collect required parameters (slots) from the user during a conversation, enabling multi-turn interactions to fulfill complex requests. In Vertex AI Agent Builder, slot filling is a core feature for conversational agents, as it systematically prompts for missing information (e.g., date, location) until all necessary slots are filled, enhancing the agent's ability to handle dynamic user inputs.

Exam trap

The trap here is that candidates often confuse 'knowledge base integration' as a core conversational feature, but it is actually a retrieval-augmented generation (RAG) capability for grounding, not a direct mechanism for managing dialogue flow like slot filling or intent matching.

Full explanation →

142

MCQmedium

A retail company uses the Vertex AI Gemini API to generate product descriptions. Recently, the model started producing factually incorrect statements about product specifications, such as wrong dimensions and materials. Which strategy should be implemented to improve factual accuracy?

A.Enable model versioning to automatically roll back to a previous version

B.Fine-tune the model on a dataset of product images and descriptions

C.Increase the temperature parameter to 0.9

D.Use grounding with Vertex AI Search to retrieve verified product data

AnswerD

Grounding the model on authoritative sources improves factual accuracy by providing context from the company's knowledge base.

Why this answer

Option B is correct because grounding with Vertex AI Search retrieves authoritative information from the company's knowledge base, reducing hallucinations. Option A (increasing temperature) would increase randomness, worsening accuracy. Option C (fine-tuning on product images) does not address factual text inaccuracies.

Option D (enabling model versioning) helps with version control but not with correctness of responses.

Full explanation →

143

MCQeasy

A company wants to measure the business impact of a GenAI content generation tool. Which metric is most appropriate?

A.Reduction in content production time

B.Number of model parameters

C.Model accuracy on a test set

D.Training loss

AnswerA

This metric directly measures the business value of automation and efficiency.

Why this answer

Option A is correct because the primary business impact of a GenAI content generation tool is operational efficiency, measured by the reduction in content production time. This metric directly correlates to cost savings and faster time-to-market, which are key business outcomes. Unlike technical metrics, it reflects real-world value delivery.

Exam trap

Google Cloud often tests the confusion between technical performance metrics (e.g., accuracy, loss) and business impact metrics (e.g., time savings, cost reduction), leading candidates to select a technically impressive but irrelevant option like model parameters or accuracy.

How to eliminate wrong answers

Option B is wrong because the number of model parameters is a model architecture metric, not a business impact metric; it does not measure how the tool affects content production workflows or ROI. Option C is wrong because model accuracy on a test set evaluates technical performance on a static dataset, not the tool's effectiveness in a dynamic business environment where content quality and relevance vary. Option D is wrong because training loss is a training-phase optimization metric that indicates model convergence, not post-deployment business outcomes like productivity gains.

Full explanation →

144

MCQeasy

A startup wants to leverage Google Cloud's generative AI but has limited ML expertise. Which Google Cloud service allows them to build generative AI applications without deep ML knowledge?

A.Vertex AI Generative AI Studio

B.Cloud TPU

C.TensorFlow

D.Apigee

AnswerA

Design and deploy generative AI apps without coding.

Why this answer

Vertex AI Generative AI Studio is a managed service that provides a low-code/no-code interface for building, testing, and deploying generative AI applications using pre-trained foundation models. It abstracts away the complexities of model training, infrastructure management, and ML pipeline orchestration, enabling teams with limited ML expertise to leverage generative AI capabilities through simple prompts and visual workflows.

Exam trap

The trap here is that candidates confuse infrastructure-level services (Cloud TPU) or developer tools (TensorFlow) with managed application-building platforms, assuming that any ML-related Google Cloud service can be used without expertise, when in fact only Vertex AI Generative AI Studio provides the necessary abstraction for non-ML practitioners.

How to eliminate wrong answers

Option B (Cloud TPU) is wrong because Cloud TPUs are specialized hardware accelerators designed for training and running large-scale ML models, requiring deep expertise in distributed computing, model optimization, and TensorFlow/PyTorch programming — not a service for building generative AI applications without ML knowledge. Option C (TensorFlow) is wrong because TensorFlow is an open-source ML framework that requires programming skills to define, train, and deploy models; it does not provide a managed, no-code interface for generative AI application development. Option D (Apigee) is wrong because Apigee is an API management platform focused on securing, scaling, and analyzing API traffic, not a service for building or deploying generative AI models or applications.

Full explanation →

145

MCQmedium

A company wants to generate images from text descriptions. Which model in Vertex AI Model Garden should they use?

A.Chirp

B.Codey

C.PaLM 2

D.Imagen

AnswerD

Imagen generates images from text.

Why this answer

Imagen is the correct choice because it is a text-to-image model in Vertex AI Model Garden specifically designed to generate high-fidelity images from natural language descriptions. Unlike the other options, Imagen uses a diffusion-based architecture to create photorealistic visuals, making it the only option that directly addresses the requirement of generating images from text.

Exam trap

The trap here is that candidates may confuse PaLM 2's multimodal capabilities (which include image understanding but not generation) with Imagen's generative ability, leading them to incorrectly select PaLM 2 for text-to-image tasks.

How to eliminate wrong answers

Option A is wrong because Chirp is a speech-to-text and text-to-speech model in Vertex AI, focused on audio processing, not image generation. Option B is wrong because Codey is a code generation model specialized in writing and completing code, not generating images from text. Option C is wrong because PaLM 2 is a large language model for text generation, reasoning, and chat, but it does not have the capability to produce visual outputs like images.

Full explanation →

146

MCQmedium

A retail company is building a chatbot for customer service. They need the model to generate product descriptions based on a catalog but also answer questions about store policies. The team wants to minimize latency and cost while maintaining high accuracy. Which Google Cloud generative AI offering should they use?

A.Vertex AI Model Garden with PaLM 2

B.Vertex AI Imagen

C.Vertex AI Codey APIs

D.Vertex AI Gemini API

AnswerD

Gemini offers multimodal capabilities and is optimized for both text generation and comprehension tasks.

Why this answer

Option B is correct because Gemini is a multimodal model that can handle both product descriptions (text) and policy questions, and Vertex AI offers optimized inference for latency and cost. Option A (Imagen) is for image generation, not text. Option C (Codey) is for code generation.

Option D (PaLM 2 via Model Garden) is possible but Gemini is more modern and efficient.

Full explanation →

147

MCQhard

A team has developed a generative AI model for real-time translation. The evaluation metrics and business requirements are shown. Which business decision is most appropriate given the trade-offs?

A.Accept the model as-is because all other metrics are within limits.

B.Optimize the model for cost efficiency, even if accuracy drops slightly to 90%.

C.Prioritize latency reduction even if it increases cost.

D.Reduce accuracy to 85% to achieve both latency and cost targets.

AnswerB

Cost is the only metric out of range; minor accuracy loss is acceptable.

Why this answer

Option B is correct because the business requirements prioritize cost efficiency as the primary constraint, and the model currently exceeds the cost target. A slight accuracy drop to 90% (still within acceptable limits) allows cost to be reduced, aligning with the core business goal. The trade-off is acceptable since latency and other metrics remain within bounds, and accuracy at 90% still meets the minimum threshold for real-time translation quality.

Exam trap

Google Cloud often tests the ability to prioritize business constraints over model perfection, and the trap here is assuming that accuracy must be preserved at all costs, when in fact cost efficiency is the binding requirement and a small accuracy trade-off is acceptable.

How to eliminate wrong answers

Option A is wrong because accepting the model as-is ignores the fact that cost exceeds the business requirement, which is a critical failure for deployment at scale. Option C is wrong because prioritizing latency reduction, even if it increases cost, directly violates the cost constraint and does not address the primary business need for cost efficiency. Option D is wrong because reducing accuracy to 85% is unnecessary; the cost target can likely be met with a smaller accuracy drop (e.g., to 90%), and 85% may fall below the acceptable quality threshold for real-time translation, risking user trust.

Full explanation →

148

MCQmedium

A company is using Vertex AI to generate marketing copy. They notice that the output sometimes contains factual inaccuracies. Which parameter adjustment is most likely to improve factual accuracy?

A.Decrease the temperature parameter.

B.Increase the max_output_tokens parameter.

C.Increase the top_p parameter.

D.Add a post-processing step to verify facts using a database.

AnswerA

Lower temperature reduces randomness, making output more factual.

Why this answer

Decreasing the temperature parameter reduces the randomness of the model's output, making it more deterministic and less likely to generate creative but factually incorrect content. Lower temperature (e.g., 0.1) forces the model to choose higher-probability tokens, which aligns with more factual and consistent responses, especially in tasks like marketing copy where accuracy is critical.

Exam trap

Google Cloud often tests the misconception that increasing output length or diversity (via top_p or max_tokens) improves quality, when in fact these parameters increase randomness and the likelihood of hallucination, whereas lowering temperature is the direct lever for factual accuracy.

How to eliminate wrong answers

Option B is wrong because increasing max_output_tokens only extends the length of the generated text, which can increase the chance of hallucinations as the model continues generating beyond its reliable context window; it does not improve factual accuracy. Option C is wrong because increasing top_p (nucleus sampling) allows the model to consider a larger set of probable tokens, increasing diversity and randomness, which can worsen factual inaccuracies rather than reduce them. Option D is wrong because adding a post-processing step to verify facts using a database is a valid engineering solution but is not a parameter adjustment of the generative model itself; the question specifically asks for a parameter adjustment, and this option represents a workflow change, not a model parameter.

Full explanation →

149

MCQhard

An organization is deploying a summarization model on Vertex AI and needs to ensure that the model's responses are consistent and avoid hallucinations. They have a labeled dataset of source documents and human-written summaries. Which approach would best align the model with their quality requirements?

A.Deploy the model with a larger max_output_tokens

B.Use prompt engineering with few-shot examples

C.Increase the temperature to 0.9

D.Perform supervised fine-tuning using their labeled dataset

AnswerD

Fine-tuning adapts the model to the specific summarization style and reduces errors.

Why this answer

Supervised fine-tuning on a high-quality dataset specific to the task reduces hallucinations and improves consistency.

Full explanation →

150

MCQhard

Refer to the exhibit. A team's IAM policy for Vertex AI includes the following binding. They can deploy models but cannot create tuning jobs. Which statement is true?

A.The developer needs the aiplatform.admin role

B.The aiplatform.user role overrides the modelUser role

C.The aiplatform.user role lacks permission to create tuning jobs

D.The policy is missing the aiplatform.specialist role

AnswerC

Missing aiplatform.tuningJobs.create permission.

Why this answer

The roles/aiplatform.user role does not include permission to create tuning jobs (aiplatform.tuningJobs.create). The modelUser role does not override the user role, admin role is not needed, and specialist role doesn't exist.

Full explanation →

Page 2 of 7

All pages

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output

See all domains with question counts →