Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 451–500

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 7 of 7

451

MCQmedium

A financial services firm is using a foundation model on Vertex AI to generate investment summaries from quarterly reports. The summaries are accurate but often miss key financial metrics and trends. The team cannot afford to fine-tune the model frequently. Which technique should they use to improve the completeness and relevance of the summaries without modifying the model?

A.Increase temperature to 0.9 to encourage more creative outputs.

B.Provide three few-shot examples in the prompt that highlight the desired metrics.

C.Set stop sequences to [' '] to ensure the model finishes each paragraph.

D.Lower top_p to 0.5 to reduce the sampling pool.

AnswerB

Few-shot examples condition the model to replicate the structure and content of the examples.

Why this answer

Option C is correct because adding few-shot examples that specifically include the desired metrics (e.g., revenue growth, profit margins) trains the model to include those details. Option A is wrong because increasing temperature increases randomness, which could omit key facts. Option B is wrong because stopping at newlines doesn't guarantee completeness.

Option D is wrong because adjusting top_p does not target completeness.

Full explanation →

452

Multi-Selectmedium

A healthcare chatbot must avoid hallucinations. Which TWO techniques should the team implement? (Choose two.)

Select 2 answers

A.Set frequency penalty to 0.0

B.Use chain-of-thought prompting

C.Use higher temperature

D.Increase top_k to 50

E.Enable grounding with a knowledge base

AnswersB, E

Encourages step-by-step reasoning, reducing errors.

Why this answer

Grounding with a knowledge base ensures responses are based on retrieved facts, and chain-of-thought prompting improves reasoning steps. Higher temperature increases randomness, top_k increases diversity, and frequency penalty affects repetition, not hallucinations.

Full explanation →

453

MCQmedium

A developer is using the Vertex AI Gemini API to generate product descriptions. They get a 400 error 'INVALID_ARGUMENT: The model's maximum input token limit is 8192.' What is the most likely issue?

A.The prompt is too long

B.The API key is invalid

C.The output tokens are too high

D.The model is not available in the region

AnswerA

The error explicitly states the input token limit is exceeded.

Why this answer

Option A is correct because the error indicates the input prompt exceeds the token limit. Option B is wrong because the error says 'input token limit', not output. Option C is wrong because an invalid API key would give a different error (e.g., PERMISSION_DENIED).

Option D is wrong because model availability would give a different error.

Full explanation →

454

MCQeasy

A company is choosing between Google's Gemini API and an open-source model. Which factor is most important for a business with limited ML expertise?

A.Ease of integration and availability of support

B.Model parameter count

C.Cost per token

D.Community size

AnswerA

Limited ML expertise means the team needs a solution that is easy to integrate and comes with reliable support.

Why this answer

Option B is correct because ease of integration and support reduces the need for in-house ML expertise. Option A (cost) is important but secondary to feasibility. Option C (parameter count) is not relevant to ease of use.

Option D (community size) is helpful but not as critical as managed support.

Full explanation →

455

MCQhard

What is the most likely cause of the error?

A.The predict schema must be stored in the same bucket as the model artifacts and referenced without the full gs:// URI

B.The display name contains a hyphen which is not allowed

C.The container image URI is incorrect

D.The region us-central1 does not support TensorFlow models

AnswerA

The schema should be a relative path within the artifact URI.

Why this answer

The error occurs because the Vertex AI Predict schema must be stored in the same Cloud Storage bucket as the model artifacts, and when referenced in the model upload request, it should use a relative path (without the full `gs://` URI). Using the full URI causes a parsing failure, as Vertex AI expects the schema to be co-located with the model artifacts for validation and deployment.

Exam trap

Google Cloud often tests the nuance that Vertex AI expects schema files to be co-located with model artifacts and referenced without the full `gs://` URI, causing candidates to incorrectly assume the error is due to region limitations or container image issues.

How to eliminate wrong answers

Option B is wrong because hyphens are allowed in display names for Vertex AI models; the constraint is on the model ID (auto-generated) and display names can contain hyphens, underscores, and alphanumeric characters. Option C is wrong because the container image URI is syntactically correct and points to a valid Vertex AI pre-built serving image for TensorFlow; the error is not related to the container URI format. Option D is wrong because us-central1 fully supports TensorFlow models on Vertex AI; it is one of the primary regions for AI Platform and Vertex AI model deployment.

Full explanation →

456

MCQhard

A team configures a Vertex AI prediction request as shown. Users report that the model sometimes produces incoherent or off-topic responses despite moderate settings. What is the most likely cause?

A.The temperature is too high for coherent responses.

B.The maxOutputTokens is too low.

C.The safety threshold blocks too much content.

D.The topK value is too low.

AnswerA

High temperature introduces randomness, reducing coherence.

Why this answer

The combination of high temperature (0.9) and high topK (40) increases randomness and diversity, leading to incoherent outputs. Lowering temperature and topK can improve coherence. Safety settings are unrelated, and maxOutputTokens is a limit, not a cause of incoherence.

Full explanation →

457

MCQhard

A healthcare company is using a fine-tuned version of PaLM 2 on Vertex AI to generate clinical notes from doctor-patient conversations. The model was fine-tuned on a dataset of 10,000 de-identified transcripts and corresponding notes. During testing, the generated notes are grammatically correct and well-structured, but they often contain subtle inaccuracies: for example, they might mention a medication that was not discussed, or omit a key symptom. The team has already tried increasing the training epochs and adjusting learning rates, with minimal improvement. They need a solution that can be implemented quickly to improve factual accuracy without retraining the entire model. The team has access to a large archive of verified clinical notes and a small set of recent conversation-to-note pairs that have been manually reviewed and corrected. The inference pipeline currently uses a single call to the model with the conversation transcript as input. What should the team do?

A.Implement retrieval-augmented generation (RAG) by retrieving similar verified notes from the archive and providing them as context in the prompt.

B.Decrease the temperature to 0.1 to reduce randomness and force the model to stick to the input.

C.Use prompt engineering to instruct the model to only include information explicitly mentioned in the conversation.

D.Add a human-in-the-loop step to review and correct every generated note before use.

AnswerA

RAG grounds the generation in factual examples, directly reducing inaccuracies without retraining.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) directly addresses the core issue of factual inaccuracy without retraining. By retrieving verified clinical notes similar to the current conversation from the archive and injecting them as context in the prompt, the model gains access to ground-truth examples that anchor its output to factual details. This approach leverages the team's existing archive and small set of corrected pairs to provide relevant, accurate context, improving precision without modifying the model's weights.

Exam trap

The trap here is that candidates often assume factual inaccuracy is solely a randomness issue (temperature) or a prompt instruction problem, overlooking that the model's parametric knowledge is insufficient and needs external grounding via retrieval augmentation.

How to eliminate wrong answers

Option B is wrong because decreasing temperature to 0.1 reduces randomness but does not fix factual inaccuracies stemming from the model's training data or lack of context; it may actually cause the model to become overly deterministic and repeat hallucinations from its fine-tuning. Option C is wrong because prompt engineering to instruct the model to only include explicitly mentioned information is a superficial fix that cannot overcome the model's tendency to hallucinate or omit details when the training data or fine-tuning process has embedded those inaccuracies; it lacks the grounding provided by external verified data. Option D is wrong because adding a human-in-the-loop step to review every note is a manual, non-scalable solution that does not improve the model's output quality at inference time and fails to address the root cause of factual inaccuracy; it also contradicts the requirement for a quick implementation without retraining.

Full explanation →

458

MCQeasy

A startup wants to use a pre-trained model to generate product descriptions without training. Which Google Cloud service should they use?

A.Vertex AI Prediction

B.AI Platform Training

C.Cloud AutoML

D.Vertex AI Generative AI Studio

AnswerD

Vertex AI Generative AI Studio is designed for accessing and experimenting with foundation models for generative tasks.

Why this answer

Vertex AI Generative AI Studio provides access to pre-trained foundation models like Gemini for text generation via a user interface and API, making it the easiest choice for generating product descriptions without training.

Full explanation →

459

Multi-Selecthard

A financial services firm must comply with regulations when using gen AI. Which two measures are critical?

Select 2 answers

A.Implement audit trails

B.Deploy without risk assessment

C.Use a closed-source model

D.Use explainable AI

E.Use only synthetic data

AnswersA, D

Audit trails provide accountability and support regulatory reviews.

Why this answer

Audit trails are critical for compliance because they provide a tamper-evident, chronological record of all AI model inputs, outputs, and decisions. This enables firms to demonstrate regulatory adherence (e.g., under GDPR or SOX) by reconstructing the exact sequence of events that led to a specific AI-generated output, which is essential for accountability and forensic review.

Exam trap

Google Cloud often tests the misconception that 'closed-source models are inherently more compliant' or that 'synthetic data eliminates privacy risks,' when in reality, compliance hinges on transparency, auditability, and risk assessment rather than the model's source or data origin.

Full explanation →

460

MCQmedium

A data scientist is using Vertex AI Model-as-a-Service (MaaS) to deploy a fine-tuned open-source model. They notice high latency during inference. What is the most likely cause?

A.The model is too large for the hardware

B.The endpoint is set to autoscaling with a low minimum node count

C.The model is not quantized

D.The region is incorrect

AnswerB

Autoscaling with low min nodes causes cold start latency.

Why this answer

Option C is correct because a low minimum node count in autoscaling can cause cold starts and high latency. Option A is wrong because model size is managed by MaaS and typically handled. Option B is wrong because quantization affects model size and speed, but the issue is more likely autoscaling.

Option D is wrong because region does not directly cause latency spikes.

Full explanation →

461

MCQmedium

Refer to the exhibit. A developer runs this command. What is the primary purpose?

A.Create a training pipeline

B.Deploy a model to an endpoint

C.Train a model

D.Upload a model artifact to Model Registry

AnswerD

The 'models upload' command registers a model with the specified container and artifacts.

Why this answer

The command shown in the exhibit is `az ml model create --name my-model --path ./model.pkl --registry-name myregistry`. This command uploads a local model artifact (model.pkl) to the Azure Machine Learning Model Registry, which is a centralized repository for versioning and managing trained models. It does not initiate training, deployment, or pipeline creation; its sole purpose is to register the model artifact for later use.

Exam trap

Google Cloud often tests the distinction between model registration (uploading a trained artifact) and model training or deployment, so the trap here is that candidates confuse the `az ml model create` command with initiating a training job or deployment, when it only stores the model artifact for versioning and reuse.

How to eliminate wrong answers

Option A is wrong because creating a training pipeline requires a command like `az ml job create` or `az ml pipeline create`, not `az ml model create`. Option B is wrong because deploying a model to an endpoint uses commands such as `az ml online-endpoint create` and `az ml online-deployment create`, which involve specifying compute targets and scoring scripts, not just uploading a model file. Option C is wrong because training a model is performed via a training job (e.g., `az ml job create` with a training script and compute target), not by registering an already-trained artifact.

Full explanation →

462

MCQhard

A media company is using Vertex AI Imagen to generate marketing images. The output frequently contains unrealistic artifacts, especially in human faces. The team has fine-tuned the model using their brand assets. What is the most likely cause and recommended fix?

A.Safety filters are too aggressive; reduce them.

B.Negative prompts are missing; always include 'unrealistic'.

C.The fine-tuning dataset is too small or too homogeneous; augment and diversify the training data.

D.Inference steps are too low; increase to 100.

AnswerC

Overfitting to limited data causes artifacts; more varied data helps generalization.

Why this answer

Option D is correct. Overfitting to a small dataset can cause artifacts. Common symptoms include unrealistic details.

Option A is less likely because safety filters usually block rather than degrade quality. Option B (low inference steps) could cause quality loss but typically not specific artifacts. Option C (negative prompt) might help but is not the root cause.

Full explanation →

463

MCQmedium

A legal firm uses a generative AI to draft contracts. They want the output to follow a specific clause structure. Which technique should they use in the prompt?

A.Include a system instruction that defines the required format.

B.Increase temperature to encourage variance.

C.Use grounding to pull from a database of contracts.

D.Set stop sequences to end generation at certain points.

AnswerA

System instructions can specify structure, ensuring adherence.

Why this answer

System instructions set overarching rules for output structure. Option A is wrong because temperature doesn't enforce structure. Option C is wrong because grounding retrieves facts, not structure.

Option D is wrong because adjusting stop sequences only ends generation.

Full explanation →

464

MCQmedium

A marketing agency uses a generative AI model to create slogans for ad campaigns. The model outputs generic slogans like 'Quality you can trust' that lack originality. The agency has a library of past award-winning slogans and wants to generate more creative and brand-specific outputs. They have a requirement that the model must not produce slogans longer than 15 words. Which technique should they prioritize?

A.Use few-shot prompting with 3-5 examples of award-winning slogans in the prompt.

B.Set max tokens to 15 to force shorter, potentially more punchy slogans.

C.Increase the temperature to 1.2 to encourage more creative word combinations.

D.Fine-tune the model on the library of award-winning slogans.

AnswerA

Few-shot examples teach the desired style and creativity directly.

Why this answer

Option C is correct because providing few-shot examples of award-winning slogans in the prompt directly inspires creativity and style matching. Option A is wrong because increasing temperature may produce nonsense. Option B is wrong because fine-tuning is heavy and not needed.

Option D is wrong because token limit only truncates length, doesn't improve creativity.

Full explanation →

465

MCQmedium

A healthcare organization wants to use generative AI for medical report summaries. What is the primary concern?

A.Ensuring HIPAA compliance and data security when using cloud AI services

B.The model's ability to generate fluent and coherent summaries

C.Minimizing the cost of each API call to stay within budget

D.Latency of responses for real-time use cases

AnswerA

Generative AI models processing PHI must be HIPAA-compliant, requiring a signed Business Associate Agreement (BAA) with Google Cloud.

Why this answer

The primary concern for a healthcare organization using generative AI for medical report summaries is ensuring HIPAA compliance and data security when using cloud AI services. Medical data is protected health information (PHI), and any cloud-based AI service must have a Business Associate Agreement (BAA) in place and enforce encryption at rest and in transit to avoid regulatory penalties and data breaches.

Exam trap

Google Cloud often tests the misconception that technical performance (fluency, cost, latency) is the top priority, when in regulated industries like healthcare, compliance and data security are the non-negotiable primary concerns.

How to eliminate wrong answers

Option B is wrong because while fluency and coherence are important for summary quality, they are secondary to the legal and security obligations of handling PHI; a fluent summary that leaks data is non-compliant. Option C is wrong because cost minimization is an operational concern, not the primary risk; HIPAA violations carry fines up to $50,000 per violation, far outweighing API call costs. Option D is wrong because latency is a performance metric relevant for real-time use, but medical report summarization is typically asynchronous or batch-processed, and compliance takes precedence over speed.

Full explanation →

466

MCQhard

A team is deploying a real-time chat application using Gemini. They need to ensure the model does not generate harmful content. Which safety filter configuration should they use?

A.Set safety_threshold high for all categories

B.Use grounding with a safe knowledge base

C.Implement custom safety attribute filters with low thresholds

D.Fine-tune the model with safe examples

AnswerA

High thresholds block more harmful content, providing stricter safety.

Why this answer

Option D is correct because setting high safety thresholds for all categories blocks more harmful content. Option A is wrong because model tuning is about task adaptation, not safety. Option B is wrong because grounding does not prevent harmful output.

Option C is wrong because custom safety attributes are for additional categories, but still need thresholds.

Full explanation →

467

MCQhard

A developer receives the above JSON response from a Vertex AI language model. The output content is correct, but the developer expected the model to not answer geography questions. What should the developer do to prevent the model from responding to geography queries?

A.Adjust the safety filter thresholds for the 'Toxic' category

B.Enable Vertex AI Grounding with a geography knowledge base

C.Configure a safety filter for the 'Geography' category

D.Add a system instruction to not answer geography questions

AnswerC

Safety filters can block specific categories like geography.

Why this answer

Option C is correct because Vertex AI provides safety filters that can be configured to block model responses in specific categories, including a 'Geography' category. By adjusting the safety filter thresholds for this category, the developer can prevent the model from answering geography queries while still allowing it to respond to other topics. This is a direct and effective method to enforce content restrictions without modifying the model's underlying behavior.

Exam trap

The trap here is that candidates often assume system instructions are sufficient for content restrictions, but Cisco tests the understanding that safety filters are the correct mechanism for enforcing categorical blocks, as they operate at a lower level and cannot be bypassed by prompt engineering.

How to eliminate wrong answers

Option A is wrong because adjusting safety filter thresholds for the 'Toxic' category only controls responses related to toxicity (e.g., hate speech, harassment), not geography-specific content; it does not address the requirement to block geography questions. Option B is wrong because enabling Vertex AI Grounding with a geography knowledge base would actually enhance the model's ability to answer geography queries by providing additional context, which is the opposite of what the developer wants. Option D is wrong because while adding a system instruction to not answer geography questions might influence the model, it is not a guaranteed enforcement mechanism—models can still override or ignore instructions, especially if the prompt is rephrased; safety filters provide a more reliable, configurable block.

Full explanation →

468

MCQeasy

A product team uses a translation model to convert English product descriptions into French. The model mixes formal and informal French dialects. Which simple prompt modification likely solves this?

A.Increase the temperature to encourage more consistent output.

B.Add a system prompt specifying 'Use only formal French with no informal expressions.'

C.Fine-tune the model on a corpus of formal French texts.

D.Provide a few-shot example of a formal French translation in the prompt.

AnswerB

Prompt engineering directly addresses the style issue.

Why this answer

Option D is correct because adding a requirement in the prompt to 'Use formal French' directly instructs the model on the desired style. Option A is wrong because temperature is for creativity, not dialect. Option B is wrong because few-shot with formal examples could help, but it's not a simple modification; a system prompt is simpler.

Option C is wrong because fine-tuning is overkill.

Full explanation →

469

MCQhard

An MLOps engineer wants to implement continuous evaluation of a generative model in production. Which Vertex AI component should they use?

A.Vertex AI Model Monitoring

B.Vertex AI Feature Store

C.Vertex AI Prediction

D.Vertex AI Pipelines

AnswerA

Model Monitoring provides continuous evaluation of model metrics and alerts on degradation.

Why this answer

Vertex AI Model Monitoring is the correct component because it provides continuous evaluation of model performance in production, including detecting prediction drift, data drift, and feature attribution drift. For generative models, it can monitor output quality and safety metrics over time, alerting engineers to degradation or shifts in model behavior without requiring manual intervention.

Exam trap

Google Cloud often tests the distinction between monitoring (ongoing evaluation of deployed models) and serving (handling inference requests), leading candidates to mistakenly choose Vertex AI Prediction for continuous evaluation tasks.

How to eliminate wrong answers

Option B is wrong because Vertex AI Feature Store is designed for managing, storing, and serving feature data for training and predictions, not for monitoring model performance or evaluating outputs in production. Option C is wrong because Vertex AI Prediction handles model serving and inference requests, but it does not include built-in continuous evaluation or drift detection capabilities. Option D is wrong because Vertex AI Pipelines orchestrates ML workflows for training and batch prediction, but it is not a real-time monitoring service for production model evaluation.

Full explanation →

470

Multi-Selecteasy

Which THREE are essential components of a responsible AI strategy for GenAI? (Select three.)

Select 3 answers

A.Use of only open-source models

B.Maximum model size

C.Human oversight for critical decisions

D.Model transparency and explainability

E.Bias detection and mitigation

AnswersC, D, E

Human oversight prevents harmful automated decisions and ensures ethical use.

Why this answer

Human oversight for critical decisions (C) is essential because GenAI models can produce plausible but incorrect or harmful outputs. A responsible AI strategy mandates that a human-in-the-loop reviews high-stakes outputs, such as medical diagnoses or financial approvals, to prevent automated errors from causing real-world harm. This aligns with the principle of human accountability in AI governance frameworks like the NIST AI Risk Management Framework.

Exam trap

Google Cloud often tests the misconception that technical attributes like model size or open-source licensing are core to responsible AI, when in fact the focus is on governance practices like transparency, bias mitigation, and human oversight.

Full explanation →

471

Multi-Selectmedium

Which TWO actions can reduce the cost of using Vertex AI Gemini API? (Choose two.)

Select 2 answers

A.Use batch prediction instead of online

B.Increase the max output tokens

C.Use grounding with Google Search

D.Use a larger model

E.Use context caching

AnswersA, E

Batch prediction is generally cheaper than online prediction.

Why this answer

Options A and B are correct. Context caching reduces repeated input costs, and batch prediction is cheaper than online. Options C, D, and E are incorrect because increasing max output tokens may increase cost, using larger models costs more, and grounding with Google Search incurs additional costs.

Full explanation →

472

MCQmedium

A team is deploying a text generation model for legal document review. They observe that the model occasionally generates factually incorrect legal citations. Which approach best reduces this issue?

A.Implement retrieval-augmented generation (RAG) with a verified legal database.

B.Lower the temperature to 0.0.

C.Use a larger base model.

D.Increase the max output tokens.

AnswerA

RAG retrieves factual information from verified sources, reducing hallucinations.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) with a verified legal database grounds the model in factual, up-to-date sources, directly addressing incorrect citations. Option A (lowering temperature) reduces randomness but does not prevent hallucination. Option B (increasing max tokens) has no effect on factual accuracy.

Option D (using a larger model) may not guarantee correctness without proper grounding.

Full explanation →

473

MCQeasy

A company is deploying a generative AI model for medical advice. What is the most important consideration?

A.Model latency

B.Safety and fairness

C.Model size

D.Cost of inference

AnswerB

Patient safety and avoiding bias are the top priorities.

Why this answer

In medical advice applications, a generative AI model's outputs can directly impact patient health, making safety and fairness the paramount consideration. Incorrect or biased advice could lead to misdiagnosis or harm, outweighing performance metrics like latency or cost. Regulatory frameworks such as HIPAA and FDA guidelines for clinical decision support further mandate rigorous validation of model safety and fairness before deployment.

Exam trap

Google Cloud often tests the misconception that technical performance metrics like latency or cost are the primary concerns in high-stakes domains, when in fact ethical and safety considerations take precedence.

How to eliminate wrong answers

Option A is wrong because model latency, while important for user experience, is secondary to ensuring the advice is safe and unbiased; a fast but harmful response is unacceptable in healthcare. Option C is wrong because model size correlates with computational resources and potential capability, but does not inherently guarantee safety or fairness; a larger model may amplify biases or generate more confident but incorrect advice. Option D is wrong because cost of inference is a business consideration that must be balanced against safety requirements, but it is not the most critical factor when human lives are at stake.

Full explanation →

474

MCQmedium

A travel company fine-tuned a language model on customer chat logs to provide travel recommendations. After deployment, they receive complaints that the model sometimes generates inappropriate or offensive content. What is the most effective approach to improve output safety while preserving overall performance?

A.Modify the system instruction to request polite responses only

B.Retrain the model on a larger dataset of chat logs

C.Reduce the temperature to 0.0

D.Add a post-processing safety classifier that filters or rewrites unsafe outputs

AnswerD

A safety classifier directly catches and mitigates harmful content without modifying the base model.

Why this answer

Option B is correct because applying a safety classifier as an output filter catches harmful content without retraining. Option A is wrong because retraining on more data may not address specific safety issues. Option C is wrong because lowering temperature reduces creativity but not necessarily offensive content.

Option D is wrong because system instruction alone is often insufficient for robust safety.

Full explanation →

475

MCQeasy

A developer is using Vertex AI PaLM 2 to generate product descriptions. The output is often too verbose and includes irrelevant details. Which technique should the developer apply?

A.Set top_p to 0.1

B.Enable safety filters

C.Use few-shot prompting with examples of concise descriptions

D.Increase temperature to 0.9

AnswerC

Guides the model to match the style of provided examples.

Why this answer

Option C is correct because the developer needs to constrain the model's output to be concise and relevant. Few-shot prompting provides the model with explicit examples of the desired output format (concise descriptions), guiding it to mimic that style and length. This directly addresses verbosity and irrelevant details without altering the model's fundamental randomness or safety settings.

Exam trap

The trap here is that candidates confuse hyperparameter tuning (top_p, temperature) with prompt engineering techniques, assuming that reducing randomness (top_p) or increasing creativity (temperature) can fix verbosity, when only explicit examples in the prompt can reliably enforce a specific output style.

How to eliminate wrong answers

Option A is wrong because setting top_p to 0.1 reduces the cumulative probability threshold for token sampling, which makes the output less diverse and more deterministic, but it does not teach the model to be concise or omit irrelevant details—it only narrows the pool of possible next tokens. Option B is wrong because safety filters block harmful or sensitive content (e.g., toxicity, violence), not verbose or irrelevant details; they do not control output length or relevance. Option D is wrong because increasing temperature to 0.9 increases randomness and creativity in token selection, which would likely make the output even more verbose and include more irrelevant details, the opposite of what is needed.

Full explanation →

476

MCQhard

Refer to the exhibit. The team changed the generation parameters to reduce output variability. However, summaries now often repeat the same phrases. Which parameter change is most likely causing the repetition?

A.Reducing top_p from 0.95 to 0.85

B.Reducing temperature from 0.7 to 0.2

C.Using the same model text-bison@002

D.Reducing top_k from 40 to 10

AnswerB

Low temperature increases determinism and repetition.

Why this answer

Lowering temperature to 0.2 makes the model more deterministic, increasing repetition. Option A is wrong because top_k reduction also contributes to determinism. Option B is wrong because top_p reduction also narrows token selection.

Option D is wrong because the model is the same.

Full explanation →

477

Multi-Selecthard

Which THREE considerations are critical when deploying a generative AI model using Vertex AI Endpoints for a latency-sensitive application? (Choose THREE.)

Select 3 answers

A.Model size and architecture

B.Number of model versions

C.GPU type and number

D.Autoscaling configuration

E.Number of model instances

AnswersA, C, D

Larger models introduce higher latency.

Why this answer

Model size and architecture directly impact inference latency because larger models with more parameters require more computation per request. For latency-sensitive applications, choosing a smaller or distilled model (e.g., Gemma 2B vs. 27B) or using quantization can reduce response times. Vertex AI Endpoints serve the model as-is, so the model's inherent computational cost is the primary driver of per-request latency.

Exam trap

Google Cloud often tests the distinction between configuration choices that affect latency (GPU type, autoscaling, model size) versus operational or lifecycle management choices (version count, manual instance count) that do not directly impact per-request response time.

Full explanation →

478

MCQhard

You are the Generative AI lead for a global retail company that is building a customer service chatbot using a large language model (LLM) on Vertex AI. The chatbot will handle order inquiries, returns, and product recommendations. The company has a multi-cloud strategy and uses Google Cloud for AI workloads, but customer data is stored in AWS DynamoDB and on-premises databases. The legal team mandates that no customer personally identifiable information (PII) is sent to the LLM for training or inference, and that the model's responses must comply with GDPR and CCPA. The engineering team has proposed using a fine-tuned version of Gemini with retrieval-augmented generation (RAG) from a vector database. During a pilot, the chatbot occasionally hallucinates and invents order details, and response latency is over 10 seconds for complex queries. The budget for this project is limited, and the team needs to balance cost, compliance, and performance. Which course of action should you recommend?

A.Implement a two-model architecture: a smaller model for simple queries and a larger model for complex queries, with a router based on query complexity.

B.Switch to a purely fine-tuned model without RAG, and rely on fine-tuning data that excludes PII to ensure compliance.

C.Use a larger, more powerful LLM with chain-of-thought prompting to improve reasoning and reduce hallucinations, and cache frequent queries to reduce latency.

D.Ground the model with a curated knowledge base from DynamoDB and on-premises data, and use prompt engineering to explicitly instruct the model not to generate PII. Implement a PII detection and redaction layer before sending queries to the LLM.

AnswerD

Grounding reduces hallucinations by restricting responses to verified data, and prompt engineering with PII detection ensures compliance without significant latency increase or budget overrun.

Why this answer

Option B is correct because grounding the model with a knowledge base and using prompt engineering to restrict PII directly addresses hallucinations and compliance without high cost or latency. Option A is too complex and expensive for limited budget. Option C increases latency further due to multi-hop reasoning.

Option D removes the RAG capability, increasing hallucination risk.

Full explanation →

479

MCQhard

Refer to the exhibit. An administrator creates this IAM policy for a Vertex AI project. What is the effect of this policy?

A.Alice can view models; Bob can delete models

B.Alice can deploy pre-trained models; Bob can create and manage custom model code

C.Both have full access to all Vertex AI resources

D.Alice can train models; Bob can deploy models

AnswerB

aiplatform.user includes deployment permissions; customCodeModelAdmin covers custom code management.

Why this answer

Option B is correct because the IAM policy grants Alice the `aiplatform.models.get` permission (allowing her to view and deploy pre-trained models) and grants Bob the `aiplatform.models.create` and `aiplatform.models.update` permissions (allowing him to create and manage custom model code). The policy uses separate bindings for each user, with specific roles that align with these actions.

Exam trap

Google Cloud often tests the distinction between specific IAM permissions (e.g., `get` vs. `create` vs. `delete`) and the common misconception that viewing a model implies full access or that creating a model implies the ability to deploy it.

How to eliminate wrong answers

Option A is wrong because Alice is granted `aiplatform.models.get`, which allows viewing models but not deleting them; Bob is granted `aiplatform.models.create` and `aiplatform.models.update`, which allow creating and updating models but not deleting them. Option C is wrong because the policy does not grant full access to all Vertex AI resources; it only grants specific permissions on models, and neither user has permissions for other resources like datasets or endpoints. Option D is wrong because Alice's permission (`aiplatform.models.get`) does not include training models, and Bob's permissions (`aiplatform.models.create` and `aiplatform.models.update`) are for custom model code management, not deploying models.

Full explanation →

480

MCQmedium

A company is using Vertex AI to generate email responses. They want to ensure sensitive customer data (PII) is not included in the output. What is the most effective approach?

A.Use a system prompt instructing the model to avoid PII.

B.Fine-tune the model on a dataset that excludes PII.

C.Manually review each output before sending.

D.Configure safety filters to block PII categories.

AnswerD

Safety filters can automatically block PII.

Why this answer

Option D is correct because safety filters in Vertex AI are specifically designed to block categories of harmful content, including PII, at the model's output layer. This provides a deterministic, automated guardrail that prevents sensitive data from being generated, unlike prompt-based instructions which can be overridden by the model's training. Safety filters operate on the model's response before it is returned, ensuring PII is caught even if the model attempts to generate it.

Exam trap

The trap here is that candidates assume a system prompt (Option A) is sufficient to control model behavior, but Cisco tests the understanding that prompts are not enforceable guardrails, whereas safety filters are a hard technical control.

How to eliminate wrong answers

Option A is wrong because system prompts are merely instructions and do not guarantee the model will comply; the model can still generate PII due to its training data or adversarial inputs. Option B is wrong because fine-tuning on a dataset that excludes PII does not prevent the model from generating PII from its pre-trained knowledge, and fine-tuning is costly and may not cover all edge cases. Option C is wrong because manual review is not scalable, introduces latency, and is prone to human error, making it ineffective for high-volume email generation.

Full explanation →

481

MCQhard

Refer to the exhibit. This JSON describes a Vertex AI endpoint with a deployed model. Which statement about scaling is true?

A.The endpoint uses only dedicated resources, no automatic scaling

B.The endpoint will automatically scale based on GPU utilization

C.The endpoint will scale from 1 to 3 replicas based on load using automatic scaling

D.The endpoint can scale to zero when not in use

AnswerA

DedicatedResources with min/max replicas means manual scaling.

Why this answer

Option A is correct because the JSON shows that the endpoint is configured with `dedicatedResources` and no `autoscalingMetricSpecs` or `minReplicaCount`/`maxReplicaCount` fields. In Vertex AI, when you specify only `machineSpec` and a fixed `minReplicaCount` (here implicitly 1) without a `maxReplicaCount` or autoscaling metrics, the endpoint uses dedicated resources with no automatic scaling — the model will always run on exactly the number of replicas you define, regardless of load.

Exam trap

Google Cloud often tests the misconception that any endpoint with a `minReplicaCount` and `maxReplicaCount` automatically enables scaling, but the trap here is that without `autoscalingMetricSpecs`, the endpoint uses dedicated resources and does not scale dynamically — the `maxReplicaCount` is ignored if autoscaling metrics are absent.

How to eliminate wrong answers

Option B is wrong because Vertex AI automatic scaling is based on CPU utilization or custom metrics, not GPU utilization; GPU utilization is not a supported metric for autoscaling in Vertex AI endpoints. Option C is wrong because the JSON does not include `autoscalingMetricSpecs` or a `maxReplicaCount` field, which are required to enable automatic scaling from a minimum to a maximum number of replicas; without these, the endpoint uses a fixed replica count. Option D is wrong because Vertex AI endpoints with dedicated resources cannot scale to zero; scaling to zero is only possible with private endpoints using manual scaling or when using Vertex AI Prediction with a custom container that supports scale-to-zero, but dedicated resources always maintain at least one replica.

Full explanation →

482

MCQeasy

A medical imaging team wants to generate synthetic X-ray images to augment a training dataset for a rare disease. Which type of generative model is most suitable for generating high-fidelity, realistic medical images?

A.Generative Adversarial Network (GAN)

B.Diffusion model

C.Variational Autoencoder (VAE)

D.Autoregressive transformer (e.g., PixelCNN)

AnswerB

Diffusion models currently produce the highest quality images.

Why this answer

Diffusion models are the most suitable for generating high-fidelity, realistic medical images because they iteratively denoise random noise into a coherent image through a learned reverse diffusion process, which produces superior sample quality and diversity compared to GANs, especially for complex, high-dimensional data like X-rays. Their training stability and ability to model fine-grained anatomical details without mode collapse make them the current state-of-the-art for medical image synthesis.

Exam trap

Google Cloud often tests the misconception that GANs are the default choice for image generation due to their popularity, but the trap here is that for high-fidelity medical imaging, diffusion models are preferred because they avoid GANs' mode collapse and training instability, which are critical in safety-sensitive domains.

How to eliminate wrong answers

Option A is wrong because GANs, while capable of generating realistic images, suffer from training instability, mode collapse, and difficulty in capturing the full diversity of medical image distributions, often producing artifacts that are unacceptable in clinical contexts. Option C is wrong because VAEs generate blurry and less detailed images due to their reliance on a variational lower bound and a Gaussian prior, which fails to capture the sharp edges and fine textures critical in X-ray images. Option D is wrong because autoregressive transformers like PixelCNN generate images pixel-by-pixel, which is computationally prohibitive for high-resolution medical images and lacks the global coherence and efficiency of diffusion models.

Full explanation →

483

MCQeasy

A company is using Vertex AI to generate customer support summaries from chat logs. They notice that the summaries sometimes include irrelevant details from the conversation. Which technique should they use to reduce irrelevant details?

A.Use a higher top-k value.

B.Fine-tune the model on a large dataset of general conversations.

C.Add a system instruction to focus on key points.

D.Increase the temperature parameter.

AnswerC

This guides the model to produce concise, relevant summaries.

Why this answer

Option A is correct because adding a system instruction to focus on key points directly guides the model to omit irrelevant details. Option B (increasing temperature) would increase randomness and potentially introduce more irrelevant content. Option C (using a higher top-k value) increases diversity of word choices, not relevance.

Option D (fine-tuning on general conversations) is not targeted and may not resolve the specific issue.

Full explanation →

484

MCQmedium

A company's generative AI model is producing biased outputs. What is the most effective mitigation strategy?

A.Use a larger model with more parameters to improve overall accuracy

B.Fine-tune the model using a balanced, representative dataset and implement output filtering

C.Use prompt engineering to instruct the model to avoid biased language

D.Increase the diversity of input samples by random sampling

AnswerB

Balanced data reduces bias during training, and filters catch biased outputs in production.

Why this answer

Fine-tuning on a balanced, representative dataset directly addresses the root cause of biased outputs by correcting the model's learned associations, while output filtering provides a safety net to catch residual bias. This combination is more effective than superficial fixes because it modifies the model's internal weights rather than just masking outputs.

Exam trap

Google Cloud often tests the misconception that prompt engineering or model scaling alone can fix bias, when in fact only retraining or fine-tuning with balanced data addresses the underlying weight distribution.

How to eliminate wrong answers

Option A is wrong because increasing model size does not inherently reduce bias; larger models can amplify biases present in training data due to higher capacity to memorize spurious correlations. Option C is wrong because prompt engineering only provides a surface-level instruction that the model may ignore or fail to generalize, especially if the bias is deeply embedded in its parameters. Option D is wrong because random sampling of inputs does not address the model's biased internal representations; it only diversifies the prompts, not the training data that caused the bias.

Full explanation →

485

MCQmedium

You are a generative AI architect for a large e-commerce company. Your team has built a product description generator using Vertex AI's text-bison model. The model is accessed via the Vertex AI API from a web application. You have set the temperature to 0.5 and top_k to 40. The team reports that the generated descriptions are often too generic and lack creativity. They want the descriptions to be more diverse and engaging. You are also concerned about cost, as each API call is billed. Which change should you recommend to increase creativity while managing cost?

A.Keep temperature at 0.5 but reduce top_k to 20.

B.Increase the temperature to 0.8 and keep top_k at 40.

C.Switch to a larger model like text-bison@002 and keep same parameters.

D.Decrease the temperature to 0.2 and increase top_k to 60.

AnswerB

Higher temperature increases diversity and creativity.

Why this answer

Increasing the temperature to 0.8 makes the model's output probability distribution flatter, which increases randomness and allows less likely tokens to be selected. This directly addresses the need for more diverse and creative descriptions. Keeping top_k at 40 ensures the model still considers a broad set of candidate tokens, balancing creativity with coherence, and does not increase API call costs since temperature and top_k are inference parameters that do not affect billing.

Exam trap

Google Cloud often tests the misconception that increasing creativity requires a larger model or more expensive resources, when in fact tuning sampling parameters like temperature and top_k is the correct, cost-neutral approach.

How to eliminate wrong answers

Option A is wrong because reducing top_k to 20 narrows the set of candidate tokens, which actually reduces diversity and can make outputs more generic, counteracting the goal of increasing creativity. Option C is wrong because switching to a larger model like text-bison@002 would increase cost per API call (larger models are billed at higher rates) without guaranteeing more creativity; creativity is controlled by sampling parameters, not model size alone. Option D is wrong because decreasing temperature to 0.2 makes the model more deterministic and conservative, reducing creativity, and increasing top_k to 60 does not compensate for the loss of randomness — the net effect is less diverse outputs.

Full explanation →

486

MCQmedium

An organization is using Vertex AI Agent Builder to create a customer service agent. They want the agent to be able to hand off to a human agent when it cannot answer a question. What should they configure in the agent's design?

A.Configure 'Slot filling' to collect more info

B.Implement a 'Confirmation' prompt for the user

C.Add an 'Escalation' intent that triggers a human handoff

D.Use a 'Fallback' intent to route to a human

AnswerC

Escalation intent is designed for human handoff.

Why this answer

Agent Builder supports 'Escalation' intent to hand off to human agents. Option B is wrong because fallback intent is for unrecognized input but not necessarily human handoff. Option C is wrong because confirmation is for confirming actions.

Option D is wrong because slot filling is for collecting parameters.

Full explanation →

487

MCQeasy

A developer is using the Gemini API to build a chatbot. They want the model to always respond in a friendly, professional tone. Which prompt engineering technique should they use?

A.Set system instructions to 'You are a friendly and professional assistant.'

B.Include a few-shot example in every user message.

C.Set the temperature to 0.2.

D.Set max output tokens to 100.

AnswerA

System instructions define the assistant's behavior for the entire session.

Why this answer

Option A is correct because setting system instructions is the most direct and reliable way to define the model's persona and behavioral constraints. In the Gemini API, system instructions act as a persistent, top-level directive that influences every response, ensuring the chatbot consistently adopts a friendly and professional tone without requiring repeated examples or parameter tuning.

Exam trap

Google Cloud often tests the distinction between controlling output style (system instructions) versus controlling output randomness (temperature) or length (max tokens), so the trap here is that candidates may confuse temperature or token limits with persona control, thinking that lowering creativity or capping length will enforce a specific tone.

How to eliminate wrong answers

Option B is wrong because including a few-shot example in every user message is inefficient and not a persistent technique; it would require repeating the example in each turn, increasing token usage and latency, and it does not guarantee consistent tone across all interactions. Option C is wrong because setting the temperature to 0.2 controls randomness and creativity, not tone; a low temperature makes outputs more deterministic but does not enforce a specific persona or style. Option D is wrong because setting max output tokens to 100 limits response length but has no effect on the tone or style of the output; it only truncates the response.

Full explanation →

488

Multi-Selectmedium

A company is designing a prompt engineering strategy for a customer service chatbot using Gemini. Which two practices are recommended for improving response quality? (Choose TWO)

Select 2 answers

A.Use chain-of-thought prompting

B.Always provide multiple examples in the prompt

C.Avoid any context in the prompt

D.Set temperature to 1.0 for maximum creativity

E.Include a system instruction to define the role

AnswersA, E

Chain-of-thought encourages logical reasoning, improving accuracy.

Why this answer

Chain-of-thought prompting (A) is recommended because it guides the model to reason step-by-step, improving accuracy on complex customer service queries by breaking down multi-step problems. This technique leverages Gemini's ability to follow logical sequences, reducing errors in tasks like troubleshooting or escalation decisions.

Exam trap

Google Cloud often tests the misconception that higher temperature always improves creativity, but in customer service, lower temperature is critical for deterministic, safe responses, and candidates may overlook the role of system instructions in defining behavior.

Full explanation →

489

MCQmedium

A data scientist uses Vertex AI Model Evaluation to assess a fine-tuned model for sentiment analysis. The evaluation report shows high precision but low recall on the 'negative' class. What is the best course of action to improve recall without sacrificing too much precision?

A.Adjust the prediction threshold for the negative class

B.Switch to a different model architecture (e.g., from BERT to RoBERTa)

C.Collect more labeled examples of negative sentiment and retrain

D.Use a larger pretrained model from Model Garden

AnswerC

Adding more data for the underperforming class helps the model learn better.

Why this answer

Option B is correct because collecting more negative examples and retraining addresses class imbalance, which is a common cause of low recall. Option A (adjusting threshold) trades off precision and recall, but may not fix underlying imbalance. Option C (changing model architecture) is excessive.

Option D (using a larger base model) may not specifically address recall.

Full explanation →

490

MCQmedium

A startup is building a generative AI content creation tool. They want to minimize operational costs while maintaining low latency for end users. Which deployment strategy should they adopt?

A.Deploy the model on edge devices to reduce cloud dependency.

B.Build an on-premises infrastructure to avoid cloud egress fees.

C.Use a serverless inference endpoint that scales to zero when not in use.

D.Provision dedicated GPU instances for consistent performance.

AnswerC

Serverless aligns cost with usage and auto-scales to meet demand.

Why this answer

Option C is correct because serverless inference endpoints, such as AWS Lambda with SageMaker or Google Cloud Run, automatically scale to zero when idle, eliminating costs during periods of no traffic. This directly addresses the startup's goal of minimizing operational costs while maintaining low latency through rapid cold-start optimizations and provisioned concurrency for burst handling.

Exam trap

Google Cloud often tests the misconception that 'scaling to zero' is only for CPU workloads, but serverless GPU inference endpoints (e.g., AWS SageMaker Serverless Inference) support GPU acceleration and scale to zero, making them cost-effective for variable generative AI workloads.

How to eliminate wrong answers

Option A is wrong because deploying on edge devices introduces significant hardware procurement and maintenance costs, and edge GPUs typically lack the compute power for large generative models, leading to higher latency for complex inference tasks. Option B is wrong because building on-premises infrastructure incurs high upfront capital expenditure and ongoing operational overhead for power, cooling, and maintenance, which contradicts the goal of minimizing operational costs. Option D is wrong because provisioning dedicated GPU instances incurs costs even when idle, as reserved or on-demand instances bill per hour regardless of usage, making it inefficient for variable or low-traffic workloads.

Full explanation →

491

MCQeasy

A marketing agency wants to generate images using Imagen on Vertex AI. They need to ensure the images are unique and avoid copyright issues. Which parameter adjustment is most relevant?

A.Increase training steps

B.Increase seed variability

C.Use negative prompts

D.Set safety threshold

AnswerC

Specifies elements to avoid, reducing copyright risk.

Why this answer

Negative prompts allow the model to exclude specific concepts, styles, or elements from generated images, directly reducing the risk of replicating copyrighted or trademarked content. By explicitly telling Imagen what not to include, the agency can steer outputs away from protected works without needing to modify training data or safety filters.

Exam trap

Google Cloud often tests the distinction between safety filters (which block harmful content) and negative prompts (which control stylistic or conceptual exclusion), leading candidates to mistakenly choose safety threshold adjustments for copyright avoidance.

How to eliminate wrong answers

Option A is wrong because increasing training steps does not affect the uniqueness or copyright compliance of outputs; it only refines model convergence on the existing training distribution. Option B is wrong because seed variability controls randomness in latent noise initialization, not the semantic content of the image, so it cannot prevent copyright infringement. Option D is wrong because safety thresholds filter harmful or policy-violating content (e.g., violence, hate speech), not copyrighted or trademarked elements.

Full explanation →

492

MCQhard

A research team is training a large language model from scratch using TPUs on Google Cloud. Which storage solution provides the highest throughput for training data?

A.Cloud Storage

B.Persistent Disk

C.Cloud Filestore

D.Cloud Spanner

AnswerA

Cloud Storage provides high throughput for large datasets, especially with parallel reads.

Why this answer

Cloud Storage provides the highest throughput for training data because it is designed for high-bandwidth, parallel access from TPU pods via the Google Cloud Storage FUSE or gRPC-based data loading. TPUs benefit from Cloud Storage's ability to serve data at hundreds of GB/s when using the `tf.data` service with `tf.io.gfile` or the `gcloud storage` API, avoiding the I/O bottlenecks of block storage. Persistent Disk and Filestore have lower aggregate throughput limits and are not optimized for the distributed, streaming read patterns typical of large-scale training.

Exam trap

Google Cloud often tests the misconception that local or attached block storage (Persistent Disk) is faster than object storage for ML training, but candidates fail to recognize that TPU training requires distributed, parallel data access that object storage (Cloud Storage) uniquely provides at scale.

How to eliminate wrong answers

Option B is wrong because Persistent Disk is a block storage device with a maximum throughput of ~1.2 GB/s per instance (for pd-ssd), which is far below the multi-GB/s requirements of TPU training and cannot scale horizontally across many workers without complex striping. Option C is wrong because Cloud Filestore is a managed NFS filestore that introduces network latency and has throughput caps (e.g., 1.2 GB/s for the Basic tier, 4.8 GB/s for the High Scale tier), making it unsuitable for the high-throughput, low-latency data streaming needed by TPUs. Option D is wrong because Cloud Spanner is a globally distributed relational database service designed for transactional consistency and ACID compliance, not for high-throughput sequential read of training data; its throughput is limited by node count and query overhead, and it is not a file storage solution.

Full explanation →

493

Multi-Selecthard

Which THREE are valid methods to reduce bias in generative AI outputs?

Select 3 answers

A.Using only English prompts

B.Increasing model size

C.Using a more diverse training dataset

D.Using safety filters

E.Applying prompt engineering to instruct the model to be fair

AnswersC, D, E

Diverse data reduces the risk of model learning biased patterns.

Why this answer

Option C is correct because training on a more diverse dataset reduces representational bias by exposing the model to a wider range of demographics, cultures, and perspectives. This directly mitigates the model's tendency to overrepresent majority groups or underrepresent minorities, which is a root cause of biased outputs in generative AI.

Exam trap

Google Cloud often tests the misconception that increasing model size or using a single language (like English) can solve bias, when in reality these actions can worsen bias by amplifying existing skews or introducing new cultural blind spots.

Full explanation →

494

MCQhard

A company is deploying a generative AI application that generates medical reports. They need to ensure the output is factual and minimizes hallucinations. Which approach is most effective?

A.Fine-tune the model with RLHF

B.Set the temperature to 0.0

C.Implement retrieval-augmented generation (RAG) with a curated knowledge base

D.Use prompt engineering to instruct the model to be accurate

AnswerC

RAG grounds outputs in retrieved facts, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the most effective approach because it grounds the model's output in a curated, authoritative knowledge base of medical data. By retrieving relevant, verified documents at inference time, RAG directly reduces the model's reliance on its parametric memory, which is the primary source of hallucinations in generative AI. This is especially critical in high-stakes domains like medical reporting, where factual accuracy is paramount.

Exam trap

The trap here is that candidates often choose 'Set the temperature to 0.0' because they confuse reducing randomness with eliminating factual errors, but temperature only controls output variability, not the truthfulness of the model's internal knowledge.

How to eliminate wrong answers

Option A is wrong because RLHF (Reinforcement Learning from Human Feedback) optimizes the model for human preference alignment and helpfulness, but it does not provide a mechanism to retrieve or verify facts from an external source, so it cannot reliably prevent hallucinations in factual domains. Option B is wrong because setting temperature to 0.0 makes the model deterministic (always picking the highest-probability token), but it does not correct factual errors stored in the model's weights; the model can still confidently generate false information. Option D is wrong because prompt engineering instructs the model to be accurate, but it is a soft constraint that the model can easily override; without external grounding, the model has no way to verify its own output against a trusted source.

Full explanation →

495

MCQmedium

A global news agency is using a generative AI model to summarize breaking news articles in real-time. The model is deployed on Vertex AI across multiple regions (us-central1, europe-west4, asia-southeast1) for low latency worldwide. The agency has a Service Level Objective (SLO) of 99.9% availability and p99 latency under 2 seconds. Recently, during a major event, traffic spiked 10x, and the europe-west4 region experienced latency spikes over 5 seconds and some 503 errors. The team suspects the regional endpoint is under-provisioned. Which combination of actions should they take to meet the SLO consistently?

A.Enable the global endpoint feature in Vertex AI with automatic traffic splitting, and increase the minimum replicas for each regional endpoint

B.Increase the maximum replicas for the europe-west4 endpoint and reduce the min replicas in other regions

C.Implement Cloud CDN caching for common summaries and reduce the number of regions to two

D.Configure a global load balancer with a single Vertex AI endpoint and increase max replicas globally

AnswerA

Global endpoint distributes traffic and increases capacity; higher min replicas prevent cold starts during spikes.

Why this answer

Enabling global endpoint with automatic traffic splitting and increasing min replicas per region (option D) provides both failover and capacity. Simply increasing replicas in europe (A) doesn't help if traffic shifts. Global endpoint without min replicas (B) still risks cold starts.

Using Cloud CDN (C) is for static content, not model inference.

Full explanation →

496

MCQhard

A large enterprise runs a generative AI solution serving millions of daily inference requests. To reduce costs, they propose using serverless endpoints (Vertex AI Prediction) with a custom container, but they notice high latency during cold starts. Which strategy best addresses this problem while minimizing cost?

A.Set a minimum number of replicas to maintain a baseline of always-on instances.

B.Upgrade to GPU-accelerated machines for all replicas.

C.Implement client-side request batching to reduce the number of inference calls.

D.Use prewarmed containers by setting an idle timeout to keep instances alive.

AnswerA

B is correct because it eliminates cold starts for the baseline load, and autoscaling handles additional traffic.

Why this answer

Option A is correct because setting a minimum number of replicas ensures that a baseline of always-on instances is maintained, eliminating cold starts for the majority of requests. This directly addresses the latency spike caused by container initialization and model loading in serverless endpoints, while the cost impact is limited to the minimum replicas rather than scaling all instances.

Exam trap

Google Cloud often tests the misconception that prewarming via idle timeout is a configurable parameter in serverless ML services, but in Vertex AI Prediction, the idle timeout is fixed and not user-adjustable, making minimum replicas the correct approach.

How to eliminate wrong answers

Option B is wrong because upgrading to GPU-accelerated machines increases cost significantly without solving cold start latency; GPUs primarily improve per-request throughput, not initialization time. Option C is wrong because client-side request batching reduces the number of inference calls but does not affect cold start latency; it may even increase perceived latency for individual requests. Option D is wrong because setting an idle timeout to keep instances alive is not a supported mechanism in Vertex AI Prediction; the service uses an internal keep-alive policy, and user-configurable idle timeouts are not available, making this option technically infeasible.

Full explanation →

497

Multi-Selectmedium

A financial institution is implementing a generative AI chatbot to handle customer inquiries. The institution must comply with regulatory requirements (e.g., GDPR, SOX) and ensure data privacy. Which TWO actions should the institution take?

Select 2 answers

A.Establish a Center of Excellence (CoE) for AI governance to oversee model deployment and monitoring.

B.Use Vertex AI without additional data governance controls to simplify deployment.

C.Use a pre-trained model without customization to reduce development time.

D.Implement model validation and testing to ensure outputs meet regulatory standards.

E.Deploy the model on-premises only to keep data within local infrastructure.

AnswersA, D

A CoE provides centralized governance and best practices.

Why this answer

Options B and D are correct. B: implementing model validation and testing ensures the model behaves as expected and helps meet compliance requirements. D: establishing a Center of Excellence (CoE) for AI governance provides oversight and standardization.

Option A is wrong because using a pre-trained model without customization may not meet specific compliance needs. Option C is wrong because deploying on-premises only is not necessary and may limit scalability. Option E is wrong because Vertex AI without data governance would not satisfy regulatory demands.

Full explanation →

498

MCQhard

A company has been using an on-premises ML infrastructure for generative AI and wants to migrate to Google Cloud. They have a pipeline that fine-tunes a large language model weekly using a proprietary dataset. The migration must minimize downtime and data transfer costs. Which approach best addresses these requirements?

A.Use Vertex AI Pipelines to orchestrate the fine-tuning process, and use Vertex AI Managed Datasets to incrementally sync new data with BigQuery as the source.

B.Use AutoML to train a new model directly from the dataset without fine-tuning.

C.Deploy the existing pipeline on a Google Kubernetes Engine cluster and use Google Cloud Filestore for shared storage.

D.Use Cloud Storage Transfer Service to move all data to Cloud Storage, then set up a Vertex AI custom training job to run the fine-tuning.

AnswerA

C is correct because it allows incremental sync and automated pipeline execution with minimal disruption.

Why this answer

Option C is correct because Vertex AI Pipelines with Managed Datasets allows incremental data transfer and automates fine-tuning in the cloud, minimizing downtime. Option A is wrong because Cloud Storage Transfer Service is for one-time bulk transfer, causing longer downtime. Option B is wrong because a custom solution on GKE is complex and may not reduce costs.

Option D is wrong because AutoML does not support fine-tuning custom large language models.

Full explanation →

499

MCQhard

A large enterprise is deploying a generative AI-powered code assistant for their developers. The solution uses Vertex AI with a fine-tuned Codey model. The security team requires that all prompts and responses be logged for audit purposes, but the logs must not contain sensitive information such as API keys or passwords. The operations team is concerned about high latency during peak usage. You need to design a solution that meets security requirements without compromising performance. Which approach should you take?

A.Use Cloud Audit Logs to capture all API calls to Vertex AI, but do not log the actual prompts and responses

B.Enable Vertex AI model monitoring with Cloud Logging, and configure a log sink with a custom exclusion filter to redact sensitive patterns before storing

C.Log all prompts and responses to Cloud Storage and use a Cloud DLP job to scan and redact sensitive data periodically

D.Implement a custom proxy that logs all requests after stripping sensitive data, then forward to the model

AnswerB

This ensures all interactions are logged but sensitive data is removed, meeting security without major performance impact.

Why this answer

Option B is correct because it uses Vertex AI model monitoring with Cloud Logging to capture prompts and responses, then applies a custom exclusion filter with a log sink to redact sensitive patterns (e.g., API keys, passwords) in real time before logs are stored. This meets the security requirement for audit logging without sensitive data while avoiding the latency overhead of post-processing or a custom proxy, thus satisfying the operations team's performance concern.

Exam trap

Google Cloud often tests the misconception that post-processing redaction (e.g., Cloud DLP) or custom proxies are acceptable for real-time logging, when in fact native streaming redaction via log sinks is required to meet both security and performance constraints.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs capture only administrative actions (e.g., model deployment) and not the actual prompts and responses, failing the audit requirement. Option C is wrong because logging all data to Cloud Storage and running a periodic Cloud DLP job introduces significant latency and potential exposure window between logging and redaction, violating the performance requirement. Option D is wrong because implementing a custom proxy adds network hop latency and operational overhead, degrading performance during peak usage, and does not leverage native Vertex AI logging capabilities.

Full explanation →

500

MCQeasy

Refer to the exhibit. A developer has defined a dynamic action in the Vertex AI Agent Builder agent YAML. The agent is not triggering the action. What is the most likely issue?

A.The action name is misspelled

B.The endpoint returns a 4xx status

C.The endpoint is not HTTPS

D.The agent is not enabled

AnswerA

The action name must exactly match what the agent tries to invoke.

Why this answer

Option B is correct because the action name in the YAML is 'book_flight' but the agent may be expecting a different name (typo). Option A is wrong because the endpoint uses HTTPS, which is correct. Option C is wrong because the agent is defined.

Option D is wrong because a 4xx error would still trigger the action but result in an error response.

Full explanation →

Page 7 of 7

All pages

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output

See all domains with question counts →