Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 751–825

997 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 11 of 14

751

MCQhard

A retail company has deployed a generative AI chatbot for customer support. They notice that the model sometimes provides incorrect product information. The team wants to ground the model's responses in their product catalog to improve accuracy. Which Vertex AI feature should they enable?

A.Use Vertex AI RAG Engine

B.Enable Grounding with Google Search

C.Increase the model's temperature setting

D.Fine-tune the model with product catalog updates

AnswerB

Grounding allows the model to retrieve real-time information from a designated data source, ensuring responses are based on the catalog.

Why this answer

Option B is correct because Grounding with Google Search allows the model to retrieve real-time, authoritative information from the product catalog via Vertex AI's grounding service, ensuring responses are based on verified data rather than the model's internal knowledge. This feature directly addresses the need to reduce hallucinations by anchoring the model's output to a trusted source, such as a product database, without requiring custom retrieval infrastructure.

Exam trap

Cisco often tests the distinction between grounding (real-time retrieval from a trusted source) and fine-tuning (static model updates), leading candidates to mistakenly choose fine-tuning when the question emphasizes dynamic accuracy improvements.

How to eliminate wrong answers

Option A is wrong because Vertex AI RAG Engine (Retrieval-Augmented Generation) is a framework for building custom retrieval pipelines, but it requires additional setup and indexing of the product catalog, whereas Grounding with Google Search provides a simpler, out-of-the-box solution for grounding responses in external data. Option C is wrong because increasing the model's temperature setting would make responses more random and creative, which is the opposite of what is needed to improve accuracy and reduce incorrect product information. Option D is wrong because fine-tuning the model with product catalog updates would require retraining the model on static data, which is inefficient for dynamic catalogs and does not guarantee real-time grounding; it also risks catastrophic forgetting and does not leverage Vertex AI's built-in grounding capabilities.

Full explanation →

752

Multi-Selectmedium

Which TWO components are essential for building a multi-turn conversational agent using Vertex AI Agent Builder? (Choose two.)

Select 2 answers

A.BigQuery

B.Vertex AI Prediction

C.Dialogflow CX

D.Agent Builder Agent

E.Cloud Storage

AnswersC, D

Dialogflow CX is used for defining conversational flows.

Why this answer

Dialogflow CX is essential for building multi-turn conversational agents because it provides advanced state management, flow-based design, and natural language understanding (NLU) capabilities specifically optimized for complex, multi-turn interactions. Agent Builder Agent (the core agent component within Vertex AI Agent Builder) is the second essential component, as it serves as the container for the agent's configuration, knowledge bases, and integration settings, enabling the orchestration of conversations across channels.

Exam trap

Cisco often tests the distinction between core conversational components (Dialogflow CX, Agent Builder Agent) and auxiliary services (BigQuery, Cloud Storage, Vertex AI Prediction) that are optional or used for supporting tasks, leading candidates to mistakenly include storage or analytics services as essential.

Full explanation →

753

MCQmedium

A team uses Vertex AI Generative AI Studio to tune a model via RLHF. After tuning, the model outputs are bland. What likely went wrong?

A.Insufficient training data

B.Too many training steps

C.Low temperature during evaluation

D.Reward model overfits to generic responses

AnswerD

Penalizes unique outputs, making them bland.

Why this answer

Option D is correct because when the reward model overfits to generic responses, it assigns high rewards to safe, non-committal outputs, causing the RLHF-tuned model to converge toward bland, uninformative text. This happens because the reward model learns to prefer patterns that are statistically common in the training data rather than genuinely high-quality or diverse responses, directly leading to the 'bland' output described.

Exam trap

Cisco often tests the misconception that bland outputs are caused by inference-time parameters like temperature, rather than by the reward model overfitting during the RLHF training phase.

How to eliminate wrong answers

Option A is wrong because insufficient training data typically causes underfitting or poor generalization, not specifically bland outputs; RLHF can still produce diverse responses if the reward model is well-calibrated. Option B is wrong because too many training steps usually lead to overfitting or reward hacking, where the model exploits the reward model for extreme or repetitive outputs, not blandness. Option C is wrong because low temperature during evaluation reduces randomness and can make outputs more deterministic, but it does not inherently cause blandness; the model would still produce coherent, contextually appropriate responses, just with less creativity.

Full explanation →

754

MCQhard

An AI team is choosing between supervised fine-tuning and reinforcement learning from human feedback (RLHF) for a chatbot. They want the model to follow instructions closely and avoid toxic outputs. Which statement correctly compares these approaches?

A.RLHF is more effective at aligning the model with human values, including reducing toxicity

B.Supervised fine-tuning and RLHF achieve the same results, but RLHF is faster

C.Supervised fine-tuning is better for reducing toxicity because it uses labeled safe examples

D.In-context learning is always superior to both for safety, as it can adapt on the fly

AnswerA

RLHF uses human feedback to reward non-toxic and helpful responses, directly aligning with desired values.

Why this answer

RLHF aligns the model with human preferences by rewarding desired behaviors (e.g., non-toxic, helpful). Supervised fine-tuning teaches format but not safety; in-context learning can steer but is less reliable for safety.

Full explanation →

755

Multi-Selecteasy

A project manager wants to measure the business impact of a GenAI code review tool. Which THREE metrics should they track to evaluate ROI? (Choose 3)

Select 3 answers

A.Total tokens consumed by the GenAI model

B.Developer satisfaction score

C.Average time saved per code review

D.Number of lines of code generated

E.Defect escape rate (bugs found in production)

AnswersB, C, E

Measures team acceptance and morale, which affects long-term adoption and productivity.

Why this answer

Developer satisfaction score (B) is a critical metric for GenAI code review ROI because it directly measures user adoption and perceived value. If developers find the tool frustrating or inaccurate, they will bypass it, negating any potential time savings or defect reduction. High satisfaction correlates with sustained usage, which is necessary for long-term return on investment.

Exam trap

Cisco often tests the distinction between cost/usage metrics (like tokens consumed) and value/outcome metrics (like time saved or defect reduction), leading candidates to mistakenly select operational metrics instead of business impact metrics.

Full explanation →

756

MCQeasy

The exhibit shows a command to deploy a model to a Vertex AI endpoint with GPU. The deployment fails due to a resource constraint. What is the most likely reason?

A.The --model flag points to an autoML model.

B.The accelerator type is misspelled.

C.The machine type n1-standard-4 does not support GPU accelerators.

D.The min-replica-count is greater than the max-replica-count.

AnswerC

n1-standard machines do not have enough PCIe lanes; use n1-highmem or n1-highcpu.

Why this answer

Option C is correct because the n1-standard-4 machine type does not support attaching GPUs. In Vertex AI, GPU accelerators require specific machine series (e.g., n1-highmem-* or n1-highcpu-* for NVIDIA Tesla GPUs, or newer machine families like a2-highgpu-* for A100 GPUs). The n1-standard-4 is a general-purpose machine type that lacks the necessary PCIe lanes and power delivery to accommodate a GPU accelerator, causing a resource constraint failure during deployment.

Exam trap

The trap here is that candidates often assume any n1-standard machine type can support GPUs, but Google Cloud restricts GPU attachments to specific machine types within the N1 family (highmem/highcpu) and newer families like A2 or G2.

How to eliminate wrong answers

Option A is wrong because the --model flag pointing to an AutoML model is not a resource constraint; AutoML models can be deployed to endpoints with GPU accelerators, though they may not benefit from them. Option B is wrong because a misspelled accelerator type would cause a validation error (e.g., 'Invalid accelerator type') at deployment time, not a resource constraint failure. Option D is wrong because min-replica-count being greater than max-replica-count would cause a validation error (e.g., 'min_replica_count must be less than or equal to max_replica_count'), not a resource constraint issue.

Full explanation →

757

Multi-Selecthard

A company wants to build a multimodal AI application that accepts text and image inputs and provides text responses. They need to process sensitive customer data and require that the model be hosted within their own Google Cloud project for data residency. Which TWO components are essential? (Choose 2)

Select 2 answers

A.Vertex AI with private endpoints or VPC-SC

B.Gemini API with multimodal capabilities

C.Codey for code generation

D.Chirp for speech recognition

E.Imagen for image input processing

AnswersA, B

Vertex AI allows deployment within a project and supports data residency controls.

Why this answer

Gemini is a multimodal model that can process text and images. Vertex AI provides a managed environment for deploying models with data residency controls. The other options are not essential for this requirement.

Full explanation →

758

MCQeasy

A company is building a customer support chatbot using Vertex AI Agent Builder. They want the agent to answer questions based on their internal knowledge base. Which feature should they use?

A.Grounding with Google Search

B.Grounding with enterprise data stores

C.Model tuning

D.Prompt engineering

AnswerB

Grounding with enterprise data stores allows the agent to use internal knowledge bases.

Why this answer

Vertex AI Agent Builder supports grounding with enterprise data stores, which allows the agent to retrieve and answer questions based on the company's internal knowledge base (e.g., documents, PDFs, websites) without relying on public web search. This ensures responses are grounded in proprietary, controlled data, making it the correct choice for a customer support chatbot that needs to reference internal policies or product documentation.

Exam trap

The trap here is that candidates may confuse 'grounding with Google Search' (public web) with 'grounding with enterprise data stores' (private data), assuming any grounding feature works for internal knowledge, but only the enterprise data store option provides the necessary data isolation and access control.

How to eliminate wrong answers

Option A is wrong because Grounding with Google Search uses public web data, not the company's internal knowledge base, which could introduce irrelevant or unverified information and violates data privacy requirements. Option C is wrong because model tuning (e.g., fine-tuning a foundation model) adjusts model weights on custom datasets, but it is not designed for real-time retrieval from a specific knowledge base; it also requires significant compute and may not scale for dynamic content. Option D is wrong because prompt engineering involves crafting input prompts to guide model behavior, but it does not provide a mechanism to retrieve and ground answers in a specific enterprise data store; without grounding, the model may hallucinate or rely on its training data.

Full explanation →

759

Multi-Selecteasy

A company is adopting generative AI for customer support. Which TWO strategies should they implement to manage risks related to brand reputation?

Select 2 answers

A.Establish a human-in-the-loop escalation process for sensitive interactions.

B.Publish a disclaimer that the AI may make mistakes.

C.Implement automated monitoring for toxic or off-brand language.

D.Deploy the model without any content filters to maximize helpfulness.

E.Disable customer support AI entirely to avoid any risk.

AnswersA, C

Human oversight ensures appropriate handling of sensitive issues.

Why this answer

Option A is correct because a human-in-the-loop escalation process ensures that sensitive or ambiguous customer interactions are reviewed by a human agent before an AI-generated response is sent. This directly mitigates brand reputation risk by preventing the AI from inadvertently making offensive, legally problematic, or factually incorrect statements that could go viral. The human reviewer acts as a safety net, catching edge cases that automated filters might miss, such as nuanced sarcasm or cultural insensitivity.

Exam trap

Google Cloud often tests the distinction between passive risk communication (like disclaimers) and active risk mitigation (like human-in-the-loop or automated monitoring), trapping candidates who think a disclaimer is sufficient to manage brand reputation risk.

Full explanation →

760

MCQhard

A financial services firm is deploying a GenAI-powered contract analysis tool. The tool must extract key clauses and flag risky language. Which strategy BEST ensures structured, machine-readable output that downstream systems can parse?

A.Ask the model to write a summary of the contract in natural language

B.Fine-tune the model on a dataset of contracts with clause labels

C.Use a few-shot prompt with examples of JSON output containing the desired fields

D.Rely on the model's pre-trained ability to extract clauses without any formatting instructions

AnswerC

Few-shot examples guide the model to consistently produce JSON, enabling automated extraction and integration with downstream systems.

Why this answer

Option C is correct because using a few-shot prompt with JSON output examples directly instructs the model to produce structured, machine-readable data. This approach leverages the model's in-context learning ability to follow a specific schema, ensuring downstream systems can parse the extracted clauses without additional transformation. It balances flexibility and precision without requiring costly fine-tuning or relying on unreliable free-form text.

Exam trap

Cisco often tests the misconception that fine-tuning (Option B) is the only way to achieve structured output, when in fact few-shot prompting with JSON examples can provide a more flexible and cost-effective solution for many business use cases.

How to eliminate wrong answers

Option A is wrong because asking for a natural language summary produces unstructured text that downstream systems cannot reliably parse for key clauses and risk flags, requiring additional NLP processing. Option B is wrong because fine-tuning on labeled contracts improves extraction accuracy but does not guarantee structured output; the model may still return free-form text unless explicitly prompted for a format like JSON. Option D is wrong because relying on the model's pre-trained ability without formatting instructions leads to inconsistent, ad-hoc responses that vary in structure and completeness, making automated parsing impossible.

Full explanation →

761

MCQmedium

A chatbot built with Vertex AI PaLM API often provides outdated information about company policies because the training data is months old. Which approach should the team use?

A.Implement grounding by connecting to a knowledge base of current policies.

B.Use prompt engineering to instruct the model to say 'I don't know' if unsure.

C.Increase the context window to include more history.

D.Fine-tune the model on the latest policy documents.

AnswerA

Grounding retrieves real-time information from the knowledge base.

Why this answer

Option A is correct because grounding connects the PaLM API to a live, authoritative knowledge base (e.g., Cloud Storage, BigQuery, or Vertex AI Search) containing the latest company policies. This allows the model to retrieve and cite current information at inference time without retraining, directly solving the staleness issue. Grounding is the recommended approach in Vertex AI for ensuring factual, up-to-date responses from a foundation model.

Exam trap

Cisco often tests the distinction between static model updates (fine-tuning) and dynamic knowledge injection (grounding), trapping candidates who assume fine-tuning is always the best solution for factual accuracy without considering the need for real-time updates.

How to eliminate wrong answers

Option B is wrong because prompt engineering to say 'I don't know' does not provide the model with current policy data; it only changes the model's refusal behavior, leaving outdated information uncorrected. Option C is wrong because increasing the context window does not introduce new or updated knowledge; it only allows the model to consider more of the conversation history, which does not address stale training data. Option D is wrong because fine-tuning on the latest policy documents would require significant time, cost, and labeled data, and the model would still be static until the next fine-tuning cycle; grounding provides a dynamic, real-time solution without retraining.

Full explanation →

762

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Train a custom model from scratch on the policy documents each month

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions based on the latest policy documents without retraining the model. By indexing the documents in a vector store and retrieving relevant chunks at query time, RAG ensures the model uses up-to-date information while keeping the underlying LLM static, which is cost-effective and scalable for monthly updates.

Exam trap

Cisco often tests the misconception that fine-tuning is the only way to incorporate new information, but the trap here is that candidates overlook RAG's ability to handle dynamic data without retraining, confusing 'model adaptation' with 'data retrieval'.

How to eliminate wrong answers

Option A is wrong because fine-tuning a base LLM monthly on new policy documents is expensive, time-consuming, and risks catastrophic forgetting, where the model loses previously learned information. Option B is wrong because pasting all documents into each prompt exceeds the context window limits of even the largest models (e.g., 128K tokens), leading to truncation, high latency, and increased cost per query. Option C is wrong because training a custom model from scratch each month is prohibitively expensive, requires massive computational resources and data, and is unnecessary when a pre-trained LLM with RAG can achieve the same goal.

Full explanation →

763

MCQmedium

A team is using a generative AI model to create marketing copy. They want the responses to be more focused and less random. Which parameter should they adjust?

A.Decrease temperature

B.Increase top-k

C.Decrease context window

D.Increase temperature

AnswerA

Decreasing temperature makes the model more deterministic and focused.

Why this answer

Decreasing the temperature parameter reduces the randomness of the model's token selection by lowering the probability of sampling less likely tokens. This makes the output more focused and deterministic, which is ideal for marketing copy that needs to stay on-brand and consistent.

Exam trap

Cisco often tests the misconception that increasing temperature or top-k makes outputs more focused, when in fact both increase randomness and diversity.

How to eliminate wrong answers

Option B is wrong because increasing top-k would actually increase the diversity of token selection by allowing more high-probability tokens to be considered, making responses less focused. Option C is wrong because decreasing the context window limits the amount of input text the model can reference, which can reduce coherence and relevance, not improve focus. Option D is wrong because increasing temperature amplifies randomness, making outputs more creative but less predictable and focused.

Full explanation →

764

Multi-Selecthard

A company is deploying a Gemini-based application and needs to ensure low latency for real-time user interactions. They also want to reduce cost. Which THREE strategies should they consider? (Select 3)

Select 3 answers

A.Use Gemini 1.5 Flash instead of Pro

B.Implement response caching for common queries

C.Increase the model's max output tokens to ensure comprehensive answers

D.Use full fine-tuning to make the model faster

E.Keep the context window as short as possible by trimming input

AnswersA, B, E

Flash is optimized for speed and lower cost.

Why this answer

Option A is correct because Gemini 1.5 Flash is a lighter, distilled version of the Pro model, designed for lower latency and reduced computational cost while still maintaining strong performance for real-time interactions. Flash models use fewer parameters and optimized inference paths, making them ideal for latency-sensitive applications where cost efficiency is critical.

Exam trap

Cisco often tests the misconception that increasing output tokens or fine-tuning improves speed, when in reality these actions increase computational load or add overhead, making them counterproductive for latency and cost goals.

Full explanation →

765

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Fine-tune a base LLM on the policy documents monthly

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Full explanation →

766

MCQeasy

Which feature in Vertex AI allows users to browse over 300 foundation models and deploy them with minimal code?

A.Vertex AI Studio

B.Vertex AI Agent Builder

C.Vertex AI Model Garden

D.Vertex AI Pipelines

AnswerC

Model Garden is a central repository for discovering and deploying foundation models.

Why this answer

Option C, Vertex AI Model Garden, is correct because it provides a centralized repository where users can browse, discover, and deploy over 300 foundation models—including first-party, open-source, and third-party models—with minimal code. It abstracts away infrastructure complexity by offering pre-built deployment templates and one-click integration with Vertex AI endpoints, enabling rapid experimentation and production deployment without deep ML engineering overhead.

Exam trap

The trap here is that candidates confuse Vertex AI Studio's prompt engineering capabilities with Model Garden's model discovery and deployment functionality, leading them to select A instead of C.

How to eliminate wrong answers

Option A is wrong because Vertex AI Studio is a low-code environment for prototyping and tuning generative AI models, but it does not serve as a model repository for browsing and deploying over 300 foundation models; it focuses on prompt design and model customization. Option B is wrong because Vertex AI Agent Builder is designed for creating conversational agents and search experiences using pre-built components, not for browsing and deploying a broad catalog of foundation models. Option D is wrong because Vertex AI Pipelines is an orchestration service for building and managing ML workflows, not a model discovery and deployment interface.

Full explanation →

767

Multi-Selectmedium

A company is using Vertex AI Generative AI Studio to iterate on a prompt template. They want to save and organize multiple versions of prompts. Which TWO features should they use?

Select 2 answers

A.Model Garden

B.Version history

C.Prompt library

D.Parameter sliders

E.Run button

AnswersB, C

Version history tracks changes over time.

Why this answer

Version history (B) is correct because it allows users to track, compare, and revert to previous iterations of a prompt template directly within Generative AI Studio. This feature is essential for managing the iterative development process, as it automatically saves snapshots of each change, enabling teams to audit and restore earlier versions without manual backup.

Exam trap

Cisco often tests the distinction between features that modify model behavior (like parameter sliders) and features that manage prompt artifacts (like version history and prompt library), causing candidates to confuse operational controls with organizational tools.

Full explanation →

768

Multi-Selecteasy

A company is using Vertex AI generative models for a high-volume text summarization service. Which two strategies can reduce operational costs?

Select 2 answers

A.Increase the model's max output tokens to 2048.

B.Implement retry logic with exponential backoff.

C.Lower the temperature parameter to 0.

D.Use batch prediction instead of online prediction.

E.Reduce the size of the model (e.g., switch from text-bison@002 to text-bison-light).

AnswersD, E

Batch prediction has lower per-request cost for large jobs compared to online prediction.

Why this answer

Batch prediction reduces costs by processing multiple requests in a single batch job, which avoids the per-request overhead and idle compute time associated with online prediction. This is especially cost-effective for high-volume, non-real-time workloads like text summarization, as you pay only for the compute time used during the batch job rather than for each individual inference.

Exam trap

Google Cloud often tests the misconception that adjusting inference parameters like temperature or output length can reduce costs, when in reality only reducing model size or switching to batch processing directly lowers operational expenses.

Full explanation →

769

MCQeasy

A marketing team needs to generate personalized email campaigns for thousands of customers. They want to maintain brand tone consistency and avoid manual writing. Which GenAI approach is BEST suited?

A.Use Vertex AI Studio with prompt design and few-shot examples in the prompt

B.Fine-tune a small model on brand guidelines only

C.Embed a rules-based template engine with no AI

D.Train a custom model from scratch on past campaigns

AnswerA

Vertex AI Studio enables rapid prompt iteration. Few-shot examples ensure consistent tone and structure without custom training.

Why this answer

Option A is correct because Vertex AI Studio enables prompt engineering with few-shot examples, allowing the team to generate personalized emails while maintaining brand tone consistency without fine-tuning or custom training. This approach leverages a pre-trained large language model (LLM) with carefully designed prompts that include brand guidelines and a few examples, ensuring output adheres to the desired style and context. It avoids the overhead of fine-tuning or building custom models, making it ideal for rapid deployment and iterative refinement.

Exam trap

Cisco often tests the misconception that fine-tuning or custom training is always necessary for domain-specific tasks, when in fact prompt engineering with few-shot examples can achieve comparable results with far less effort and cost.

How to eliminate wrong answers

Option B is wrong because fine-tuning a small model on brand guidelines only may lead to catastrophic forgetting or insufficient generalization, as the model might overfit to the narrow dataset and lose the broad language understanding needed for diverse customer personalization. Option C is wrong because a rules-based template engine cannot adapt to the nuanced, context-aware personalization required for thousands of unique customers; it would produce rigid, repetitive content that fails to capture brand tone dynamically. Option D is wrong because training a custom model from scratch on past campaigns is resource-intensive, requires massive labeled datasets, and is unnecessary when pre-trained models with prompt engineering can achieve the same goal more efficiently.

Full explanation →

770

MCQmedium

A data scientist is fine-tuning a foundation model for a specialized legal document summarization task. The labeled dataset is only 5,000 examples. Which fine-tuning technique would be MOST efficient to adapt the model without catastrophic forgetting and with minimal computational cost?

A.Low-Rank Adaptation (LoRA)

B.Reinforcement Learning from Human Feedback (RLHF)

C.Full supervised fine-tuning of all model parameters

D.In-context learning with few-shot examples

AnswerA

LoRA inserts trainable low-rank matrices into transformer layers, requiring far fewer parameters to update, which is efficient and reduces forgetting.

Why this answer

LoRA (Low-Rank Adaptation) is an adapter-based method that trains only a small number of added parameters, making it efficient and less prone to catastrophic forgetting compared to full fine-tuning. Supervised fine-tuning full model is expensive; RLHF is for alignment after fine-tuning; in-context learning requires no training but may not suffice.

Full explanation →

771

MCQmedium

A financial institution is deploying a generative AI chatbot for investment advice. According to Google's AI Principles and responsible AI practices, what is a mandatory requirement before this chatbot can be used with customers?

A.The chatbot must use SynthID to watermark its outputs

B.The training data must be publicly available

C.All responses must be reviewed by a human financial advisor before being shown to the customer

D.The chatbot must be deployed on-premises to ensure data residency

AnswerC

High-stakes AI decisions, especially in financial advice, need human review to ensure accountability.

Why this answer

Option C is correct because Google's AI Principles emphasize that high-risk AI applications, such as financial investment advice, must include meaningful human oversight to prevent harm. In this context, a human financial advisor must review and approve each chatbot response before it reaches the customer, ensuring compliance with responsible AI practices and regulatory requirements for fiduciary duty.

Exam trap

Cisco often tests the misconception that technical safeguards like watermarking or deployment location are mandatory for all AI systems, when in fact the critical requirement for high-risk domains is human oversight to mitigate potential harm and ensure accountability.

How to eliminate wrong answers

Option A is wrong because SynthID is a watermarking tool for AI-generated content, but it is not a mandatory requirement for all generative AI deployments; it is used for identifying AI outputs, not for ensuring safety or accuracy in high-stakes financial advice. Option B is wrong because training data does not need to be publicly available; Google's AI Principles require transparency and accountability, but proprietary or private datasets can be used as long as they are responsibly sourced and free from bias. Option D is wrong because on-premises deployment is not a mandatory requirement; data residency concerns can be addressed through cloud-based solutions with proper data governance controls, and Google Cloud offers options like Confidential VMs and data residency regions without requiring on-premises infrastructure.

Full explanation →

772

MCQmedium

A data scientist fine-tunes a large language model on Vertex AI but gets poor results on validation data. What is the most likely cause?

A.Incorrect learning rate

B.Insufficient training data

C.Using wrong model family

D.Overfitting due to too many epochs

AnswerB

Fine-tuning requires enough representative data to adapt the model without overfitting or underfitting.

Why this answer

Fine-tuning a large language model on Vertex AI with poor validation results is most likely due to insufficient training data. Large language models have billions of parameters and require a substantial amount of high-quality, task-specific data to effectively adapt to a new domain or task; without enough examples, the model cannot learn the desired patterns and will perform poorly on unseen data.

Exam trap

The trap here is that candidates often assume hyperparameter tuning (like learning rate) is the primary cause of poor fine-tuning results, but in generative AI, data quantity and quality are the most common bottlenecks, especially when using pre-trained models on Vertex AI.

How to eliminate wrong answers

Option A is wrong because an incorrect learning rate typically causes training instability (e.g., loss divergence or slow convergence) rather than consistently poor validation results, and Vertex AI's default hyperparameters are often reasonable. Option C is wrong because using the wrong model family (e.g., choosing a text generation model for a classification task) would likely cause immediate, obvious failures or mismatches in output format, not just poor validation performance after fine-tuning. Option D is wrong because overfitting due to too many epochs would manifest as high training accuracy with low validation accuracy, but the question states poor results on validation data without mentioning training performance, and overfitting is less likely with insufficient data (the model would underfit instead).

Full explanation →

773

MCQmedium

Refer to the exhibit. A team runs 'gcloud ai models list --filter=displayName:qa-chat-v1' and sees the output. The model was tuned using supervised fine-tuning (SFT) but shows 'state: DEPLOYING' for days. What is the most likely issue?

A.The evaluation metrics are missing, causing deployment to hang

B.The training pipeline failed silently

C.The model is stuck in deployment due to insufficient quota

D.The model has no errors, so it is fine

AnswerC

Quota limits can cause indefinite DEPLOYING state.

Why this answer

The model shows 'state: DEPLOYING' for days, which indicates the deployment process is stuck rather than failing with an error. In Vertex AI, a model stuck in DEPLOYING state for an extended period is typically caused by insufficient quota for the selected machine type (e.g., GPU or TPU accelerators). The deployment process cannot complete because the requested resources are not available, causing it to hang indefinitely.

Exam trap

Cisco often tests the distinction between deployment failure (which shows an error state) and deployment stuck (which shows DEPLOYING state for an extended period), leading candidates to incorrectly assume a silent failure or missing metrics when the real issue is resource quota exhaustion.

How to eliminate wrong answers

Option A is wrong because missing evaluation metrics would not cause deployment to hang; evaluation metrics are used for model evaluation and monitoring, not for the deployment process itself. Option B is wrong because a failed training pipeline would result in no model being created or a model with an error state, not a model that exists and is stuck in DEPLOYING state. Option D is wrong because the model being stuck in DEPLOYING for days is clearly an error condition; a healthy deployment should complete within minutes, not days.

Full explanation →

774

MCQhard

A data scientist is comparing two fine-tuned models on Vertex AI Model Evaluation. They want to choose the model with better factual accuracy for a medical Q&A task. Which evaluation metric should they prioritize?

A.exact_match

B.pairwise_rouge

C.ROUGE-L

D.BLEU

AnswerA

Exact match evaluates if the output is exactly correct, suitable for Q&A.

Why this answer

Exact Match (EM) is the correct metric because it measures whether the model's output exactly matches the ground truth answer, which is critical for factual accuracy in medical Q&A where even minor deviations (e.g., 'aspirin' vs. 'acetylsalicylic acid') could indicate incorrect or incomplete knowledge. Vertex AI Model Evaluation supports EM as a binary metric that penalizes any variation, making it ideal for high-stakes domains requiring precise factual recall.

Exam trap

The trap here is that candidates often confuse ROUGE or BLEU as 'accuracy' metrics because they measure text overlap, but they fail to penalize factual substitutions or omissions that are critical in domain-specific tasks like medical Q&A.

How to eliminate wrong answers

Option B (pairwise_rouge) is wrong because it is a comparative metric used to rank two model outputs relative to each other, not an absolute measure of factual accuracy; it does not directly assess correctness against a known ground truth. Option C (ROUGE-L) is wrong because it measures the longest common subsequence between generated and reference text, which captures fluency and structure but not exact factual correctness—a model could rephrase a fact correctly yet score low on ROUGE-L if the wording differs. Option D (BLEU) is wrong because it evaluates n-gram precision against reference translations, designed for machine translation tasks, and is insensitive to factual errors that do not change n-gram overlap (e.g., swapping 'left' for 'right' in a medical context).

Full explanation →

775

MCQhard

A financial services company is building a customer service agent using Vertex AI Agent Builder. They want the agent to only answer questions based on their approved policy documents, which are stored in Cloud Storage. They also need to ensure that the agent never reveals internal employee names or account numbers. They have set up grounding with the documents but find that the agent sometimes ignores the grounding and generates responses using the model's internal knowledge. What should they do to strictly constrain the agent to only use the provided documents?

A.Add a system instruction that says 'Only answer from the provided documents.'

B.Use the 'vertex-ai-agent-builder' with strict grounding mode and disable fallback to model knowledge.

C.Set the model's temperature to 0 and top_p to 0.1.

D.Fine-tune the model on the policy documents to limit its knowledge.

AnswerB

Strict grounding mode ensures the agent only uses the grounded documents, with no fallback.

Why this answer

Option B is correct because Vertex AI Agent Builder offers a 'strict grounding' mode that, when enabled, forces the agent to rely exclusively on the provided grounding documents (e.g., from Cloud Storage) and disables any fallback to the model's internal knowledge. This directly addresses the requirement to prevent the agent from generating responses based on its pre-trained data, ensuring strict adherence to the approved policy documents.

Exam trap

The trap here is that candidates often confuse prompt engineering techniques (like system instructions) with architectural enforcement mechanisms, assuming a textual directive can reliably constrain model behavior, when in fact only a grounded retrieval system with a strict no-fallback mode can guarantee the agent does not use its internal knowledge.

How to eliminate wrong answers

Option A is wrong because a system instruction is a prompt-level directive that the model can ignore or override, especially if its internal knowledge is strongly activated; it does not provide a technical enforcement mechanism to disable fallback to model knowledge. Option C is wrong because adjusting temperature and top_p controls the randomness and diversity of token sampling, not the source of knowledge; the model can still generate responses from its internal training data even with these parameters set to low values. Option D is wrong because fine-tuning on the policy documents would bias the model's weights toward that data but does not guarantee it will never use other internal knowledge; the model can still hallucinate or retrieve unrelated information, and fine-tuning does not provide a runtime constraint to block fallback to pre-existing knowledge.

Full explanation →

776

MCQeasy

A project manager wants to track the ROI of a generative AI feature that assists customer support agents. Which metric is MOST directly tied to productivity improvement?

A.Adoption rate of the AI tool

B.Customer satisfaction (CSAT) score

C.Average handle time (AHT) per ticket

D.Cost per API call

AnswerC

AHT directly reflects agent efficiency; a decrease indicates productivity improvement.

Why this answer

Average handle time (AHT) directly measures the time agents spend per interaction, so a reduction indicates productivity gain. CSAT measures satisfaction, not efficiency. Cost per API call is a cost metric.

Adoption rate measures usage.

Full explanation →

777

MCQmedium

A company is developing a code generation assistant and wants to ensure the model respects access control policies, e.g., it should not generate code that uses internal APIs that the user is not authorized to access. Which technique is most effective for embedding such policy constraints into the model's behavior?

A.Include a system prompt that instructs the model to never generate code using internal APIs.

B.Use a retriever to fetch policy documents and prepend them to each prompt.

C.Fine-tune the model on a dataset of code snippets that follow access control policies, including negative examples of disallowed API usage.

D.Train a separate classifier to rerank model outputs and reject non-compliant generations.

AnswerC

Fine-tuning directly embeds the policy into model weights.

Why this answer

Fine-tuning on a curated dataset that includes both positive examples (compliant code) and negative examples (disallowed API usage) directly adjusts the model's weights to internalize access control policies. This method is more effective than prompt-based approaches because it modifies the model's underlying behavior rather than relying on fragile instructions that can be ignored or overridden, especially in complex code generation tasks.

Exam trap

Cisco often tests the misconception that prompt engineering or retrieval-augmented generation (RAG) can reliably enforce complex, context-dependent policies, when in fact fine-tuning is required to deeply integrate constraints into the model's behavior.

How to eliminate wrong answers

Option A is wrong because a system prompt is a brittle, instruction-based approach that the model can easily ignore or fail to generalize across diverse code generation contexts, especially when the user's authorization level is dynamic. Option B is wrong because prepending policy documents via a retriever adds context but does not train the model to reliably apply those policies; the model may still generate disallowed code if the policy text is long, ambiguous, or not perfectly attended to. Option D is wrong because a separate classifier for reranking can filter outputs but does not prevent the model from generating non-compliant code in the first place, leading to wasted compute and potential exposure if the classifier has false negatives.

Full explanation →

778

MCQeasy

Which Google AI milestone introduced the Transformer architecture that underpins modern LLMs?

A.AlphaGo

B.Transformer paper

C.AlphaFold

D.BERT

AnswerB

The Transformer paper introduced the foundational architecture for LLMs.

Why this answer

The Transformer architecture, which is the foundational technology behind modern large language models (LLMs) like GPT and BERT, was introduced in the 2017 paper 'Attention Is All You Need' by Vaswani et al. This paper proposed the self-attention mechanism and the encoder-decoder structure that replaced recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, enabling parallelized training and superior handling of long-range dependencies in sequence data.

Exam trap

Cisco often tests the distinction between the original research paper that introduced a concept (the Transformer paper) and later implementations or applications of that concept (like BERT or GPT), causing candidates to confuse the milestone with its derivative products.

How to eliminate wrong answers

Option A is wrong because AlphaGo is a reinforcement learning-based system for playing the board game Go, not a milestone in neural network architecture for language models. Option C is wrong because AlphaFold is a deep learning model for protein structure prediction, not a foundational architecture for LLMs. Option D is wrong because BERT is a pre-trained language model that itself is built on the Transformer architecture, not the original paper that introduced the Transformer.

Full explanation →

779

MCQhard

A developer uses Vertex AI to generate code but the output is not syntactically correct. Which parameter should be adjusted?

A.candidate_count

B.max_output_tokens

C.temperature

D.top_k

AnswerC

Lower temperature (e.g., 0.2) makes the model more focused and likely to produce valid syntax.

Why this answer

Temperature controls the randomness of token selection during generation. A high temperature increases the likelihood of less probable tokens, which can lead to syntactically incorrect code. Lowering temperature makes the model more deterministic and conservative, favoring higher-probability tokens that are more likely to form valid syntax.

Exam trap

Google Cloud often tests the misconception that increasing candidate_count or max_output_tokens will improve output quality, when in fact these parameters only affect quantity or length, not the underlying token selection logic that determines syntactic correctness.

How to eliminate wrong answers

Option A is wrong because candidate_count controls how many different response candidates are generated, not the syntactic correctness of any single output. Option B is wrong because max_output_tokens limits the length of the generated text, not the quality or validity of the syntax. Option D is wrong because top_k limits the number of highest-probability tokens considered at each step; while it can affect output quality, it does not directly address syntactic correctness as effectively as temperature does.

Full explanation →

780

MCQmedium

A company is using Vertex AI Model Garden to discover and test various foundation models. They need a model that can generate code from natural language. Which model should they select?

A.Chirp

B.Codey

C.Med-PaLM

D.Imagen

AnswerB

Codey models are optimized for code-related tasks.

Why this answer

Codey is Google's family of models specifically designed for code generation, including converting natural language descriptions into code. It is built on the PaLM 2 architecture and is optimized for tasks like code completion, code generation, and code chat, making it the correct choice for generating code from natural language.

Exam trap

The trap here is that candidates may confuse Chirp (audio) or Imagen (image) with multimodal models, mistakenly thinking they can handle code generation, when in fact only Codey is purpose-built for code tasks.

How to eliminate wrong answers

Option A is wrong because Chirp is a speech-to-text model designed for audio transcription, not code generation. Option C is wrong because Med-PaLM is a domain-specific model fine-tuned for medical and healthcare applications, not for generating code. Option D is wrong because Imagen is a text-to-image diffusion model for generating images, not code.

Full explanation →

781

MCQmedium

A company wants to use a pre-trained language model for customer support summarization. They need to ensure responses are concise and accurate. Which prompt engineering technique is most effective?

A.Zero-shot prompting

B.Few-shot prompting with examples

C.Chain-of-thought prompting

D.Negative prompting

AnswerB

Few-shot provides examples to guide the model, improving accuracy and conciseness.

Why this answer

Few-shot prompting (B) is most effective because it provides the model with a small set of example input-output pairs (e.g., a customer query and its concise summary), which guides the model to produce outputs that match the desired format, length, and accuracy. This technique is particularly useful for summarization tasks where consistency and adherence to a specific style are critical, as it reduces ambiguity without requiring fine-tuning.

Exam trap

Google Cloud often tests the misconception that zero-shot prompting is sufficient for all tasks, but the trap here is that candidates overlook the need for explicit guidance in format-sensitive tasks like summarization, where few-shot examples provide the necessary constraint for consistency.

How to eliminate wrong answers

Option A (Zero-shot prompting) is wrong because it relies solely on the model's pre-trained knowledge without any examples, which often leads to inconsistent or overly verbose summaries, especially when the task requires a specific format or level of conciseness. Option C (Chain-of-thought prompting) is wrong because it is designed for multi-step reasoning tasks (e.g., arithmetic or logic problems) and is unnecessary for summarization, where the goal is to condense information rather than reason through steps. Option D (Negative prompting) is wrong because it instructs the model on what to avoid (e.g., 'do not include details'), which can be imprecise and may inadvertently suppress relevant information, making it less reliable than providing positive examples of desired outputs.

Full explanation →

782

MCQhard

A financial institution needs to deploy a large language model (LLM) for a customer-facing application that must comply with HIPAA and have strict data residency controls. They also require the ability to ground responses in real-time search results from the web. Which combination of services should they use?

A.Google AI Studio with Gemini Pro

B.Vertex AI with Gemini Pro and Grounding with Google Search

C.Gemini API directly from AI Studio with custom VPC

D.Vertex AI with Gemini and Amazon Bedrock for grounding

AnswerB

Vertex AI offers enterprise-grade security, compliance (HIPAA BAA), and supports grounding with Google Search for real-time information retrieval.

Why this answer

Vertex AI with Gemini Pro and Grounding with Google Search is correct because it provides a HIPAA-compliant platform (Vertex AI) with the ability to ground LLM responses in real-time web search results via Google Search Grounding, while also supporting data residency controls through customer-managed encryption keys and regional endpoints. This combination meets all requirements: compliance, data residency, and real-time grounding.

Exam trap

Cisco often tests the distinction between prototyping tools (AI Studio) and production platforms (Vertex AI), and the trap here is assuming that any Google AI service can be used for HIPAA-compliant deployment without checking for enterprise features like VPC Service Controls and data residency support.

How to eliminate wrong answers

Option A is wrong because Google AI Studio is a prototyping tool, not a production deployment platform, and does not support HIPAA compliance or data residency controls. Option C is wrong because the Gemini API directly from AI Studio lacks enterprise features like VPC Service Controls for data residency and does not offer HIPAA-compliant deployment; custom VPC alone does not satisfy HIPAA requirements. Option D is wrong because Amazon Bedrock is an AWS service, not part of the Google AI ecosystem, and mixing Vertex AI with Bedrock introduces cross-cloud complexity that violates the single-vendor strategy implied by the exam domain and does not provide native Google Search Grounding.

Full explanation →

783

MCQmedium

What is the primary purpose of a system instruction in the Gemini API?

A.Set the model's temperature and top_p

B.Define the overall behavior and constraints for the model

C.Provide few-shot examples for each query

D.Set the maximum output length

AnswerB

Correct: System instructions guide the model's persona and rules.

Why this answer

The system instruction in the Gemini API is the primary mechanism to define the overall behavior, persona, constraints, and guardrails for the model across all interactions. Unlike per-query parameters, it sets a persistent context that shapes how the model interprets every user prompt, ensuring consistent adherence to rules such as tone, format, or safety policies.

Exam trap

Google Cloud often tests the distinction between persistent system-level instructions and per-request parameters, so the trap here is confusing the system instruction (which defines the model's role and constraints) with generation controls like temperature, top_p, or max tokens, which only affect the style or length of a single response.

How to eliminate wrong answers

Option A is wrong because temperature and top_p are sampling parameters that control randomness and diversity of output, not the overarching behavioral constraints set by a system instruction. Option C is wrong because few-shot examples are typically provided in the user prompt or as part of a structured conversation, not as the primary purpose of a system instruction, which is for persistent context rather than per-query demonstrations. Option D is wrong because maximum output length is a generation parameter that limits token count, not a behavioral or constraint-setting mechanism like a system instruction.

Full explanation →

784

Multi-Selectmedium

A data scientist is evaluating how to ground a generative AI model to reduce hallucinations when answering questions about a private knowledge base. Which TWO techniques are most suitable?

Select 2 answers

A.Using a larger model like Gemini Ultra

B.Fine‑tuning on the private knowledge base

C.Increasing the temperature to 0.9

D.Retrieval-Augmented Generation (RAG)

E.Prompt engineering to instruct the model to answer based only on the provided context

AnswersD, E

RAG injects retrieved knowledge into the prompt, grounding the response in the source.

Why this answer

Retrieval-Augmented Generation (RAG) is the most suitable technique because it retrieves relevant, up-to-date chunks from the private knowledge base at inference time and conditions the generative model's output on that retrieved context. This grounds the model in factual data, directly reducing hallucinations by ensuring answers are based on the retrieved evidence rather than the model's parametric memory alone.

Exam trap

Cisco often tests the misconception that fine-tuning is the primary method for grounding a model on private data, when in fact RAG is preferred for dynamic or large knowledge bases because it avoids retraining and allows real-time updates without modifying model weights.

Full explanation →

785

MCQeasy

Which statement best describes the difference between the Gemini Flash and Gemini Pro models on Vertex AI?

A.Gemini Flash is a distilled version of Gemini Pro that requires fine‑tuning before use

B.Gemini Pro is deployed on Google’s TPU v5p chips, while Flash uses TPU v4

C.Gemini Flash is optimized for speed and cost, while Gemini Pro provides higher quality for complex tasks

D.Gemini Flash is only available for image inputs, while Gemini Pro handles text

AnswerC

Flash is a lightweight model for faster, cheaper inference; Pro is more capable for nuanced reasoning.

Why this answer

Option C is correct because Gemini Flash is specifically designed for low-latency, high-throughput, and cost-efficient inference, making it ideal for high-volume, simpler tasks. In contrast, Gemini Pro is a larger, more capable model that delivers superior quality and reasoning for complex, multi-step tasks, though at higher latency and cost. This distinction is fundamental to the Gemini model family on Vertex AI, where Flash serves as the lightweight, fast option and Pro as the premium, high-quality option.

Exam trap

The trap here is that candidates often assume 'Flash' implies a distilled or pruned version of 'Pro' (like a student model), but in reality, Flash is a distinct model trained from scratch with a different architecture optimized for speed, not a compressed version of Pro.

How to eliminate wrong answers

Option A is wrong because Gemini Flash is not a distilled version of Gemini Pro; it is a separate, independently trained model optimized for speed and cost, and it does not require fine-tuning before use—it is available as a pre-trained model for inference. Option B is wrong because both Gemini Flash and Gemini Pro are deployed on Google's TPU v5p chips; the difference in performance is due to model architecture and size, not the underlying TPU generation. Option D is wrong because Gemini Flash handles both text and image inputs (multimodal), just like Gemini Pro; the limitation to image-only inputs is a misconception.

Full explanation →

786

MCQeasy

Which of the following best describes the primary benefit of using Grounding with Google Search when building a GenAI chatbot?

A.It reduces the model's latency by caching responses

B.It enables the model to generate images based on text descriptions

C.It provides fine-tuning capabilities for domain-specific data

D.It allows the model to access real-time information from the internet to reduce hallucinations

AnswerD

Grounding connects the model to live search results, ensuring responses are based on current data.

Why this answer

Grounding with Google Search connects the GenAI chatbot to real-time internet data, allowing it to retrieve current facts and events that the model was not trained on. This reduces hallucinations by ensuring responses are based on verified, up-to-date information rather than relying solely on the model's static training data.

Exam trap

The trap here is that candidates often confuse Grounding (a retrieval-based technique for real-time accuracy) with fine-tuning (a training-based technique for domain adaptation), leading them to select Option C incorrectly.

How to eliminate wrong answers

Option A is wrong because Grounding with Google Search does not cache responses to reduce latency; it introduces additional retrieval latency by querying live search results. Option B is wrong because Grounding is a text-based retrieval mechanism and does not enable image generation, which requires a multimodal model or separate image generation service. Option C is wrong because Grounding is a retrieval-augmented generation (RAG) technique, not a fine-tuning method; fine-tuning adjusts model weights on domain-specific data, whereas Grounding retrieves external data at inference time.

Full explanation →

787

Multi-Selectmedium

A company wants to build a generative AI-powered internal knowledge base for employees. They need to integrate with existing Google Workspace documents (Docs, Slides) and allow natural language queries. Which TWO services should they combine?

Select 2 answers

A.Model Garden

B.Gemini API

C.Vertex AI Agent Builder

D.Vertex AI RAG Engine

E.Document AI (DocAI)

AnswersC, D

Agent Builder creates the conversational interface for the knowledge base.

Why this answer

Vertex AI Agent Builder can create a conversational agent, and RAG Engine can index Google Workspace documents for retrieval. Gemini API alone is not a knowledge base. DocAI is for document parsing, not retrieval.

Model Garden is a model repository.

Full explanation →

788

MCQmedium

A healthcare startup is deploying a generative AI model to assist physicians in diagnosing rare diseases. The model will suggest possible conditions based on patient symptoms and lab results. Which approach best aligns with Google's AI Principles and responsible AI practices?

A.Deploy the model with an override mechanism available only to the development team.

B.Fine-tune the model on a small dataset of rare disease cases to improve accuracy, then deploy without additional safeguards.

C.Allow the model to provide a diagnosis without human review to speed up treatment decisions.

D.Use the model only as a suggestion tool with a mandatory human-in-the-loop review before any diagnosis is communicated to the patient.

AnswerD

This ensures accountability, safety, and aligns with Google's AI Principles, especially for high-stakes domains like healthcare.

Full explanation →

789

MCQmedium

A data scientist fine-tunes a foundation model on customer support transcripts. After evaluation, the model's responses are too formal. Which adjustment during fine-tuning is most likely to make responses more conversational?

A.Increase the batch size to stabilize training.

B.Decrease the number of fine-tuning steps to prevent overfitting.

C.Include examples of informal customer interactions in the fine-tuning data.

D.Use a higher learning rate for faster adaptation.

AnswerC

The training data teaches the model the desired tone; adding conversational examples directly influences style.

Why this answer

The training data directly influences the tone and style of model outputs. Including examples of informal conversations in the fine-tuning dataset teaches the model the desired conversational tone. Other options affect training dynamics but not the style.

Full explanation →

790

MCQhard

A global retailer uses a generative AI model to personalize product recommendations. They need to ensure that customer prompts and responses are not logged for model improvement to meet GDPR data minimization principles. Which configuration should they apply?

A.Use a third-party logging service with data deletion policies

B.Enable data logging for six months and then automatically delete

C.Disable prompt/response logging in Vertex AI endpoint settings

D.Anonymize all customer data before logging

AnswerC

Disabling logging ensures that customer data is not stored for model improvement, aligning with GDPR.

Why this answer

To comply with GDPR data minimization, the retailer should disable prompt/response logging. Vertex AI offers settings to control whether prompts and responses are stored for model improvement.

Full explanation →

791

MCQeasy

Which Google AI research organization is responsible for AlphaFold, a breakthrough in protein structure prediction?

A.Google Research

B.Google DeepMind

C.Google Brain

D.X Development

AnswerB

DeepMind developed AlphaFold.

Why this answer

Google DeepMind is the research lab behind AlphaFold, AlphaCode, and other notable achievements.

Full explanation →

792

Multi-Selecthard

Which TWO of the following are best practices for configuring safety settings in Vertex AI generative models? (Choose 2)

Select 2 answers

A.Disable safety filters for maximum creativity.

B.Adjust safety thresholds based on the specific use case and audience.

C.Use the Vertex AI Safety API to programmatically review generated content.

D.Apply the same safety settings to all models in the organization.

E.Always use the maximum safety threshold to block all potentially harmful content.

AnswersB, C

Different use cases require different levels of filtering.

Why this answer

Option B is correct because safety thresholds in Vertex AI should be calibrated per use case and audience to balance content safety with utility. For example, a medical chatbot may require stricter thresholds than a creative writing tool, and Vertex AI's safety settings allow adjusting thresholds for categories like hate speech or harassment to match specific risk tolerances.

Exam trap

The trap here is that candidates assume maximum safety is always best, but the exam tests understanding that safety settings must be balanced with model utility and that disabling filters is never a best practice.

Full explanation →

793

MCQeasy

Which Google Cloud service provides a managed environment for prompt engineering and model evaluation?

A.AI Platform Notebooks

B.Dialogflow CX

C.Vertex AI Generative AI Studio

D.Cloud Composer

AnswerC

This service provides tools for prompt design and evaluation.

Why this answer

Vertex AI Generative AI Studio is the correct answer because it is a managed service within Vertex AI specifically designed for prompt engineering, model tuning, and evaluation of generative AI models. It provides a no-code interface for testing prompts, comparing model outputs, and iterating on prompt design, directly supporting the workflow described in the question.

Exam trap

The trap here is that candidates may confuse Vertex AI Generative AI Studio with AI Platform Notebooks, assuming that any model development environment supports prompt engineering, when in fact Generative AI Studio is the specialized tool for that purpose.

How to eliminate wrong answers

Option A is wrong because AI Platform Notebooks is a managed Jupyter notebook service for custom model development and training, not a dedicated environment for prompt engineering or model evaluation. Option B is wrong because Dialogflow CX is a conversational AI platform for building chatbots and virtual agents, focused on intent classification and dialogue management, not prompt engineering or model evaluation. Option D is wrong because Cloud Composer is a managed Apache Airflow service for workflow orchestration and scheduling, unrelated to prompt engineering or model evaluation.

Full explanation →

794

MCQmedium

A data scientist is using the Vertex AI PaLM API for text generation. They notice that the model occasionally generates toxic content. Which parameter should they adjust to reduce the likelihood of toxic outputs?

A.max_output_tokens

B.temperature

C.top_k

D.safety_settings

AnswerD

safety_settings can block toxic content based on thresholds.

Why this answer

Safety settings in the Vertex AI PaLM API allow you to configure thresholds for filtering harmful content categories (e.g., toxicity, harassment, hate speech). By adjusting these settings, you can block or reduce the likelihood of toxic outputs before they are returned, directly addressing the problem without altering the model's creativity or randomness.

Exam trap

The trap here is that candidates often confuse parameters that control output randomness (temperature, top_k) with those that enforce content safety, leading them to incorrectly select temperature or top_k instead of the dedicated safety_settings parameter.

How to eliminate wrong answers

Option A is wrong because max_output_tokens controls the maximum length of the generated text, not the content safety or toxicity. Option B is wrong because temperature adjusts the randomness of token sampling, influencing creativity but not filtering toxic content. Option C is wrong because top_k limits the number of highest-probability tokens considered at each step, affecting diversity but not safety filtering.

Full explanation →

795

MCQhard

A machine learning engineer submits the above batch prediction job for a large language model. The job is expected to process 100,000 instances. The job takes much longer than expected. Which change would most likely reduce the execution time?

A.Increase maxReplicaCount to 10

B.Increase startingReplicaCount to 10 without changing maxReplicaCount

C.Increase the machine type to n1-standard-16

D.Decrease the batch size to 1

AnswerA

More replicas allow parallel processing of batch instances, drastically reducing time.

Why this answer

Increasing maxReplicaCount to 10 allows Vertex AI Batch Prediction to scale out to more worker replicas, processing the 100,000 instances in parallel. The default maxReplicaCount is often 1 or a low number, which forces sequential or limited parallel processing, causing long execution times. By raising this limit, the job can leverage horizontal scaling to distribute the workload across multiple machines, significantly reducing wall-clock time.

Exam trap

The trap here is that candidates often confuse vertical scaling (larger machine type) with horizontal scaling (more replicas), and assume that a bigger machine always speeds up batch jobs, whereas for embarrassingly parallel batch inference, increasing the number of workers is the most effective lever.

How to eliminate wrong answers

Option B is wrong because increasing startingReplicaCount without raising maxReplicaCount does not allow the job to scale beyond the existing maximum; the job will still be capped at the original maxReplicaCount, so no additional parallelism is gained. Option C is wrong because upgrading to a larger machine type (n1-standard-16) provides more CPU/memory per replica but does not increase the number of replicas; for batch inference, throughput is often bottlenecked by the number of concurrent workers, not per-worker compute, so this change may not reduce overall execution time proportionally. Option D is wrong because decreasing batch size to 1 eliminates batching entirely, increasing the number of API calls and overhead per instance, which typically increases execution time rather than reducing it.

Full explanation →

796

MCQmedium

A company deploys a fine-tuned text generation model on Vertex AI Endpoints. They want to monitor for data drift and performance degradation over time. Which GCP service should they integrate?

A.Cloud Monitoring

B.Cloud Logging

C.Vertex AI Experiments

D.Vertex AI Model Monitoring

AnswerD

Model Monitoring provides drift detection, anomaly alerts, and performance monitoring for deployed models.

Why this answer

Vertex AI Model Monitoring is the correct choice because it is specifically designed to detect data drift (changes in input data distribution) and feature attribution drift in deployed models, including fine-tuned text generation models on Vertex AI Endpoints. It provides automated alerts when model performance degrades due to shifts in production data, enabling proactive retraining or intervention.

Exam trap

The trap here is that candidates confuse general observability tools (Cloud Monitoring, Cloud Logging) with Vertex AI's purpose-built drift detection service, assuming any monitoring tool can handle model-specific data drift analysis.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring provides infrastructure-level metrics (e.g., CPU, memory, latency) but does not analyze model input data distributions or detect data drift. Option B is wrong because Cloud Logging captures raw log entries for debugging and auditing, not statistical drift detection or performance degradation analysis. Option C is wrong because Vertex AI Experiments tracks training runs and hyperparameters, not post-deployment monitoring of live endpoints.

Full explanation →

797

MCQhard

A company is using Vertex AI Model Garden to deploy a foundation model for document summarization. They notice that the model sometimes generates summaries that include factual errors. They want to reduce hallucinations without sacrificing latency. Which approach should they try first?

A.Enable Vertex AI Grounding with a curated database of documents

B.Increase the temperature parameter to make the model more confident

C.Add more safety filters to block uncertain responses

D.Fine-tune the model on a high-quality dataset of correct summaries

AnswerA

Grounding retrieves evidence to reduce hallucinations.

Why this answer

Vertex AI Grounding connects the model to a curated database of documents, allowing it to retrieve and cite factual information in real-time. This directly reduces hallucinations by grounding responses in verified sources without adding significant latency, as the retrieval step is optimized for speed. Other approaches either increase latency (fine-tuning), reduce output quality (temperature increase), or do not address factual accuracy (safety filters).

Exam trap

The trap here is that candidates often assume fine-tuning is the default fix for hallucinations, but the question prioritizes latency and immediate factual grounding, making RAG via Vertex AI Grounding the faster and more appropriate first step.

How to eliminate wrong answers

Option B is wrong because increasing the temperature parameter makes the model more random and less confident, which would likely increase hallucinations, not reduce them. Option C is wrong because safety filters block harmful or unsafe content but do not correct factual errors; they are designed for policy compliance, not factual grounding. Option D is wrong because fine-tuning requires substantial time and resources, and while it can improve accuracy, it does not provide real-time grounding against a curated database and may not reduce latency as requested.

Full explanation →

798

MCQmedium

A company wants to use generative AI to create short product videos from text descriptions. Which Google Cloud service should they consider?

A.Chirp

B.Imagen

C.Gemini

D.Veo

AnswerD

Veo is Google's model for generating high-quality videos from text prompts.

Why this answer

Veo is Google's video generation model that creates videos from text. Imagen creates images, Chirp creates audio, and Gemini is multimodal but not specialized for video generation.

Full explanation →

799

MCQhard

An AI company wants to detect whether text was generated by their own model. Which technology developed by Google is specifically designed for this purpose?

A.SynthID

B.Google's confidential computing

C.Differential privacy

D.Federated learning

AnswerA

SynthID embeds an invisible watermark that can be detected later.

Why this answer

SynthID is Google's watermarking framework for AI-generated content, including text, images, and audio.

Full explanation →

800

Multi-Selecthard

A multinational corporation deploys a generative AI chatbot across multiple regions. They need to comply with GDPR and local data residency requirements. Which THREE actions are necessary?

Select 3 answers

A.Obtain explicit consent from every user before collecting any data

B.Anonymize all training data before fine-tuning

C.Store and process data only in approved geographic regions (data residency controls)

D.Implement prompt and response logging with configurable retention policies

E.Encrypt personal data at rest and in transit using customer-managed encryption keys (CMEK)

AnswersC, D, E

Data residency controls ensure data stays within required jurisdictions.

Why this answer

To comply with GDPR and data residency, the company must encrypt personal data, store data in specific regions, and control retention of prompts/responses. Consent management is part of GDPR but not specifically about data residency. Anonymization is one approach but not a universal requirement.

Full explanation →

801

MCQmedium

A company is piloting a GenAI feature that summarizes customer support tickets. They want to measure the impact on agent productivity before rolling out to all teams. Which approach BEST evaluates the pilot?

A.Survey agents on their perception of productivity after using the tool

B.Run an A/B test where half the agents use the tool and half do not, then compare average handling time

C.Compare the cost of the API before and after deployment

D.Measure the number of summaries generated per day

AnswerB

A/B testing provides a statistically sound comparison of actual metrics.

Why this answer

A/B testing with a control group provides a rigorous comparison of productivity metrics. The other options lack a baseline for comparison.

Full explanation →

802

MCQhard

A company is deploying a generative AI model for customer support. They want to reduce hallucinations while maintaining fluency. They have a large dataset of previous support conversations. Which strategy should they prioritize?

A.Increase the beam search width to 10.

B.Implement retrieval-augmented generation (RAG) using the conversation dataset as a knowledge base.

C.Fine-tune the model on the conversation dataset.

D.Set the temperature to 0.1.

AnswerB

RAG retrieves relevant facts from the dataset, reducing hallucinations.

Why this answer

Retrieval-augmented generation (RAG) directly addresses hallucinations by grounding the model's responses in factual, retrieved data from the conversation dataset. This approach allows the model to generate fluent, contextually relevant answers while reducing the risk of inventing information, as it retrieves actual support interactions as evidence before generating a response.

Exam trap

Google Cloud often tests the misconception that tuning generation parameters (like temperature or beam search) can fix hallucinations, when in fact only grounding techniques like RAG or knowledge graph integration address the root cause of factual inaccuracy.

How to eliminate wrong answers

Option A is wrong because increasing beam search width to 10 improves output fluency by exploring more candidate sequences but does not reduce hallucinations; it may even amplify incorrect patterns if the model is prone to hallucination. Option C is wrong because fine-tuning on the conversation dataset can improve domain-specific fluency but risks overfitting to noise or biases in the data, and without retrieval, the model may still hallucinate when faced with novel queries. Option D is wrong because setting temperature to 0.1 makes the model more deterministic and less creative, which can reduce variability but does not prevent hallucinations; it may cause the model to repeat common but incorrect patterns from training data.

Full explanation →

803

MCQeasy

A developer needs to use the Vertex AI PaLM API to generate text embeddings for a large corpus of documents. Which model should they use?

A.codey-bison@001

B.textembedding-gecko@001

C.text-bison@001

D.chat-bison@001

AnswerB

This model is designed for generating embeddings.

Why this answer

Option B is correct because `textembedding-gecko@001` is the specific Vertex AI model designed for generating text embeddings, which convert text into dense vector representations. This model is optimized for semantic similarity, clustering, and retrieval tasks, making it ideal for processing a large corpus of documents. The other models are designed for code generation, text generation, or chat, not embeddings.

Exam trap

The trap here is that candidates may confuse general-purpose text generation models (like `text-bison@001`) with embedding models, assuming any 'text' model can produce embeddings, but only models with 'embedding' in the name are designed for that purpose.

How to eliminate wrong answers

Option A is wrong because `codey-bison@001` is a code generation model, not an embedding model; it generates code snippets or completes code, not vector representations of text. Option C is wrong because `text-bison@001` is a text generation model for tasks like summarization or content creation, not for producing embeddings. Option D is wrong because `chat-bison@001` is a conversational model designed for multi-turn dialogue, not for generating text embeddings.

Full explanation →

804

Multi-Selectmedium

A company wants to use GenAI to generate marketing content such as blog posts and social media updates. They need the content to be on-brand and factually accurate. Which TWO features should they use?

Select 2 answers

A.Provide few-shot examples in the prompt to set the brand tone

B.Use a longer context window to include all brand guidelines

C.Fine-tune the model on a large corpus of past marketing content

D.Enable grounding with Google Search for factual accuracy

E.Use Vertex AI Studio to design prompts with no additional grounding

AnswersA, D

Few-shot learning guides the model to produce content consistent with the brand voice.

Why this answer

Few-shot examples in prompts help maintain brand voice. Grounding with Google Search ensures factual accuracy. Vertex AI Studio is for prompt design but not directly for accuracy.

Fine-tuning may be overkill. Longer context may dilute the message.

Full explanation →

805

MCQmedium

A company is building a multilingual customer support chatbot that needs to understand and respond in 20 languages. Which Google model is most suitable for this task?

A.Chirp

B.Imagen

C.Codey

D.Gemini

AnswerD

Gemini supports many languages and multimodal understanding.

Why this answer

Gemini is a multimodal large language model (LLM) designed for understanding and generating text across multiple languages, making it the most suitable choice for a multilingual customer support chatbot. Unlike specialized models, Gemini's architecture supports over 100 languages natively, enabling it to handle the 20-language requirement without needing separate language-specific models.

Exam trap

Cisco often tests the distinction between specialized models (like Chirp for audio, Imagen for images, Codey for code) and general-purpose multimodal LLMs (like Gemini) that can handle diverse tasks including multilingual text generation.

How to eliminate wrong answers

Option A is wrong because Chirp is a speech-to-text and text-to-speech model focused on audio processing, not on understanding or generating multilingual text for a chatbot. Option B is wrong because Imagen is a text-to-image generation model, not designed for natural language understanding or multilingual text responses. Option C is wrong because Codey is a code generation model specialized in programming languages and code completion, not in handling natural language conversations across multiple human languages.

Full explanation →

806

MCQeasy

A company deploys a sentiment analysis model to classify customer reviews. The model consistently returns overly positive sentiment for all reviews, even when reviews contain negative feedback. Which technique would best resolve this issue?

A.Add a system prompt instructing the model to analyze the review for both positive and negative sentiment and output the overall classification.

B.Fine-tune the model on a dataset with an equal number of positive and negative examples.

C.Reduce the max output tokens to limit the model's tendency to generate positive language.

D.Increase the temperature parameter to reduce model confidence.

AnswerA

Prompt engineering can directly guide the model to consider all sentiment categories.

Why this answer

Option C is correct because using a system prompt with explicit instructions to detect all sentiments, including negative, guides the model to consider the full emotional range. Option A is wrong because increasing temperature adds randomness and doesn't enforce balance. Option B is wrong because adjusting max tokens only affects output length.

Option D is wrong because fine-tuning on a balanced dataset is a good practice but is not the quickest fix; prompt engineering is more immediate.

Full explanation →

807

MCQhard

A company is deploying a GenAI-powered email drafting feature. They want to control costs while maintaining low latency for real-time suggestions. Which strategy is MOST effective?

A.Batch all email drafting requests and run them every hour

B.Implement caching for frequently generated email drafts and use a smaller model variant for real-time requests

C.Use the largest available model and increase the number of tokens per request to generate more complete drafts

D.Use a large model with a longer context window to reduce the number of API calls

AnswerB

Caching avoids repeated inference for common drafts, and a smaller model reduces cost and latency for less common requests.

Why this answer

Caching common prompt-output pairs reduces API calls for repeated inputs. Choosing a smaller model balances speed and cost. Batching is for offline processing, not real-time.

Long context is more expensive.

Full explanation →

808

Multi-Selectmedium

A data architect is designing a system that uses a generative AI model to summarize customer support transcripts. The system must comply with GDPR and company policy requiring data residency in the EU. Which TWO controls should they implement?

Select 2 answers

A.Enable prompt logging and retention for 5 years for auditing

B.Use a model that has been fine-tuned on all customer transcripts globally

C.Use a non-EU region to reduce latency

D.Configure the Vertex AI endpoint to use an EU region for data processing

E.Disable prompt logging entirely to minimize data processing

AnswersD, E

Using an EU region ensures data residency within the EU.

Why this answer

Option D is correct because configuring the Vertex AI endpoint to use an EU region ensures that all data processing, including model inference and any intermediate data handling, occurs within the EU, satisfying GDPR data residency requirements. This control directly enforces geographic data localization without relying on data transfer mechanisms like Standard Contractual Clauses (SCCs).

Exam trap

Cisco often tests the misconception that disabling logging entirely is always the best GDPR control, but in practice, a balanced approach with minimal, purpose-limited logging (e.g., with data masking) is often more compliant and operationally viable than complete elimination of logs.

Full explanation →

809

MCQhard

A startup uses a generative AI model to create marketing content. They plan to sell the generated content commercially. What is the most important legal consideration regarding copyright?

A.The model automatically owns the copyright to all generated content

B.They should apply SynthID watermarking to all generated content

C.They can use any generated content freely as long as they attribute the model

D.They must verify the training data provenance to ensure no copyrighted material was used without permission

AnswerD

Training data provenance is key to avoiding copyright infringement claims on generated outputs.

Why this answer

Option D is correct because under current copyright law, the user of a generative AI system is typically considered the author of the output, but only if the training data was lawfully obtained. If the model was trained on copyrighted works without permission, the generated content may be considered a derivative work, exposing the startup to infringement liability. Verifying training data provenance is therefore the most critical legal step before commercializing AI-generated content.

Exam trap

Cisco often tests the misconception that AI-generated content is automatically free to use or that technical safeguards like watermarking replace legal due diligence, when in fact the core legal risk lies in the provenance of the training data.

How to eliminate wrong answers

Option A is wrong because under current US Copyright Office guidance and most international frameworks, AI models themselves cannot hold copyright; copyright vests in the human user who provides sufficient creative input or direction. Option B is wrong because SynthID watermarking is a technical tool for identifying AI-generated content, not a legal mechanism for copyright ownership or clearance; it does not resolve infringement risks from training data. Option C is wrong because attribution to the model does not grant legal permission to use copyrighted material; fair use or licensing must be established independently, and simply crediting the AI does not satisfy copyright law requirements.

Full explanation →

810

MCQmedium

A developer is using the Vertex AI PaLM API and receives a 429 Resource Exhausted error. What is the most likely cause?

A.The request payload is too large

B.The user has exceeded the allowed number of requests per minute

C.The model is not available in the current region

D.The API key is invalid

AnswerB

429 means too many requests, exceeding quota.

Why this answer

429 errors indicate rate limiting or quota exhaustion for the API.

Full explanation →

811

MCQmedium

A researcher wants to adapt a large language model for a specialized medical terminology domain without retraining the entire model. Which fine-tuning method is MOST parameter-efficient?

A.In-context learning with 50 examples

B.Adapter-based fine-tuning using LoRA

C.RLHF (Reinforcement Learning from Human Feedback)

D.Full supervised fine-tuning of all model weights

AnswerB

LoRA injects trainable low-rank matrices into the model, updating only a tiny fraction of parameters while achieving strong performance.

Why this answer

LoRA (Low-Rank Adaptation) is the most parameter-efficient fine-tuning method because it injects trainable low-rank matrices into the transformer layers, updating only a tiny fraction (often <1%) of the model's parameters while keeping the original weights frozen. This allows the model to adapt to specialized medical terminology without the memory and compute cost of full fine-tuning, making it ideal for domain adaptation with limited resources.

Exam trap

Cisco often tests the distinction between 'fine-tuning' and 'prompt engineering'—the trap here is that candidates mistake in-context learning (Option A) for a fine-tuning method because it adapts behavior, but it does not update model parameters, making it ineligible as a parameter-efficient fine-tuning technique.

How to eliminate wrong answers

Option A is wrong because in-context learning with 50 examples does not modify model weights at all; it relies on the prompt context window, which is limited in length and cannot reliably encode specialized medical terminology for consistent generation, making it a zero-shot/prompting technique, not a fine-tuning method. Option C is wrong because RLHF is a training paradigm that aligns model outputs with human preferences using a reward model, but it is not parameter-efficient—it typically requires full model fine-tuning or at least significant weight updates, and it is designed for alignment, not domain-specific knowledge injection. Option D is wrong because full supervised fine-tuning updates all model weights, which is extremely parameter-inefficient (requires storing and computing gradients for billions of parameters), prone to catastrophic forgetting, and demands substantial computational resources, contradicting the requirement for parameter efficiency.

Full explanation →

812

MCQeasy

A retail company wants to deploy a generative AI chatbot to assist customers with product recommendations. The chatbot must align with the company's brand voice and provide accurate, up-to-date information. Which strategy should the company prioritize when developing this solution?

A.Ground the model with proprietary product data and brand guidelines in a retrieval-augmented generation (RAG) architecture.

B.Use a generic pre-trained model without customization to reduce development time.

C.Deploy a large language model with a feedback loop to iteratively improve responses.

D.Train the model on public customer reviews to capture common preferences.

AnswerA

RAG with curated data ensures responses are accurate, up-to-date, and on-brand.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) allows the chatbot to ground its responses in the company's proprietary product data and brand guidelines, ensuring factual accuracy and brand consistency. By retrieving relevant information from a curated knowledge base at inference time, the model can provide up-to-date recommendations without requiring retraining, which is critical for a retail environment with frequently changing inventory.

Exam trap

Google Cloud often tests the distinction between fine-tuning and RAG, where candidates mistakenly believe that fine-tuning on historical data is sufficient for real-time accuracy, but the trap here is that only RAG can provide up-to-date grounding without retraining.

How to eliminate wrong answers

Option B is wrong because using a generic pre-trained model without customization will produce responses that lack the company's specific brand voice and may hallucinate product details, leading to inaccurate recommendations. Option C is wrong because deploying a large language model with only a feedback loop does not address the need for accurate, up-to-date information; feedback loops improve responses over time but do not ground the model in proprietary data, so initial outputs can still be incorrect. Option D is wrong because training on public customer reviews introduces noise, bias, and outdated opinions, and does not align with the company's brand guidelines or provide accurate product information.

Full explanation →

813

MCQeasy

A team wants to build a GenAI application that can interact with external APIs (e.g., to check inventory or place orders). Which Vertex AI component provides this capability?

A.Grounding with Google Search

B.Model Garden

C.Vertex AI Agent Builder

D.Vertex AI Extensions

AnswerD

Extensions enable agents to connect to external APIs and execute actions.

Why this answer

Extensions allow agents to call external APIs and perform actions beyond text generation. Agent Builder is the platform but Extensions provide the specific API integration. Model Garden and Grounding do not support API calls.

Full explanation →

814

MCQmedium

A retailer is building a product recommendation chatbot using Vertex AI Agent Builder. They want the agent to answer questions about product availability, prices, and promotions, but also to escalate to a human agent when the query is complex. What should they configure in Agent Builder?

A.Create a playbook with a step that transfers to a human via a webhook

B.Define an agent with a 'handoff to human' intent and configure the corresponding flow

C.Integrate a tool that calls a human support API when confidence is low

D.Use Vertex AI Agent Builder's generative fallback to automatically escalate

AnswerB

Agent Builder supports handoff to a human agent through intent and flow configuration.

Why this answer

Option A is correct because Agent Builder allows defining conversation flows with escalation to a live agent. Option B (generative fallback) only handles unknown queries, not escalation. Option C (tool integration) is for external APIs, not human takeover.

Option D (playbooks) define steps but not escalation triggers.

Full explanation →

815

MCQmedium

A company is using Gemini to generate marketing copy. They want the outputs to be more creative and varied. Which generation parameters should they adjust?

A.Set temperature to 0 and top-k to 1

B.Decrease temperature and increase top-k

C.Increase temperature and adjust top-p to a higher value

D.Increase the context window length

AnswerC

Higher temperature increases randomness; higher top-p allows a wider pool of tokens, boosting creativity.

Why this answer

Option C is correct because increasing temperature raises the randomness of token selection, making outputs more creative and varied, while adjusting top-p to a higher value (e.g., 0.9) allows the model to sample from a larger cumulative probability mass of likely tokens, further increasing diversity. Together, these parameters directly control the stochasticity of generation, which is essential for creative marketing copy.

Exam trap

Cisco often tests the misconception that increasing context length or adjusting a single parameter (like top-k) is sufficient for creativity, when in fact temperature and top-p must be increased together to achieve controlled randomness without generating gibberish.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 and top-k to 1 forces deterministic, greedy decoding (always picking the most probable token), which eliminates creativity and variation entirely. Option B is wrong because decreasing temperature reduces randomness, making outputs more focused and repetitive, which is the opposite of what is needed for creativity. Option D is wrong because increasing the context window length only allows the model to consider more input tokens (e.g., longer prompt or history), but does not affect the randomness or diversity of token selection during generation.

Full explanation →

816

Multi-Selectmedium

A company is deploying a generative AI model for medical diagnosis support. Which THREE considerations are critical for responsible AI?

Select 3 answers

A.Ensure the training data is diverse and representative.

B.Maximize model throughput to handle high volumes.

C.Implement human oversight for all diagnostic suggestions.

D.Provide clear disclaimers about the model's limitations.

E.Use the cheapest model to reduce costs.

AnswersA, C, D

Diverse data reduces bias.

Why this answer

Option A is correct because diverse and representative training data is critical for responsible AI in medical diagnosis. If the data lacks diversity, the model may exhibit bias, leading to inaccurate or harmful diagnoses for underrepresented groups. This directly impacts fairness, safety, and regulatory compliance in healthcare AI.

Exam trap

Google Cloud often tests the distinction between operational metrics (like throughput or cost) and ethical/regulatory requirements (like fairness, transparency, and human oversight) in responsible AI, leading candidates to mistakenly select performance-based options as critical considerations.

Full explanation →

817

MCQeasy

A startup wants to quickly prototype a conversational AI application using Gemini. They need free access during development and do not require VPC controls. Which access tier should they choose?

A.Vertex AI on a pay-as-you-go basis

B.Vertex AI with VPC-SC

C.Gemini API via Cloud Run

D.Google AI Studio free tier

AnswerD

Google AI Studio provides free prototyping without VPC controls.

Why this answer

Option D is correct because Google AI Studio's free tier provides free access to Gemini models for rapid prototyping without requiring VPC controls or any billing setup. This aligns directly with the startup's need for quick, cost-free development iteration before moving to production.

Exam trap

Cisco often tests the misconception that any Google Cloud service requires a billing account, but Google AI Studio's free tier explicitly bypasses this for prototyping, while options like Vertex AI or Cloud Run always incur costs even at low usage.

How to eliminate wrong answers

Option A is wrong because Vertex AI on a pay-as-you-go basis incurs costs from the start, which contradicts the requirement for free access during development. Option B is wrong because Vertex AI with VPC-SC adds unnecessary VPC security controls and costs, which the startup explicitly does not need. Option C is wrong because the Gemini API via Cloud Run requires a billing account and incurs compute and API usage costs, making it not free for prototyping.

Full explanation →

818

Multi-Selecthard

Which THREE factors should be considered when choosing between a fine-tuned model and a prompted foundation model for a generative AI solution? (Select 3)

Select 3 answers

A.Need for domain-specific vocabulary

B.Inference latency requirements

C.Size of training data available

D.Whether the model is open-source

E.Token cost per request

AnswersA, C, E

Fine-tuning can incorporate domain language.

Why this answer

Option A is correct because fine-tuning allows the model to learn domain-specific vocabulary and terminology that may not be well-represented in the foundation model's pre-training data. This is critical for specialized fields like legal, medical, or technical domains where precise language is required for accurate outputs.

Exam trap

Google Cloud often tests the misconception that inference latency is a deciding factor between fine-tuning and prompting, when in reality both can be optimized for speed, and the key differentiators are data availability, domain specificity, and cost per token.

Full explanation →

819

MCQeasy

What is the purpose of grounding in Vertex AI?

A.To improve training speed

B.To connect model outputs to verifiable sources

C.To reduce model size for faster inference

D.To enable multi-modal inputs

AnswerB

Grounding ensures the model's responses are based on authoritative information.

Why this answer

Grounding in Vertex AI connects model outputs to verifiable, external sources of information (such as Google Search, enterprise data sources, or third-party databases) to reduce hallucinations and improve factual accuracy. By referencing grounded sources, the model can provide citations and allow users to verify claims, which is critical for enterprise applications requiring trust and compliance.

Exam trap

Google Cloud often tests grounding by conflating it with fine-tuning or prompt engineering, so the trap here is assuming grounding modifies the model's weights or training process, when in fact it is a retrieval-based augmentation layer applied at inference time.

How to eliminate wrong answers

Option A is wrong because grounding does not improve training speed; it is a runtime technique applied during inference to augment responses with real-time data, not a training optimization. Option C is wrong because grounding does not reduce model size or accelerate inference; it may actually add latency due to the retrieval step. Option D is wrong because grounding is not about enabling multi-modal inputs; it specifically addresses output verification and source attribution, whereas multi-modal support is a separate capability for processing images, audio, or video alongside text.

Full explanation →

820

MCQmedium

A company is building a customer service chatbot using Vertex AI Agent Builder. The chatbot needs to answer questions based on a large internal knowledge base stored in a Cloud Storage bucket. The team wants to ensure the model can reference the latest documents without fine-tuning. Which configuration should they use?

A.Fine-tune a model on the knowledge base documents

B.Use a pre-built model with no additional configuration

C.Store the documents in BigQuery and use a BigQuery connector

D.Ground the model with a Vertex AI Search data store connected to the Cloud Storage bucket

AnswerD

Grounding enables retrieval-augmented generation from the latest documents.

Why this answer

Option C is correct because Vertex AI Agent Builder can use grounding with Cloud Storage to dynamically retrieve information from documents without fine-tuning. Option A is wrong because using a pre-built model without retrieval would not incorporate the knowledge base. Option B is wrong because fine-tuning is not needed and would require retraining.

Option D is wrong because exporting to BigQuery adds unnecessary complexity.

Full explanation →

821

MCQhard

A financial services firm is deploying a generative AI chatbot for customer inquiries. Due to regulatory requirements, all answers must be traceable to specific source documents and must not include information beyond those documents. Which approach BEST satisfies these requirements?

A.Use in-context learning by providing all documents in the prompt each time

B.Use prompt engineering with a strict instruction to only use the provided documents

C.Fine-tune a model on the source documents and use a high temperature for creativity

D.Use RAG with a vector store containing only the approved documents, and enable grounding

AnswerD

RAG retrieves from the approved documents, and grounding links each response to the retrieved sources, ensuring traceability and restricting knowledge.

Why this answer

Option D is correct because Retrieval-Augmented Generation (RAG) with a vector store ensures that the model retrieves content exclusively from the approved source documents, and grounding (e.g., via Azure OpenAI Grounding or AWS Bedrock Knowledge Bases) enforces that the generated response is directly traceable to those retrieved passages. This architecture inherently prevents hallucination or inclusion of external knowledge, satisfying regulatory traceability and scope requirements.

Exam trap

Cisco often tests the misconception that prompt engineering or in-context learning alone can reliably constrain model behavior, when in fact only architectural approaches like RAG with grounding provide the deterministic traceability required for regulated industries.

How to eliminate wrong answers

Option A is wrong because in-context learning with all documents in the prompt is impractical for large document sets due to token limits (e.g., 4K–32K tokens) and does not guarantee the model will not use its internal knowledge; it also lacks a retrieval mechanism to pinpoint specific source passages. Option B is wrong because prompt engineering with a strict instruction is a soft constraint that the model can still violate (e.g., hallucinate or use pre-training knowledge), as it does not enforce retrieval or grounding at the architecture level. Option C is wrong because fine-tuning on source documents with high temperature increases randomness and creativity, which directly contradicts the requirement to avoid generating information beyond the documents; high temperature amplifies the risk of hallucination.

Full explanation →

822

MCQeasy

What is the main advantage of using a model with a larger context window?

A.Better performance on image generation tasks

B.Lower cost per API call

C.Ability to process longer documents or conversations without truncation

D.Faster inference speed

AnswerC

More tokens can be processed in a single forward pass, enabling handling of longer content.

Why this answer

A larger context window allows the model to process and retain more tokens in a single pass, enabling it to handle longer documents, extended conversations, or large codebases without needing to truncate or chunk the input. This is critical for tasks like summarizing entire research papers, maintaining coherent multi-turn dialogues, or analyzing long legal contracts, where preserving full context directly impacts output quality and accuracy.

Exam trap

Cisco often tests the misconception that 'bigger is always better' by pairing a clear benefit (longer context) with attractive but unrelated options like lower cost or faster speed, tempting candidates to conflate model capability with operational efficiency.

How to eliminate wrong answers

Option A is wrong because context window size is a token-length constraint for text and multimodal inputs, not a direct factor in image generation quality; image generation performance depends on model architecture, training data, and diffusion/transformer design, not context length. Option B is wrong because larger context windows typically increase computational cost per API call due to the quadratic scaling of attention mechanisms (e.g., O(n²) in standard transformers), leading to higher latency and cost, not lower. Option D is wrong because processing more tokens in a larger context window generally increases inference time, as the model must attend to a longer sequence; faster inference is achieved through model optimization, quantization, or pruning, not by expanding context length.

Full explanation →

823

Multi-Selecthard

A health-tech startup is fine-tuning a generative AI model on electronic health records (EHR) to assist in clinical decision support. They need to ensure responsible AI practices. Which THREE measures should they implement? (Select three.)

Select 3 answers

A.Automate all decisions to reduce human error

B.Evaluate model outputs for bias across different demographic groups

C.Require a human clinician to review all AI-generated recommendations before action

D.Publish a Model Card that describes the model's intended use, limitations, and performance

E.Train the model exclusively on data from a single hospital to ensure consistency

AnswersB, C, D

Bias evaluation ensures the model is fair across populations, which is critical in healthcare.

Why this answer

Responsible AI in healthcare requires human oversight for high-stakes decisions, evaluating the model for bias (especially across demographic groups), and ensuring transparency about the model's limitations. Training data representativeness is also critical.

Full explanation →

824

MCQeasy

A developer needs to generate embeddings for text data to be used in a semantic search application. Which Google Cloud service should they use?

A.Document AI

B.Cloud Translation API

C.Cloud Speech-to-Text

D.Vertex AI Embeddings API

AnswerD

This API generates text embeddings using foundation models.

Why this answer

Vertex AI Embeddings API is the correct choice because it provides a managed service to generate vector embeddings from text data, which are essential for semantic search applications that rely on understanding meaning rather than exact keyword matches. This API leverages large language models to convert text into high-dimensional vectors, enabling efficient similarity search using vector databases or nearest neighbor algorithms.

Exam trap

The trap here is that candidates may confuse Document AI's ability to extract text from documents with the need to generate embeddings from that text, overlooking that embedding generation is a separate, specialized step required for semantic search.

How to eliminate wrong answers

Option A is wrong because Document AI is designed for document processing tasks like OCR, parsing, and extraction of structured data from documents, not for generating text embeddings. Option B is wrong because Cloud Translation API is used for translating text between languages, not for creating vector representations of text for semantic search. Option C is wrong because Cloud Speech-to-Text converts audio to text, but does not generate embeddings or support semantic search directly.

Full explanation →

825

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Train a custom model from scratch on the policy documents each month

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions using the latest policy documents without retraining the model. By indexing the documents in a vector store and retrieving relevant chunks at query time, RAG ensures the model's responses are grounded in the most current information, even as documents are updated monthly. This avoids the cost and complexity of fine-tuning or retraining a model each time the documents change.

Exam trap

Cisco often tests the misconception that fine-tuning or retraining is necessary for domain-specific knowledge, when in fact RAG provides a cost-effective, dynamic alternative that avoids model updates and maintains up-to-date information.

How to eliminate wrong answers

Option A is wrong because fine-tuning a base LLM on the policy documents monthly would require significant computational resources and time for each update, and the model may still produce outdated or hallucinated answers if the fine-tuning data is not perfectly aligned with the latest documents. Option B is wrong because pasting all policy documents into each prompt would exceed the context window limits of even the largest foundation models (e.g., 128K tokens for GPT-4 Turbo), leading to truncated inputs, high latency, and increased cost per query. Option C is wrong because training a custom model from scratch each month is prohibitively expensive, time-consuming, and requires large amounts of training data and expertise, making it impractical for monthly document updates.

Full explanation →

Page 11 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Generative AI Concepts and Technologies Google AI Ecosystem and Strategy Responsible AI and Data Governance Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output Applying Generative AI in Business

See all domains with question counts →

Google Cloud Generative AI Leader Generative AI Leader Generative AI Leader Questions 751–825 | Page 11/14 | Courseiva