Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 601–675

997 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 9 of 14

601

MCQhard

A company deploys a GenAI-powered code review assistant. During evaluation, they find that the assistant often suggests security vulnerabilities as improvements. What is the MOST likely cause?

A.The model was trained on a dataset with many insecure code examples

B.The model's temperature is set too low

C.The model is too small for code generation tasks

D.The prompt does not include a security constraint

AnswerA

Training data bias toward insecure code can cause the model to suggest vulnerabilities.

Why this answer

The most likely cause is that the model was trained on a dataset containing many insecure code examples. A GenAI code review assistant learns patterns from its training data; if that data includes prevalent security vulnerabilities (e.g., SQL injection, buffer overflows), the model will internalize those patterns as 'normal' or even 'desirable' improvements. This leads to the assistant suggesting insecure code changes because it is statistically replicating the flawed logic it was exposed to during training.

Exam trap

Cisco often tests the misconception that prompt engineering alone (e.g., adding a security constraint) can override fundamental training data biases, when in fact the model's learned weights from the training corpus are the dominant factor in output quality.

How to eliminate wrong answers

Option B is wrong because setting the temperature too low (e.g., near 0) makes the model more deterministic and conservative, reducing randomness and the likelihood of suggesting unusual or insecure patterns; it would not cause the model to actively suggest vulnerabilities. Option C is wrong because model size (number of parameters) affects capability and fluency, not the tendency to generate insecure code; a small model can still produce secure suggestions if trained on secure data, while a large model trained on insecure data will replicate those flaws. Option D is wrong because while a missing security constraint in the prompt might fail to guide the model away from vulnerabilities, the root cause is the training data; even with a security constraint, a model trained on insecure examples may still suggest vulnerabilities due to its ingrained patterns, and the question asks for the 'most likely' cause, which is the data quality issue.

Full explanation →

602

MCQeasy

A healthcare startup is developing a generative AI system to assist doctors in diagnosing rare diseases. According to Google's AI Principles, what is the MOST important requirement before deployment?

A.The model must achieve at least 99% accuracy on a held-out test set

B.The model must be trained on the most recent medical literature

C.The startup must publish the model's architecture in a peer-reviewed journal

D.The system must include a mechanism for human review of all diagnostic suggestions

AnswerD

For high-stakes AI decisions like medical diagnoses, human oversight is essential to ensure accountability and patient safety.

Why this answer

Google's AI Principles state that AI systems should be built and tested for safety, especially in high-stakes domains like healthcare, where human oversight is critical.

Full explanation →

603

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Fine-tune a base LLM on the policy documents monthly

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Use a larger foundation model with a longer context window and paste all documents into each prompt

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Full explanation →

604

MCQeasy

A company is using Vertex AI to deploy a text generation model for a chatbot. They want to reduce the response latency. Which configuration change is most effective?

A.Enable model quantization

B.Use a smaller model variant

C.Increase the number of GPUs

D.Use a larger batch size

AnswerB

Smaller models have faster inference, directly reducing latency.

Why this answer

Option B is correct because using a smaller model variant directly reduces the number of parameters and computational operations required per inference, which lowers latency. In Vertex AI, smaller models like `text-bison@002` have fewer layers and attention heads than larger counterparts, resulting in faster token generation without requiring hardware changes.

Exam trap

Google Cloud often tests the misconception that increasing compute resources (GPUs) or batch size always reduces latency, when in fact these optimizations target throughput, not per-request response time.

How to eliminate wrong answers

Option A is wrong because model quantization (e.g., reducing weights from FP32 to INT8) can reduce memory footprint and improve throughput, but it does not guarantee lower latency per request and may introduce accuracy trade-offs; it is not the most effective single change for latency reduction. Option C is wrong because increasing the number of GPUs can improve throughput for batch processing but does not reduce per-request latency; in fact, it may increase communication overhead and cost without speeding up individual inference. Option D is wrong because using a larger batch size increases throughput for concurrent requests but actually increases the latency for each individual request, as the model processes more sequences together before returning results.

Full explanation →

605

MCQmedium

A company is using Vertex AI Agent Builder to create a travel booking agent. They want the agent to book flights and hotels dynamically. What action type should they use?

A.Dynamic call

B.Static call

C.Webhook

D.Notification

AnswerC

Webhooks allow dynamic external API calls for booking.

Why this answer

Option C is correct because Vertex AI Agent Builder uses webhooks to integrate with external systems for dynamic, real-time operations like booking flights and hotels. A webhook allows the agent to make HTTP calls to external APIs (e.g., a travel booking service) to fetch or update data during a conversation, enabling dynamic booking actions. Static or notification actions cannot handle the two-way, real-time data exchange required for live reservations.

Exam trap

The trap here is that candidates confuse 'dynamic call' (a generic term) with the actual Vertex AI Agent Builder mechanism, or assume 'notification' can handle bidirectional data exchange, when only webhooks provide the required synchronous HTTP callback for real-time operations.

How to eliminate wrong answers

Option A is wrong because 'Dynamic call' is not a recognized action type in Vertex AI Agent Builder; the platform uses webhooks for dynamic interactions, not a separate 'dynamic call' concept. Option B is wrong because 'Static call' refers to predefined, non-interactive responses or data lookups that cannot handle real-time booking logic or external API calls. Option D is wrong because 'Notification' is a one-way push mechanism (e.g., sending alerts) and does not support the request-response pattern needed to execute a booking transaction.

Full explanation →

606

MCQhard

A researcher wants to use Google's AlphaFold for a project. What is the primary capability of AlphaFold?

A.Generating realistic human speech

B.Playing the game of Go at superhuman level

C.Predicting 3D protein structures from amino acid sequences

D.Generating code from natural language descriptions

AnswerC

AlphaFold is known for protein structure prediction.

Why this answer

AlphaFold, developed by Google DeepMind, is specifically designed to predict the 3D structure of proteins from their amino acid sequences. This capability solves a fundamental challenge in biology, as the function of a protein is largely determined by its 3D shape, and experimental methods like X-ray crystallography are time-consuming and expensive. AlphaFold achieves this using a deep learning architecture that integrates multiple sequence alignment (MSA) and pairwise distance predictions to model the spatial coordinates of atoms.

Exam trap

Cisco often tests the distinction between different Google DeepMind projects (AlphaGo vs. AlphaFold vs. AlphaZero), so the trap here is confusing the domain of game-playing AI with the domain of scientific prediction, leading candidates to pick Option B if they recall AlphaGo's fame but not AlphaFold's specific purpose.

How to eliminate wrong answers

Option A is wrong because generating realistic human speech is the primary capability of text-to-speech models like WaveNet or Tacotron, not AlphaFold, which focuses on structural biology. Option B is wrong because playing the game of Go at superhuman level is the achievement of AlphaGo and AlphaZero, which use reinforcement learning and Monte Carlo tree search, not the protein folding task AlphaFold was built for. Option D is wrong because generating code from natural language descriptions is the domain of large language models like Codex or GPT-4, not AlphaFold, which has no code generation functionality.

Full explanation →

607

MCQeasy

A data analyst needs to run a simple regression model directly on data stored in BigQuery without moving data to another platform. Which service should they use?

A.TensorFlow on Compute Engine

B.BigQuery ML

C.Vertex AI Training

D.Google Colab

AnswerB

BigQuery ML enables ML via SQL on BigQuery data.

Why this answer

BigQuery ML (B) is correct because it allows users to create and execute machine learning models using standard SQL syntax directly on data stored in BigQuery, without needing to export data to a separate platform. This service is specifically designed for running regression, classification, and other models natively within BigQuery, leveraging its serverless architecture and built-in ML capabilities.

Exam trap

The trap here is that candidates often confuse Vertex AI Training (a full-featured ML platform) with BigQuery ML, not realizing that Vertex AI requires data export and more setup, while BigQuery ML is purpose-built for in-database modeling with minimal overhead.

How to eliminate wrong answers

Option A is wrong because TensorFlow on Compute Engine requires moving data out of BigQuery to a virtual machine, where you must manually manage infrastructure, install dependencies, and write custom training code, which contradicts the requirement of not moving data. Option C is wrong because Vertex AI Training is a managed ML platform that typically requires exporting data from BigQuery to Cloud Storage or a dataset in Vertex AI, and it involves more complex pipeline setup than a simple regression model. Option D is wrong because Google Colab is a Jupyter notebook environment that runs in the cloud but requires data to be loaded from BigQuery into a DataFrame, moving it out of BigQuery's native storage, and it does not provide a direct SQL-based modeling interface.

Full explanation →

608

MCQeasy

In the transformer architecture, what is the role of the attention mechanism?

A.It normalizes the output of each layer

B.It decides which parts of the input to focus on when generating each token

C.It predicts the next token directly

D.It converts tokens into numerical vectors

AnswerB

Attention computes relevance scores between tokens, allowing the model to focus on relevant parts of the input.

Why this answer

The attention mechanism in the Transformer architecture computes a weighted sum of all input token representations, allowing the model to dynamically focus on the most relevant parts of the input sequence when generating each output token. This is achieved through learned query, key, and value projections that produce attention scores, enabling the model to capture long-range dependencies and contextual relationships. Option B correctly identifies this core function of selectively attending to input elements during token generation.

Exam trap

Cisco often tests the distinction between the attention mechanism's role in focusing on input parts versus the final prediction layer's role in outputting the next token, leading candidates to mistakenly select Option C.

How to eliminate wrong answers

Option A is wrong because normalization of layer outputs is performed by layer normalization, not the attention mechanism; attention computes relevance weights, not normalization statistics. Option C is wrong because predicting the next token directly is the role of the final linear layer and softmax over the vocabulary, while the attention mechanism provides contextualized representations that feed into that prediction. Option D is wrong because converting tokens into numerical vectors is the function of the embedding layer (token embeddings), not the attention mechanism, which operates on those vectors to compute attention scores.

Full explanation →

609

MCQmedium

A data scientist observes that a text generation model consistently produces outputs that stereotype certain genders. According to Google's AI Principles, what is the BEST first step?

A.Evaluate the model's bias using a diverse test set across genders

B.Immediately stop using the model and delete it

C.Fine-tune the model on a gender-balanced dataset

D.Add a disclaimer that the model may exhibit bias

AnswerA

Evaluation is the first step to quantify bias and inform mitigation strategies.

Why this answer

Option A is correct because Google's AI Principles emphasize that the first step in addressing bias is to evaluate and measure it using appropriate tools and diverse datasets. This aligns with Principle #2: 'Avoid creating or reinforcing unfair bias,' which requires testing models across relevant demographic groups before taking corrective action. Without evaluation, any subsequent mitigation steps would lack a baseline and could be ineffective or counterproductive.

Exam trap

Cisco often tests the misconception that mitigation (like fine-tuning or disclaimers) should be the immediate response, rather than the correct first step of systematic evaluation and measurement of bias.

How to eliminate wrong answers

Option B is wrong because immediately stopping use and deleting the model is an overreaction that violates the principle of 'Be socially beneficial' — the model may still provide value if bias is addressed, and deletion prevents any learning from the bias. Option C is wrong because fine-tuning on a gender-balanced dataset is a mitigation step that should only be taken after evaluation to understand the specific nature and extent of the bias; premature fine-tuning could introduce new biases or fail to address root causes. Option D is wrong because adding a disclaimer is a transparency measure, not a first step — it acknowledges bias without measuring or understanding it, which violates the principle of 'Be accountable to people' by avoiding proactive bias detection.

Full explanation →

610

MCQhard

You are a generative AI architect at a social media company. You are tasked with building a content moderation system that uses a generative model to flag toxic comments. The system must have very low false positive rates (i.e., not flag harmless comments) to avoid user backlash, but it must catch nearly all toxic comments. You have a large dataset of labeled toxic and non-toxic comments. You plan to use a pre-trained LLM and fine-tune it for classification. During experimentation, you notice that the model's recall for toxic comments is high (95%) but its precision is low (60%), leading to many false positives. You need to improve precision without substantially reducing recall. Which approach should you try first?

A.Gather additional toxic comments from similar platforms to augment the training data.

B.Apply a higher weight to the toxic class in the loss function during fine-tuning.

C.Use a smaller pre-trained model that is inherently less sensitive to subtle toxic language.

D.Tune the classification threshold on a held-out validation set to a higher value (e.g., require higher probability to classify as toxic).

AnswerD

Increasing the threshold reduces false positives (improves precision) with some loss in recall, which can be fine-tuned.

Why this answer

Option D is correct because tuning the classification threshold to a higher value directly addresses the low precision (high false positive rate) by requiring a higher confidence level before labeling a comment as toxic. This reduces false positives while maintaining high recall, as the model's underlying learned representations remain unchanged. The threshold adjustment is a standard post-hoc calibration technique that trades off precision and recall without retraining or altering the model architecture.

Exam trap

Cisco often tests the misconception that modifying the training data or loss function is the first step to fix precision issues, when in fact a simple threshold adjustment is the most direct and least risky intervention.

How to eliminate wrong answers

Option A is wrong because gathering additional toxic comments from similar platforms would primarily increase the recall (sensitivity) for toxic comments, not precision; it may even exacerbate false positives if the new data introduces noise or shifts the decision boundary. Option B is wrong because applying a higher weight to the toxic class in the loss function during fine-tuning would force the model to focus more on correctly classifying toxic examples, which typically increases recall but can further lower precision by making the model more aggressive in flagging borderline cases. Option C is wrong because using a smaller pre-trained model that is inherently less sensitive to subtle toxic language would likely reduce both recall and precision, as smaller models have lower capacity to capture nuanced patterns, leading to more false negatives and potentially more false positives due to coarser decision boundaries.

Full explanation →

611

MCQeasy

Which Google initiative provides a set of interactive, open-source tools to help UX designers and product managers build human-centered AI products?

A.TensorFlow Privacy

B.Model Cards

C.Datasheets for Datasets

D.People + AI Guidebook (PAIR)

AnswerD

PAIR is exactly that: an interactive guidebook for designing human-centered AI.

Why this answer

The People + AI Guidebook (PAIR) is a Google initiative that provides guidelines, case studies, and design patterns for building human-centered AI.

Full explanation →

612

Multi-Selecteasy

Which TWO Google Cloud services can be used together to implement a RAG (retrieval-augmented generation) pipeline? (Select 2)

Select 2 answers

A.Cloud SQL

B.Vertex AI Vector Search

C.Bigtable

D.Vertex AI PaLM API

E.Cloud Functions

AnswersB, D

Provides vector similarity search for retrieval.

Why this answer

Vertex AI Vector Search (option B) is correct because it provides a managed vector database for storing and querying embeddings, which is essential for the retrieval step in a RAG pipeline. It enables semantic similarity search over large datasets, allowing the system to fetch relevant context documents based on a user query.

Exam trap

Google Cloud often tests the misconception that any database (like Cloud SQL or Bigtable) can serve as a vector store for RAG, but they lack native vector indexing and similarity search, making them unsuitable for efficient retrieval at scale.

Full explanation →

613

MCQmedium

A company wants to embed a generative AI writing assistant into their Google Docs workflow. The assistant should help users draft emails and reports based on prompts. Which Google Workspace feature should they leverage?

A.Google Workspace Add-ons with Vertex AI

B.Vertex AI API integrated via Apps Script

C.Gmail Smart Compose

D.Duet AI in Google Docs (Gemini for Google Workspace)

AnswerD

Duet AI provides native 'Help me write' capabilities in Docs, ideal for drafting.

Why this answer

Duet AI in Docs (now Gemini for Google Workspace) provides 'Help me write' functionality for drafting content. Apps Script can be used for custom add-ons but requires development. Vertex AI API is for external integration.

Smart Compose is for Gmail only.

Full explanation →

614

Multi-Selecteasy

Which TWO are components of the Vertex AI Generative AI Studio?

Select 2 answers

A.Dataflow

B.Model Garden

C.Pipeline templates

D.Cloud Functions

E.Prompt Editor

AnswersB, E

Model Garden is a component for discovering and selecting models.

Why this answer

Model Garden is a core component of Vertex AI Generative AI Studio that provides a curated repository of foundation models, including Google's PaLM and Gemini models, as well as third-party models. It allows users to discover, compare, and deploy these models directly within the studio environment, making it essential for generative AI workflows.

Exam trap

Google Cloud often tests the distinction between core generative AI studio components (like Model Garden and Prompt Editor) and broader GCP services (like Dataflow or Cloud Functions) that are not part of the studio, leading candidates to select familiar but incorrect options.

Full explanation →

615

MCQeasy

What is the primary purpose of Google's Content Safety filters in Vertex AI?

A.To filter out low-quality training data

B.To ensure the model only generates content from a curated set of sources

C.To block generated content that contains hate speech, violence, or sexually explicit material

D.To improve the model's accuracy on safe content

AnswerC

Content Safety filters are designed to block harmful content categories.

Why this answer

Google's Content Safety filters in Vertex AI are designed to block generated content that violates safety policies, specifically targeting hate speech, violence, and sexually explicit material. This is a core component of responsible AI deployment, ensuring that model outputs adhere to ethical guidelines and legal requirements. The filters operate by analyzing the generated text or images against predefined safety categories, not by assessing data quality or source curation.

Exam trap

The trap here is that candidates may confuse Content Safety filters with data quality filters or source restrictions, assuming they improve model accuracy or curate training data, when in fact they are purely safety mechanisms applied at inference time.

How to eliminate wrong answers

Option A is wrong because Content Safety filters are not used to filter out low-quality training data; that function is handled by data preprocessing and curation pipelines, not by inference-time safety filters. Option B is wrong because Content Safety filters do not restrict the model to a curated set of sources; they block specific types of harmful content regardless of source, and the model can still generate from its full training distribution. Option D is wrong because the primary purpose is not to improve accuracy on safe content but to prevent the generation of unsafe content; accuracy improvements are a separate concern addressed by model tuning and evaluation.

Full explanation →

616

Multi-Selectmedium

A tech company wants to ensure that their generative AI model does not produce harmful content. They plan to use Google Cloud's content safety features. Which two methods can they use to customize content safety? (Choose two.)

Select 2 answers

A.Use the default safety filters without any modifications

B.Disable all safety filters for maximum model creativity

C.Adjust safety thresholds for different categories like hate speech and violence

D.Define a custom blocklist of prohibited words or phrases

E.Train a separate model to detect harmful content

AnswersC, D

Why this answer

Vertex AI allows setting safety thresholds per category (B) and defining blocklists for specific terms (D) to customize content filtering.

Full explanation →

617

MCQhard

An AI team is building a customer support chatbot for a telecom company using a fine-tuned LLM on Vertex AI. The model performs well on common issues but fails to answer correctly for rare or novel problems, often providing plausible-sounding but incorrect solutions. The team has a large corpus of internal troubleshooting documents. They want to minimize incorrect answers while keeping latency low. Which approach should they take?

A.Switch to a larger base model (e.g., Gemini Ultra) without any retrieval.

B.Implement a retrieval-augmented generation (RAG) pipeline using Vertex AI Search to fetch relevant documents before generating answers.

C.Collect more data on rare issues and continue fine-tuning the model weekly.

D.Use a few-shot prompt with 10 examples of rare problems and solutions.

AnswerB

RAG dynamically retrieves relevant context, enabling accurate answers for rare issues.

Why this answer

Option B is correct because implementing a RAG pipeline with Vertex AI Search allows the chatbot to retrieve relevant troubleshooting documents from the internal corpus in real-time, grounding the LLM's responses in authoritative sources. This approach directly addresses the problem of plausible-sounding but incorrect answers for rare/novel issues without requiring retraining, and it keeps latency low by fetching only the most relevant documents before generation.

Exam trap

Cisco often tests the misconception that fine-tuning or larger models alone can solve knowledge gaps, when in fact retrieval-augmented generation is the standard approach for grounding LLM outputs in up-to-date, domain-specific documents without retraining.

How to eliminate wrong answers

Option A is wrong because switching to a larger base model without retrieval does not solve the core issue of hallucination on rare/novel problems; larger models can still generate plausible-sounding but incorrect answers when they lack specific knowledge, and they often increase latency and cost. Option C is wrong because collecting more data on rare issues and fine-tuning weekly is resource-intensive, may lead to catastrophic forgetting of common issues, and cannot keep pace with the long tail of novel problems that emerge dynamically. Option D is wrong because a few-shot prompt with 10 examples is insufficient to cover the vast space of rare problems, and the model may still hallucinate when the input does not closely match any example, especially without retrieval grounding.

Full explanation →

618

Multi-Selecthard

A financial services firm is deploying a generative AI model to assist in loan approval decisions. To comply with regulatory requirements for fairness and explainability, which THREE actions should they take? (Choose 3)

Select 3 answers

A.Add SynthID watermarks to all model outputs

B.Increase the model size to improve accuracy

C.Evaluate the model for bias using diverse test sets

D.Implement chain-of-thought reasoning to explain loan decisions

E.Design a human-in-the-loop process with override capability

AnswersC, D, E

Bias evaluation is essential for fairness.

Why this answer

Option C is correct because evaluating the model for bias using diverse test sets is a fundamental step in ensuring fairness in AI-driven loan approvals. This involves testing the model across demographic groups (e.g., race, gender, age) to detect disparate impact, which is required by regulations like the Equal Credit Opportunity Act (ECOA) and Fair Housing Act. Without this evaluation, the model could inadvertently discriminate, leading to legal and ethical violations.

Exam trap

Cisco often tests the distinction between technical safeguards (like watermarks) and governance actions (like bias evaluation and explainability), leading candidates to mistakenly select watermarks as a fairness measure when they are only for content attribution.

Full explanation →

619

MCQmedium

Which Google AI model was the first to demonstrate that transformers could be pre-trained bidirectionally on a large corpus, leading to major improvements in language understanding?

A.GPT-3

B.AlphaGo

C.Transformer (the paper)

D.BERT

AnswerD

BERT is a bidirectional transformer pre-trained on a large corpus, setting new benchmarks for language understanding.

Why this answer

BERT (Bidirectional Encoder Representations from Transformers) was the first model to demonstrate that transformers could be pre-trained bidirectionally on a large corpus (BooksCorpus and English Wikipedia). By using a masked language model (MLM) objective, BERT conditions on both left and right context simultaneously, unlike previous unidirectional models, leading to significant improvements on 11 NLP benchmarks at its release.

Exam trap

Cisco often tests the distinction between introducing the transformer architecture (Option C) and being the first to apply bidirectional pre-training to it (Option D), causing candidates to confuse the original Transformer paper with BERT's specific contribution.

How to eliminate wrong answers

Option A is wrong because GPT-3 is a unidirectional (autoregressive) transformer model that predicts the next token left-to-right, not bidirectionally, and it was released after BERT. Option B is wrong because AlphaGo is a reinforcement learning model for playing the board game Go, not a language model, and it uses convolutional neural networks and Monte Carlo tree search, not bidirectional transformer pre-training. Option C is wrong because the Transformer paper ("Attention Is All You Need") introduced the transformer architecture itself but did not demonstrate bidirectional pre-training on a large corpus; it was a supervised translation model, not a pre-trained language model.

Full explanation →

620

Multi-Selectmedium

Which TWO are benefits of using retrieval-augmented generation (RAG) over fine-tuning?

Select 2 answers

A.No need for training

B.Higher accuracy on all tasks

C.More up-to-date information

D.Reduced model size

E.Lower latency

AnswersA, C

RAG does not require fine-tuning; it works with the base model plus retrieval.

Why this answer

Option A is correct because RAG does not require any training or fine-tuning of the underlying model. It works by retrieving relevant documents from an external knowledge base at inference time and providing them as context to the model, which generates an answer based on that context. This eliminates the need for costly and time-consuming model retraining or parameter updates.

Exam trap

Google Cloud often tests the misconception that RAG reduces latency or model size, when in fact it increases system complexity and inference time due to the retrieval step, while fine-tuning keeps the model unchanged in size and latency.

Full explanation →

621

MCQhard

A developer is building a chatbot for a medical application that discusses sensitive health topics. The chatbot consistently gets its outputs blocked. What should the developer do?

A.Disable the safety filter entirely to allow all topics.

B.Adjust the safety category thresholds to allow VIOLENCE and SEXUAL content since it's medical.

C.Increase the input token limit to 2000.

D.Review and refine the system instructions to avoid triggering safety filters, and consider using a different model endpoint that allows medical contexts.

AnswerD

Refining prompts and using appropriate endpoints can prevent unnecessary blocks.

Why this answer

Option D is correct because safety filters in generative AI models are triggered by content that violates predefined policies, often due to ambiguous or overly broad system instructions. By refining the system instructions to explicitly frame the medical context (e.g., 'This is a clinical discussion for educational purposes'), the developer can reduce false positives. Additionally, using a model endpoint fine-tuned for medical domains (e.g., Med-PaLM 2) bypasses generic safety restrictions while maintaining compliance with ethical guidelines.

Exam trap

Cisco often tests the misconception that adjusting safety thresholds or disabling filters is a valid technical fix, when in reality the correct approach is to refine system instructions and choose appropriate model endpoints to align with domain-specific policies.

How to eliminate wrong answers

Option A is wrong because disabling the safety filter entirely violates responsible AI practices and could expose users to harmful or unmoderated content, which is especially dangerous in a medical application. Option B is wrong because adjusting thresholds to allow VIOLENCE and SEXUAL content, even for medical contexts, is a misuse of category thresholds; these categories are designed to block explicit harm, not to be repurposed for clinical terminology (e.g., anatomical terms may be misclassified). Option C is wrong because increasing the input token limit to 2000 does not address the root cause of output blocking; safety filters operate on output content, not input length, and token limits affect context window capacity, not content moderation.

Full explanation →

622

MCQeasy

A company wants to build a chatbot that answers questions using their internal knowledge base. Which approach is most suitable?

A.Use Retrieval-Augmented Generation (RAG)

B.Fine-tune a model on the knowledge base

C.Train a new model from scratch

D.Use zero-shot prompting with no context

AnswerA

RAG retrieves relevant context and generates answers, perfect for knowledge base Q&A.

Why this answer

Retrieval-Augmented Generation (RAG) combines retrieval of relevant documents from a knowledge base with generative responses, making it ideal for this use case.

Full explanation →

623

MCQeasy

Which of the following is a key principle in Google's AI Principles that directly addresses the need to avoid creating or reinforcing unfair bias?

A.Avoid creating or reinforcing unfair bias

B.Uphold high standards of scientific excellence

C.Be socially beneficial

D.Be accountable to people

AnswerA

This is the exact principle that directly addresses unfair bias.

Why this answer

Option A is correct because Google's AI Principles explicitly state 'Avoid creating or reinforcing unfair bias' as a standalone principle. This principle directly mandates that AI systems must be designed and tested to mitigate biases in training data, model outputs, and deployment contexts, ensuring fairness across demographic groups. It is the most direct response to the question's focus on avoiding unfair bias.

Exam trap

Cisco often tests the distinction between principles that directly address bias versus those that are related but broader, so candidates may confuse 'Be socially beneficial' or 'Be accountable to people' as the correct answer because they seem to cover fairness, but they lack the explicit focus on avoiding unfair bias.

How to eliminate wrong answers

Option B is wrong because 'Uphold high standards of scientific excellence' addresses rigor, reproducibility, and methodological soundness, not the specific mitigation of unfair bias. Option C is wrong because 'Be socially beneficial' is a broader principle about overall positive impact, which includes but does not specifically target the avoidance of unfair bias. Option D is wrong because 'Be accountable to people' focuses on transparency, oversight, and redress mechanisms, not the direct prevention of bias in model design or data.

Full explanation →

624

MCQmedium

Refer to the exhibit. A data scientist runs the gcloud command and sees the model listed. However, when they try to deploy the model to an endpoint, they get an error: 'Model is not deployable'. What is the most likely reason?

A.The model is still in training and not yet ready.

B.The model was imported from a custom container but without a serving specification or artifact.

C.The model does not have the correct IAM permissions assigned to the deployment service account.

D.The region for the endpoint is different from the model's region.

AnswerB

A model must have a serving container or artifacts to be deployable.

Why this answer

Option B is correct because a model imported from a custom container must include a serving specification (e.g., a `predict` route) and an artifact (e.g., a saved model file) to be deployable. Without these, Vertex AI cannot determine how to serve predictions, resulting in the 'Model is not deployable' error. The `gcloud` command listing the model only confirms its registration, not its readiness for deployment.

Exam trap

Google Cloud often tests the misconception that a model listed in the registry is automatically deployable, but the trap here is that Vertex AI separates model registration from deployment readiness, requiring explicit serving configuration for custom containers.

How to eliminate wrong answers

Option A is wrong because if the model were still in training, it would not appear in the model list via `gcloud`; Vertex AI only registers a model after training completes. Option C is wrong because IAM permissions affect the deployment action itself (e.g., who can deploy), not the deployability status of the model; the error 'Model is not deployable' is a model-level validation, not an authorization failure. Option D is wrong because region mismatch between the endpoint and model would cause a resource-location error, not a 'Model is not deployable' error; Vertex AI enforces regional consistency but does not block deployment based on region alone.

Full explanation →

625

Multi-Selecteasy

A company is prompt engineering a model for customer support. They want to reduce hallucination (false information) in responses. Which TWO techniques are most effective? (Choose two.)

Select 2 answers

A.Implement RAG to retrieve relevant documents for context

B.Provide 3 few-shot examples of conversations

C.Reduce max output tokens to 150

D.Add a system instruction: 'Only answer based on the provided context.'

E.Increase temperature to 1.2

AnswersA, D

RAG provides factual grounding, reducing hallucination.

Why this answer

Option A is correct because Retrieval-Augmented Generation (RAG) grounds the model's output in external, verifiable documents retrieved from a knowledge base. By providing relevant context at inference time, RAG significantly reduces the likelihood of the model fabricating information, as it can reference and paraphrase from the retrieved sources rather than relying solely on its parametric memory.

Exam trap

Cisco often tests the misconception that any parameter adjustment (like reducing tokens or increasing temperature) can fix hallucination, when in fact only techniques that constrain the model's knowledge source (like RAG and strict system instructions) are effective for reducing false information.

Full explanation →

626

MCQhard

An enterprise is deploying a customer-facing chatbot using a foundation model on Vertex AI. They need to ensure the model does not produce toxic outputs. Which combination of settings and features should they implement?

A.Use Reinforcement Learning from Human Feedback (RLHF) during fine-tuning

B.Enable Vertex AI Model Monitoring and set up alerts for toxic outputs

C.Reduce the temperature to 0.0 and increase top-k to 50

D.Configure safety filters and safety settings in the model deployment to block harmful categories

AnswerD

Safety filters provide real-time content moderation, blocking toxic outputs before they reach users.

Why this answer

Option D is correct because safety filters and safety settings in Vertex AI model deployment are the direct mechanism to block harmful categories of output at inference time. These settings allow administrators to define thresholds for categories like toxicity, harassment, and hate speech, ensuring the model refuses to generate prohibited content without requiring retraining or post-hoc monitoring.

Exam trap

Cisco often tests the distinction between training-time alignment techniques (like RLHF) and inference-time safety controls (like safety filters), tempting candidates to choose a fine-tuning approach when the question explicitly asks for deployment settings to prevent toxic outputs.

How to eliminate wrong answers

Option A is wrong because RLHF is a fine-tuning technique that aligns model behavior based on human preferences, but it does not provide real-time blocking of toxic outputs during inference; it only influences the model's general behavior after training. Option B is wrong because Vertex AI Model Monitoring is designed for detecting data drift and performance anomalies, not for filtering or blocking toxic content in real-time responses. Option C is wrong because reducing temperature to 0.0 makes the model deterministic and increasing top-k to 50 broadens token selection, which does not prevent toxicity; these parameters control randomness, not content safety.

Full explanation →

627

Multi-Selectmedium

A company is considering whether to use Vertex AI's Generative AI Studio. Which TWO are benefits?

Select 2 answers

A.It is always cheaper than using third-party APIs

B.It integrates seamlessly with Vertex AI Pipelines for MLOps

C.It generates outputs that are always more accurate than custom models

D.It provides built-in tools for prompt engineering and iterative testing

E.It requires no coding or machine learning expertise to use

AnswersB, D

Integration allows automating deployment, monitoring, and retraining.

Why this answer

Option B is correct because Vertex AI Generative AI Studio is designed to work natively with Vertex AI Pipelines, enabling users to incorporate generative models into end-to-end MLOps workflows for automation, monitoring, and retraining. This integration allows seamless orchestration of prompt tuning, model evaluation, and deployment within the same managed environment, reducing operational overhead.

Exam trap

Google Cloud often tests the misconception that 'no-code' tools eliminate the need for any ML expertise, but the trap here is that Generative AI Studio still requires understanding of prompt engineering, model evaluation, and cost trade-offs to avoid poor outputs or unexpected expenses.

Full explanation →

628

MCQmedium

A software company wants to provide users with a clear understanding of when and why their AI system may produce incorrect answers. Which tool from the Responsible AI toolkit should they use to communicate model limitations?

A.People + AI Guidebook

B.PAIR Explorables

C.Model Cards

D.Datasheets for Datasets

AnswerC

Model Cards include sections on limitations, ethical considerations, and intended use.

Why this answer

Model Cards are designed to communicate model performance, intended use, and limitations to stakeholders in a standardized format.

Full explanation →

629

MCQmedium

A company wants to generate high-quality product images from text descriptions for an e-commerce catalog. They need photorealistic results. Which model and approach should they choose?

A.Use Veo for video generation and extract frames

B.Fine-tune Gemini 1.5 Pro on product images

C.Use Imagen on Vertex AI with appropriate prompts

D.Use Codey to generate code that renders images

AnswerC

Imagen is optimized for photorealistic image generation from text.

Why this answer

Imagen on Vertex AI is specifically designed for high-quality, photorealistic text-to-image generation, making it the ideal choice for creating product images from text descriptions. It leverages advanced diffusion models to produce detailed and visually accurate outputs that meet the requirements of an e-commerce catalog.

Exam trap

Cisco often tests the distinction between models specialized for different modalities (text, image, video, code) to see if candidates recognize that a dedicated image generation model like Imagen is required for photorealistic text-to-image tasks, rather than repurposing video or code models.

How to eliminate wrong answers

Option A is wrong because Veo is a video generation model, and extracting frames from video would introduce motion artifacts, temporal inconsistencies, and lower resolution compared to a dedicated image generation model, failing to achieve photorealistic results. Option B is wrong because Gemini 1.5 Pro is a multimodal large language model optimized for understanding and generating text, code, and reasoning, not for high-fidelity image generation; fine-tuning it on product images would not produce photorealistic outputs as it lacks a diffusion-based image generation architecture. Option D is wrong because Codey is a code generation model designed to produce code snippets, not to render images; using it to generate code that renders images would require additional rendering engines and would not directly produce photorealistic images from text descriptions.

Full explanation →

630

Multi-Selectmedium

A company is deploying a chatbot using Gemini 1.5 Pro. They want to reduce the risk of the chatbot generating toxic or harmful content. Which TWO techniques should they implement? (Choose two.)

Select 2 answers

A.Apply Reinforcement Learning from Human Feedback (RLHF) after deployment

B.Include a system prompt that instructs the model to be helpful and harmless

C.Use a RAG system to ground responses in a knowledge base

D.Configure Google's safety filters and thresholds in Vertex AI

E.Fine-tune the model on a curated dataset of safe conversations

AnswersD, E

Safety filters can block categories of harmful content before generation.

Why this answer

Safety filters (e.g., Google's safety settings) block harmful content. Fine-tuning with curated safe examples reduces the likelihood of generating harmful outputs. Prompt engineering alone is insufficient, RLHF is post-training and may not catch all cases, and RAG is for grounding, not safety.

Full explanation →

631

Multi-Selectmedium

A development team is integrating a large language model into a healthcare application. They need to reduce the risk of generating harmful medical advice. Which THREE measures should they implement? (Choose three.)

Select 3 answers

A.Use a safety filter to block outputs containing harmful medical terminology.

B.Implement RAG to retrieve verified medical information from trusted sources.

C.Fine-tune the model on a curated dataset of medical textbooks.

D.Include a disclaimer in the system instruction that the model is not a doctor.

E.Set the temperature to a very high value to ensure diverse outputs.

AnswersA, B, C

Safety filters directly block harmful content at inference time.

Why this answer

Option A is correct because implementing a safety filter that blocks outputs containing harmful medical terminology directly mitigates the risk of generating dangerous advice. This acts as a post-processing guardrail, intercepting model outputs that include terms associated with diagnoses, dosages, or procedures that could lead to patient harm. It is a standard practice in high-stakes domains to layer such filters on top of the generative model.

Exam trap

Cisco often tests the misconception that disclaimers or system instructions alone are sufficient safety measures, when in fact they do not technically prevent the model from generating harmful content—only post-hoc filtering or architectural controls like RAG and fine-tuning can reduce the risk at the output level.

Full explanation →

632

MCQeasy

A small business wants to use Vertex AI to analyze customer reviews and extract sentiment, product mentions, and overall themes. They have a small dataset of 500 reviews in a CSV file. The team is not experienced with machine learning and wants a pre-built solution that requires minimal coding. They want to start quickly and scale later. Which Google Cloud offering should they use?

A.Cloud Natural Language API for pre-trained sentiment and entity extraction.

B.Vertex AI Workbench to build a custom sentiment analysis model.

C.AutoML Natural Language to train a custom model on their data.

D.Vertex AI Gemini API with zero-shot prompting.

AnswerA

This is a pre-built API that requires no ML experience and can be used immediately.

Why this answer

Option A is correct because Cloud Natural Language API provides pre-trained models for sentiment analysis and entity extraction, requiring minimal coding (just API calls) and no ML expertise. This aligns with the business's need for a quick, scalable, pre-built solution for their small dataset of 500 reviews, avoiding the overhead of custom training or complex prompting.

Exam trap

The trap here is that candidates confuse 'pre-built API' (Cloud Natural Language API) with 'custom training' (AutoML) or 'generative AI' (Gemini), assuming that any AI solution requires custom model building or that generative models are suitable for structured NLP tasks like sentiment extraction.

How to eliminate wrong answers

Option B is wrong because Vertex AI Workbench is a Jupyter-based development environment for building custom ML models from scratch, which requires significant coding and ML expertise, contradicting the team's lack of experience and desire for minimal coding. Option C is wrong because AutoML Natural Language requires training a custom model on the user's data, which involves data labeling, training time, and cost overkill for a small 500-review dataset, and still demands more setup than a pre-built API. Option D is wrong because Vertex AI Gemini API with zero-shot prompting is designed for generative tasks (e.g., summarization, generation) and not optimized for structured sentiment and entity extraction from tabular CSV data; it also requires prompt engineering and may produce inconsistent, non-deterministic results compared to a dedicated NLP API.

Full explanation →

633

MCQeasy

A project manager wants to automatically generate weekly status reports from meeting notes and project data. The team uses Google Workspace. Which built-in capability is the QUICKEST to implement?

A.Use Gemini for Workspace (Duet AI) in Google Docs and Google Meet to generate summaries and reports

B.Select a model from Model Garden and deploy it as a private endpoint for report generation

C.Use Vertex AI Studio to design a custom prompt and call the Gemini API from a custom app

D.Write a Google Apps Script to call the Gemini API and format the report

AnswerA

Gemini for Workspace provides built-in AI capabilities that can summarize meeting notes and help write reports directly in Docs, requiring no custom code.

Why this answer

Gemini for Workspace (Duet AI) can generate summaries directly in Google Docs and Meet, leveraging existing data with no custom development. Custom API integration or Model Garden would require more effort. Apps Script is for custom automation, not built-in.

Full explanation →

634

MCQhard

A healthcare organization needs a generative AI model to answer medical questions using proprietary clinical guidelines. They have a large dataset of doctor-patient interactions. Should they fine-tune a pre-trained model or use Retrieval-Augmented Generation (RAG)?

A.Use RAG to reduce inference costs by skipping model updates.

B.Use RAG to retrieve relevant guidelines during inference, avoiding frequent retraining.

C.Use prompt engineering to encode all guidelines into the system prompt.

D.Fine-tune the model on the clinical guidelines and interactions.

AnswerB

RAG dynamically pulls up-to-date guidelines, ensuring accuracy and compliance.

Why this answer

RAG is preferred because it can incorporate the latest guidelines without retraining, crucial for regulatory changes. Fine-tuning may cause overfitting to outdated interactions. Option B is wrong because fine-tuning requires continuous retraining.

Option C is wrong because prompt engineering alone cannot inject proprietary knowledge. Option D is wrong because RAG does not inherently reduce cost.

Full explanation →

635

MCQmedium

A team deployed a custom generative AI model using KServe on Google Kubernetes Engine (GKE) with the above configuration. They notice that the model is taking longer than expected to respond. What is the most likely cause?

A.The CPU resource limits are too low

B.The model is crashing due to insufficient memory

C.The model requires more than 1 GPU for acceptable performance

D.The container image is too large and takes time to pull

AnswerC

Large generative models often need multiple GPUs for low latency.

Why this answer

The configuration specifies 1 GPU, but the model requires more than 1 GPU for acceptable performance. KServe on GKE allocates GPU resources based on the `limits` field; if the model's inference workload exceeds the memory bandwidth or compute capacity of a single GPU, latency increases due to queuing and serialization. This is the most likely cause of the slow response time, as GPU-bound models are sensitive to under-provisioning.

Exam trap

The trap here is that candidates assume slow responses always indicate a resource shortage like CPU or memory, but for GPU-accelerated models, the most common cause of high latency is insufficient GPU compute or memory bandwidth, not CPU or memory limits.

How to eliminate wrong answers

Option A is wrong because CPU resource limits affect non-GPU compute tasks, but the primary bottleneck for a GPU-accelerated model is GPU throughput, not CPU; low CPU limits would cause throttling only if the model has CPU-intensive preprocessing or postprocessing, which is not indicated. Option B is wrong because insufficient memory would cause the pod to be OOMKilled (crash) rather than just slow responses; the model is responding, so memory is sufficient. Option D is wrong because the container image pull happens during pod startup, not during inference; once the pod is running, image size does not affect response latency.

Full explanation →

636

MCQeasy

A data scientist is evaluating a generative AI model for potential gender bias in its outputs. They use a diverse test set that includes names, pronouns, and occupations across genders. Which Google's AI Principle does this practice primarily support?

A.Avoid creating or reinforcing unfair bias

B.Incorporate privacy design principles

C.Be built and tested for safety

D.Be socially beneficial

AnswerA

Using diverse test sets to evaluate bias directly addresses this principle.

Full explanation →

637

MCQeasy

A data scientist needs to generate high-quality images from text prompts using Google Cloud. Which service should they use?

A.Imagen

B.PaLM 2

C.Gemini Pro Vision

D.Codey

AnswerA

Imagen is Google's diffusion-based model for generating images from text prompts.

Why this answer

Imagen is Google Cloud's text-to-image diffusion model. PaLM 2 and Gemini are primarily text models; Codey is for code generation.

Full explanation →

638

Multi-Selectmedium

Which TWO of the following are best practices for prompt engineering?

Select 2 answers

A.Provide context and examples in the prompt

B.Append random noise to prompts to improve creativity

C.Use clear and specific instructions

D.Always use the maximum possible number of tokens

E.Use negative prompts to discourage undesired outputs

AnswersA, C

Context and examples help the model understand the desired output.

Why this answer

Clear and specific instructions help guide the model, and providing context and examples improves output quality. Options B, D, and E are not recommended.

Full explanation →

639

MCQeasy

Which Google resource provides interactive visualizations and exercises to help AI practitioners understand concepts like fairness, interpretability, and privacy?

A.PAIR Explorables

B.Model Cards

C.People + AI Guidebook

D.TensorFlow Fairness Indicators

AnswerA

PAIR Explorables are interactive and educational.

Why this answer

PAIR (People + AI Research) Explorables is the correct answer because it is a Google resource specifically designed to provide interactive visualizations and hands-on exercises that help AI practitioners grasp complex concepts like fairness, interpretability, and privacy. Unlike static documentation or tools, Explorables allow users to manipulate parameters and see real-time effects, making abstract responsible AI principles tangible and actionable.

Exam trap

Cisco often tests the distinction between educational/interactive resources and practical implementation tools, so the trap here is that candidates confuse Model Cards or Fairness Indicators (which are about applying fairness) with PAIR Explorables (which are about learning fairness concepts through interaction).

How to eliminate wrong answers

Option B (Model Cards) is wrong because Model Cards are standardized documentation templates that disclose model performance, intended use, and fairness evaluations, but they do not offer interactive visualizations or exercises for learning concepts. Option C (People + AI Guidebook) is wrong because it is a static design guide with best practices and patterns for human-AI interaction, not an interactive learning tool with visualizations and exercises. Option D (TensorFlow Fairness Indicators) is wrong because it is a suite of tools for computing and visualizing fairness metrics on model evaluations, but it is a practical debugging tool, not an educational resource with interactive exercises to teach underlying concepts.

Full explanation →

640

MCQhard

A research lab is using Vertex AI to generate high-resolution medical images (2560x1920) of cell structures using Imagen. They have fine-tuned the model on their own microscope images. The generated images are sharp but often contain repeating patterns (e.g., identical cell arrangements) that are not biologically plausible. The team suspects the model is overfitting to spatial patterns in the training data. They have already tried increasing the training dataset size and augmenting it with rotations and flips. What additional technique should they try within Vertex AI?

A.Switch to a different foundation model like Stable Diffusion.

B.Add regularization techniques such as dropout layers or data augmentation that randomly crops and blends patches.

C.Use a larger batch size during fine-tuning.

D.Further increase the resolution of training images to 5120x3840.

AnswerB

Regularization helps prevent overfitting to specific spatial patterns.

Why this answer

Option B is correct because the repeating patterns indicate the model is memorizing spatial arrangements rather than learning generalizable features. Adding regularization like dropout layers or data augmentation that randomly crops and blends patches (e.g., CutMix or MixUp) directly reduces overfitting by forcing the model to focus on local, biologically plausible details rather than memorizing entire image layouts. Vertex AI's training pipelines support custom augmentation strategies, making this a practical and targeted fix.

Exam trap

The trap here is that candidates assume increasing data or resolution always helps generalization, but in generative models, overfitting to spatial patterns requires explicit regularization techniques that disrupt memorization of layout, not just more data or higher resolution.

How to eliminate wrong answers

Option A is wrong because switching to a different foundation model like Stable Diffusion does not address the root cause of overfitting to spatial patterns; it merely changes the base model, and the same overfitting issue would likely recur without regularization. Option C is wrong because increasing batch size during fine-tuning can improve training stability but does not prevent the model from memorizing repetitive spatial patterns; it may even exacerbate overfitting by reducing gradient noise. Option D is wrong because further increasing the resolution of training images to 5120x3840 would not solve the overfitting problem and could worsen it by providing more pixel-level details for the model to memorize, while also increasing computational cost and risk of overfitting to high-frequency noise.

Full explanation →

641

MCQeasy

A data scientist needs to fine-tune a foundation model for a sentiment analysis task without managing infrastructure. Which Google Cloud service should they use?

A.Compute Engine

B.BigQuery ML

C.Cloud Run

D.Vertex AI Model Garden

AnswerD

Model Garden offers managed fine-tuning of foundation models without infrastructure overhead.

Why this answer

Vertex AI Model Garden is the correct service because it provides a curated hub of foundation models that can be fine-tuned with managed infrastructure, eliminating the need for the data scientist to provision or manage servers. It supports one-click deployment and fine-tuning workflows for sentiment analysis, directly addressing the requirement to avoid infrastructure management.

Exam trap

The trap here is that candidates often confuse BigQuery ML's ability to train models on tabular data with the capability to fine-tune large language models, but BigQuery ML does not support fine-tuning of foundation models for NLP tasks.

How to eliminate wrong answers

Option A is wrong because Compute Engine is an IaaS offering that requires the user to manually provision, configure, and manage virtual machines, which contradicts the requirement of not managing infrastructure. Option B is wrong because BigQuery ML is designed for creating and executing machine learning models using SQL queries on structured data in BigQuery, not for fine-tuning large foundation models for natural language tasks like sentiment analysis. Option C is wrong because Cloud Run is a serverless container platform for running stateless HTTP-driven applications, but it does not provide native support for fine-tuning foundation models; it would require the user to build and manage the fine-tuning pipeline themselves.

Full explanation →

642

MCQmedium

A company fine-tunes a model using Vertex AI and notices the model's performance drops on the original training task (e.g., language understanding) after fine-tuning for a new task (e.g., summarization). What could be the cause?

A.Data leakage

B.Model quantization

C.Catastrophic forgetting

D.Underfitting

AnswerC

Fine-tuning on a narrow task can overwrite general knowledge, leading to performance degradation on the original task.

Why this answer

Catastrophic forgetting occurs when a neural network loses previously learned knowledge upon being fine-tuned on a new task. In this scenario, fine-tuning the model for summarization overwrites the weights responsible for language understanding, causing performance degradation on the original task. This is a well-known limitation of sequential fine-tuning in deep learning.

Exam trap

Google Cloud often tests the distinction between catastrophic forgetting and underfitting, as candidates may mistakenly think the model simply didn't learn the new task well, rather than recognizing that it forgot the original task due to weight overwriting.

How to eliminate wrong answers

Option A is wrong because data leakage refers to the inadvertent exposure of target information during training, which would typically inflate performance metrics rather than cause a drop on the original task. Option B is wrong because model quantization reduces numerical precision (e.g., from FP32 to INT8) to improve inference speed and memory efficiency, but it does not inherently cause performance loss on a previously learned task; any accuracy loss from quantization is generally uniform across tasks. Option D is wrong because underfitting means the model fails to capture patterns in the training data, resulting in poor performance on both the original and new tasks, not a selective drop on the original task after fine-tuning.

Full explanation →

643

Multi-Selectmedium

A company is using Vertex AI to generate report summaries. They want to measure ROI. Which three metrics should they track? (Choose THREE)

Select 3 answers

A.Error rate (e.g., factual errors in summaries)

B.Number of employees trained on the system

C.Number of active users per month

D.Accuracy score of generated summaries compared to human-written ones

E.Time saved per report (minutes)

AnswersA, D, E

Reducing errors improves trust and saves correction time.

Why this answer

Option A is correct because error rate directly measures the quality and reliability of generated summaries, which is a key factor in determining ROI. If summaries contain factual errors, they require human correction, reducing the time savings and potentially introducing business risks. Tracking error rate helps quantify the cost of inaccuracies against the benefits of automation.

Exam trap

The trap here is that candidates confuse adoption metrics (active users, training count) with ROI metrics, but ROI requires direct financial or efficiency quantification, not just usage statistics.

Full explanation →

644

Multi-Selecthard

A company is deploying a document summarization solution using Vertex AI. They want to minimize cost while maintaining quality. Which three strategies should they implement? (Choose THREE)

Select 3 answers

A.Use prompt compression to cut token usage

B.Choose a smaller, specialized model (e.g., Gemini 1.5 Flash)

C.Implement response caching for repeated queries

D.Use the largest available model for best quality

E.Batch multiple document summarization requests together

AnswersB, C, E

Smaller models are cheaper and often adequate for summarization.

Why this answer

Using caching reduces repeated processing, choosing a smaller model lowers token cost, and batching requests minimizes overhead. Prompt compression is not a standard Vertex AI feature, and using the largest model increases cost.

Full explanation →

645

MCQhard

A developer uses the Vertex AI Python SDK to call a Gemini model for structured JSON output. However, the model often returns malformed JSON. Which parameter should the developer set in the generation configuration to enforce valid JSON output?

A.Set the temperature to a lower value (0.1) to reduce variation.

B.Set the 'response_mime_type' parameter to 'application/json'.

C.Include few-shot examples of the desired JSON format in the system prompt.

D.Switch to a smaller model to reduce complexity.

AnswerB

This parameter forces the model to output valid JSON, supported by Gemini.

Why this answer

Option B is correct because setting `response_mime_type` to `'application/json'` in the generation configuration instructs the Gemini API to constrain the model's output to valid JSON format. This parameter leverages the model's native structured output capability, ensuring the response adheres to JSON syntax without relying on post-processing or prompt engineering.

Exam trap

Google Cloud often tests the misconception that prompt engineering (e.g., few-shot examples or temperature tuning) can reliably enforce structured output, when in fact the correct approach is to use the API's native structured output parameter like `response_mime_type`.

How to eliminate wrong answers

Option A is wrong because lowering temperature reduces randomness but does not enforce structural constraints; the model can still produce malformed JSON due to token-level deviations. Option C is wrong because few-shot examples in the system prompt improve formatting consistency but do not guarantee valid JSON output, as the model may still generate syntax errors or deviate from the schema. Option D is wrong because switching to a smaller model reduces capacity and may increase the likelihood of malformed output, and model size does not address the need for structured output enforcement.

Full explanation →

646

MCQhard

A healthcare startup fine-tunes a model to generate patient education materials. They want to ensure the model never gives medical advice, only information. They add a safety instruction, but the model sometimes still gives advice. What advanced technique should they apply?

A.Hard-code a list of prohibited phrases in a post-processing script

B.Add a secondary classifier to rewrite any detected advice into general information

C.Use semantic similarity to a 'medical advice' embedding and reject if close

D.Apply RLHF with a reward model that penalizes outputs containing medical advice

AnswerD

RLHF directly optimizes the model to avoid undesired behaviors based on human preferences.

Why this answer

RLHF (Reinforcement Learning from Human Feedback) directly addresses the model's behavior by training a reward model that penalizes outputs containing medical advice. This aligns the model's generation with the safety instruction at a fundamental level, rather than relying on brittle post-hoc filters or static embeddings that can be easily circumvented by novel phrasings.

Exam trap

Cisco often tests the misconception that static rule-based or similarity-based filters are sufficient for safety, when in fact only training-time alignment methods like RLHF can fundamentally change the model's behavior to avoid prohibited outputs.

How to eliminate wrong answers

Option A is wrong because hard-coding a list of prohibited phrases is brittle and fails against adversarial or paraphrased advice that doesn't match the exact phrases. Option B is wrong because adding a secondary classifier to rewrite detected advice introduces latency, potential for semantic drift, and cannot handle nuanced contexts where advice is implied rather than explicit. Option C is wrong because semantic similarity to a static 'medical advice' embedding is threshold-dependent and can produce false positives (flagging general information) or false negatives (missing advice phrased differently), and it does not train the model to avoid the behavior.

Full explanation →

647

MCQmedium

An e-commerce company uses a generative AI model to generate marketing copy. They notice that the model occasionally produces off-brand or inappropriate content. What is the best way to mitigate this?

A.Reduce the model's temperature

B.Increase the model's top-k sampling

C.Use a safety filter

D.Fine-tune the model on brand guidelines

AnswerD

Trains the model to adhere to brand style and content.

Why this answer

Fine-tuning the model on brand guidelines directly addresses the root cause of off-brand or inappropriate content by adjusting the model's weights to align with specific stylistic and content constraints. This supervised learning approach teaches the model the desired output patterns, making it inherently less likely to generate violations compared to post-hoc filtering or sampling adjustments.

Exam trap

Cisco often tests the misconception that sampling parameters (temperature, top-k) or safety filters are sufficient for content alignment, when in fact they only control randomness or block explicit violations, not the underlying model behavior that fine-tuning corrects.

How to eliminate wrong answers

Option A is wrong because reducing temperature lowers output randomness but does not enforce brand-specific constraints; it may still produce off-brand content that is simply less diverse. Option B is wrong because increasing top-k sampling restricts token selection to the top k most likely tokens, which can reduce creativity but does not incorporate brand guidelines or safety rules. Option C is wrong because a safety filter is a post-processing step that can catch explicit violations but does not prevent the model from generating subtly off-brand content that passes the filter, and it adds latency.

Full explanation →

648

Multi-Selecteasy

A developer wants to use the Gemini API to generate creative text. Which TWO parameters can they adjust to influence the output?

Select 2 answers

A.Color space

B.Audio sample rate

C.Top-k

D.Image size

E.Temperature

AnswersC, E

Top-k limits the vocabulary sampled.

Why this answer

Option C is correct because Top-k is a sampling parameter that restricts the model to consider only the k most likely next tokens, reducing randomness and focusing the output on higher-probability choices. This directly influences the diversity and coherence of generated text in the Gemini API.

Exam trap

Cisco often tests the distinction between model parameters that affect text generation (like Temperature and Top-k) versus media-specific parameters (like color space or image size), leading candidates to confuse domain-specific settings with generative AI controls.

Full explanation →

649

MCQeasy

A data scientist wants to quickly prototype a text generation application using Google's foundation models. Which Google Cloud service should they use?

A.Generative AI Studio

B.Cloud Natural Language API

C.Vertex AI Prediction

D.AI Platform Training

AnswerA

Generative AI Studio provides a no-code interface to prototype with foundation models.

Why this answer

Generative AI Studio is the correct service because it provides a purpose-built environment for quickly prototyping and experimenting with Google's foundation models, including text generation models like PaLM 2 and Gemini. It offers a no-code interface and SDK access for rapid iteration, directly aligning with the data scientist's goal of fast prototyping without needing to manage infrastructure or training pipelines.

Exam trap

The trap here is that candidates confuse the purpose of Cloud Natural Language API (a non-generative analysis tool) with generative AI capabilities, or assume Vertex AI Prediction is the correct choice for prototyping when it is actually designed for serving deployed models, not interactive experimentation.

How to eliminate wrong answers

Option B is wrong because Cloud Natural Language API is a pre-trained API for analyzing text (e.g., sentiment, entity extraction) and does not support generative text generation or foundation model prototyping. Option C is wrong because Vertex AI Prediction is used for deploying and serving trained models for inference, not for rapid prototyping or interactive experimentation with foundation models. Option D is wrong because AI Platform Training (now part of Vertex AI) is designed for training custom machine learning models, not for quickly prototyping with pre-built foundation models.

Full explanation →

650

MCQmedium

A financial services firm wants to use generative AI to assist employees in drafting emails. They are evaluating Duet AI in Gmail (now Gemini in Workspace). Which capability directly supports this use case?

A.Speaker notes generation in Slides

B.Smart Compose in Gmail

C.Meeting summaries in Meet

D.Help me write in Docs

AnswerB

Smart Compose uses generative AI to suggest complete sentences as the user types, speeding up email drafting.

Why this answer

Smart Compose in Gmail is a generative AI feature that provides real-time, context-aware suggestions to help users draft emails faster. It directly supports the use case of assisting employees in drafting emails by predicting and completing sentences as they type, reducing effort and improving efficiency.

Exam trap

Cisco often tests the distinction between generative AI features across different Workspace apps, and the trap here is confusing 'Help me write in Docs' (a document-focused tool) with the email-specific Smart Compose in Gmail, leading candidates to pick a feature that is not directly integrated into the email drafting workflow.

How to eliminate wrong answers

Option A is wrong because Speaker notes generation in Slides is designed to create presentation notes, not assist with email drafting. Option C is wrong because Meeting summaries in Meet provide post-meeting recaps, not real-time email composition assistance. Option D is wrong because Help me write in Docs is a generative AI feature for document creation in Google Docs, not for drafting emails within Gmail.

Full explanation →

651

MCQeasy

A healthcare company wants to use Gemini to analyze patient records and summarize findings. Which data privacy practice is most critical when using the Gemini API on Vertex AI?

A.Fine-tune Gemini using PHI to improve accuracy.

B.Disable request-response logging in Vertex AI to ensure data is not stored.

C.Enable Vertex AI Data Governance to mask or redact PII before sending to the API.

D.Use the text-davinci-003 model instead of Gemini, as it is more private.

AnswerC

D is correct because Data Governance can automatically protect sensitive data.

Why this answer

Option C is correct because Vertex AI Data Governance allows you to configure data masking or redaction of personally identifiable information (PII) before the data is sent to the Gemini API, ensuring compliance with healthcare regulations like HIPAA. This is the most critical practice because it prevents PHI from being exposed to the model or stored in logs, directly addressing the core privacy requirement. Disabling logging alone (Option B) does not prevent PHI from being processed by the model, and fine-tuning with PHI (Option A) introduces significant compliance risks.

Exam trap

Google Cloud often tests the misconception that disabling logging is sufficient for data privacy, when in fact the critical step is preventing sensitive data from being sent to the API in the first place, which is achieved through data masking or redaction.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini using PHI would require storing and processing that data in a training pipeline, which violates HIPAA and other data privacy regulations unless strict de-identification and contractual safeguards are in place; it also increases the attack surface for data breaches. Option B is wrong because disabling request-response logging in Vertex AI prevents storage of API interactions but does not prevent PHI from being sent to and processed by the Gemini model itself, leaving the data exposed during inference. Option D is wrong because text-davinci-003 is an OpenAI model, not available on Vertex AI, and it does not inherently offer better privacy controls; the comparison is irrelevant and the premise is false.

Full explanation →

652

Multi-Selecthard

Which THREE of the following are potential risks when deploying generative AI?

Select 3 answers

A.Hallucinations

B.Memorization of sensitive training data

C.Bias and fairness issues

D.Increased model accuracy

E.Toxic or harmful content generation

AnswersA, C, E

Models can generate false or fabricated information.

Why this answer

Option A is correct because generative AI models, particularly large language models (LLMs), can produce plausible-sounding but factually incorrect or nonsensical outputs, known as hallucinations. This occurs due to the model's probabilistic nature and lack of true understanding, where it generates text based on learned patterns rather than verified facts.

Exam trap

Google Cloud often tests the distinction between risks and benefits, so the trap here is that candidates may mistakenly identify 'increased model accuracy' as a risk, when it is actually a performance improvement and not a deployment risk.

Full explanation →

653

MCQeasy

What is the primary benefit of using foundation models (like Gemini) as opposed to training a model from scratch?

A.They are always faster at inference

B.They guarantee 100% accuracy on all tasks

C.They require less data and compute to adapt to new tasks

D.They are open-source and free to use

AnswerC

Pre-training provides a strong base; fine-tuning or prompting needs fewer resources.

Why this answer

Foundation models like Gemini are pre-trained on vast datasets, capturing general language understanding and patterns. Adapting them to new tasks via fine-tuning requires significantly less task-specific data and computational resources compared to training a model from scratch, which demands enormous datasets and compute for initial training. This transfer learning approach is the primary benefit, enabling efficient customization for specialized applications.

Exam trap

Cisco often tests the misconception that 'pre-trained' means 'free' or 'always faster,' leading candidates to pick options that confuse inference speed or licensing with the core benefit of reduced data and compute for adaptation.

How to eliminate wrong answers

Option A is wrong because inference speed depends on model architecture, size, and optimization, not on whether the model is a foundation model or trained from scratch; a smaller custom model can be faster at inference than a large foundation model. Option B is wrong because no model, including foundation models, can guarantee 100% accuracy on all tasks due to inherent data biases, distribution shifts, and the complexity of real-world tasks. Option D is wrong because while some foundation models are open-source (e.g., Llama), many, including Gemini, are proprietary and require API access or licensing fees, so they are not universally free or open-source.

Full explanation →

654

MCQmedium

A data scientist wants to run large-scale distributed training of a custom deep learning model using Google's custom AI accelerators. Which infrastructure should they choose to minimize cost while leveraging Google's proprietary chips?

A.Cloud TPU v5e

B.Compute Engine with NVIDIA A100 GPUs

C.Vertex AI Workbench with custom machines

D.Google Colab Pro with TPU runtime

AnswerA

Cloud TPU v5e is Google's custom AI accelerator, optimized for large-scale training and cost-efficient for many models.

Why this answer

Cloud TPU v5e is the correct choice because it is Google's proprietary custom AI accelerator designed specifically for large-scale distributed training of deep learning models, offering superior cost-efficiency compared to GPUs for many workloads. TPU v5e provides a balanced price-performance ratio for medium-to-large training tasks, and Google's TPU architecture is optimized for TensorFlow and JAX, enabling efficient scaling across multiple TPU pods. This minimizes cost while leveraging Google's custom chips, as opposed to using NVIDIA GPUs which are not Google's proprietary hardware.

Exam trap

The trap here is that candidates may confuse 'custom AI accelerators' with any high-performance hardware like GPUs, but the question specifically requires Google's proprietary chips (TPUs), and Cloud TPU v5e is the only option that directly provides cost-optimized, large-scale distributed training using Google's own accelerators.

How to eliminate wrong answers

Option B is wrong because Compute Engine with NVIDIA A100 GPUs uses third-party hardware (NVIDIA) rather than Google's proprietary chips, and while powerful, it typically incurs higher costs for large-scale distributed training compared to TPUs for suitable workloads. Option C is wrong because Vertex AI Workbench with custom machines is a development environment for building and training models, not a specific infrastructure choice for leveraging Google's custom AI accelerators; it can use TPUs or GPUs but does not inherently minimize cost with proprietary chips. Option D is wrong because Google Colab Pro with TPU runtime is designed for small-scale experimentation and prototyping, not for large-scale distributed training, and it lacks the scalability and cost efficiency of Cloud TPU v5e for production workloads.

Full explanation →

655

Multi-Selectmedium

A company wants to adopt GenAI for code generation and review. To ensure code quality and security, they plan to implement a change management program. Which THREE actions are most effective?

Select 3 answers

A.Create a code review checklist specifically for AI-generated code

B.Conduct training sessions on prompt engineering for code generation

C.Appoint a single AI champion to manage all code generation

D.Phase out senior developers to rely entirely on AI-generated code

E.Pilot the tool with a small group of developers before company-wide rollout

AnswersA, B, E

A checklist helps reviewers catch common AI errors like security flaws.

Why this answer

Training developers, piloting with a small team, and creating review checklists address skills, risk mitigation, and quality control. Phasing out senior developers would be counterproductive. A single champion is insufficient.

Full explanation →

656

Multi-Selectmedium

A healthcare provider is planning to deploy generative AI for clinical note summarization. Which THREE actions are essential for regulatory compliance (e.g., HIPAA)?

Select 3 answers

A.Implement role-based access controls to limit who can view AI-generated notes.

B.Anonymize patient data before using it for model training or inference.

C.Allow clinicians to share AI-generated summaries with anyone in the organization.

D.Store raw patient data in model training logs for auditing.

E.Ensure data encryption at rest and in transit.

AnswersA, B, E

Access controls ensure only authorized users see sensitive data.

Why this answer

Option A is correct because role-based access controls (RBAC) are a core requirement under HIPAA's Security Rule (45 CFR § 164.312(a)(1)) to ensure that only authorized personnel can access electronic protected health information (ePHI). In the context of generative AI for clinical note summarization, RBAC prevents unauthorized viewing of AI-generated summaries that may contain sensitive patient data, thereby enforcing the minimum necessary standard.

Exam trap

Google Cloud often tests the misconception that sharing AI-generated summaries freely within an organization is acceptable under HIPAA, when in fact the minimum necessary rule strictly limits access to only those who need the information for their job functions.

Full explanation →

657

MCQeasy

Which technique allows a model to incorporate real-time data from external APIs?

A.RAG with tool calling

B.Prompt engineering

C.Fine-tuning

D.Model pruning

AnswerA

Enables dynamic API access during generation.

Why this answer

RAG with tool calling is correct because it enables a generative AI model to query external APIs in real-time, retrieve up-to-date information, and incorporate that data into its response. This technique combines retrieval-augmented generation (RAG) with function calling, where the model outputs a structured request (e.g., a JSON object) to invoke an API, receive the result, and then generate a context-aware answer. Unlike static methods, this allows dynamic data integration without retraining.

Exam trap

Cisco often tests the misconception that prompt engineering alone can achieve real-time data integration, but candidates must recognize that only RAG with tool calling provides the explicit mechanism to execute external API calls and incorporate live results.

How to eliminate wrong answers

Option B is wrong because prompt engineering only modifies the input text to guide model behavior, but it cannot fetch live data from external sources—it relies solely on the model's pre-existing knowledge. Option C is wrong because fine-tuning updates the model's weights on a fixed dataset, which does not enable real-time API access; it only improves performance on static tasks. Option D is wrong because model pruning reduces model size by removing redundant weights, which has no mechanism for external data retrieval or API interaction.

Full explanation →

658

Multi-Selectmedium

Which TWO strategies are effective for reducing latency in a generative AI chat application deployed on Vertex AI? (Select 2)

Select 2 answers

A.Deploy on TPU instead of GPU

B.Use streaming responses

C.Increase the max output tokens

D.Enable model quantization

E.Use larger batch sizes

AnswersB, D

Reduces perceived latency.

Why this answer

Option B is correct because streaming responses reduce perceived latency by sending tokens to the client as they are generated, rather than waiting for the full response. This leverages server-sent events (SSE) or chunked transfer encoding to deliver partial results immediately, improving user experience in chat applications.

Exam trap

Google Cloud often tests the distinction between reducing actual latency (e.g., model optimization) versus reducing perceived latency (e.g., streaming), and candidates mistakenly choose options that increase throughput (like larger batch sizes) without realizing they harm per-request latency.

Full explanation →

659

MCQeasy

A startup is building a customer support chatbot using Vertex AI and wants to ground responses in their product documentation to reduce hallucinations. Which approach should they use?

A.Enable Vertex AI Grounding with a custom enterprise data store containing the documentation.

B.Use the Codey API for text generation.

C.Use the base model without any grounding to maximize flexibility.

D.Fine-tune the model on the documentation and deploy.

AnswerA

Grounding ties responses to specific documents, reducing hallucinations.

Why this answer

Vertex AI Grounding with a custom enterprise data store is the correct approach because it allows the chatbot to retrieve and cite specific chunks from the product documentation in real time, directly reducing hallucinations by constraining responses to verified content. This method uses the underlying grounding service to query a vector-based data store (powered by Vertex AI Search) and append source references to the model's output, ensuring factual accuracy without retraining.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the best way to incorporate domain knowledge, but the trap here is that fine-tuning does not provide dynamic, verifiable grounding with citations, whereas Vertex AI Grounding with a custom data store does, making it the correct choice for reducing hallucinations in a retrieval-augmented generation use case.

How to eliminate wrong answers

Option B is wrong because the Codey API is designed for code generation tasks (e.g., code completion, chat), not for grounding responses in external documents; it lacks the retrieval-augmented generation (RAG) capabilities needed to reduce hallucinations from product documentation. Option C is wrong because using a base model without grounding maximizes flexibility but also maximizes the risk of hallucination, as the model relies solely on its training data and cannot verify facts against the documentation. Option D is wrong because fine-tuning the model on the documentation embeds the content into the model's weights, which is static, costly to update, and does not provide real-time citation or retrieval; it also risks overfitting and does not leverage Vertex AI's built-in grounding infrastructure for dynamic fact-checking.

Full explanation →

660

Multi-Selecthard

A financial institution wants to deploy a generative AI system for automated report generation. They require that the model does NOT expose sensitive information from its training data and that outputs are factually accurate. Which THREE techniques should they combine?

Select 3 answers

A.Use Retrieval-Augmented Generation (RAG) to ground outputs in verified documents

B.Train the model with differential privacy

C.Use a larger model with more parameters

D.Apply Reinforcement Learning from Human Feedback (RLHF) to align model behavior

E.Increase the temperature to 2.0 for creativity

AnswersA, B, D

RAG ensures facts come from trusted sources, reducing hallucinations and exposure of training data.

Why this answer

RAG grounds outputs in verified sources, RLHF can reduce hallucination and harmful outputs, and differential privacy during training prevents memorization of sensitive data.

Full explanation →

661

MCQeasy

A prompt engineer wants to improve the model's adherence to a specific output format (e.g., always start with a greeting). Which technique should they try first?

A.Use a lower temperature to make the output more deterministic.

B.Fine-tune the model on many examples of the desired format.

C.Include a system instruction at the beginning of the prompt that specifies the desired format.

D.Modify the model's tokenizer to encode the format rules.

AnswerC

System instructions set global behavior and are the easiest first step.

Why this answer

Option C is correct because system instructions are the most direct and efficient method to enforce output formatting in large language models. By placing a clear directive at the beginning of the prompt (e.g., 'Always start your response with a greeting'), the model's attention mechanism is guided to prioritize this rule during generation, without requiring retraining or hyperparameter changes.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (like temperature) can enforce structural output rules, when in fact it only controls randomness, not format adherence.

How to eliminate wrong answers

Option A is wrong because lowering temperature reduces randomness but does not enforce a specific structural rule like starting with a greeting; it only makes token selection more deterministic, which could still produce varied formats. Option B is wrong because fine-tuning is a resource-intensive process that requires a curated dataset and retraining, making it an overkill for a simple formatting constraint that can be achieved with a prompt instruction. Option D is wrong because modifying the tokenizer would alter how input text is split into tokens, not how the model adheres to output format rules; tokenizers have no mechanism to enforce generation constraints.

Full explanation →

662

MCQhard

A team is training a custom foundation model using JAX on TPUs on Google Cloud. They encounter frequent Out of Memory (OOM) errors. Which action is most effective in resolving the OOM error?

A.Reduce the model size by decreasing the number of layers.

B.Increase the batch size to maximize TPU utilization.

C.Use mixed precision training (bfloat16) to reduce memory footprint.

D.Enable model parallelism using GSPMD to distribute the model across TPU cores.

AnswerD

Model parallelism directly addresses memory constraints by partitioning the model.

Why this answer

Option D is correct because OOM errors when training large foundation models on TPUs often stem from the model exceeding the memory of a single TPU core. GSPMD (Generalized SPMD) enables automatic model parallelism, sharding the model's parameters, gradients, and optimizer states across multiple TPU cores, thereby reducing per-core memory pressure without altering the model architecture or precision.

Exam trap

Google Cloud often tests the misconception that mixed precision (bfloat16) alone is sufficient to resolve OOM errors, when in fact for very large models the memory bottleneck is the model size itself, not just the precision, and model parallelism is required.

How to eliminate wrong answers

Option A is wrong because reducing the number of layers changes the model architecture and may degrade model quality; it is a workaround, not a systematic solution to memory management. Option B is wrong because increasing the batch size increases memory consumption for activations and gradients, exacerbating OOM errors rather than resolving them. Option C is wrong because while mixed precision training (bfloat16) halves the memory footprint of tensors, it does not address the fundamental issue of a model being too large to fit on a single TPU core; it only provides a constant-factor reduction and may still result in OOM for very large models.

Full explanation →

663

MCQeasy

Refer to the exhibit. A developer sees this error when trying to call a Vertex AI endpoint for online prediction. What permission does the requesting identity need to be granted?

A.aiplatform.prediction.predict

B.aiplatform.endpoints.predict

C.aiplatform.endpoints.use

D.aiplatform.models.predict

AnswerB

The error explicitly states this permission is required.

Why this answer

The error occurs when calling a Vertex AI endpoint for online prediction, which requires the `aiplatform.endpoints.predict` permission. This permission is specifically scoped to the endpoint resource, allowing the identity to send prediction requests to a deployed model endpoint. The correct IAM role binding must include this permission for the requesting identity to successfully invoke the endpoint.

Exam trap

Google Cloud often tests the distinction between permissions scoped to endpoints versus models, and candidates mistakenly choose `aiplatform.models.predict` because they think prediction is always tied to the model, not the endpoint serving it.

How to eliminate wrong answers

Option A is wrong because `aiplatform.prediction.predict` is not a valid IAM permission in Vertex AI; the correct permission for prediction is scoped to the endpoint or model resource, not a generic 'prediction' service. Option C is wrong because `aiplatform.endpoints.use` does not exist as a permission; Vertex AI uses `aiplatform.endpoints.predict` for invoking predictions on endpoints. Option D is wrong because `aiplatform.models.predict` is a permission for calling prediction directly on a model resource, not on an endpoint, and the error specifically references an endpoint call, not a model call.

Full explanation →

664

MCQmedium

A healthcare company is using Vertex AI to build a generative AI assistant that helps doctors draft clinical notes. The assistant uses a fine-tuned PaLM 2 model deployed on a private endpoint. Recently, doctors have reported that the assistant takes over 30 seconds to respond, causing workflow delays. Additionally, the monthly Vertex AI costs have increased by 40% without a proportional increase in usage. The model responses are generally accurate but sometimes include irrelevant details. The company wants to improve response time and cost while maintaining acceptable quality. A review of logs shows that most requests are for similar note types (e.g., progress notes, discharge summaries) and that the same prompt is used repeatedly with minor variations. What should the company do first?

A.Switch to a larger model (e.g., Gemini 1.5 Pro) to improve response quality and reduce irrelevant details

B.Increase the Vertex AI endpoint's maximum request quota to handle concurrent requests

C.Apply model quantization (e.g., INT8) to reduce model size and inference time

D.Implement response caching for common queries and batch process similar requests

AnswerD

Caching reduces redundant computations, and batching improves throughput, together cutting latency and cost.

Why this answer

Option D is correct because the logs show that most requests are for similar note types with repeated prompts, making response caching ideal for reducing latency and cost. Caching stores responses for identical or near-identical queries, eliminating redundant inference calls, which directly addresses the 30-second response time and 40% cost increase without sacrificing quality.

Exam trap

The trap here is that candidates confuse performance optimization techniques (quantization, quota increases) with the root cause of redundant requests, leading them to pick options that address symptoms rather than the fundamental pattern of repeated prompts.

How to eliminate wrong answers

Option A is wrong because switching to a larger model (e.g., Gemini 1.5 Pro) would increase inference time and cost, worsening the problem, and the issue is not about quality but latency and cost. Option B is wrong because increasing the endpoint's maximum request quota does not reduce per-request latency or cost; it only allows more concurrent requests, which could further degrade performance under load. Option C is wrong because model quantization (e.g., INT8) reduces model size and inference time but requires retraining or calibration, and it may degrade accuracy for clinical notes where precision is critical; moreover, it does not address the root cause of redundant requests.

Full explanation →

665

MCQmedium

A startup is building a generative AI tool that helps users write code. They want to launch quickly but need to ensure the generated code is secure and does not introduce vulnerabilities. They have a small team of developers with some ML experience. The tool should be cloud-hosted. Which approach balances speed, security, and cost?

A.Deploy the tool without any security checks and rely on manual review

B.Train a custom code generation model from scratch on a large dataset

C.Use a pre-trained code model (e.g., Codey) and add a security filtering layer

D.Use a smaller model and restrict outputs to only simple code patterns

AnswerC

Leverages existing model, adds security checks, fast to deploy.

Why this answer

Option B is correct because using a pre-trained code model with a security filtering layer provides a good balance: quick start, built-in safety checks, and manageable cost. Option A (building from scratch) is too slow. Option C (manual review) doesn't scale.

Option D (restricting outputs) may reduce usefulness.

Full explanation →

666

MCQhard

Refer to the exhibit. This is the IAM policy for a project containing a Vertex AI Agent Builder agent and a data store. The agent is unable to access the data store. What is the most likely cause?

A.The user needs more permissions

B.The agent needs a bigger quota

C.The agent service account needs the data store viewer role

D.The data store is not in the same region

AnswerC

The agent's service account must have access to the data store.

Why this answer

The agent service account must have the Data Store Viewer role (or equivalent permissions) to read data from the data store. Without this role, the agent cannot access the indexed content, even if the user has permissions. This is a common IAM misconfiguration in Vertex AI Agent Builder.

Exam trap

Cisco often tests the distinction between user permissions and service account permissions, trapping candidates who assume the user's IAM role is inherited by the agent.

How to eliminate wrong answers

Option A is wrong because the user's permissions are irrelevant; the agent operates under its own service account identity, not the user's. Option B is wrong because quota limits affect throughput or resource usage, not access control; the issue is authorization, not capacity. Option D is wrong because Vertex AI Agent Builder and data stores can be in different regions; cross-region access is supported and not a typical cause of access failures.

Full explanation →

667

MCQmedium

A data analyst wants to build a regression model in BigQuery to predict sales from historical data without writing any Python code. Which BigQuery ML statement should they use to define the model?

A.CREATE MODEL my_model OPTIONS(model_type='linear_reg') AS SELECT ...

B.INSERT INTO model my_model VALUES ...

C.CREATE ML my_model AS (SELECT ...)

D.SELECT ML.TRAIN('linear_reg', ...)

AnswerA

This is the correct syntax to create a linear regression model in BigQuery ML.

Why this answer

The CREATE MODEL statement with option model_type='linear_reg' is used to create a linear regression model in BigQuery ML.

Full explanation →

668

MCQmedium

A team is deploying a large language model for legal document summarization. They find the model occasionally omits critical legal clauses. Which improvement technique would be most effective?

A.Design a prompt that explicitly lists required sections

B.Increase the top_p value to 1.0

C.Fine-tune the model on legal summaries

D.Lower the temperature to 0.1

AnswerA

A structured prompt with requirements improves completeness.

Why this answer

Using prompt engineering with explicit instructions to include all clauses and possibly a checklist directly addresses omissions. Option A is wrong because fine-tuning would require labeled data of summaries with clauses. Option B is wrong because temperature reduction might make output less creative but doesn't enforce completeness.

Option D is wrong because it adds randomness, making omissions more likely.

Full explanation →

669

MCQhard

A global e-commerce company uses Vertex AI Gemini API for real-time product description generation. They observe that sometimes the model generates text in a language other than the user's language, despite being prompted in English. They need to ensure output language consistency. Which approach is most effective?

A.Set the language parameter in the generation config to 'en'

B.Fine-tune the model on a dataset of English-only product descriptions

C.Configure a safety filter that blocks non-English text

D.Run a language detection model on the output and regenerate if not English

AnswerC

Vertex AI allows custom safety filters; blocking non-English text ensures output language consistency.

Why this answer

Option C is correct because configuring a safety filter that blocks non-English text is the most direct and effective way to enforce output language consistency at the API level. Safety filters in Vertex AI Gemini API can be customized to reject responses that do not meet specified criteria, including language, without requiring retraining or post-processing. This approach ensures that any generated text not in English is blocked before being returned to the user, providing real-time enforcement.

Exam trap

The trap here is that candidates often confuse generation config parameters with safety filters, assuming a 'language' parameter exists in the API, when in reality Vertex AI Gemini API relies on prompt engineering and safety filters for output control, not a dedicated language setting.

How to eliminate wrong answers

Option A is wrong because the Vertex AI Gemini API generation config does not have a 'language' parameter; the model determines output language based on the prompt and training data, not a configurable language flag. Option B is wrong because fine-tuning on English-only product descriptions would bias the model toward English but cannot guarantee that the model will never generate non-English text, especially for ambiguous prompts or edge cases, and it requires significant time and resources. Option D is wrong because running a language detection model on the output and regenerating if not English adds latency, increases cost, and may still produce non-English text on subsequent attempts, making it less efficient and less reliable than blocking at the filter level.

Full explanation →

670

MCQmedium

A developer is building a customer support chatbot using a large language model. The chatbot frequently generates plausible-sounding but incorrect answers to product questions. Which technique should be applied to improve factual accuracy?

A.Provide a few-shot example of correct answers in the prompt.

B.Use a higher temperature setting to encourage more creative responses.

C.Increase the model's context length to include more of the conversation history.

D.Enable Grounding with the company's product knowledge base.

AnswerD

Grounding retrieves live, verified data and injects it into the prompt, directly improving factual accuracy.

Why this answer

Option D is correct because Grounding (e.g., using Vertex AI Grounding with Search) retrieves relevant information from a trusted source in real time, reducing hallucination. Option A is wrong because increasing context length may include more irrelevant information and does not guarantee accuracy. Option B is wrong because higher temperature increases randomness, worsening hallucinations.

Option C is wrong because few-shot prompting can help but only if examples are accurate and relevant; it does not dynamically look up facts.

Full explanation →

671

MCQhard

A retailer wants to generate personalized product descriptions using PaLM API. They have concerns about data privacy. What is the best practice to mitigate these concerns?

A.Train a custom model from scratch on proprietary data stored on-premise

B.Use the PaLM API directly with anonymized customer data

C.Encrypt all data in transit and at rest using customer-managed encryption keys

D.Enable data residency and use prompt engineering to avoid including personally identifiable information

AnswerD

Vertex AI allows data to stay in specific regions, and careful prompt design can generate personalized content without exposing raw PII.

Why this answer

Option D is correct because data residency ensures customer data is processed and stored within a specific geographic region, addressing regulatory compliance, while prompt engineering allows the retailer to avoid sending PII to the PaLM API entirely. This combination mitigates privacy risks without requiring custom model training or relying solely on encryption, which does not prevent the API from processing sensitive data.

Exam trap

Google Cloud often tests the misconception that encryption alone (Option C) is sufficient for data privacy, when in fact it does not prevent the API from accessing or processing the data, which is the core concern in this scenario.

How to eliminate wrong answers

Option A is wrong because training a custom model from scratch on proprietary data is cost-prohibitive, requires extensive ML expertise, and does not leverage the PaLM API's pre-trained capabilities, making it an inefficient solution for generating personalized descriptions. Option B is wrong because using the PaLM API directly with anonymized customer data still transmits data to Google's servers, and anonymization may not be irreversible or sufficient to prevent re-identification, violating privacy policies. Option C is wrong because encrypting data in transit (e.g., TLS 1.3) and at rest (e.g., AES-256) protects against unauthorized access but does not prevent the PaLM API from processing the data, meaning the retailer's privacy concerns about data exposure to the API remain unaddressed.

Full explanation →

672

MCQhard

A healthcare startup is exploring GenAI for clinical note summarization. They have concerns about patient data privacy. Which Google Cloud approach best addresses privacy while still using powerful models?

A.Deploy open-source models on-premises

B.Use a third-party API with anonymization of patient data

C.Use Vertex AI with model customization (fine-tuning)

D.Use Vertex AI with data residency controls and no external data sharing

AnswerD

Vertex AI offers regional endpoints and commitments to not use customer data for training, addressing privacy while providing powerful models.

Why this answer

Vertex AI with data residency controls and no external data sharing ensures that patient data remains within specified geographic boundaries and is not used for model training or improvement, directly addressing healthcare privacy regulations like HIPAA. This approach leverages Google Cloud's powerful models while maintaining strict data governance, unlike options that risk data exposure or lack enterprise-grade controls.

Exam trap

The trap here is that candidates often assume fine-tuning (Option C) inherently provides privacy, but without explicit data residency and no-sharing policies, it fails to meet strict healthcare compliance requirements.

How to eliminate wrong answers

Option A is wrong because deploying open-source models on-premises, while offering data control, often lacks the advanced summarization capabilities and scalability of Vertex AI's foundation models, and still requires significant effort to ensure HIPAA compliance without Google's built-in privacy safeguards. Option B is wrong because using a third-party API, even with anonymization, introduces risks of data leakage or re-identification, and typically does not provide contractual guarantees against model training on patient data, violating many healthcare privacy policies. Option C is wrong because fine-tuning a model on Vertex AI without explicit data residency controls and no external data sharing may still allow Google to process data outside desired regions or use it for service improvements, failing to meet strict data privacy requirements.

Full explanation →

673

MCQmedium

A team monitors their generative AI model on Vertex AI. They notice output quality declining. Which metric is most likely the root cause?

A.Input token count per request is increasing.

B.Output token count is decreasing.

C.Prediction latency is stable.

D.Error rate is less than 1%.

AnswerA

Growing inputs may push the model beyond optimal context length, reducing focus.

Why this answer

A is correct because an increasing input token count per request can degrade output quality by diluting the model's attention across a longer context window. In transformer-based models like those on Vertex AI, the attention mechanism has a fixed capacity; as input tokens grow, the model may lose focus on critical information, leading to less coherent or relevant outputs. This is a common issue in production systems where users gradually add more context without trimming irrelevant tokens.

Exam trap

Cisco often tests the misconception that output quality issues are always due to model errors or latency problems, rather than subtle input-side factors like token count inflation that silently degrade attention focus.

How to eliminate wrong answers

Option B is wrong because a decreasing output token count does not inherently cause quality decline; it may indicate shorter responses, but quality can remain high if the model is well-tuned. Option C is wrong because stable prediction latency suggests consistent infrastructure performance, not a root cause of output quality degradation. Option D is wrong because a low error rate (<1%) indicates the model is responding without failures, but output quality can still suffer from issues like hallucination or incoherence even when error rates are minimal.

Full explanation →

674

Multi-Selecthard

A financial institution is deploying a generative AI solution that generates investment advice. They must ensure fairness, avoid toxic outputs, and comply with regulations like GDPR. Which TWO strategies should they implement? (Choose two.)

Select 2 answers

A.Use Vertex AI Safety Attributes to filter harmful content in both input and output.

B.Set the model temperature to 0 to eliminate creativity and reduce bias.

C.Implement a human review process for any advice above a certain risk threshold.

D.Fine-tune the model exclusively on compliant financial documents.

E.Disable request logging to avoid storing sensitive data.

AnswersA, C

B is correct because it proactively blocks toxic content.

Why this answer

Option A is correct because Vertex AI Safety Attributes provides built-in safety filters that can detect and block harmful content (e.g., hate speech, toxicity, financial misinformation) in both user prompts and model outputs. This directly addresses the need to avoid toxic outputs and comply with regulations like GDPR, which require protecting users from harmful or biased advice.

Exam trap

Cisco often tests the misconception that reducing model temperature or fine-tuning on compliant data alone can ensure safety and regulatory compliance, when in fact these measures do not address dynamic, context-dependent toxic outputs or logging requirements.

Full explanation →

675

MCQmedium

A content generation model for e-commerce product descriptions repeats the same phrases across multiple descriptions (e.g., 'high-quality', 'best-in-class'). The team wants more varied and engaging output. Which parameter adjustment is most appropriate?

A.Increase the frequency penalty parameter to 1.0.

B.Decrease the max output tokens to 50.

C.Increase the temperature parameter to 1.5.

D.Set the top-p value to a very small number like 0.1.

AnswerA

Frequency penalty specifically reduces the model's tendency to repeat tokens, improving lexical diversity.

Why this answer

Increasing the frequency penalty to 1.0 penalizes tokens that have already appeared in the generated text, directly reducing repetition of phrases like 'high-quality' and 'best-in-class'. This encourages the model to use more diverse vocabulary and sentence structures, leading to varied and engaging product descriptions.

Exam trap

Cisco often tests the distinction between frequency penalty and temperature, where candidates mistakenly increase temperature to add variety, not realizing that temperature increases randomness and can break coherence, while frequency penalty directly targets repetition without sacrificing quality.

How to eliminate wrong answers

Option B is wrong because decreasing max output tokens to 50 limits the length of each description but does not address the root cause of phrase repetition; the model can still repeat phrases within the shorter output. Option C is wrong because increasing temperature to 1.5 makes the output more random and less coherent, which can lead to nonsensical descriptions rather than controlled variation. Option D is wrong because setting top-p to a very small number like 0.1 restricts the model to only the most likely tokens, which actually increases repetition and reduces diversity, the opposite of the desired outcome.

Full explanation →

Page 9 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Generative AI Concepts and Technologies Google AI Ecosystem and Strategy Responsible AI and Data Governance Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output Applying Generative AI in Business

See all domains with question counts →