Google Cloud Generative AI Leader Generative AI Leader Generative AI Leader Questions 901–975 | Page 13/14

901

MCQmedium

A healthcare startup wants to use Vertex AI to deploy a model that helps doctors diagnose rare diseases. The model must be explainable, showing the reasoning path. Which technique should they implement?

A.Use a black-box ensemble model for higher accuracy

B.Reduce the model size to make it inherently interpretable

C.Disable all safety filters to avoid interfering with model output

D.Implement chain-of-thought prompting to output reasoning steps

AnswerD

Chain-of-thought provides a clear reasoning path, aiding interpretability.

Why this answer

Chain-of-thought reasoning allows the model to generate step-by-step explanations, which is crucial for medical diagnosis transparency.

Full explanation →

902

MCQeasy

A company wants to build a text-to-speech application for generating voiceovers in multiple languages. They need to use a pre-built Google API without training custom models. Which service should they use?

A.Cloud Text-to-Speech API

B.Vertex AI Text-to-Speech with a custom model

C.Gemini API with a custom prompt

D.Cloud Speech-to-Text API

AnswerA

Cloud Text-to-Speech API provides pre-built voices in many languages, no custom training needed.

Why this answer

Cloud Text-to-Speech API is a pre-built service for converting text to natural-sounding speech in many languages. The other options are either for speech recognition or not pre-built APIs.

Full explanation →

903

MCQmedium

A startup is building a GenAI application and must decide between using a pre-built API (e.g., Vertex AI Gemini API) or fine-tuning a custom model. Which factor STRONGLY favors using the pre-built API?

A.The application must process sensitive data that cannot leave the company's VPC

B.The startup has a large dataset of labeled examples and high compute budget

C.The application requires highly accurate, domain-specific terminology

D.The startup needs to launch quickly with minimal ML infrastructure and operational overhead

AnswerD

Pre-built APIs allow rapid integration without managing models or training pipelines.

Why this answer

Option D is correct because using a pre-built API like Vertex AI Gemini API eliminates the need to manage ML infrastructure, handle model training, or operationalize a custom model. This allows the startup to integrate GenAI capabilities rapidly via simple API calls, focusing on application logic rather than the complexities of model deployment, scaling, and maintenance.

Exam trap

Cisco often tests the misconception that 'more control equals better performance,' leading candidates to choose fine-tuning when the question explicitly asks for the factor that favors a pre-built API, which is speed and reduced operational burden.

How to eliminate wrong answers

Option A is wrong because processing sensitive data that cannot leave the VPC actually favors a self-hosted or fine-tuned model within the VPC, not a pre-built API which typically requires data to be sent to an external endpoint. Option B is wrong because having a large dataset of labeled examples and a high compute budget are prerequisites for fine-tuning, not reasons to use a pre-built API; fine-tuning would leverage those resources for domain adaptation. Option C is wrong because achieving highly accurate, domain-specific terminology is a primary reason to fine-tune a model on proprietary data, as a general-purpose pre-built API may lack the specialized vocabulary or context.

Full explanation →

904

MCQeasy

A startup wants to use a pre-trained model to generate product descriptions without training. Which Google Cloud service should they use?

A.Vertex AI Prediction

B.AI Platform Training

C.Cloud AutoML

D.Vertex AI Generative AI Studio

AnswerD

Vertex AI Generative AI Studio is designed for accessing and experimenting with foundation models for generative tasks.

Why this answer

Vertex AI Generative AI Studio is the correct service because it provides a no-code interface to access and experiment with pre-trained generative models, including text generation for product descriptions, without requiring any training or custom model development. It allows users to directly prompt models like PaLM 2 or Gemini for inference tasks, making it ideal for generating content from a pre-trained model without training.

Exam trap

The trap here is that candidates may confuse Vertex AI Prediction (which serves custom models) with Generative AI Studio (which serves pre-trained models), or assume that any generative AI task requires training via AI Platform Training or AutoML, when in fact the question explicitly states 'without training'.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction is used for deploying and serving custom-trained models for online or batch predictions, not for directly accessing pre-trained generative models without training. Option B is wrong because AI Platform Training is designed for training custom machine learning models, not for using pre-trained models for inference without training. Option C is wrong because Cloud AutoML is used for training custom models on user-provided data with automated machine learning, not for directly generating content from a pre-trained model without any training.

Full explanation →

905

Multi-Selecthard

A financial services firm must comply with regulations when using gen AI. Which two measures are critical?

Select 2 answers

A.Implement audit trails

B.Deploy without risk assessment

C.Use a closed-source model

D.Use explainable AI

E.Use only synthetic data

AnswersA, D

Audit trails provide accountability and support regulatory reviews.

Why this answer

Audit trails are critical for compliance because they provide a tamper-evident, chronological record of all AI model inputs, outputs, and decisions. This enables firms to demonstrate regulatory adherence (e.g., under GDPR or SOX) by reconstructing the exact sequence of events that led to a specific AI-generated output, which is essential for accountability and forensic review.

Exam trap

Google Cloud often tests the misconception that 'closed-source models are inherently more compliant' or that 'synthetic data eliminates privacy risks,' when in reality, compliance hinges on transparency, auditability, and risk assessment rather than the model's source or data origin.

Full explanation →

906

MCQmedium

A data scientist is using Vertex AI Model-as-a-Service (MaaS) to deploy a fine-tuned open-source model. They notice high latency during inference. What is the most likely cause?

A.The model is too large for the hardware

B.The endpoint is set to autoscaling with a low minimum node count

C.The model is not quantized

D.The region is incorrect

AnswerB

Autoscaling with low min nodes causes cold start latency.

Why this answer

High latency during inference on Vertex AI MaaS is most often caused by the endpoint scaling configuration. When autoscaling is enabled with a low minimum node count, the system may need to provision additional nodes to handle the request load, which introduces cold-start latency. This is especially pronounced for fine-tuned open-source models, which can be large and take time to load onto new nodes.

Exam trap

The trap here is that candidates often assume high latency is due to model size or lack of optimization, but the question specifically describes 'high latency during inference' in a MaaS context, which points to scaling delays rather than compute bottlenecks.

How to eliminate wrong answers

Option A is wrong because Vertex AI MaaS automatically selects appropriate hardware for the model size; if the model were too large, it would fail to deploy rather than cause high latency. Option C is wrong because quantization primarily affects throughput and memory usage, not the latency spike from cold starts; unquantized models may be slower but not cause the intermittent high latency described. Option D is wrong because the region setting affects data residency and network latency, but it would cause consistently higher latency, not the intermittent high latency typical of autoscaling delays.

Full explanation →

907

MCQmedium

Refer to the exhibit. A developer runs this command. What is the primary purpose?

A.Create a training pipeline

B.Deploy a model to an endpoint

C.Train a model

D.Upload a model artifact to Model Registry

AnswerD

The 'models upload' command registers a model with the specified container and artifacts.

Why this answer

The command shown in the exhibit is `az ml model create --name my-model --path ./model.pkl --registry-name myregistry`. This command uploads a local model artifact (model.pkl) to the Azure Machine Learning Model Registry, which is a centralized repository for versioning and managing trained models. It does not initiate training, deployment, or pipeline creation; its sole purpose is to register the model artifact for later use.

Exam trap

Google Cloud often tests the distinction between model registration (uploading a trained artifact) and model training or deployment, so the trap here is that candidates confuse the `az ml model create` command with initiating a training job or deployment, when it only stores the model artifact for versioning and reuse.

How to eliminate wrong answers

Option A is wrong because creating a training pipeline requires a command like `az ml job create` or `az ml pipeline create`, not `az ml model create`. Option B is wrong because deploying a model to an endpoint uses commands such as `az ml online-endpoint create` and `az ml online-deployment create`, which involve specifying compute targets and scoring scripts, not just uploading a model file. Option C is wrong because training a model is performed via a training job (e.g., `az ml job create` with a training script and compute target), not by registering an already-trained artifact.

Full explanation →

908

MCQhard

A media company is using Vertex AI Imagen to generate marketing images. The output frequently contains unrealistic artifacts, especially in human faces. The team has fine-tuned the model using their brand assets. What is the most likely cause and recommended fix?

A.Safety filters are too aggressive; reduce them.

B.Negative prompts are missing; always include 'unrealistic'.

C.The fine-tuning dataset is too small or too homogeneous; augment and diversify the training data.

D.Inference steps are too low; increase to 100.

AnswerC

Overfitting to limited data causes artifacts; more varied data helps generalization.

Why this answer

Option C is correct because unrealistic artifacts in fine-tuned generative models, especially in human faces, typically stem from a training dataset that is too small or lacks diversity. When the dataset is homogeneous, the model overfits to limited patterns and fails to generalize, leading to distorted outputs. Augmenting and diversifying the training data with varied poses, lighting, and ethnicities helps the model learn robust facial features.

Exam trap

The trap here is that candidates confuse inference parameters (like steps or safety filters) with data quality issues, assuming artifacts are due to model settings rather than the fundamental cause of insufficient or non-diverse training data.

How to eliminate wrong answers

Option A is wrong because safety filters in Vertex AI Imagen block harmful content (e.g., violence, hate speech) and do not cause unrealistic artifacts; reducing them would not fix facial distortions and could introduce policy violations. Option B is wrong because negative prompts guide the model to avoid certain concepts, but simply including the word 'unrealistic' is not a technical fix—the model needs diverse training data, not a prompt hack. Option D is wrong because inference steps control the denoising process and image quality, but increasing them to 100 would not address overfitting from a poor dataset; the default steps (typically 50) are sufficient for high-quality outputs.

Full explanation →

909

MCQmedium

A legal firm uses a generative AI to draft contracts. They want the output to follow a specific clause structure. Which technique should they use in the prompt?

A.Include a system instruction that defines the required format.

B.Increase temperature to encourage variance.

C.Use grounding to pull from a database of contracts.

D.Set stop sequences to end generation at certain points.

AnswerA

System instructions can specify structure, ensuring adherence.

Why this answer

A system instruction (or system message) sets the overall behavior and output format for the generative AI model, effectively constraining it to follow a specific clause structure. This is the most direct and reliable technique for enforcing a predefined format in the prompt, as it operates at the model's instruction-following layer.

Exam trap

Cisco often tests the distinction between content control (grounding, temperature) and structural/format control (system instructions), leading candidates to mistakenly choose grounding or stop sequences when the real need is a format-enforcing instruction.

How to eliminate wrong answers

Option B is wrong because increasing temperature encourages more randomness and variance in the output, which is the opposite of what is needed for a consistent, structured clause format. Option C is wrong because grounding (e.g., using Retrieval-Augmented Generation) pulls relevant data from a database but does not enforce a specific output structure; it provides content, not format constraints. Option D is wrong because stop sequences only terminate generation at a specific token or phrase, but they do not guide the model to produce a particular clause structure throughout the entire output.

Full explanation →

910

MCQmedium

A marketing agency uses a generative AI model to create slogans for ad campaigns. The model outputs generic slogans like 'Quality you can trust' that lack originality. The agency has a library of past award-winning slogans and wants to generate more creative and brand-specific outputs. They have a requirement that the model must not produce slogans longer than 15 words. Which technique should they prioritize?

A.Use few-shot prompting with 3-5 examples of award-winning slogans in the prompt.

B.Set max tokens to 15 to force shorter, potentially more punchy slogans.

C.Increase the temperature to 1.2 to encourage more creative word combinations.

D.Fine-tune the model on the library of award-winning slogans.

AnswerA

Few-shot examples teach the desired style and creativity directly.

Why this answer

Few-shot prompting (A) is the most direct and efficient technique because it provides the model with concrete examples of the desired output style (award-winning, creative slogans) within the context window, guiding the model's generation without altering its underlying weights. This approach immediately constrains the output to be brand-specific and creative by leveraging in-context learning, while the 15-word limit can be handled via a simple instruction in the prompt, avoiding the need for fine-tuning or risky parameter changes.

Exam trap

Cisco often tests the misconception that adjusting token limits or temperature is a substitute for providing explicit stylistic guidance, leading candidates to choose B or C, when in fact few-shot prompting directly addresses the need for creative, brand-specific output with minimal overhead.

How to eliminate wrong answers

Option B is wrong because setting max tokens to 15 does not enforce a word count; tokens are subword units, and 15 tokens could produce a slogan far shorter or longer than 15 words, and it does nothing to improve creativity or brand specificity. Option C is wrong because increasing temperature to 1.2 increases randomness, which can lead to nonsensical or irrelevant slogans, not necessarily more creative or brand-specific ones, and it may violate the 15-word requirement by producing longer outputs. Option D is wrong because fine-tuning on a library of award-winning slogans is resource-intensive, requires significant data and compute, and may cause catastrophic forgetting of general language capabilities; it is overkill when few-shot prompting can achieve the goal with zero training.

Full explanation →

911

MCQmedium

A healthcare organization wants to use generative AI for medical report summaries. What is the primary concern?

A.Ensuring HIPAA compliance and data security when using cloud AI services

B.The model's ability to generate fluent and coherent summaries

C.Minimizing the cost of each API call to stay within budget

D.Latency of responses for real-time use cases

AnswerA

Generative AI models processing PHI must be HIPAA-compliant, requiring a signed Business Associate Agreement (BAA) with Google Cloud.

Why this answer

The primary concern for a healthcare organization using generative AI for medical report summaries is ensuring HIPAA compliance and data security when using cloud AI services. Medical data is protected health information (PHI), and any cloud-based AI service must have a Business Associate Agreement (BAA) in place and enforce encryption at rest and in transit to avoid regulatory penalties and data breaches.

Exam trap

Google Cloud often tests the misconception that technical performance (fluency, cost, latency) is the top priority, when in regulated industries like healthcare, compliance and data security are the non-negotiable primary concerns.

How to eliminate wrong answers

Option B is wrong because while fluency and coherence are important for summary quality, they are secondary to the legal and security obligations of handling PHI; a fluent summary that leaks data is non-compliant. Option C is wrong because cost minimization is an operational concern, not the primary risk; HIPAA violations carry fines up to $50,000 per violation, far outweighing API call costs. Option D is wrong because latency is a performance metric relevant for real-time use, but medical report summarization is typically asynchronous or batch-processed, and compliance takes precedence over speed.

Full explanation →

912

Multi-Selectmedium

A data scientist is building a RAG pipeline for a legal document retrieval system. Which TWO components are essential for this system? (Select two.)

Select 2 answers

A.A large language model for final response generation

B.A vector database for similarity search

C.A fine-tuned LLM for generation

D.An embedding model to vectorize documents

E.A diffusion model for document generation

AnswersB, D

Vector database stores and retrieves document embeddings efficiently.

Why this answer

Option B is correct because a vector database is essential for storing and efficiently retrieving document embeddings via similarity search (e.g., cosine similarity or Euclidean distance), which is the core retrieval mechanism in a RAG pipeline. Without it, the system cannot quickly find the most relevant document chunks to augment the LLM's context.

Exam trap

Cisco often tests the misconception that a fine-tuned LLM is required for RAG, when in fact the essential components are the embedding model and vector database for retrieval, while the generator can be any pre-trained LLM.

Full explanation →

913

Multi-Selectmedium

A company is using Vertex AI to build a language model for generating legal documents. They need to ensure the model's outputs are accurate and verifiable. Which TWO features should they use?

Select 2 answers

A.Confidence indicators

B.Safety filters for legal content

C.Chain-of-thought reasoning

D.Grounding with citations to relevant legal texts

E.Model Cards

AnswersC, D

Chain-of-thought makes the model's reasoning process transparent, aiding in verification.

Why this answer

Grounding (citing sources) and chain-of-thought reasoning are the two features that directly improve accuracy and verifiability. Confidence indicators and safety filters are useful but do not directly address accuracy/verifiability.

Full explanation →

914

MCQhard

A team is deploying a real-time chat application using Gemini. They need to ensure the model does not generate harmful content. Which safety filter configuration should they use?

A.Set safety_threshold high for all categories

B.Use grounding with a safe knowledge base

C.Implement custom safety attribute filters with low thresholds

D.Fine-tune the model with safe examples

AnswerA

High thresholds block more harmful content, providing stricter safety.

Why this answer

Option D is correct because setting high safety thresholds for all categories blocks more harmful content. Option A is wrong because model tuning is about task adaptation, not safety. Option B is wrong because grounding does not prevent harmful output.

Option C is wrong because custom safety attributes are for additional categories, but still need thresholds.

Full explanation →

915

MCQeasy

Which of the following best describes how large language models (LLMs) generate text?

A.They retrieve the most similar text from a database and return it

B.They use a rule-based grammar engine to construct sentences

C.They predict the next token in a sequence based on the preceding tokens

D.They randomly select words from a fixed vocabulary

AnswerC

LLMs use the transformer architecture to model the probability distribution of the next token given the context.

Why this answer

LLMs are trained to predict the next token given the preceding tokens. During inference, they generate one token at a time autoregressively.

Full explanation →

916

MCQhard

A developer receives the above JSON response from a Vertex AI language model. The output content is correct, but the developer expected the model to not answer geography questions. What should the developer do to prevent the model from responding to geography queries?

A.Adjust the safety filter thresholds for the 'Toxic' category

B.Enable Vertex AI Grounding with a geography knowledge base

C.Configure a safety filter for the 'Geography' category

D.Add a system instruction to not answer geography questions

AnswerC

Safety filters can block specific categories like geography.

Why this answer

Option C is correct because Vertex AI provides safety filters that can be configured to block model responses in specific categories, including a 'Geography' category. By adjusting the safety filter thresholds for this category, the developer can prevent the model from answering geography queries while still allowing it to respond to other topics. This is a direct and effective method to enforce content restrictions without modifying the model's underlying behavior.

Exam trap

The trap here is that candidates often assume system instructions are sufficient for content restrictions, but Cisco tests the understanding that safety filters are the correct mechanism for enforcing categorical blocks, as they operate at a lower level and cannot be bypassed by prompt engineering.

How to eliminate wrong answers

Option A is wrong because adjusting safety filter thresholds for the 'Toxic' category only controls responses related to toxicity (e.g., hate speech, harassment), not geography-specific content; it does not address the requirement to block geography questions. Option B is wrong because enabling Vertex AI Grounding with a geography knowledge base would actually enhance the model's ability to answer geography queries by providing additional context, which is the opposite of what the developer wants. Option D is wrong because while adding a system instruction to not answer geography questions might influence the model, it is not a guaranteed enforcement mechanism—models can still override or ignore instructions, especially if the prompt is rephrased; safety filters provide a more reliable, configurable block.

Full explanation →

917

MCQeasy

A product team uses a translation model to convert English product descriptions into French. The model mixes formal and informal French dialects. Which simple prompt modification likely solves this?

A.Increase the temperature to encourage more consistent output.

B.Add a system prompt specifying 'Use only formal French with no informal expressions.'

C.Fine-tune the model on a corpus of formal French texts.

D.Provide a few-shot example of a formal French translation in the prompt.

AnswerB

Prompt engineering directly addresses the style issue.

Why this answer

Adding a system prompt that explicitly instructs the model to 'Use only formal French with no informal expressions' directly constrains the output style at inference time without requiring retraining. This leverages the model's instruction-following capability to enforce a specific dialect, which is the simplest and most effective modification for controlling output style in a production translation pipeline.

Exam trap

Cisco often tests the misconception that fine-tuning or few-shot examples are always necessary for style control, when in fact a system prompt is the simplest and most scalable solution for inference-time behavior modification.

How to eliminate wrong answers

Option A is wrong because increasing temperature adds randomness to token sampling, which would make the output less consistent and potentially increase dialect mixing, not solve it. Option C is wrong because fine-tuning requires a curated dataset and significant compute resources, making it far more complex and time-consuming than a simple prompt change for a style preference. Option D is wrong because a few-shot example can bias the model but does not guarantee consistent enforcement across all outputs, especially if the model's training data contains mixed dialects; a system prompt provides a stronger, persistent constraint.

Full explanation →

918

MCQeasy

A healthcare company needs to process medical records (e.g., discharge summaries) to extract structured data. Which AI API is specifically designed for this purpose?

A.Natural Language AI

B.Translation AI

C.Document AI with Healthcare NLP

D.Vision AI

AnswerC

Healthcare AI (via Document AI Healthcare NLP) is specialized for medical records.

Why this answer

Healthcare AI is purpose-built for medical documents, providing features like entity extraction for clinical data.

Full explanation →

919

MCQhard

An MLOps engineer wants to implement continuous evaluation of a generative model in production. Which Vertex AI component should they use?

A.Vertex AI Model Monitoring

B.Vertex AI Feature Store

C.Vertex AI Prediction

D.Vertex AI Pipelines

AnswerA

Model Monitoring provides continuous evaluation of model metrics and alerts on degradation.

Why this answer

Vertex AI Model Monitoring is the correct component because it provides continuous evaluation of model performance in production, including detecting prediction drift, data drift, and feature attribution drift. For generative models, it can monitor output quality and safety metrics over time, alerting engineers to degradation or shifts in model behavior without requiring manual intervention.

Exam trap

Google Cloud often tests the distinction between monitoring (ongoing evaluation of deployed models) and serving (handling inference requests), leading candidates to mistakenly choose Vertex AI Prediction for continuous evaluation tasks.

How to eliminate wrong answers

Option B is wrong because Vertex AI Feature Store is designed for managing, storing, and serving feature data for training and predictions, not for monitoring model performance or evaluating outputs in production. Option C is wrong because Vertex AI Prediction handles model serving and inference requests, but it does not include built-in continuous evaluation or drift detection capabilities. Option D is wrong because Vertex AI Pipelines orchestrates ML workflows for training and batch prediction, but it is not a real-time monitoring service for production model evaluation.

Full explanation →

920

MCQhard

A music streaming service wants to use AI-generated playlists and artwork, but is concerned about potential copyright infringement. They plan to use a generative model that was trained on a large corpus of publicly available music and images. Which action is MOST important to mitigate IP risk?

A.Review the training data provenance and ensure it consists of properly licensed or public domain works

B.Only use models hosted on Google Cloud, as Google assumes liability

C.Add a watermark to all generated content using SynthID

D.Ask the model to self-certify that its outputs are original

AnswerA

Understanding and documenting the training data's legal status helps mitigate IP infringement risk.

Why this answer

Training data provenance is a key IP concern. The service should verify that the training data does not include copyrighted content without permission, and document the data sources to establish a chain of provenance.

Full explanation →

921

Multi-Selecteasy

Which THREE are essential components of a responsible AI strategy for GenAI? (Select three.)

Select 3 answers

A.Use of only open-source models

B.Maximum model size

C.Human oversight for critical decisions

D.Model transparency and explainability

E.Bias detection and mitigation

AnswersC, D, E

Human oversight prevents harmful automated decisions and ensures ethical use.

Why this answer

Human oversight for critical decisions (C) is essential because GenAI models can produce plausible but incorrect or harmful outputs. A responsible AI strategy mandates that a human-in-the-loop reviews high-stakes outputs, such as medical diagnoses or financial approvals, to prevent automated errors from causing real-world harm. This aligns with the principle of human accountability in AI governance frameworks like the NIST AI Risk Management Framework.

Exam trap

Google Cloud often tests the misconception that technical attributes like model size or open-source licensing are core to responsible AI, when in fact the focus is on governance practices like transparency, bias mitigation, and human oversight.

Full explanation →

922

Multi-Selectmedium

Which TWO actions can reduce the cost of using Vertex AI Gemini API? (Choose two.)

Select 2 answers

A.Use batch prediction instead of online

B.Increase the max output tokens

C.Use grounding with Google Search

D.Use a larger model

E.Use context caching

AnswersA, E

Batch prediction is generally cheaper than online prediction.

Why this answer

Batch prediction reduces cost because it processes multiple requests asynchronously in a single batch, allowing Vertex AI to optimize resource utilization and charge lower per-token rates compared to online (real-time) prediction, which requires dedicated infrastructure for low-latency responses.

Exam trap

Cisco often tests the misconception that increasing output tokens or using a larger model improves quality without cost impact, but candidates must remember that both directly increase token consumption and per-token pricing.

Full explanation →

923

MCQmedium

A startup is prototyping a multimodal AI application that processes images and text. They have a limited budget and want the fastest time to market, with minimal infrastructure setup. Which combination of services should they use for prototyping?

A.Vertex AI Prediction and Cloud Storage

B.Google AI Studio (Gemini API) and Colab

C.Cloud Run and Firestore

D.Vertex AI Workbench and BigQuery ML

AnswerB

Google AI Studio offers a free tier for Gemini API, and Colab provides free GPU notebooks — ideal for rapid prototyping with multimodal data.

Why this answer

Option B is correct because Google AI Studio provides immediate access to the Gemini API for multimodal (image+text) processing without any infrastructure setup, and Colab offers a free, managed Jupyter environment with pre-installed libraries for rapid prototyping. This combination minimizes time to market and cost, aligning perfectly with the startup's constraints.

Exam trap

The trap here is that candidates often over-engineer the solution by choosing managed ML platforms like Vertex AI Prediction, forgetting that prototyping prioritizes speed and minimal setup over production-grade scalability.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction requires deploying a model to an endpoint, which involves infrastructure setup and ongoing costs, making it slower and more expensive for prototyping. Option C is wrong because Cloud Run and Firestore are serverless compute and database services, not designed for multimodal AI processing; they would require building custom ML logic and lack built-in multimodal capabilities. Option D is wrong because Vertex AI Workbench is a managed notebook environment for model development, and BigQuery ML is for SQL-based ML on tabular data, neither providing direct multimodal AI inference like the Gemini API.

Full explanation →

924

MCQmedium

A team is deploying a text generation model for legal document review. They observe that the model occasionally generates factually incorrect legal citations. Which approach best reduces this issue?

A.Implement retrieval-augmented generation (RAG) with a verified legal database.

B.Lower the temperature to 0.0.

C.Use a larger base model.

D.Increase the max output tokens.

AnswerA

RAG retrieves factual information from verified sources, reducing hallucinations.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) with a verified legal database grounds the model in factual, up-to-date sources, directly addressing incorrect citations. Option A (lowering temperature) reduces randomness but does not prevent hallucination. Option B (increasing max tokens) has no effect on factual accuracy.

Option D (using a larger model) may not guarantee correctness without proper grounding.

Full explanation →

925

Multi-Selecthard

A company is deploying a GenAI application that must meet SOC 2 compliance. Which three Google Cloud offerings can be used in a compliant manner? (Choose three.)

Select 3 answers

A.Colab (consumer version)

B.Google AI Studio (free tier)

C.Vertex AI

D.Document AI

E.BigQuery ML

AnswersC, D, E

Vertex AI supports SOC 2 compliance.

Why this answer

Vertex AI is correct because it operates on Google Cloud's infrastructure, which is SOC 2 compliant when configured properly. It provides managed ML services that can be deployed within a VPC-scoped environment, allowing customers to meet data residency, access control, and audit logging requirements mandated by SOC 2.

Exam trap

Cisco often tests the misconception that any free-tier or consumer-grade Google AI tool (like Colab or AI Studio) can be used for compliance, when in fact only enterprise-grade services within a properly configured Google Cloud environment meet SOC 2 requirements.

Full explanation →

926

MCQmedium

A team is using Vertex AI to fine-tune a large language model on proprietary company data. The data contains personally identifiable information (PII). What is the BEST practice to protect privacy?

A.Use differential privacy during the fine-tuning process

B.Use a different foundation model that was not trained on proprietary data

C.Remove all PII from the dataset before fine-tuning

D.Store the fine-tuned model on-premises only

AnswerA

Differential privacy adds noise to prevent the model from memorizing individual data points.

Why this answer

Differential privacy (Option A) is the best practice because it adds calibrated noise during fine-tuning, mathematically guaranteeing that the model cannot memorize or leak individual PII records even if the training data contains such information. This approach preserves privacy without requiring complete removal of PII, which may be impractical or destroy data utility. Vertex AI supports differential privacy through libraries like TensorFlow Privacy, enabling privacy budget tracking via epsilon (ε) values.

Exam trap

Cisco often tests the misconception that data removal (Option C) is sufficient for privacy, when in fact differential privacy provides a formal mathematical guarantee against inference attacks even if PII is present in the training set.

How to eliminate wrong answers

Option B is wrong because using a different foundation model does not address the privacy risk; the fine-tuning process on proprietary data still exposes PII regardless of the base model chosen. Option C is wrong because removing all PII before fine-tuning is a data preprocessing step that can reduce risk but is not a privacy guarantee—residual PII may remain due to incomplete scrubbing, and the model can still infer sensitive patterns from non-PII fields. Option D is wrong because storing the fine-tuned model on-premises only addresses data residency but does not prevent the model from memorizing and leaking PII during inference; privacy protection requires algorithmic safeguards, not just storage location.

Full explanation →

927

MCQeasy

A company is deploying a generative AI model for medical advice. What is the most important consideration?

A.Model latency

B.Safety and fairness

C.Model size

D.Cost of inference

AnswerB

Patient safety and avoiding bias are the top priorities.

Why this answer

In medical advice applications, a generative AI model's outputs can directly impact patient health, making safety and fairness the paramount consideration. Incorrect or biased advice could lead to misdiagnosis or harm, outweighing performance metrics like latency or cost. Regulatory frameworks such as HIPAA and FDA guidelines for clinical decision support further mandate rigorous validation of model safety and fairness before deployment.

Exam trap

Google Cloud often tests the misconception that technical performance metrics like latency or cost are the primary concerns in high-stakes domains, when in fact ethical and safety considerations take precedence.

How to eliminate wrong answers

Option A is wrong because model latency, while important for user experience, is secondary to ensuring the advice is safe and unbiased; a fast but harmful response is unacceptable in healthcare. Option C is wrong because model size correlates with computational resources and potential capability, but does not inherently guarantee safety or fairness; a larger model may amplify biases or generate more confident but incorrect advice. Option D is wrong because cost of inference is a business consideration that must be balanced against safety requirements, but it is not the most critical factor when human lives are at stake.

Full explanation →

928

MCQmedium

A travel company fine-tuned a language model on customer chat logs to provide travel recommendations. After deployment, they receive complaints that the model sometimes generates inappropriate or offensive content. What is the most effective approach to improve output safety while preserving overall performance?

A.Modify the system instruction to request polite responses only

B.Retrain the model on a larger dataset of chat logs

C.Reduce the temperature to 0.0

D.Add a post-processing safety classifier that filters or rewrites unsafe outputs

AnswerD

A safety classifier directly catches and mitigates harmful content without modifying the base model.

Why this answer

Option D is correct because a post-processing safety classifier acts as a guardrail that can detect and filter or rewrite unsafe outputs without altering the underlying model's weights or training data. This approach preserves the model's overall performance on safe, relevant recommendations while adding a dedicated safety layer that can be independently tuned and updated as new safety requirements emerge. Unlike prompt engineering or hyperparameter adjustments, a classifier provides a robust, policy-enforced mechanism to catch edge cases that the model might otherwise generate.

Exam trap

Cisco often tests the misconception that prompt engineering or hyperparameter tuning alone can reliably fix safety issues, when in fact they are insufficient against learned toxic patterns in the model's weights, and a dedicated safety classifier is the standard industry practice for robust output filtering.

How to eliminate wrong answers

Option A is wrong because modifying the system instruction is a form of prompt engineering that can be easily overridden by the model's learned patterns from training data; it does not guarantee the model will never generate offensive content, especially if such patterns exist in the fine-tuning data. Option B is wrong because retraining on a larger dataset of chat logs does not address the root cause of inappropriate outputs—if the original data contains toxic or biased examples, simply adding more data may dilute but not eliminate the problem, and could even introduce new unsafe patterns. Option C is wrong because reducing temperature to 0.0 makes the model deterministic and greedy, which reduces creativity but does not prevent the model from generating the most likely token sequence that could still be offensive; it also harms performance on diverse, nuanced travel recommendations.

Full explanation →

929

MCQeasy

A developer is using Vertex AI PaLM 2 to generate product descriptions. The output is often too verbose and includes irrelevant details. Which technique should the developer apply?

A.Set top_p to 0.1

B.Enable safety filters

C.Use few-shot prompting with examples of concise descriptions

D.Increase temperature to 0.9

AnswerC

Guides the model to match the style of provided examples.

Why this answer

Option C is correct because the developer needs to constrain the model's output to be concise and relevant. Few-shot prompting provides the model with explicit examples of the desired output format (concise descriptions), guiding it to mimic that style and length. This directly addresses verbosity and irrelevant details without altering the model's fundamental randomness or safety settings.

Exam trap

The trap here is that candidates confuse hyperparameter tuning (top_p, temperature) with prompt engineering techniques, assuming that reducing randomness (top_p) or increasing creativity (temperature) can fix verbosity, when only explicit examples in the prompt can reliably enforce a specific output style.

How to eliminate wrong answers

Option A is wrong because setting top_p to 0.1 reduces the cumulative probability threshold for token sampling, which makes the output less diverse and more deterministic, but it does not teach the model to be concise or omit irrelevant details—it only narrows the pool of possible next tokens. Option B is wrong because safety filters block harmful or sensitive content (e.g., toxicity, violence), not verbose or irrelevant details; they do not control output length or relevance. Option D is wrong because increasing temperature to 0.9 increases randomness and creativity in token selection, which would likely make the output even more verbose and include more irrelevant details, the opposite of what is needed.

Full explanation →

930

MCQhard

Refer to the exhibit. The team changed the generation parameters to reduce output variability. However, summaries now often repeat the same phrases. Which parameter change is most likely causing the repetition?

A.Reducing top_p from 0.95 to 0.85

B.Reducing temperature from 0.7 to 0.2

C.Using the same model text-bison@002

D.Reducing top_k from 40 to 10

AnswerB

Low temperature increases determinism and repetition.

Why this answer

Lowering temperature to 0.2 makes the model more deterministic, increasing repetition. Option A is wrong because top_k reduction also contributes to determinism. Option B is wrong because top_p reduction also narrows token selection.

Option D is wrong because the model is the same.

Full explanation →

931

MCQhard

A company is deploying a GenAI contract analysis system that processes confidential legal documents. They need to ensure that the model does not retain or train on customer data. Which configuration is REQUIRED?

A.Select a model with a context window large enough to hold the entire contract

B.Use a public model endpoint with data encryption in transit

C.Opt out of model logging and data retention in the API settings

D.Use a smaller model to reduce the risk of data memorization

AnswerC

Opting out of logging and data retention ensures the provider does not store or train on your data, fulfilling confidentiality.

Why this answer

Option C is correct because the primary requirement is to prevent the GenAI model from retaining or training on confidential legal documents. Most enterprise GenAI APIs (e.g., OpenAI, Azure OpenAI) provide a data privacy setting that allows customers to opt out of model logging and data retention, ensuring that prompts and responses are not stored or used for model improvement. This configuration directly addresses the compliance need for data confidentiality in contract analysis.

Exam trap

The trap here is that candidates confuse data security measures (encryption, context window, model size) with data privacy controls (opt-out of logging and retention), leading them to select technically valid but irrelevant options for the specific requirement of preventing data retention and training.

How to eliminate wrong answers

Option A is wrong because a large context window does not prevent data retention or training; it only allows the model to process longer inputs, which is unrelated to privacy controls. Option B is wrong because data encryption in transit (e.g., TLS 1.3) protects data during transmission but does not prevent the model provider from logging or retaining the data on their servers. Option D is wrong because using a smaller model does not inherently reduce the risk of data memorization; memorization depends on training data and model architecture, not model size alone, and does not address API-level data retention policies.

Full explanation →

932

Multi-Selectmedium

A media company wants to generate video content from text descriptions. They need a Google Cloud solution that can produce high-quality videos with realistic motion. Which TWO services should they consider?

Select 1 answer

A.Chirp

B.Codey

C.Gemini (multimodal)

D.Veo

E.Imagen

AnswersD

Veo is Google's generative video model that creates videos from text prompts.

Why this answer

Veo is Google Cloud's advanced video generation model that creates high-quality videos with realistic motion from text descriptions. It leverages diffusion transformers and temporal coherence techniques to ensure smooth, lifelike movement across frames, making it the correct choice for this use case.

Exam trap

The trap here is that candidates may confuse Imagen (text-to-image) or Gemini (multimodal) as capable of video generation, but only Veo is purpose-built for high-quality video synthesis with realistic motion in Google Cloud's generative AI portfolio.

Full explanation →

933

Multi-Selectmedium

A developer wants to use Gemini for multimodal tasks involving images and text. Which two features are available via the Gemini API on Vertex AI? (Choose two.)

Select 2 answers

A.Fine-tuning with images

B.Image generation

C.Function calling

D.Code execution

E.Multimodal understanding (image+text)

AnswersC, E

Function calling is supported on Vertex AI.

Why this answer

Gemini on Vertex AI supports multimodal input (image+text) and function calling.

Full explanation →

934

Multi-Selecthard

A research team wants to leverage Google DeepMind's work to accelerate drug discovery. They are interested in using a model that predicts protein structures and another that can generate novel protein sequences with desired properties. Which TWO Google DeepMind achievements are most relevant? (Select 2 options.)

Select 2 answers

A.WaveNet

B.Gemini

C.AlphaCode

D.AlphaFold

E.AlphaProteo

AnswersD, E

Predicts protein 3D structures, fundamental to drug discovery.

Why this answer

AlphaFold (D) is correct because it is Google DeepMind's breakthrough model for predicting protein 3D structures from amino acid sequences, which directly accelerates drug discovery by enabling researchers to understand target proteins. AlphaProteo (E) is correct because it is DeepMind's AI system designed to generate novel protein sequences that bind to specific targets, effectively creating new proteins with desired therapeutic properties.

Exam trap

Cisco often tests the ability to distinguish between general-purpose AI models (like Gemini or WaveNet) and domain-specific scientific models (like AlphaFold and AlphaProteo), where candidates mistakenly select familiar names without verifying their specific application in drug discovery.

Full explanation →

935

Multi-Selecthard

Which THREE considerations are critical when deploying a generative AI model using Vertex AI Endpoints for a latency-sensitive application? (Choose THREE.)

Select 3 answers

A.Model size and architecture

B.Number of model versions

C.GPU type and number

D.Autoscaling configuration

E.Number of model instances

AnswersA, C, D

Larger models introduce higher latency.

Why this answer

Model size and architecture directly impact inference latency because larger models with more parameters require more computation per request. For latency-sensitive applications, choosing a smaller or distilled model (e.g., Gemma 2B vs. 27B) or using quantization can reduce response times. Vertex AI Endpoints serve the model as-is, so the model's inherent computational cost is the primary driver of per-request latency.

Exam trap

Google Cloud often tests the distinction between configuration choices that affect latency (GPU type, autoscaling, model size) versus operational or lifecycle management choices (version count, manual instance count) that do not directly impact per-request response time.

Full explanation →

936

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Fine-tune a base LLM on the policy documents monthly

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Use a larger foundation model with a longer context window and paste all documents into each prompt

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions using the latest policy documents without retraining the model. By indexing the documents in a vector store and retrieving relevant chunks at query time, RAG provides up-to-date answers while keeping the underlying LLM static, which is ideal for monthly document updates.

Exam trap

Cisco often tests the misconception that fine-tuning is the only way to adapt an LLM to domain-specific knowledge, but the trap here is that candidates overlook RAG's ability to handle dynamic, frequently updated documents without retraining, which is a core requirement in the question.

How to eliminate wrong answers

Option A is wrong because training a custom model from scratch each month is prohibitively expensive and time-consuming, requiring full GPU-based training cycles and large datasets, which is not feasible for monthly updates. Option B is wrong because fine-tuning a base LLM monthly still requires significant computational resources and risks catastrophic forgetting of previous knowledge, and it does not inherently handle dynamic document updates without retraining. Option D is wrong because pasting all documents into each prompt exceeds typical context window limits (e.g., 4K–128K tokens) and degrades performance due to attention mechanism scaling issues, making it impractical for large or growing document sets.

Full explanation →

937

MCQhard

You are the Generative AI lead for a global retail company that is building a customer service chatbot using a large language model (LLM) on Vertex AI. The chatbot will handle order inquiries, returns, and product recommendations. The company has a multi-cloud strategy and uses Google Cloud for AI workloads, but customer data is stored in AWS DynamoDB and on-premises databases. The legal team mandates that no customer personally identifiable information (PII) is sent to the LLM for training or inference, and that the model's responses must comply with GDPR and CCPA. The engineering team has proposed using a fine-tuned version of Gemini with retrieval-augmented generation (RAG) from a vector database. During a pilot, the chatbot occasionally hallucinates and invents order details, and response latency is over 10 seconds for complex queries. The budget for this project is limited, and the team needs to balance cost, compliance, and performance. Which course of action should you recommend?

A.Implement a two-model architecture: a smaller model for simple queries and a larger model for complex queries, with a router based on query complexity.

B.Switch to a purely fine-tuned model without RAG, and rely on fine-tuning data that excludes PII to ensure compliance.

C.Use a larger, more powerful LLM with chain-of-thought prompting to improve reasoning and reduce hallucinations, and cache frequent queries to reduce latency.

D.Ground the model with a curated knowledge base from DynamoDB and on-premises data, and use prompt engineering to explicitly instruct the model not to generate PII. Implement a PII detection and redaction layer before sending queries to the LLM.

AnswerD

Grounding reduces hallucinations by restricting responses to verified data, and prompt engineering with PII detection ensures compliance without significant latency increase or budget overrun.

Why this answer

Option B is correct because grounding the model with a knowledge base and using prompt engineering to restrict PII directly addresses hallucinations and compliance without high cost or latency. Option A is too complex and expensive for limited budget. Option C increases latency further due to multi-hop reasoning.

Option D removes the RAG capability, increasing hallucination risk.

Full explanation →

938

MCQmedium

A company needs to deploy a chatbot on a mobile device that must work offline. They want to use Gemini for natural language understanding but need minimal latency and no cloud dependency. Which Gemini model variant is most appropriate?

A.Gemini Flash

B.Gemini Nano

C.Gemini Pro

D.Gemini Ultra

AnswerB

Gemini Nano is designed for on-device execution, enabling offline operation.

Why this answer

Gemini Nano is the most appropriate variant because it is specifically designed for on-device deployment, enabling offline operation with minimal latency. It is optimized for mobile devices through quantization and efficient architecture, allowing natural language understanding without any cloud dependency.

Exam trap

The trap here is that candidates often confuse 'lightweight' cloud models like Gemini Flash with truly on-device models like Gemini Nano, assuming that any 'fast' or 'small' model can work offline without understanding the fundamental requirement of local execution.

How to eliminate wrong answers

Option A is wrong because Gemini Flash is a lightweight cloud-based model optimized for speed and cost, but it still requires an internet connection to access Google's servers, making it unsuitable for offline use. Option C is wrong because Gemini Pro is a mid-tier cloud model designed for high-quality responses in cloud environments, not for on-device or offline scenarios. Option D is wrong because Gemini Ultra is the largest and most capable cloud model, intended for complex tasks with cloud infrastructure, and cannot run on a mobile device offline due to its massive computational requirements.

Full explanation →

939

MCQhard

A social media company is using a generative AI model to automatically moderate user-uploaded images for harmful content. They need to comply with the EU AI Act's requirements for high-risk AI systems. Which combination of actions is MOST appropriate?

A.Deploy the model without any filters, but log all decisions for audit

B.Use Google's safety filters, allow users to appeal automated decisions, and document the model's capabilities and limitations

C.Only rely on human moderators and disable AI moderation

D.Use a third-party model that has been certified by a notified body

AnswerB

Safety filters reduce harmful outputs, appeal mechanisms provide human oversight, and documentation satisfies transparency obligations under the EU AI Act.

Why this answer

The EU AI Act mandates risk management, transparency, and human oversight for high-risk systems. Google's AI Principles align with these requirements. The correct answer involves using safety filters for content, maintaining human review for appeals, and documenting the system's purpose and limitations.

Full explanation →

940

MCQhard

Refer to the exhibit. An administrator creates this IAM policy for a Vertex AI project. What is the effect of this policy?

A.Alice can view models; Bob can delete models

B.Alice can deploy pre-trained models; Bob can create and manage custom model code

C.Both have full access to all Vertex AI resources

D.Alice can train models; Bob can deploy models

AnswerB

aiplatform.user includes deployment permissions; customCodeModelAdmin covers custom code management.

Why this answer

Option B is correct because the IAM policy grants Alice the `aiplatform.models.get` permission (allowing her to view and deploy pre-trained models) and grants Bob the `aiplatform.models.create` and `aiplatform.models.update` permissions (allowing him to create and manage custom model code). The policy uses separate bindings for each user, with specific roles that align with these actions.

Exam trap

Google Cloud often tests the distinction between specific IAM permissions (e.g., `get` vs. `create` vs. `delete`) and the common misconception that viewing a model implies full access or that creating a model implies the ability to deploy it.

How to eliminate wrong answers

Option A is wrong because Alice is granted `aiplatform.models.get`, which allows viewing models but not deleting them; Bob is granted `aiplatform.models.create` and `aiplatform.models.update`, which allow creating and updating models but not deleting them. Option C is wrong because the policy does not grant full access to all Vertex AI resources; it only grants specific permissions on models, and neither user has permissions for other resources like datasets or endpoints. Option D is wrong because Alice's permission (`aiplatform.models.get`) does not include training models, and Bob's permissions (`aiplatform.models.create` and `aiplatform.models.update`) are for custom model code management, not deploying models.

Full explanation →

941

MCQmedium

A machine learning engineer is evaluating a generative AI model for bias. They have a diverse test set covering gender, race, and age groups. Which metric would best indicate if the model's performance is systematically worse for certain demographic groups?

A.Model perplexity on held-out data

B.Equalized odds across demographic groups

C.Overall accuracy on the test set

D.Area under the ROC curve (AUC)

AnswerB

Equalized odds checks for fairness by comparing error rates across groups.

Why this answer

Equalized odds measures whether a model's predictions have equal false positive/negative rates across groups. The other options either measure different aspects or are not specific to fairness.

Full explanation →

942

MCQmedium

A company is using Vertex AI to generate email responses. They want to ensure sensitive customer data (PII) is not included in the output. What is the most effective approach?

A.Use a system prompt instructing the model to avoid PII.

B.Fine-tune the model on a dataset that excludes PII.

C.Manually review each output before sending.

D.Configure safety filters to block PII categories.

AnswerD

Safety filters can automatically block PII.

Why this answer

Option D is correct because safety filters in Vertex AI are specifically designed to block categories of harmful content, including PII, at the model's output layer. This provides a deterministic, automated guardrail that prevents sensitive data from being generated, unlike prompt-based instructions which can be overridden by the model's training. Safety filters operate on the model's response before it is returned, ensuring PII is caught even if the model attempts to generate it.

Exam trap

The trap here is that candidates assume a system prompt (Option A) is sufficient to control model behavior, but Cisco tests the understanding that prompts are not enforceable guardrails, whereas safety filters are a hard technical control.

How to eliminate wrong answers

Option A is wrong because system prompts are merely instructions and do not guarantee the model will comply; the model can still generate PII due to its training data or adversarial inputs. Option B is wrong because fine-tuning on a dataset that excludes PII does not prevent the model from generating PII from its pre-trained knowledge, and fine-tuning is costly and may not cover all edge cases. Option C is wrong because manual review is not scalable, introduces latency, and is prone to human error, making it ineffective for high-volume email generation.

Full explanation →

943

MCQhard

Refer to the exhibit. This JSON describes a Vertex AI endpoint with a deployed model. Which statement about scaling is true?

A.The endpoint uses only dedicated resources, no automatic scaling

B.The endpoint will automatically scale based on GPU utilization

C.The endpoint will scale from 1 to 3 replicas based on load using automatic scaling

D.The endpoint can scale to zero when not in use

AnswerA

DedicatedResources with min/max replicas means manual scaling.

Why this answer

Option A is correct because the JSON shows that the endpoint is configured with `dedicatedResources` and no `autoscalingMetricSpecs` or `minReplicaCount`/`maxReplicaCount` fields. In Vertex AI, when you specify only `machineSpec` and a fixed `minReplicaCount` (here implicitly 1) without a `maxReplicaCount` or autoscaling metrics, the endpoint uses dedicated resources with no automatic scaling — the model will always run on exactly the number of replicas you define, regardless of load.

Exam trap

Google Cloud often tests the misconception that any endpoint with a `minReplicaCount` and `maxReplicaCount` automatically enables scaling, but the trap here is that without `autoscalingMetricSpecs`, the endpoint uses dedicated resources and does not scale dynamically — the `maxReplicaCount` is ignored if autoscaling metrics are absent.

How to eliminate wrong answers

Option B is wrong because Vertex AI automatic scaling is based on CPU utilization or custom metrics, not GPU utilization; GPU utilization is not a supported metric for autoscaling in Vertex AI endpoints. Option C is wrong because the JSON does not include `autoscalingMetricSpecs` or a `maxReplicaCount` field, which are required to enable automatic scaling from a minimum to a maximum number of replicas; without these, the endpoint uses a fixed replica count. Option D is wrong because Vertex AI endpoints with dedicated resources cannot scale to zero; scaling to zero is only possible with private endpoints using manual scaling or when using Vertex AI Prediction with a custom container that supports scale-to-zero, but dedicated resources always maintain at least one replica.

Full explanation →

944

MCQmedium

A company uses Vertex AI Agent Builder to create a customer support agent. They need the agent to answer questions about order status by calling an internal API. Which Vertex AI feature should they use?

A.Vertex AI RAG Engine

B.Vertex AI Extensions

C.Grounding with Google Search

D.Vertex AI Model Garden

AnswerB

Extensions enable the agent to call custom APIs as tools.

Why this answer

Extensions in Vertex AI Agent Builder allow the agent to call external APIs (including internal ones) as tools during conversation.

Full explanation →

945

Multi-Selectmedium

A company wants to use Gemini for Google Workspace to improve productivity. They want to generate meeting summaries in Google Meet and draft email replies in Gmail. Which two Duet AI features should they enable? (Choose TWO)

Select 2 answers

A.Meet 'Take notes for me'

B.Gmail Smart Compose

C.Docs 'Help me write'

D.Sheets formula assistance

E.Slides speaker notes generation

AnswersA, B

This feature generates meeting notes and summaries automatically.

Why this answer

Option A is correct because the 'Take notes for me' feature in Google Meet uses Duet AI to automatically capture meeting notes, action items, and summaries, directly addressing the requirement to generate meeting summaries. This feature leverages Gemini's natural language processing to transcribe and summarize conversations in real time, enhancing productivity by eliminating manual note-taking.

Exam trap

Cisco often tests the distinction between features that are specific to a single application (like Docs or Sheets) versus cross-application productivity tools, leading candidates to select features that are technically correct but do not match the exact use case described.

Full explanation →

946

MCQhard

An organization needs to deploy a generative AI application with strict compliance requirements, including data residency and auditability of model decisions. Which Google Cloud feature should they prioritize?

A.Colab Enterprise

B.Gemini API

C.Vertex AI

D.Model Garden

AnswerC

Vertex AI offers deployment options with data residency, audit logging, and governance features.

Why this answer

Vertex AI provides enterprise controls including data residency options, audit logs, and model monitoring for compliance. Other options are important but do not directly address data residency and auditability comprehensively.

Full explanation →

947

MCQeasy

A medical imaging team wants to generate synthetic X-ray images to augment a training dataset for a rare disease. Which type of generative model is most suitable for generating high-fidelity, realistic medical images?

A.Generative Adversarial Network (GAN)

B.Diffusion model

C.Variational Autoencoder (VAE)

D.Autoregressive transformer (e.g., PixelCNN)

AnswerB

Diffusion models currently produce the highest quality images.

Why this answer

Diffusion models are the most suitable for generating high-fidelity, realistic medical images because they iteratively denoise random noise into a coherent image through a learned reverse diffusion process, which produces superior sample quality and diversity compared to GANs, especially for complex, high-dimensional data like X-rays. Their training stability and ability to model fine-grained anatomical details without mode collapse make them the current state-of-the-art for medical image synthesis.

Exam trap

Google Cloud often tests the misconception that GANs are the default choice for image generation due to their popularity, but the trap here is that for high-fidelity medical imaging, diffusion models are preferred because they avoid GANs' mode collapse and training instability, which are critical in safety-sensitive domains.

How to eliminate wrong answers

Option A is wrong because GANs, while capable of generating realistic images, suffer from training instability, mode collapse, and difficulty in capturing the full diversity of medical image distributions, often producing artifacts that are unacceptable in clinical contexts. Option C is wrong because VAEs generate blurry and less detailed images due to their reliance on a variational lower bound and a Gaussian prior, which fails to capture the sharp edges and fine textures critical in X-ray images. Option D is wrong because autoregressive transformers like PixelCNN generate images pixel-by-pixel, which is computationally prohibitive for high-resolution medical images and lacks the global coherence and efficiency of diffusion models.

Full explanation →

948

MCQeasy

Which Google Cloud AI service would you use to transcribe customer service call recordings into text for subsequent analysis?

A.Speech-to-Text

B.Text-to-Speech

C.Translation API

D.Document AI

AnswerA

Speech-to-Text transcribes audio into text.

Why this answer

Speech-to-Text (STT) is the correct service because it is specifically designed to convert audio speech into written text using automatic speech recognition (ASR) models. For customer service call recordings, STT can handle domain-specific vocabulary, multiple speakers, and various audio formats, enabling downstream analysis like sentiment analysis or keyword extraction.

Exam trap

The trap here is confusing Speech-to-Text with Text-to-Speech or assuming that Translation API can handle audio input, when in fact it only works on text, leading candidates to pick a service that does not perform audio transcription.

How to eliminate wrong answers

Option B (Text-to-Speech) is wrong because it converts text into spoken audio, the reverse of what is needed for transcribing recordings. Option C (Translation API) is wrong because it translates text between languages but does not perform speech recognition or transcription from audio. Option D (Document AI) is wrong because it processes scanned documents and PDFs for text extraction and layout analysis, not audio files.

Full explanation →

949

MCQeasy

A company is using Vertex AI to generate customer support summaries from chat logs. They notice that the summaries sometimes include irrelevant details from the conversation. Which technique should they use to reduce irrelevant details?

A.Use a higher top-k value.

B.Fine-tune the model on a large dataset of general conversations.

C.Add a system instruction to focus on key points.

D.Increase the temperature parameter.

AnswerC

This guides the model to produce concise, relevant summaries.

Why this answer

Adding a system instruction to focus on key points is the most direct and effective technique for reducing irrelevant details in generated summaries. System instructions act as a persistent, high-level directive that guides the model's attention and output structure without altering the underlying model weights. This allows the model to filter out extraneous information from the chat logs by explicitly prioritizing key points, which is a standard practice in prompt engineering for Vertex AI.

Exam trap

Cisco often tests the misconception that increasing randomness parameters (top-k or temperature) improves focus, when in fact they reduce determinism and can worsen output quality for summarization tasks.

How to eliminate wrong answers

Option A is wrong because increasing top-k (e.g., from 40 to 100) actually increases the pool of candidate tokens considered at each step, which can introduce more randomness and irrelevant tokens, making summaries less focused. Option B is wrong because fine-tuning on a large dataset of general conversations would dilute the model's specialization for customer support summaries, potentially worsening the inclusion of irrelevant details rather than reducing them. Option D is wrong because increasing the temperature parameter (e.g., from 0.2 to 0.8) increases the randomness of token selection, which would likely amplify the generation of irrelevant details instead of suppressing them.

Full explanation →

950

MCQmedium

A legal department wants to automate contract analysis using GenAI. They need to identify risky clauses and extract key dates. Which Google Cloud service is best suited for this task?

A.Document AI

B.Vertex AI Agent Builder

C.BigQuery

D.Model Garden

AnswerB

Agent Builder allows creating an agent that can ingest contracts, use an LLM for analysis, and extract structured data like clauses and dates.

Why this answer

Vertex AI Agent Builder is the correct choice because it enables the creation of custom generative AI agents that can analyze contract text, identify risky clauses, and extract key dates using large language models (LLMs) and retrieval-augmented generation (RAG). It provides a no-code/low-code environment to build agents that leverage enterprise data, making it ideal for automating contract analysis without extensive ML expertise.

Exam trap

The trap here is that candidates often confuse Document AI's structured extraction capabilities with generative AI's ability to perform nuanced risk analysis, leading them to choose Document AI without recognizing that Vertex AI Agent Builder provides the necessary generative and agentic capabilities for this specific use case.

How to eliminate wrong answers

Option A is wrong because Document AI is a document understanding service focused on OCR, parsing, and structured data extraction (e.g., forms, invoices) using pre-trained models, not generative AI for clause risk analysis or flexible date extraction. Option C is wrong because BigQuery is a serverless data warehouse for SQL-based analytics and large-scale data processing, not a service for building generative AI agents or analyzing contract text. Option D is wrong because Model Garden is a repository of pre-trained models and foundation models for experimentation and fine-tuning, but it does not provide the agent-building framework, RAG integration, or deployment tools needed for this task.

Full explanation →

951

MCQhard

An organization is building a RAG system using Vertex AI Vector Search. They notice that the retrieved documents are not relevant to the user's query. What is the most likely cause?

A.The context window of the LLM is too small

B.The embedding model used does not capture the semantic meaning of the documents effectively

C.The chunk size of the documents is too large

D.The temperature setting in the LLM is too high

AnswerB

Poor embeddings lead to poor similarity matching.

Why this answer

The most likely cause is that the embedding model fails to map the semantic meaning of the documents and queries into a shared vector space effectively. In Vertex AI Vector Search, retrieval quality depends entirely on the cosine similarity between query and document embeddings; if the embeddings are poor, even a perfect vector index will return irrelevant results.

Exam trap

Cisco often tests the distinction between retrieval-stage failures (embedding quality) and generation-stage parameters (temperature, context window), leading candidates to incorrectly blame LLM settings for poor retrieval results.

How to eliminate wrong answers

Option A is wrong because the context window size affects how much of the retrieved text the LLM can process, not the relevance of the retrieved documents themselves. Option C is wrong because chunk size impacts granularity and potential information loss, but the primary cause of irrelevant retrieval is poor embedding quality, not chunk size alone. Option D is wrong because temperature controls the randomness of the LLM's response generation, not the retrieval step; it has no effect on which documents are fetched from the vector index.

Full explanation →

952

MCQmedium

A financial services company uses a generative AI model to summarize customer complaints. They notice that summaries for certain demographics consistently omit negative sentiment. Which responsible AI practice should they apply FIRST to address this bias?

A.Store all prompts and responses in Cloud Logging for auditing

B.Implement SynthID watermarking on all generated summaries

C.Reduce the temperature parameter of the LLM to 0.1 to make outputs more deterministic

D.Evaluate the model's outputs for bias using a diverse test set that represents all customer demographics

AnswerD

Bias evaluation with representative data helps identify and quantify unfair bias, allowing the team to take corrective action.

Why this answer

Evaluating the model's outputs for bias using diverse test sets is essential to identify and mitigate unfair bias, as outlined in Google's AI Principles.

Full explanation →

953

MCQmedium

A company's generative AI model is producing biased outputs. What is the most effective mitigation strategy?

A.Use a larger model with more parameters to improve overall accuracy

B.Fine-tune the model using a balanced, representative dataset and implement output filtering

C.Use prompt engineering to instruct the model to avoid biased language

D.Increase the diversity of input samples by random sampling

AnswerB

Balanced data reduces bias during training, and filters catch biased outputs in production.

Why this answer

Fine-tuning on a balanced, representative dataset directly addresses the root cause of biased outputs by correcting the model's learned associations, while output filtering provides a safety net to catch residual bias. This combination is more effective than superficial fixes because it modifies the model's internal weights rather than just masking outputs.

Exam trap

Google Cloud often tests the misconception that prompt engineering or model scaling alone can fix bias, when in fact only retraining or fine-tuning with balanced data addresses the underlying weight distribution.

How to eliminate wrong answers

Option A is wrong because increasing model size does not inherently reduce bias; larger models can amplify biases present in training data due to higher capacity to memorize spurious correlations. Option C is wrong because prompt engineering only provides a surface-level instruction that the model may ignore or fail to generalize, especially if the bias is deeply embedded in its parameters. Option D is wrong because random sampling of inputs does not address the model's biased internal representations; it only diversifies the prompts, not the training data that caused the bias.

Full explanation →

954

MCQhard

A regulated industry client requires that all AI model predictions be logged with a traceable audit trail, including the model version, input data, and output, for compliance with internal policies. Which Vertex AI feature should they enable?

A.Vertex AI Explainable AI

B.Vertex AI Model Monitoring (with prediction logging)

C.VPC Service Controls

D.Vertex AI Feature Store

AnswerB

Model Monitoring can log predictions, model version, and input/output for audit trails.

Why this answer

Vertex AI Model Monitoring with prediction logging captures model version, input data, and output for every prediction request, storing them in BigQuery or Cloud Logging to create a traceable audit trail. This directly meets compliance requirements for regulated industries by providing immutable logs that can be queried and audited. Other Vertex AI features like Explainable AI or Feature Store do not offer the same comprehensive logging and version tracking needed for audit trails.

Exam trap

The trap here is that candidates confuse 'Explainable AI' (which explains predictions) with 'logging for audit trails' (which records predictions), or they assume VPC Service Controls handle logging, when in fact they only control network access.

How to eliminate wrong answers

Option A is wrong because Vertex AI Explainable AI provides feature attributions and explanations for model predictions, but it does not log prediction data or model versions for audit trails. Option C is wrong because VPC Service Controls enforce network security boundaries and data exfiltration prevention, but they do not capture or store prediction logs. Option D is wrong because Vertex AI Feature Store manages feature data for training and serving, but it does not log model predictions or versions for compliance auditing.

Full explanation →

955

MCQeasy

A data analyst wants to use Gemini in Google Sheets to help with complex formulas. Which feature should they use?

A.Model Garden in Vertex AI

B.Smart Compose in Gmail

C.Gemini for Workspace in Google Sheets

D.Help me write in Google Docs

AnswerC

Gemini in Sheets offers formula suggestions and assistance directly within the spreadsheet.

Why this answer

Gemini for Workspace in Google Sheets provides an AI-powered side panel that can generate, explain, and debug complex formulas directly within the spreadsheet environment. This feature is specifically designed to assist with formula creation and data analysis tasks, making it the correct choice for a data analyst using Gemini in Google Sheets.

Exam trap

The trap here is that candidates may confuse general-purpose AI writing features (like Help me write in Docs or Smart Compose in Gmail) with the specialized, context-aware formula assistance provided by Gemini for Workspace in Sheets, failing to recognize that each Workspace tool has a domain-specific integration.

How to eliminate wrong answers

Option A is wrong because Model Garden in Vertex AI is a repository of foundation models for building and deploying custom AI applications, not a feature integrated into Google Sheets for formula assistance. Option B is wrong because Smart Compose in Gmail is a feature for suggesting complete sentences in email composition, unrelated to spreadsheet formulas or data analysis. Option D is wrong because Help me write in Google Docs is a generative writing assistant for document creation, not designed to handle complex formulas or spreadsheet-specific tasks.

Full explanation →

956

MCQmedium

A company wants to use Gemini to process invoices that contain both text and images (scanned documents). The invoices vary in layout. Which Gemini model version should they use?

A.PaLM 2

B.Gemini 1.5 Pro

C.Gemini 1.0 Pro

D.Gemini Nano

AnswerB

Gemini 1.5 Pro handles multimodal inputs and large context windows, perfect for varied invoice layouts.

Why this answer

Gemini 1.5 Pro is the correct choice because it is a multimodal model capable of processing both text and images (scanned documents) natively, and its long context window (up to 1 million tokens) allows it to handle invoices with varying layouts without requiring preprocessing or layout-specific training. This version excels at understanding mixed-format documents, making it ideal for invoice processing where text and visual elements like tables and logos must be interpreted together.

Exam trap

Cisco often tests the misconception that any multimodal model (like Gemini 1.0 Pro) is sufficient for complex document processing, but the trap here is that candidates overlook the importance of the long context window and advanced multimodal reasoning in Gemini 1.5 Pro for handling variable-layout invoices, leading them to choose a less capable version.

How to eliminate wrong answers

Option A is wrong because PaLM 2 is a text-only large language model that cannot process images or scanned documents, lacking the multimodal capabilities required for this use case. Option C is wrong because Gemini 1.0 Pro, while multimodal, has a shorter context window and less robust handling of complex, variable-layout documents compared to Gemini 1.5 Pro, which offers superior performance for mixed-format invoice processing. Option D is wrong because Gemini Nano is designed for on-device, lightweight tasks with limited context and multimodal capabilities, making it unsuitable for enterprise-grade invoice processing that requires handling diverse layouts and high accuracy.

Full explanation →

957

MCQmedium

You are a generative AI architect for a large e-commerce company. Your team has built a product description generator using Vertex AI's text-bison model. The model is accessed via the Vertex AI API from a web application. You have set the temperature to 0.5 and top_k to 40. The team reports that the generated descriptions are often too generic and lack creativity. They want the descriptions to be more diverse and engaging. You are also concerned about cost, as each API call is billed. Which change should you recommend to increase creativity while managing cost?

A.Keep temperature at 0.5 but reduce top_k to 20.

B.Increase the temperature to 0.8 and keep top_k at 40.

C.Switch to a larger model like text-bison@002 and keep same parameters.

D.Decrease the temperature to 0.2 and increase top_k to 60.

AnswerB

Higher temperature increases diversity and creativity.

Why this answer

Increasing the temperature to 0.8 makes the model's output probability distribution flatter, which increases randomness and allows less likely tokens to be selected. This directly addresses the need for more diverse and creative descriptions. Keeping top_k at 40 ensures the model still considers a broad set of candidate tokens, balancing creativity with coherence, and does not increase API call costs since temperature and top_k are inference parameters that do not affect billing.

Exam trap

Google Cloud often tests the misconception that increasing creativity requires a larger model or more expensive resources, when in fact tuning sampling parameters like temperature and top_k is the correct, cost-neutral approach.

How to eliminate wrong answers

Option A is wrong because reducing top_k to 20 narrows the set of candidate tokens, which actually reduces diversity and can make outputs more generic, counteracting the goal of increasing creativity. Option C is wrong because switching to a larger model like text-bison@002 would increase cost per API call (larger models are billed at higher rates) without guaranteeing more creativity; creativity is controlled by sampling parameters, not model size alone. Option D is wrong because decreasing temperature to 0.2 makes the model more deterministic and conservative, reducing creativity, and increasing top_k to 60 does not compensate for the loss of randomness — the net effect is less diverse outputs.

Full explanation →

958

MCQeasy

A product manager wants to add a feature that drafts meeting summaries automatically in Google Meet. Which Gemini for Google Workspace capability should they use?

A.Vertex AI Model Garden

B.Gemini for Workspace in Google Meet

C.Vertex AI Agent Builder

D.Duet AI in Google Slides

AnswerB

Gemini for Workspace (formerly Duet AI) includes meeting summaries in Google Meet.

Why this answer

Gemini for Workspace in Google Meet provides meeting summaries. Duet AI for Slides does not generate meeting summaries. Vertex AI Agent Builder and Model Garden are not directly integrated into Meet.

Full explanation →

959

MCQmedium

An organization is using Vertex AI Agent Builder to create a customer service agent. They want the agent to be able to hand off to a human agent when it cannot answer a question. What should they configure in the agent's design?

A.Configure 'Slot filling' to collect more info

B.Implement a 'Confirmation' prompt for the user

C.Add an 'Escalation' intent that triggers a human handoff

D.Use a 'Fallback' intent to route to a human

AnswerC

Escalation intent is designed for human handoff.

Why this answer

Agent Builder supports 'Escalation' intent to hand off to human agents. Option B is wrong because fallback intent is for unrecognized input but not necessarily human handoff. Option C is wrong because confirmation is for confirming actions.

Option D is wrong because slot filling is for collecting parameters.

Full explanation →

960

MCQeasy

A developer is using the Gemini API to build a chatbot. They want the model to always respond in a friendly, professional tone. Which prompt engineering technique should they use?

A.Set system instructions to 'You are a friendly and professional assistant.'

B.Include a few-shot example in every user message.

C.Set the temperature to 0.2.

D.Set max output tokens to 100.

AnswerA

System instructions define the assistant's behavior for the entire session.

Why this answer

Option A is correct because setting system instructions is the most direct and reliable way to define the model's persona and behavioral constraints. In the Gemini API, system instructions act as a persistent, top-level directive that influences every response, ensuring the chatbot consistently adopts a friendly and professional tone without requiring repeated examples or parameter tuning.

Exam trap

Google Cloud often tests the distinction between controlling output style (system instructions) versus controlling output randomness (temperature) or length (max tokens), so the trap here is that candidates may confuse temperature or token limits with persona control, thinking that lowering creativity or capping length will enforce a specific tone.

How to eliminate wrong answers

Option B is wrong because including a few-shot example in every user message is inefficient and not a persistent technique; it would require repeating the example in each turn, increasing token usage and latency, and it does not guarantee consistent tone across all interactions. Option C is wrong because setting the temperature to 0.2 controls randomness and creativity, not tone; a low temperature makes outputs more deterministic but does not enforce a specific persona or style. Option D is wrong because setting max output tokens to 100 limits response length but has no effect on the tone or style of the output; it only truncates the response.

Full explanation →

961

Multi-Selectmedium

A company wants to integrate generative AI into their existing CRM workflow to draft personalized email responses. They have limited engineering resources. Which two approaches should they consider? (Choose TWO)

Select 2 answers

A.Use Vertex AI API with a low-code integration platform (e.g., Apigee)

B.Fine-tune a model on historical email data to ensure brand voice

C.Use Gemini API via Google Apps Script to add a custom menu in the CRM

D.Deploy a dedicated GPU cluster for inference

E.Build a custom web UI for the assistant from scratch

AnswersA, C

Low-code platforms reduce the need for custom coding.

Why this answer

Using Gemini API via Apps Script is a lightweight integration, and using Vertex AI API with a low-code tool like Apigee or Cloud Functions can also minimize engineering effort. Building a custom UI or fine-tuning is resource-intensive.

Full explanation →

962

Multi-Selecthard

A manufacturing company wants to use GenAI to generate maintenance reports from sensor data. They need structured output (JSON) for downstream systems, and they want to reduce token costs. Which THREE strategies should they use?

Select 3 answers

A.Use batch API requests for multiple sensor readings

B.Use the largest available model to ensure accuracy

C.Use structured output formatting in the prompt (e.g., 'Return JSON')

D.Choose the smallest model that meets accuracy requirements

E.Include multiple few-shot examples of JSON in every prompt

AnswersA, C, D

Batch requests reduce per-token cost.

Why this answer

Structured output ensures JSON format; batch requests reduce cost; the smallest suitable model minimizes token usage. Few-shot adds tokens; caching may not help for diverse sensor data.

Full explanation →

963

Multi-Selectmedium

A company is designing a prompt engineering strategy for a customer service chatbot using Gemini. Which two practices are recommended for improving response quality? (Choose TWO)

Select 2 answers

A.Use chain-of-thought prompting

B.Always provide multiple examples in the prompt

C.Avoid any context in the prompt

D.Set temperature to 1.0 for maximum creativity

E.Include a system instruction to define the role

AnswersA, E

Chain-of-thought encourages logical reasoning, improving accuracy.

Why this answer

Chain-of-thought prompting (A) is recommended because it guides the model to reason step-by-step, improving accuracy on complex customer service queries by breaking down multi-step problems. This technique leverages Gemini's ability to follow logical sequences, reducing errors in tasks like troubleshooting or escalation decisions.

Exam trap

Google Cloud often tests the misconception that higher temperature always improves creativity, but in customer service, lower temperature is critical for deterministic, safe responses, and candidates may overlook the role of system instructions in defining behavior.

Full explanation →

964

MCQmedium

A company wants to integrate GenAI into their existing customer relationship management (CRM) system. The CRM is a third-party SaaS application. Which implementation pattern is MOST suitable?

A.API-first integration by calling Vertex AI API from the CRM's custom code

B.Building a standalone AI application and exporting data manually

C.Embedding GenAI using Google Workspace add-ons

D.Using Apps Script to extend Google Sheets connected to the CRM

AnswerA

The CRM can make API calls to Vertex AI for predictions, keeping the CRM intact.

Why this answer

API-first integration allows the CRM to call Google Cloud GenAI APIs without modifying the CRM's core. Workspace add-ons are for Google Workspace, not SaaS CRM.

Full explanation →

965

MCQmedium

A data scientist uses Vertex AI Model Evaluation to assess a fine-tuned model for sentiment analysis. The evaluation report shows high precision but low recall on the 'negative' class. What is the best course of action to improve recall without sacrificing too much precision?

A.Adjust the prediction threshold for the negative class

B.Switch to a different model architecture (e.g., from BERT to RoBERTa)

C.Collect more labeled examples of negative sentiment and retrain

D.Use a larger pretrained model from Model Garden

AnswerC

Adding more data for the underperforming class helps the model learn better.

Why this answer

Option B is correct because collecting more negative examples and retraining addresses class imbalance, which is a common cause of low recall. Option A (adjusting threshold) trades off precision and recall, but may not fix underlying imbalance. Option C (changing model architecture) is excessive.

Option D (using a larger base model) may not specifically address recall.

Full explanation →

966

Multi-Selecthard

A company is moving a GenAI proof-of-concept to production. They need to ensure the system can handle variable traffic and maintain low latency. Which THREE practices should they implement? (Choose 3)

Select 3 answers

A.Implement response caching for common queries

B.Enable auto-scaling for the serving infrastructure

C.Reduce the input context length to the absolute minimum

D.Set up monitoring and alerting on latency metrics

E.Use a single, large instance to handle all traffic

AnswersA, B, D

Caching reduces latency and cost by reusing responses for identical requests.

Why this answer

Option A is correct because response caching stores the outputs of frequently requested queries, allowing the system to serve them instantly without recomputation. This drastically reduces latency for repeated requests and offloads the underlying model, which is critical for maintaining responsiveness under variable traffic patterns in production.

Exam trap

Cisco often tests the misconception that minimizing input context length universally improves performance, ignoring the trade-off with output quality, and that a single large instance is simpler and sufficient for production traffic, overlooking scalability and fault tolerance requirements.

Full explanation →

967

MCQeasy

What is the primary benefit of using embeddings and vector search in a generative AI application?

A.They improve the model's ability to generate code

B.They reduce the size of the model by compressing weights

C.They enable efficient retrieval of semantically similar content

D.They allow the model to process images directly

AnswerC

Embeddings allow similarity search in vector space, enabling RAG and other retrieval tasks.

Why this answer

Option C is correct because embeddings convert text into dense vector representations that capture semantic meaning, and vector search enables efficient retrieval of semantically similar content by finding nearest neighbors in vector space. This retrieval-augmented generation (RAG) approach grounds the generative AI model in relevant external knowledge, improving accuracy and reducing hallucinations without retraining.

Exam trap

The trap here is that candidates confuse embeddings and vector search with model optimization or multimodal capabilities, when in fact they are a retrieval mechanism for grounding generative outputs in external knowledge.

How to eliminate wrong answers

Option A is wrong because embeddings and vector search are not specifically designed to improve code generation; they enhance retrieval of any text or data, but code generation benefits more from specialized training data and fine-tuning. Option B is wrong because embeddings and vector search do not reduce model size or compress weights; they operate on the input/output side, storing vectors separately, while model compression is achieved through techniques like pruning or quantization. Option D is wrong because embeddings and vector search primarily handle text or other data types via vector representations, not direct image processing; images require separate vision encoders or multimodal models to be processed directly.

Full explanation →

968

MCQeasy

What is the key advantage of using vector search for retrieval in a RAG system compared to keyword search?

A.Vector search eliminates the need for a foundation model

B.Vector search can find conceptually similar documents even without exact keyword matches

C.Vector search is faster than keyword search

D.Vector search requires no preprocessing of documents

AnswerB

Embeddings represent meaning, so vector search retrieves documents that are semantically related, not just keyword matches.

Why this answer

Vector search in a RAG system encodes documents and queries into dense vector embeddings using a foundation model, then retrieves documents based on semantic similarity in the embedding space. This allows it to find conceptually related documents even when they share no exact keywords with the query, overcoming the lexical gap that limits keyword search.

Exam trap

Cisco often tests the misconception that vector search is faster than keyword search, but the trap is that while vector search excels at semantic matching, it incurs higher latency and computational overhead compared to the simple inverted index lookup of keyword search.

How to eliminate wrong answers

Option A is wrong because vector search actually requires a foundation model (or an embedding model) to generate the vector representations; it does not eliminate the need for one. Option C is wrong because vector search is generally slower than keyword search due to the computational cost of embedding generation and approximate nearest neighbor (ANN) search, though it offers better recall. Option D is wrong because vector search requires preprocessing of documents to generate and store embeddings, which is a significant upfront step.

Full explanation →

969

MCQmedium

A startup is building a generative AI content creation tool. They want to minimize operational costs while maintaining low latency for end users. Which deployment strategy should they adopt?

A.Deploy the model on edge devices to reduce cloud dependency.

B.Build an on-premises infrastructure to avoid cloud egress fees.

C.Use a serverless inference endpoint that scales to zero when not in use.

D.Provision dedicated GPU instances for consistent performance.

AnswerC

Serverless aligns cost with usage and auto-scales to meet demand.

Why this answer

Option C is correct because serverless inference endpoints, such as AWS Lambda with SageMaker or Google Cloud Run, automatically scale to zero when idle, eliminating costs during periods of no traffic. This directly addresses the startup's goal of minimizing operational costs while maintaining low latency through rapid cold-start optimizations and provisioned concurrency for burst handling.

Exam trap

Google Cloud often tests the misconception that 'scaling to zero' is only for CPU workloads, but serverless GPU inference endpoints (e.g., AWS SageMaker Serverless Inference) support GPU acceleration and scale to zero, making them cost-effective for variable generative AI workloads.

How to eliminate wrong answers

Option A is wrong because deploying on edge devices introduces significant hardware procurement and maintenance costs, and edge GPUs typically lack the compute power for large generative models, leading to higher latency for complex inference tasks. Option B is wrong because building on-premises infrastructure incurs high upfront capital expenditure and ongoing operational overhead for power, cooling, and maintenance, which contradicts the goal of minimizing operational costs. Option D is wrong because provisioning dedicated GPU instances incurs costs even when idle, as reserved or on-demand instances bill per hour regardless of usage, making it inefficient for variable or low-traffic workloads.

Full explanation →

970

MCQeasy

A marketing agency wants to generate images using Imagen on Vertex AI. They need to ensure the images are unique and avoid copyright issues. Which parameter adjustment is most relevant?

A.Increase training steps

B.Increase seed variability

C.Use negative prompts

D.Set safety threshold

AnswerC

Specifies elements to avoid, reducing copyright risk.

Why this answer

Negative prompts allow the model to exclude specific concepts, styles, or elements from generated images, directly reducing the risk of replicating copyrighted or trademarked content. By explicitly telling Imagen what not to include, the agency can steer outputs away from protected works without needing to modify training data or safety filters.

Exam trap

Google Cloud often tests the distinction between safety filters (which block harmful content) and negative prompts (which control stylistic or conceptual exclusion), leading candidates to mistakenly choose safety threshold adjustments for copyright avoidance.

How to eliminate wrong answers

Option A is wrong because increasing training steps does not affect the uniqueness or copyright compliance of outputs; it only refines model convergence on the existing training distribution. Option B is wrong because seed variability controls randomness in latent noise initialization, not the semantic content of the image, so it cannot prevent copyright infringement. Option D is wrong because safety thresholds filter harmful or policy-violating content (e.g., violence, hate speech), not copyrighted or trademarked elements.

Full explanation →

971

MCQeasy

A marketing team wants to generate social media posts from product descriptions using Generative AI. They need consistent brand tone and the ability to iterate quickly. Which tool is BEST suited for this task?

A.Duet AI in Google Docs

B.Vertex AI Model Garden

C.Vertex AI Studio

D.Vertex AI Agent Builder

AnswerC

Vertex AI Studio offers a user interface for designing and testing prompts, tuning models, and evaluating outputs — ideal for marketing content creation.

Why this answer

Vertex AI Studio is the correct choice because it provides a purpose-built environment for prompt engineering, model tuning, and rapid iteration with foundation models. It allows the marketing team to experiment with prompts, adjust parameters like temperature and top-p, and maintain consistent brand tone through saved prompt templates and versioning, directly supporting the need for quick iteration.

Exam trap

Cisco often tests the distinction between a general-purpose AI assistant (Duet AI) and a dedicated prompt engineering platform (Vertex AI Studio), leading candidates to choose Duet AI because they confuse document assistance with content generation.

How to eliminate wrong answers

Option A is wrong because Duet AI in Google Docs is an AI-powered assistant for document creation and editing, not a tool for generating social media posts from product descriptions with iterative prompt engineering. Option B is wrong because Vertex AI Model Garden is a repository for discovering and deploying pre-trained models, but it lacks the integrated prompt engineering and iterative testing environment needed for fine-tuning brand tone. Option D is wrong because Vertex AI Agent Builder is designed for building conversational agents and chatbots, not for generating and iterating on social media content from product descriptions.

Full explanation →

972

MCQeasy

A developer needs to integrate a GenAI model into an existing customer relationship management (CRM) system. The CRM exposes REST APIs and runs on-premises. Which integration pattern is MOST suitable?

A.Use Apps Script to call the model from Google Sheets

B.Fine-tune a model on CRM data and deploy it on-premises

C.Implement an API-first integration by calling Vertex AI API from the CRM's backend

D.Build a custom Google Workspace add-on

AnswerC

Vertex AI API can be called via HTTPS from any system that supports REST, making it ideal for integrating with an on-premises CRM.

Why this answer

API-first integration using Vertex AI API allows any system with HTTP capabilities to call the GenAI model. Workspace add-ons and Apps Script are for Google Workspace only. Fine-tuning doesn't help with integration.

Full explanation →

973

MCQhard

A financial services firm wants to use Gemini to analyze customer support transcripts and generate summaries. Compliance requires that the model never output any personally identifiable information (PII). Which combination of techniques should they implement?

A.Configure Gemini safety settings to block PII and use a separate PII detection API for post-processing

B.Fine-tune Gemini on redacted transcripts and rely on the model to not generate PII

C.Use a smaller model that has never seen PII in training

D.Only use prompt instructions telling the model to avoid PII

AnswerA

Safety filters reduce PII in outputs, and a post-processing API (e.g., DLP) redacts any remaining PII.

Why this answer

Using Gemini with safety filters and a post-processing step to redact PII provides defense in depth. Fine-tuning on redacted data might not cover all cases, and prompt instructions alone are not reliable.

Full explanation →

974

MCQhard

A company has deployed a GenAI-powered report generation system using Vertex AI. They notice that the cost is higher than expected. Investigation shows that many requests include very long prompts with repetitive boilerplate text. Which cost optimization strategy is MOST effective?

A.Increase the batch size for batch requests

B.Enable context caching for repeated prompt prefixes

C.Switch to a smaller model size

D.Ignore the cost increase as it will stabilize

AnswerB

Context caching stores repeated prefixes so they are not re-sent with each request, reducing token consumption and cost.

Why this answer

Caching repeated prompt prefixes can significantly reduce token usage and cost. Batch requests help with throughput but not with per-request token savings. Reducing model size may hurt quality.

Ignoring is not a strategy.

Full explanation →

975

MCQhard

A research team is training a large language model from scratch using TPUs on Google Cloud. Which storage solution provides the highest throughput for training data?

A.Cloud Storage

B.Persistent Disk

C.Cloud Filestore

D.Cloud Spanner

AnswerA

Cloud Storage provides high throughput for large datasets, especially with parallel reads.

Why this answer

Cloud Storage provides the highest throughput for training data because it is designed for high-bandwidth, parallel access from TPU pods via the Google Cloud Storage FUSE or gRPC-based data loading. TPUs benefit from Cloud Storage's ability to serve data at hundreds of GB/s when using the `tf.data` service with `tf.io.gfile` or the `gcloud storage` API, avoiding the I/O bottlenecks of block storage. Persistent Disk and Filestore have lower aggregate throughput limits and are not optimized for the distributed, streaming read patterns typical of large-scale training.

Exam trap

Google Cloud often tests the misconception that local or attached block storage (Persistent Disk) is faster than object storage for ML training, but candidates fail to recognize that TPU training requires distributed, parallel data access that object storage (Cloud Storage) uniquely provides at scale.

How to eliminate wrong answers

Option B is wrong because Persistent Disk is a block storage device with a maximum throughput of ~1.2 GB/s per instance (for pd-ssd), which is far below the multi-GB/s requirements of TPU training and cannot scale horizontally across many workers without complex striping. Option C is wrong because Cloud Filestore is a managed NFS filestore that introduces network latency and has throughput caps (e.g., 1.2 GB/s for the Basic tier, 4.8 GB/s for the High Scale tier), making it unsuitable for the high-throughput, low-latency data streaming needed by TPUs. Option D is wrong because Cloud Spanner is a globally distributed relational database service designed for transactional consistency and ACID compliance, not for high-throughput sequential read of training data; its throughput is limited by node count and query overhead, and it is not a file storage solution.

Full explanation →

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 901–975