Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 526600

997 questions total · 14pages · All types, answers revealed

Page 7

Page 8 of 14

Page 9
526
MCQmedium

A healthcare company needs to generate synthetic medical images for research while ensuring compliance with patient privacy regulations. Which Google Cloud generative AI service should they use?

A.Codey for code generation
B.Chirp for speech recognition
C.Imagen on Vertex AI
D.Gemini 1.5 Pro with multimodal prompting
AnswerC

Imagen is purpose-built for text-to-image generation and can be deployed securely on Vertex AI with compliance controls.

Why this answer

Imagen on Vertex AI is Google's image generation service that can create synthetic images and offers controls for responsible AI and data governance.

527
MCQhard

A legal firm wants to automate contract analysis. They need to extract key clauses (e.g., termination, indemnification) from scanned PDFs. The team expects high accuracy and must maintain data privacy. Which combination of services is most suitable?

A.Use AutoML Tables to train a classification model on text features
B.Use Document AI for OCR and Vertex AI with a custom fine-tuned model for clause extraction
C.Use Gemini API directly with a prompt to analyze PDFs
D.Use AppSheet to create a form for manual entry and then use BigQuery ML
AnswerB

Document AI extracts text from scans; a fine-tuned model on Vertex AI provides high accuracy and data stays in Google Cloud.

Why this answer

Document AI performs OCR and extracts text from scanned PDFs; Vertex AI with a custom fine-tuned model provides high accuracy for clause extraction while keeping data within the customer's project.

528
Multi-Selecthard

Which THREE factors should be considered when selecting a foundation model for a generative AI application in a regulated industry?

Select 3 answers
A.Transparency of the model's training data and sources
B.Support for data residency and sovereignty requirements
C.Latency and throughput requirements
D.Size of the model in terms of parameters
E.Bias and fairness evaluation results
AnswersA, B, E

Regulated industries require understanding of data provenance to ensure compliance.

Why this answer

Option A is correct because in regulated industries (e.g., healthcare, finance), transparency of training data and sources is critical for compliance with regulations like GDPR or HIPAA. Without knowing the provenance and composition of the training data, an organization cannot audit for prohibited content, verify consent, or ensure the model does not inadvertently expose sensitive information. This transparency directly impacts the ability to perform due diligence and meet legal obligations for data usage.

Exam trap

Google Cloud often tests the misconception that technical performance metrics (like latency or parameter count) are primary selection criteria for regulated industries, when in fact governance factors like transparency, data residency, and bias evaluation are the non-negotiable requirements.

529
MCQmedium

A developer is using the Gemini API to generate product descriptions. They want the output to be more focused and less random. Which parameter adjustment would BEST achieve this?

A.Decrease top-p to 0.1
B.Decrease temperature to 0.2
C.Increase top-k to 100
D.Increase temperature to 1.5
AnswerB

Lowering temperature reduces randomness, making the model more deterministic and focused on high-probability tokens.

Why this answer

Lowering temperature makes the model more deterministic and focused. Top-k and top-p control sampling but are secondary; raising temperature increases randomness.

530
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
B.Fine-tune a base LLM on the policy documents monthly
C.Use a larger foundation model with a longer context window and paste all documents into each prompt
D.Train a custom model from scratch on the policy documents each month
AnswerA

RAG retrieves the latest document chunks at query time, eliminating the need to retrain.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to retrieve relevant policy document chunks from a vector store at inference time, without requiring model retraining when documents are updated monthly. This decouples the knowledge base from the model weights, enabling dynamic updates by simply re-indexing the new documents into the vector database.

Exam trap

Cisco often tests the misconception that fine-tuning is the only way to incorporate domain knowledge, but here the key trap is ignoring the cost and frequency of updates—candidates may pick fine-tuning (B) because it seems 'customized,' without realizing RAG avoids retraining entirely.

How to eliminate wrong answers

Option B is wrong because fine-tuning a base LLM monthly on updated policy documents is costly, time-consuming, and risks catastrophic forgetting of previous policies, making it impractical for frequent updates. Option C is wrong because pasting all policy documents into each prompt exceeds typical context window limits (e.g., 4K–128K tokens), leading to truncation, high latency, and increased cost, while also violating the constraint of not retraining. Option D is wrong because training a custom model from scratch each month is prohibitively expensive, requires massive computational resources and data, and is unnecessary when a retrieval-based approach can leverage existing pre-trained LLMs.

531
Multi-Selectmedium

An enterprise is evaluating whether to build a custom fine-tuned model or use a pre-built API for code generation. Which three factors should they consider in the build vs. buy decision? (Choose THREE)

Select 3 answers
A.Data privacy and security requirements
B.Number of developer seats in the organization
C.Compatibility with on-premises legacy systems
D.Level of customization needed for the organization's coding standards
E.Availability of pre-built models for the specific programming language
AnswersA, D, E

If code contains proprietary logic, a custom model deployed within VPC may be necessary.

Why this answer

The three key factors are data privacy (build may be needed for IP-sensitive code), customization requirements (build offers more control), and cost/ROI (build requires upfront investment but may reduce per-query cost).

532
MCQmedium

During a proof-of-concept for a GenAI document summarization tool, the team wants to evaluate whether the summaries are accurate and retain key information before scaling. Which evaluation approach is most appropriate for this stage?

A.Measure latency and cost as the primary evaluation metrics
B.Deploy to all users and collect feedback via a survey
C.Run an A/B test with a small user group and have domain experts manually review a sample of summaries for accuracy
D.Use ROUGE scores exclusively to compare summaries against human-written ones
AnswerC

A/B testing with manual expert review provides reliable quality assessment before broader rollout.

Why this answer

A/B testing with manual review by domain experts provides qualitative and quantitative feedback on accuracy and completeness, which is crucial for a pilot. Automated metrics alone may not capture business relevance.

533
Multi-Selectmedium

A team wants to reduce hallucinations in a question-answering model. Which THREE techniques should they consider?

Select 3 answers
A.Fine-tune the model on a curated factual dataset
B.Use retrieval-augmented generation (RAG)
C.Apply prompt engineering with specific instructions to cite sources
D.Reduce the number of tokens in output
E.Increase the temperature parameter
AnswersA, B, C

Fine-tuning on factual data improves accuracy.

Why this answer

Fine-tuning on a curated factual dataset directly adjusts the model's weights to prioritize accurate, domain-specific knowledge, reducing the likelihood of generating unsupported or hallucinated content. This technique anchors the model's output in verified data, making it more reliable for question-answering tasks.

Exam trap

Google Cloud often tests the misconception that reducing output length or increasing randomness (temperature) can improve factual accuracy, when in reality these parameters control style and creativity, not truthfulness.

534
MCQmedium

A financial technology company has deployed a custom-tuned PaLM 2 model on Vertex AI to generate personalized investment recommendations for retail clients. The model was fine-tuned on a corpus of historical market data and advisory transcripts. Recently, the compliance team flagged that several recommendations contradicted SEC guidelines, and the model sometimes repeated prohibited statements from outdated training materials. The team has already implemented safety filters (e.g., blocking toxic content) and adjusted the model's system instructions to be more conservative. However, the issues persist. The model's deployment parameters are: temperature=0.4, top_p=0.9, max_output_tokens=500, and no grounding. The company must maintain compliance without significantly increasing latency. What should they do next?

A.Increase temperature to 0.7 to allow more diverse responses, and add a second model to verify outputs
B.Perform an additional fine-tuning round exclusively on the most recent SEC regulatory filings and compliance-approved content
C.Implement a chain-of-thought prompting technique that requires the model to explain its reasoning step by step
D.Configure Vertex AI grounding using a curated data store of real-time SEC regulations and market data
AnswerD

Grounding with an authoritative, live data source directly ensures outputs comply with current regulations and eliminates reliance on outdated training data.

Why this answer

Option D is correct because configuring Vertex AI grounding with a curated data store of real-time SEC regulations directly addresses the root cause: the model is generating outputs that contradict current compliance rules. Grounding forces the model to base its responses on authoritative, up-to-date sources, which is more effective than safety filters or system instructions alone, and it avoids the latency increase of a second model or the risk of catastrophic forgetting from additional fine-tuning.

Exam trap

Cisco often tests the misconception that fine-tuning or prompt engineering alone can solve compliance issues, when in fact grounding with authoritative data sources is the only reliable method for ensuring outputs adhere to real-time, external regulations without sacrificing latency.

How to eliminate wrong answers

Option A is wrong because increasing temperature to 0.7 would make outputs more random and less deterministic, increasing the likelihood of generating non-compliant statements, and adding a second model for verification would significantly increase latency and cost without fixing the underlying data contamination. Option B is wrong because performing additional fine-tuning on recent SEC filings risks catastrophic forgetting of the original training data and does not guarantee real-time compliance, as fine-tuning is static and cannot adapt to rapidly changing regulations. Option C is wrong because chain-of-thought prompting only improves reasoning transparency but does not constrain the model to use compliant sources; the model could still generate prohibited statements from its outdated training data.

535
Multi-Selectmedium

A retail company wants to generate personalized marketing content (emails, social posts) at scale using generative AI. They need consistent brand voice and the ability to review outputs before publishing. Which two Google Cloud capabilities should they use? (Choose TWO)

Select 2 answers
A.Google Workspace Duet AI in Docs for drafting, then manual review
B.AutoML Tables for predicting customer segments
C.Vertex AI Agent Builder with grounding and human-in-the-loop
D.Pre-trained Gemini model via Vertex AI API with no customization
E.Cloud Vision API for image analysis
AnswersA, C

Duet AI can assist in drafting content quickly, and manual review ensures brand alignment.

Why this answer

Option A is correct because Google Workspace Duet AI in Docs allows marketers to draft personalized content using generative AI while maintaining control over brand voice through iterative editing. The manual review step ensures outputs meet quality and compliance standards before publishing, addressing the need for human oversight in content generation.

Exam trap

The trap here is that candidates may confuse AutoML Tables (a predictive modeling tool) with generative AI capabilities, or overlook that pre-trained models without customization fail to meet brand voice requirements, while Cloud Vision API is irrelevant to text generation tasks.

536
MCQmedium

A financial institution is deploying a generative AI chatbot to provide investment advice. According to regulatory requirements, high-stakes AI decisions must have human review. Which setup BEST satisfies this requirement?

A.Use a separate AI model to review the first model's outputs and flag issues
B.AI generates recommendations, but a human advisor must review and approve before any action, and can override the AI
C.AI provides advice directly to the customer with a disclaimer that it is not financial advice
D.Allow the AI to execute trades automatically, with an audit log for later review
AnswerB

This ensures human oversight with override capability, meeting regulatory expectations.

Why this answer

Option B is correct because it directly implements the regulatory requirement for human-in-the-loop (HITL) oversight in high-stakes AI decisions. In financial advisory contexts, regulations like the EU AI Act or SEC guidelines mandate that a qualified human advisor must review and approve AI-generated recommendations before any action is taken, ensuring accountability and the ability to override erroneous outputs.

Exam trap

Cisco often tests the distinction between 'human review' and 'human oversight' — candidates mistakenly think that an audit log or a disclaimer satisfies the requirement, but the trap is that regulators require proactive human approval before the action occurs, not after.

How to eliminate wrong answers

Option A is wrong because using a separate AI model to review outputs merely replaces one automated system with another, failing to satisfy the regulatory requirement for human review; this is a form of 'AI oversight of AI' that does not provide the necessary human accountability. Option C is wrong because a disclaimer does not constitute human review; the AI is still directly providing advice to the customer without any human intervention, which violates the requirement for high-stakes decisions. Option D is wrong because allowing the AI to execute trades automatically with only an audit log for later review is a 'human-out-of-the-loop' approach; post-hoc auditing does not prevent harm from occurring in real time, and regulators require proactive human approval before execution.

537
MCQeasy

A startup is building a customer service chatbot that generates responses in real-time. They want the model to have up-to-date information on the latest product catalog but cannot afford frequent fine-tuning. Which technique should they use to inject current data into the model without retraining?

A.Rely on the model's zero-shot capabilities to infer product details.
B.Use retrieval-augmented generation (RAG) to fetch relevant documents from a vector database at inference time.
C.Craft detailed system prompts that include the entire product catalog in the prompt.
D.Fine-tune the base model weekly on the latest product catalog.
AnswerB

RAG enables the model to access external, up-to-date information without retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct technique because it allows the chatbot to fetch the most current product catalog entries from an external vector database at inference time, without requiring any model retraining. This keeps responses grounded in up-to-date information while avoiding the cost and latency of frequent fine-tuning.

Exam trap

Google Cloud often tests the distinction between in-context learning (via RAG or prompt engineering) and parametric knowledge (via fine-tuning), trapping candidates who think that simply adding more data to the prompt is scalable or that zero-shot inference can substitute for external retrieval.

How to eliminate wrong answers

Option A is wrong because zero-shot capabilities rely solely on the model's pre-existing knowledge, which cannot incorporate new or updated product catalog details without retraining. Option C is wrong because crafting detailed system prompts with the entire product catalog would exceed the model's context window limits and incur high token costs, making it impractical for real-time inference. Option D is wrong because fine-tuning weekly is expensive, time-consuming, and contradicts the requirement to avoid frequent retraining; it also risks catastrophic forgetting of previously learned information.

538
MCQmedium

An ML engineer sees the above deployment output. The business wants to reduce inference cost. Which action should they take?

A.Use a larger model
B.Change to a lower-cost machine type
C.Deploy to multiple regions
D.Increase traffic split
AnswerB

Using a smaller machine type reduces per-request compute cost.

Why this answer

Option B is correct because switching to a lower-cost machine type directly reduces the per-request compute cost without altering the model architecture or inference logic. This is a common cost-optimization strategy in cloud-based ML deployments, where instance types (e.g., from GPU to CPU or from a larger to a smaller GPU) can be selected based on latency and throughput requirements, provided the model fits within the machine's memory and compute constraints.

Exam trap

Google Cloud often tests the misconception that 'more resources' (larger model, more regions) always improves performance, but here the business goal is cost reduction, so the correct action is to downsize infrastructure while maintaining acceptable quality.

How to eliminate wrong answers

Option A is wrong because using a larger model increases both memory footprint and compute operations per inference, which raises cost and latency—the opposite of the business goal. Option C is wrong because deploying to multiple regions adds infrastructure overhead, data transfer costs, and management complexity, increasing rather than reducing inference cost. Option D is wrong because increasing traffic split (e.g., routing more requests to a shadow or canary deployment) does not reduce cost; it may increase resource utilization or require additional compute capacity.

539
MCQeasy

A retail company wants to build a chatbot that answers product questions and provides personalized recommendations. They have a small labeled dataset and limited ML expertise. Which approach should they take?

A.Fine-tune Gemini with their product data using Vertex AI Generative AI Studio.
B.Build a custom transformer model using TensorFlow on Vertex AI Workbench.
C.Use BigQuery ML to train a classification model on customer queries.
D.Use Vertex AI Agent Builder with a pre-built agent and integrate their product catalog via Search and Conversation.
AnswerD

A is correct because it leverages managed services with minimal ML effort.

Why this answer

Option D is correct because Vertex AI Agent Builder provides a pre-built agent framework that integrates with Search and Conversation, allowing the company to quickly deploy a chatbot using their product catalog without needing extensive ML expertise. This approach leverages Google's foundation models and retrieval-augmented generation (RAG) to answer product questions and generate personalized recommendations, making it ideal for a small labeled dataset and limited ML resources.

Exam trap

Google Cloud often tests the misconception that fine-tuning or custom model building is necessary for domain-specific tasks, when in fact pre-built agent frameworks with RAG can achieve the same goal with far less data and expertise.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini with a small labeled dataset risks overfitting and requires significant ML expertise to manage the fine-tuning pipeline, which the company lacks. Option B is wrong because building a custom transformer model from scratch using TensorFlow on Vertex AI Workbench demands deep ML expertise and large datasets, contradicting the company's constraints. Option C is wrong because BigQuery ML is designed for structured data classification (e.g., SQL-based models), not for building conversational chatbots that handle natural language queries and recommendations.

540
MCQeasy

A retailer wants to use generative AI to write product descriptions automatically. They have a large dataset of existing product descriptions and need to customize a foundation model for their brand voice. Which Vertex AI feature should they use?

A.Vertex AI Search with grounding
B.Prompt design with the Gemini API directly
C.Vertex AI Model Evaluation
D.Vertex AI custom model tuning
AnswerD

Tuning adapts the model to the retailer's brand voice using their dataset.

Why this answer

Vertex AI custom model tuning (Option D) is correct because it allows the retailer to fine-tune a foundation model on their proprietary dataset of existing product descriptions, adapting the model's output to match their specific brand voice and style. This process adjusts the model's weights using supervised learning on the retailer's data, enabling personalized and consistent content generation that generic prompt engineering cannot achieve.

Exam trap

Cisco often tests the distinction between prompt engineering (which only changes input instructions) and model tuning (which modifies the model's internal parameters), leading candidates to mistakenly choose prompt design when deep customization is required.

How to eliminate wrong answers

Option A is wrong because Vertex AI Search with grounding is designed for enterprise search and retrieval-augmented generation (RAG) to ground responses in specific data sources, not for fine-tuning a model to adopt a brand voice. Option B is wrong because prompt design with the Gemini API directly only modifies the input instructions without altering the underlying model weights, which is insufficient for deeply customizing the model's writing style to a unique brand voice. Option C is wrong because Vertex AI Model Evaluation is a tool for assessing model performance and detecting issues like bias or drift, not for training or customizing a model's output behavior.

541
MCQeasy

A company is evaluating whether to build a custom generative AI solution from scratch or use a pre-built API from a cloud provider. Which factor most strongly supports the build-from-scratch approach?

A.The team has limited machine learning expertise.
B.Speed to market is the top priority.
C.Minimizing initial development cost is critical.
D.The solution requires deep integration with proprietary data and unique domain-specific outputs.
AnswerD

Custom models can be fine-tuned on proprietary data for unique needs.

Why this answer

Building a custom generative AI solution from scratch is most strongly supported when deep integration with proprietary data and unique domain-specific outputs is required. Pre-built APIs are typically trained on general data and may not capture the nuances of specialized domains, whereas a custom model can be fine-tuned or trained from scratch on proprietary datasets to achieve higher accuracy and relevance for unique business needs.

Exam trap

The trap here is that candidates may confuse 'minimizing cost' (Option C) with long-term total cost of ownership, but The Generative AI Leader exam specifically tests the immediate strategic driver for build vs. buy, which is the need for proprietary data integration and unique outputs.

How to eliminate wrong answers

Option A is wrong because limited ML expertise would favor using a pre-built API to avoid the complexity of model training, infrastructure management, and hyperparameter tuning. Option B is wrong because speed to market is a key advantage of pre-built APIs, which offer immediate access to generative capabilities without the months of development required for a custom solution. Option C is wrong because minimizing initial development cost typically favors pre-built APIs, which have lower upfront investment compared to the significant costs of data preparation, compute resources, and specialized talent needed for building from scratch.

542
MCQmedium

A data scientist is trying to get online predictions from a Vertex AI endpoint but receives the error shown. What is the most likely cause?

A.The region in the request does not match the endpoint region
B.The model has not been deployed to the specified endpoint
C.The endpoint ID is incorrect
D.The model ID is incorrect
AnswerB

The error message directly states the model is not deployed to the endpoint.

Why this answer

The error indicates that the model is not deployed to the endpoint. In Vertex AI, an endpoint is a resource that hosts one or more deployed models. If a model has not been deployed to the endpoint, any prediction request to that endpoint will fail with a 'model not found' or similar error, even if the endpoint ID and region are correct.

Exam trap

Google Cloud often tests the distinction between endpoint existence and model deployment, where candidates confuse a valid endpoint ID with the requirement that a model must be explicitly deployed to that endpoint before predictions can be served.

How to eliminate wrong answers

Option A is wrong because if the region in the request did not match the endpoint region, the error would typically be a 'region mismatch' or 'not found' error at the API routing level, not a model deployment error. Option C is wrong because an incorrect endpoint ID would result in a '404 Not Found' or 'endpoint not found' error, not a model deployment error. Option D is wrong because the model ID is not directly used in the prediction request to an endpoint; the endpoint routes to the deployed model, so an incorrect model ID would not cause this specific error unless the model was never deployed.

543
Multi-Selecthard

An organization is building a multi‑agent workflow on Vertex AI where one agent analyzes an image (e.g., a scanned contract), another agent extracts text from the image, and a third agent answers questions about the contract. The solution must be low‑latency. Which THREE services are most appropriate?

Select 2 answers
A.Imagen
B.Vertex AI Search
C.Gemini on Vertex AI
D.Document AI
E.Cloud Vision API
AnswersC, E

Gemini can process images and text, serving as both the image analyst and the Q&A agent.

Why this answer

Gemini on Vertex AI (C) is the most appropriate service because it is a multimodal model that can natively analyze images, extract text, and answer questions about the content in a single, low-latency inference call. This eliminates the need to chain separate services for image analysis, OCR, and Q&A, reducing overall latency and architectural complexity.

Exam trap

Cisco often tests the misconception that multimodal tasks require separate specialized services (e.g., Cloud Vision for OCR + a separate LLM for Q&A), when in fact a single multimodal model like Gemini can perform all steps in one low-latency call.

544
MCQmedium

A developer wants to build a RAG application using Vertex AI. Which vector database is natively integrated with Vertex AI for storing embeddings?

A.Firestore
B.Vertex AI Vector Search
C.Cloud SQL
D.Bigtable
AnswerB

Vector Search is purpose-built for storing and querying embeddings.

Why this answer

Vertex AI Vector Search is the native vector database integrated with Vertex AI for storing and querying embeddings. It is purpose-built for high-dimensional vector similarity search, enabling efficient retrieval in RAG applications without requiring external infrastructure.

Exam trap

Google Cloud often tests the misconception that any database can store embeddings equally well, but the key differentiator is native vector indexing and ANN search support, which only Vertex AI Vector Search provides among the listed options.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database designed for storing structured data, not optimized for vector similarity search or embedding storage. Option C is wrong because Cloud SQL is a relational database service (MySQL, PostgreSQL, SQL Server) that lacks native vector indexing and similarity search capabilities required for RAG. Option D is wrong because Bigtable is a wide-column NoSQL database for large-scale analytical workloads, not designed for low-latency vector similarity queries.

545
MCQmedium

A developer deployed a large language model on Vertex AI for real-time chat. Users report slow response times. The model generates sentences one word at a time. Which optimization should be applied to reduce latency?

A.Batch multiple user queries together.
B.Deploy the model with more accelerators.
C.Enable prompt caching to reuse previous queries.
D.Use streaming responses to start output earlier.
AnswerD

Streaming sends tokens as they are generated, reducing the wait for the full response.

Why this answer

Option D is correct because streaming responses allow the model to send tokens to the client as they are generated, rather than waiting for the full sequence to complete. This reduces perceived latency significantly in real-time chat, as users see the first word appear almost immediately, even though the total generation time remains similar.

Exam trap

The trap here is that candidates often confuse throughput optimization (batching or more accelerators) with latency reduction, failing to recognize that streaming directly minimizes the time users wait for the first visible output in real-time scenarios.

How to eliminate wrong answers

Option A is wrong because batching multiple user queries together increases latency for individual requests, as the system waits to accumulate enough queries before processing, which is counterproductive for real-time chat. Option B is wrong because deploying with more accelerators improves throughput and total generation speed, but does not address the fundamental issue of word-by-word generation latency; the model still outputs one token at a time, and the user must wait for the full response. Option C is wrong because prompt caching reuses previous queries to avoid recomputation, but this optimization targets repeated or similar prompts, not the latency of generating a new response token-by-token.

546
MCQmedium

A company wants to build a chatbot that answers questions based on internal documents. Which approach is most appropriate?

A.Use a pre-trained model without any customizations
B.Train a custom model from scratch
C.Fine-tune a model on the documents
D.Use a prompt with the documents in the context
AnswerD

This is the core of RAG: provide relevant documents in the prompt to ground the model's answers.

Why this answer

Option D is correct because Retrieval-Augmented Generation (RAG) allows the chatbot to dynamically include relevant internal documents in the prompt context without modifying the underlying model. This approach leverages the pre-trained model's language understanding while grounding answers in specific, up-to-date internal data, avoiding the cost and latency of fine-tuning or retraining.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the only way to incorporate proprietary data, but RAG is the most appropriate for dynamic, retrieval-based Q&A because it avoids retraining and keeps the model's knowledge current.

How to eliminate wrong answers

Option A is wrong because a pre-trained model without customization lacks access to the company's internal documents, leading to hallucinated or generic answers not grounded in proprietary data. Option B is wrong because training a custom model from scratch is computationally prohibitive and unnecessary; it requires massive labeled datasets and resources, whereas RAG achieves the same goal with far less effort. Option C is wrong because fine-tuning on documents teaches the model to memorize specific content, which is inefficient for large, frequently updated document sets and risks catastrophic forgetting, whereas RAG keeps the model static and retrieves fresh context per query.

547
MCQhard

An enterprise customer needs to ensure that all data sent to the Gemini API is not used by Google for model improvement and must support a HIPAA BAA. Which access tier should they use?

A.Google AI Studio (free tier)
B.Vertex AI (enterprise tier)
C.Gemini API via Google Workspace
D.Gemini API via API key (developer tier)
AnswerB

Enterprise tier provides data isolation, no training on customer data, and HIPAA BAA.

Why this answer

Vertex AI for enterprise offers data isolation, no use of customer data for training, and HIPAA BAA. Google AI Studio and Gemini API (developer) do not provide these guarantees.

548
MCQhard

A data scientist wants to generate photorealistic images of products from text descriptions for an e-commerce catalog. The images must be brand-consistent and avoid generating distorted product features. Which Google Cloud generative AI service should they use?

A.Veo
B.Imagen
C.Chirp
D.Gemini Pro Vision
AnswerB

Imagen is Google's text-to-image diffusion model, built for creating realistic, high-fidelity images from prompts, with features for brand consistency.

Why this answer

Imagen is Google's text-to-image model that produces high-quality, photorealistic images. It is designed for brand consistency and safe image generation.

549
MCQmedium

A company wants to build a customer support chatbot that answers based on internal documentation. They use Vertex AI Search and want to ensure the model only uses retrieved documents. What should they do?

A.Fine-tune the model on the documentation
B.Enable grounding with Vertex AI Search
C.Increase max output tokens
D.Set temperature to 0.0
AnswerB

Ground forces the model to answer based on provided context.

Why this answer

Option B is correct because grounding with Vertex AI Search ensures the model's responses are strictly based on the retrieved documents from the internal documentation, preventing hallucination or reliance on pre-trained knowledge. Grounding works by providing the model with a search result context that it must use as the sole source for generating answers, effectively constraining the output to the provided documents.

Exam trap

Cisco often tests the distinction between controlling model behavior (temperature, token limits) and controlling the source of information (grounding, RAG), leading candidates to mistakenly choose temperature or token adjustments as a solution for hallucination prevention.

How to eliminate wrong answers

Option A is wrong because fine-tuning the model on the documentation would embed the knowledge into the model's parameters, but it does not guarantee that the model will only use that knowledge during inference; the model could still generate responses from its pre-trained weights or hallucinate. Option C is wrong because increasing max output tokens only controls the length of the response, not the source of the information; the model could still generate content not found in the retrieved documents. Option D is wrong because setting temperature to 0.0 makes the model deterministic (greedy decoding) but does not restrict the model to only use retrieved documents; it can still produce answers based on its internal knowledge.

550
MCQhard

A team is building a medical diagnosis assistant using a foundation model. To comply with regulations, they need to ensure the model does not make up facts. What is the best approach?

A.Use a small model to hallucinate less
B.Use grounding with Vertex AI Search
C.Reduce temperature to 0
D.Fine-tune on medical journals
AnswerB

Grounding provides verifiable citations and reduces fabrication.

Why this answer

Grounding with Vertex AI Search is the best approach because it connects the foundation model's outputs to a verifiable, curated knowledge base, ensuring factual accuracy and compliance with regulations that prohibit hallucination. By retrieving information from a trusted source (e.g., medical databases) in real time, the model can cite evidence and avoid generating unverified claims.

Exam trap

Google Cloud often tests the misconception that reducing temperature or using a smaller model can eliminate hallucination, when in fact only grounding with external, verifiable data sources can reliably prevent fact fabrication in high-stakes domains.

How to eliminate wrong answers

Option A is wrong because using a smaller model does not inherently reduce hallucination; smaller models have less capacity and may actually hallucinate more due to limited training data and weaker reasoning. Option C is wrong because reducing temperature to 0 makes the model deterministic but does not prevent it from generating plausible-sounding but false information; it still relies on its parametric knowledge, which can be incomplete or outdated. Option D is wrong because fine-tuning on medical journals alone does not guarantee factual accuracy; the model may memorize and reproduce errors, and it cannot dynamically verify facts against a live, authoritative source.

551
MCQhard

A company wants to generate a video from a text description using Google Cloud. Which service is designed for this?

A.Codey
B.Chirp
C.Imagen
D.Veo
AnswerD

Veo is Google's text-to-video generation model.

Why this answer

Veo is Google Cloud's generative AI model specifically designed for creating high-quality videos from text or image prompts. It leverages advanced diffusion and transformer architectures to generate coherent video sequences, making it the correct choice for text-to-video generation.

Exam trap

The trap here is that candidates often confuse Imagen (text-to-image) with Veo (text-to-video), assuming any generative visual model can handle video, but Google Cloud explicitly separates these capabilities into distinct services.

How to eliminate wrong answers

Option A is wrong because Codey is Google's model for code generation and chat, not video creation. Option B is wrong because Chirp is a speech-to-text and text-to-speech model, focused on audio processing. Option C is wrong because Imagen is a text-to-image model, capable of generating static images but not video sequences.

552
MCQhard

A media company uses generative AI to produce personalized news summaries. They notice that summaries occasionally contain factual errors and biased language. What business strategy should they implement to address these issues while maintaining user engagement?

A.Disable personalization and serve generic summaries to all users.
B.Allow users to flag errors and manually correct summaries in real-time.
C.Implement a human review layer for high-risk topics and use automated fact-checking for all content, with a feedback loop for model improvement.
D.Replace AI with entirely human-written summaries.
AnswerC

This ensures accuracy and allows continuous improvement.

Why this answer

Option C is correct because it balances accuracy and engagement by combining automated fact-checking with human review for high-risk topics. This hybrid approach reduces factual errors and biased language while maintaining the personalization that drives user engagement. The feedback loop continuously improves the model, addressing root causes rather than just symptoms.

Exam trap

Google Cloud often tests the misconception that either full automation or full human oversight is the only solution, when the correct answer is a hybrid approach that leverages the strengths of both AI and human judgment.

How to eliminate wrong answers

Option A is wrong because disabling personalization eliminates the core value proposition of generative AI for news summaries, likely reducing user engagement significantly without addressing the underlying model flaws. Option B is wrong because allowing real-time manual corrections by users is impractical at scale, introduces latency, and does not prevent errors from reaching users in the first place; it also lacks a systematic feedback mechanism for model improvement. Option D is wrong because replacing AI with entirely human-written summaries is cost-prohibitive, slow, and defeats the purpose of using generative AI for scalability and personalization.

553
MCQeasy

A marketing team wants to generate product descriptions using a text generation model on Vertex AI. They need consistent output style across all descriptions, including tone and length. They have a small set of 10 high-quality example descriptions that capture the desired style. The team has limited ML expertise and wants a quick solution that does not require model retraining. Which approach should they use?

A.Use a pre-built template with no model input.
B.Fine-tune the model on a large external dataset of product descriptions.
C.Use few-shot prompting with the examples in the prompt.
D.Set the temperature to 0.9 to maximize creativity.
AnswerC

Few-shot prompting directly leverages examples to achieve consistent style without retraining.

Why this answer

Few-shot prompting is the correct approach because it allows the team to inject the desired style, tone, and length directly into the prompt using the 10 high-quality examples, without any model retraining. This technique leverages the in-context learning capability of large language models on Vertex AI, enabling consistent output from a small set of demonstrations. It is ideal for teams with limited ML expertise as it requires only prompt engineering, not fine-tuning or infrastructure changes.

Exam trap

Google Cloud often tests the misconception that higher temperature always improves output quality, but the trap here is that temperature controls randomness, not consistency, so candidates may incorrectly choose Option D without understanding that low temperature is required for reproducible style and length.

How to eliminate wrong answers

Option A is wrong because a pre-built template with no model input cannot generate dynamic, context-aware product descriptions; it produces static text that lacks the flexibility and nuance of a generative model. Option B is wrong because fine-tuning on a large external dataset would require significant ML expertise, data preparation, and compute resources, contradicting the requirement for a quick solution without model retraining. Option D is wrong because setting temperature to 0.9 maximizes randomness and creativity, which is the opposite of what is needed for consistent output style; a lower temperature (e.g., 0.2) would be more appropriate for deterministic, reproducible results.

554
MCQeasy

A data scientist wants to generate realistic product images for an online catalog using Google Cloud's generative AI. Which service should they use?

A.Imagen on Vertex AI
B.Codey API for code generation
C.Gemini API with text-to-text prompts
D.Vertex AI Model Garden without a specific model
AnswerA

Imagen is purpose-built for image generation.

Why this answer

Imagen on Vertex AI is Google Cloud's specialized service for generating high-quality, photorealistic images from text prompts. It is built on diffusion models and is directly designed for image generation tasks, making it the correct choice for creating product images for an online catalog.

Exam trap

The trap here is that candidates may confuse the general-purpose Gemini API (which can handle multimodal inputs) with a dedicated image generation service, overlooking that Gemini's text-to-text mode does not generate images, while Imagen is purpose-built for that task.

How to eliminate wrong answers

Option B is wrong because Codey API is designed for code generation, not image generation; it uses models specialized in programming languages and cannot produce visual outputs. Option C is wrong because Gemini API with text-to-text prompts is optimized for text-based tasks like summarization or question answering, not for generating images; while Gemini can process images, its primary text-to-text mode does not generate visual content. Option D is wrong because Vertex AI Model Garden is a repository of pre-trained models and frameworks, but without selecting a specific model like Imagen, it cannot directly generate images; it requires explicit model selection and configuration.

555
Multi-Selectmedium

Which TWO techniques are effective for reducing bias in generative AI model outputs?

Select 2 answers
A.Increasing model size to learn more patterns
B.Training on diverse and representative datasets
C.Relying solely on post-hoc filters
D.Using adversarial debiasing methods during fine-tuning
E.Limiting the model to only factual prompts
AnswersB, D

Correct: Diverse data helps reduce biased associations.

Why this answer

Option B is correct because training on diverse and representative datasets directly reduces sampling bias and coverage gaps in the training distribution, which are primary sources of stereotypical or skewed outputs. By ensuring the model sees balanced examples across demographics, contexts, and edge cases, it learns more equitable representations and reduces the likelihood of generating biased content.

Exam trap

Google Cloud often tests the misconception that increasing model size or adding post-hoc filters is sufficient to mitigate bias, when in reality these approaches fail to address the root causes of bias in training data and model representations.

556
MCQhard

A financial institution deploys a chatbot using Gemini Pro in Vertex AI. Compliance requires logging all user inputs and model outputs for audit. Which approach meets this requirement?

A.Capture logs via Cloud Monitoring
B.Enable Vertex AI Endpoint request-response logging
C.Use Cloud Logging sink with a filter for Vertex AI requests
D.Enable Vertex AI Model Registry logging
AnswerB

This captures every request and response for the deployed model, meeting audit requirements.

Why this answer

Vertex AI Endpoint request-response logging captures both the user's input prompt and the model's generated output, which is precisely what compliance auditing requires. This feature logs the exact payloads sent to and received from the deployed model, ensuring a complete audit trail without additional configuration.

Exam trap

The trap here is that candidates confuse Cloud Logging sinks or Cloud Monitoring with the specific Vertex AI feature that must be explicitly enabled on the endpoint, assuming that default logging captures request-response payloads when it does not.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is designed for metrics, alerts, and dashboards, not for capturing detailed request-response payloads for audit compliance. Option C is wrong because a Cloud Logging sink with a filter can only export logs that already exist; it does not enable the capture of Vertex AI request-response logs, which must be explicitly enabled on the endpoint. Option D is wrong because Vertex AI Model Registry logging tracks model version metadata and lifecycle events, not the user inputs and model outputs from inference calls.

557
Multi-Selecthard

A financial analyst uses generative AI to summarize earnings reports. The summaries vary in style. Which THREE methods can improve consistency? (Choose three.)

Select 3 answers
A.Set temperature to 0.2
B.Increase max output tokens
C.Enable citation mode
D.Use few-shot prompting with fixed examples
E.Fine-tune on a curated dataset of desired summaries
AnswersA, D, E

Reduces output randomness.

Why this answer

Setting temperature to 0.2 reduces randomness in token sampling, making the model more deterministic and less likely to produce stylistic variations. Lower temperatures (e.g., 0.1–0.3) narrow the probability distribution, forcing the model to select the most likely next token, which directly improves consistency across multiple summaries.

Exam trap

Cisco often tests the misconception that increasing output tokens or enabling citations improves consistency, when in fact these features address length or attribution, not stylistic uniformity.

558
Multi-Selecteasy

A developer is using the Vertex AI PaLM API to generate code. They want to ensure the output is safe and adheres to company policies. Which THREE attributes can they configure in the safety_settings parameter?

Select 3 answers
A.Language detection
B.Sentiment analysis
C.Toxicity
D.Harassment
E.Sexually explicit content
AnswersC, D, E

Toxicity is a standard safety category.

Why this answer

Option C is correct because the safety_settings parameter in the Vertex AI PaLM API allows developers to filter content based on predefined harm categories, including toxicity. This setting enables the API to block or adjust responses that contain toxic language, ensuring the generated code adheres to company safety policies by preventing harmful or offensive outputs.

Exam trap

The trap here is that candidates may confuse general NLP features (like language detection or sentiment analysis) with the specific safety filtering attributes available in the safety_settings parameter, leading them to select options that are not part of the API's harm category configuration.

559
MCQhard

A data science team wants to build a custom model for generating product descriptions that adhere to specific brand guidelines. They have 5,000 high-quality examples. Which approach balances cost and accuracy?

A.Fine-tune a foundation model (e.g., PaLM 2) using Vertex AI Model Garden
B.Use a pre-built API with prompt engineering and few-shot examples
C.Use Vertex AI Agent Builder with a custom prompt
D.Train a model from scratch using TensorFlow on Vertex AI
AnswerA

Fine-tuning adapts the model to the specific style with a reasonable cost.

Why this answer

Fine-tuning a foundation model on the examples yields high accuracy with moderate cost. Training from scratch is overkill; prompt engineering may not capture all nuances.

560
Multi-Selectmedium

A global e-commerce company wants to build a multilingual customer support chatbot that can understand queries in 50 languages and respond in the same language. They need to process both text and images (e.g., product photos) in the queries. Which THREE Google Cloud services should they consider? (Select 3 options.)

Select 2 answers
A.Translation API
B.Vision API
C.Natural Language API
D.Text-to-Speech
E.Gemini API (multimodal)
AnswersA, E

Translation API can translate between 100+ languages, supplementing Gemini's language support for less common languages.

Why this answer

Translation API is correct because it provides real-time language translation across 100+ languages, enabling the chatbot to understand and respond in 50 languages. It integrates directly with other Google Cloud services to maintain language consistency in a multilingual pipeline.

Exam trap

Cisco often tests the distinction between single-purpose APIs (like Vision API or Natural Language API) and multimodal models (like Gemini API) that can handle both text and images natively, leading candidates to over-select specialized services instead of the integrated multimodal solution.

561
MCQmedium

A user reports that the model's response to the same prompt varies significantly across different calls. Which parameter change would most likely reduce variability?

A.Decrease topK to 10.
B.Decrease temperature to 0.2.
C.Increase candidateCount to 3.
D.Increase maxOutputTokens to 2000.
AnswerB

Lower temperature reduces randomness, making outputs more consistent.

Why this answer

Temperature controls the randomness of token sampling. Lowering temperature (e.g., to 0.2) makes the model's output more deterministic by reducing the probability of low-likelihood tokens, thus decreasing variability across calls for the same prompt.

Exam trap

Cisco often tests the misconception that topK or candidateCount are the primary controls for output variability, when in fact temperature is the direct parameter governing randomness in token selection.

How to eliminate wrong answers

Option A is wrong because decreasing topK to 10 still allows sampling from a limited set of tokens, which can introduce variability if temperature is not also reduced; topK alone does not control randomness as directly as temperature. Option C is wrong because increasing candidateCount to 3 generates multiple independent responses, which increases variability rather than reducing it. Option D is wrong because increasing maxOutputTokens to 2000 only extends the maximum length of the response, not the consistency of the output; it has no effect on token selection randomness.

562
MCQhard

A multimodal generative AI system processes both image and text inputs to produce captions. During inference, the image encoder sometimes produces noisy or missing features. Which architectural design decision best handles such input degradation without retraining?

A.Train a separate variational autoencoder to produce a clean latent representation from the noisy image.
B.Increase the image encoder’s capacity to better extract robust features.
C.Apply standard image preprocessing (e.g., denoising) to all inputs before feeding to the encoder.
D.Introduce a gating mechanism that learns to weigh image features based on confidence scores from the encoder.
AnswerD

Gating allows the model to ignore unreliable features dynamically.

Why this answer

Option D is correct because a gating mechanism dynamically adjusts the contribution of image features based on confidence scores from the encoder, allowing the model to gracefully handle noisy or missing features without retraining. This architectural design learns to suppress unreliable image inputs and rely more on text or other modalities, ensuring robust caption generation under input degradation.

Exam trap

Google Cloud often tests the misconception that preprocessing or model capacity adjustments are the only ways to handle input noise, but the key insight is that architectural mechanisms like gating can adaptively handle degradation at inference time without retraining.

How to eliminate wrong answers

Option A is wrong because training a separate variational autoencoder (VAE) to produce clean latent representations requires additional training data and retraining, which contradicts the 'without retraining' constraint; it also adds complexity without addressing dynamic degradation during inference. Option B is wrong because increasing the image encoder’s capacity does not inherently handle noisy or missing features—it may overfit to training data and still produce unreliable outputs when inputs degrade, and it requires retraining to change capacity. Option C is wrong because standard image preprocessing like denoising is a fixed, non-adaptive approach that cannot compensate for missing features or varying noise levels, and it may discard useful information; it also does not leverage the model’s ability to learn confidence-based weighting.

563
MCQmedium

A company is developing a generative AI application that will be used by customers in multiple countries, including those with strict data residency laws. How should they approach data governance?

A.Store all data in a single central data center to simplify management
B.Use a VPN to route data through compliant regions
C.Use data residency controls to keep data in specified regions
D.Anonymize all data before processing to avoid residency issues
AnswerC

Data residency controls ensure compliance by restricting data storage and processing to allowed locations.

Why this answer

Option C is correct because data residency controls, such as those provided by cloud providers (e.g., AWS Organizations SCPs, Azure Policy, or GCP Organization Policies), allow the company to enforce that data is stored and processed only within specified geographic regions. This directly addresses strict data residency laws by preventing data from leaving the jurisdiction, which is a fundamental requirement for compliance with regulations like GDPR or Brazil's LGPD. Unlike workarounds, this approach provides native, auditable enforcement at the infrastructure level.

Exam trap

Cisco often tests the misconception that technical workarounds like VPNs or anonymization can substitute for native data residency enforcement, when in fact only infrastructure-level controls provide the auditable, deterministic compliance required by law.

How to eliminate wrong answers

Option A is wrong because storing all data in a single central data center violates data residency laws that require data to remain within specific national or regional boundaries, and it does not provide any mechanism to segregate or control data flow based on user location. Option B is wrong because using a VPN to route data through compliant regions does not change the physical storage location of the data; it only masks the network path, and the data still resides in a non-compliant data center, which fails legal audits. Option D is wrong because anonymization is not a guaranteed solution for data residency; many regulations (e.g., GDPR) still apply to pseudonymized or anonymized data if re-identification is possible, and the data's physical location remains non-compliant unless stored in the required region.

564
MCQhard

After fine-tuning a model on customer support data, the model starts using profanity. What is the most effective mitigation?

A.Add profanity to training data as negative examples
B.Reduce learning rate and retrain
C.Increase temperature to reduce confidence
D.Enable a safety attribute filter
AnswerD

Blocks profanity in real-time without retraining.

Why this answer

Enabling a safety attribute filter is the most effective mitigation because it acts as a post-processing guardrail that blocks profanity at inference time, regardless of the model's training data. This is a standard practice in production LLM deployments, where safety filters (e.g., using keyword matching or classifier models) intercept and redact harmful outputs before they reach the user, providing immediate and reliable control without requiring retraining.

Exam trap

Cisco often tests the misconception that modifying training parameters (like learning rate or temperature) can fix output quality issues, when in fact post-processing filters are the standard, immediate solution for content safety in production LLM systems.

How to eliminate wrong answers

Option A is wrong because adding profanity as negative examples in training data can inadvertently reinforce the behavior or cause the model to learn spurious correlations, and it does not guarantee removal of already learned profanity patterns. Option B is wrong because reducing the learning rate and retraining only adjusts the model's weights during fine-tuning, which does not address the root cause of profanity generation and may not eliminate learned toxic patterns without extensive data curation. Option C is wrong because increasing temperature increases randomness in token sampling, which can actually increase the likelihood of generating profanity by making the model less deterministic, not reduce it.

565
MCQeasy

A product manager wants to quickly build a conversational agent that can answer FAQs from the company's help center articles. They have limited coding experience. Which Google Cloud service is BEST suited for this task?

A.Vertex AI Pipelines
B.Vertex AI Agent Builder
C.Vertex AI Studio
D.Model Garden
AnswerB

Agent Builder allows building chatbots with a visual interface and easy integration with data stores.

Why this answer

Vertex AI Agent Builder is the best choice because it provides a no-code/low-code interface specifically designed for building conversational agents and search experiences. It allows the product manager to connect help center articles as a data source and automatically generate a FAQ-answering agent without writing code, making it ideal for someone with limited coding experience.

Exam trap

Cisco often tests the distinction between tools for building agents (Agent Builder) versus tools for model experimentation (Studio) or model selection (Model Garden), and candidates mistakenly choose Studio or Model Garden because they think any generative AI tool can build a chatbot, ignoring the specific no-code agent-building capability required.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines is a tool for orchestrating and automating ML workflows (e.g., training, deployment) and requires pipeline definition via code or SDK, not for quickly building a conversational agent. Option C is wrong because Vertex AI Studio is a platform for experimenting with and tuning generative AI models (like prompts and foundation models), but it does not provide a built-in agent builder for connecting to help center articles and generating FAQ responses without custom development. Option D is wrong because Model Garden is a repository of pre-trained foundation models and does not include the agent-building or data-connecting capabilities needed to create a conversational FAQ agent.

566
MCQhard

A company wants to run a large-scale training job for a 175B parameter model. They need to minimize training time and cost. Which TPU version and configuration should they choose?

A.TPU v2-8
B.TPU v3-32
C.TPU v4-64
D.TPU v5e-256
AnswerD

v5e provides a good balance of performance and cost for large-scale training.

Why this answer

Option D is correct because the TPU v5e-256 offers the best performance-per-dollar for large-scale training of a 175B parameter model. With 256 chips in a pod, it provides massive parallelism and high memory bandwidth, significantly reducing training time compared to earlier generations while maintaining cost efficiency through optimized architecture.

Exam trap

The trap here is that candidates often assume larger chip count alone (like v4-64) is sufficient, but fail to consider the memory capacity per chip and the cost-efficiency of newer generations, leading them to overlook the v5e-256's superior balance of scale and affordability.

How to eliminate wrong answers

Option A is wrong because TPU v2-8 provides only 8 chips with 64 GB HBM total, which is far too small for a 175B parameter model (requiring ~350 GB just for parameters) and lacks the memory capacity and inter-chip interconnect speed needed for efficient distributed training. Option B is wrong because TPU v3-32 offers 32 chips with 128 GB HBM total, still insufficient memory for a 175B model and uses older interconnects that create bottlenecks at scale, leading to longer training times and higher overall cost. Option C is wrong because TPU v4-64 provides 64 chips with 256 GB HBM total, which is still below the memory requirement for a 175B model and, while faster than v3, does not match the cost-efficiency and throughput of v5e at the required scale.

567
Multi-Selecteasy

A developer is using the Gemini API to generate text summaries. They want to control the creativity and diversity of the output. Which THREE parameters can they adjust?

Select 3 answers
A.Context window
B.Top-p
C.Embedding dimension
D.Top-k
E.Temperature
AnswersB, D, E

Top-p (nucleus sampling) selects from the smallest set of tokens whose cumulative probability exceeds p, controlling diversity.

Why this answer

Option B (Top-p) is correct because it controls the nucleus sampling threshold, where the model considers only the smallest set of tokens whose cumulative probability exceeds the specified p value (e.g., 0.9). This directly influences the diversity of the generated text by limiting the token pool to the most likely candidates, reducing the chance of sampling very low-probability tokens.

Exam trap

Cisco often tests the distinction between parameters that affect input processing (context window, embedding dimension) versus those that control output generation (temperature, top-k, top-p), leading candidates to mistakenly select context window as a creativity parameter.

568
MCQhard

A research lab is fine-tuning a large language model on a small dataset of medical records. They observe that the model overfits, memorizing specific patient details and producing outputs that violate privacy regulations. Which technique should they apply to improve generalization and reduce memorization?

A.Increase the batch size to 64
B.Increase the number of training epochs
C.Use early stopping based on validation loss
D.Apply differential privacy (DP-SGD) during fine-tuning
AnswerD

DP-SGD bounds the influence of any single example, reducing memorization and improving privacy.

Why this answer

Differential privacy (DP-SGD) is the correct technique because it directly addresses memorization of sensitive patient data by adding calibrated noise to the gradient updates during fine-tuning. This bounds the model's ability to encode any single individual's information, improving generalization and ensuring compliance with privacy regulations like HIPAA.

Exam trap

Google Cloud often tests the misconception that early stopping or batch size adjustments can prevent memorization, when in fact only techniques like differential privacy directly bound the influence of individual training examples.

How to eliminate wrong answers

Option A is wrong because increasing batch size to 64 reduces gradient variance but does not prevent memorization of specific patient details; it may even accelerate overfitting on a small dataset. Option B is wrong because increasing the number of training epochs exacerbates overfitting, causing the model to memorize more training examples and worsen privacy violations. Option C is wrong because early stopping based on validation loss only halts training when validation performance degrades, but it does not impose any privacy guarantee or fundamentally limit memorization of unique patient records.

569
MCQmedium

A team wants to improve the factual accuracy of their chatbot responses regarding internal company policies. What is the most effective approach?

A.Use few-shot prompting with example Q&A pairs
B.Increase the model's maximum tokens
C.Fine-tune the model on policy documents
D.Use RAG with Vertex AI Search indexing the policies
AnswerD

Correct: RAG retrieves fresh data from indexed policies, ensuring factual accuracy.

Why this answer

RAG with Vertex AI Search is the most effective approach because it retrieves relevant, up-to-date policy documents from a curated index and injects them into the prompt context at inference time, grounding the chatbot's responses in authoritative sources without modifying the underlying model. This ensures factual accuracy for dynamic or evolving policies, as the model can reference the exact text rather than relying on static training data.

Exam trap

Cisco often tests the misconception that fine-tuning (Option C) is the best way to improve factual accuracy for dynamic knowledge, when in reality RAG is superior because it avoids model retraining and provides verifiable source citations.

How to eliminate wrong answers

Option A is wrong because few-shot prompting provides example Q&A pairs but does not guarantee the model will recall or cite the correct policy details, especially for nuanced or updated policies; it relies on the model's parametric memory, which can be incomplete or outdated. Option B is wrong because increasing the maximum tokens only expands the output length, not the factual grounding; it does nothing to improve the accuracy of the content generated. Option C is wrong because fine-tuning on policy documents embeds static knowledge into the model weights, making it difficult to update when policies change and risking catastrophic forgetting of other capabilities; it also does not provide a mechanism to cite specific sources or handle real-time retrieval.

570
MCQeasy

A company notices that their AI chatbot occasionally generates incorrect information. Which technique can best reduce hallucinations without retraining?

A.Use a longer system prompt without examples
B.Use system instructions to constrain the model to only answer from provided context
C.Set top_p to 0.1
D.Increase temperature to 0.9
AnswerB

Correct: This confines the model to the given context, minimizing hallucination.

Why this answer

Option B is correct because constraining the model to answer only from provided context directly addresses the root cause of hallucinations—the model generating information not grounded in verified sources. This technique, often implemented via system instructions or retrieval-augmented generation (RAG) pipelines, forces the model to rely on a trusted knowledge base rather than its parametric memory, effectively eliminating unsupported fabrications without requiring retraining.

Exam trap

Cisco often tests the misconception that adjusting sampling parameters (like top_p or temperature) can fix hallucinations, when in reality these parameters control randomness, not factual grounding, and the correct solution is to constrain the model's output to a trusted context.

How to eliminate wrong answers

Option A is wrong because using a longer system prompt without examples does not prevent hallucinations; it may actually increase the risk by introducing more ambiguous or conflicting instructions, and without explicit grounding constraints, the model can still generate unverified content. Option C is wrong because setting top_p to 0.1 reduces the diversity of token sampling but does not enforce factual accuracy—it merely makes outputs more deterministic, which can still produce confident hallucinations if the model's internal knowledge is flawed. Option D is wrong because increasing temperature to 0.9 increases randomness and creativity in outputs, which exacerbates hallucination risk by making the model more likely to generate improbable or fabricated information.

571
Multi-Selecthard

Which THREE benefits does Vertex AI Agent Builder provide over building a custom conversational agent from scratch?

Select 3 answers
A.Automatic scaling and load balancing
B.Pre-built integration for grounding on enterprise data sources
C.Full control over the underlying ML model architecture
D.Built-in safety filters and guardrails
E.Guaranteed lower inference latency
AnswersA, B, D

Managed service scales according to demand without manual intervention.

Why this answer

Vertex AI Agent Builder provides automatic scaling and load balancing as a managed service, handling infrastructure provisioning and traffic distribution across multiple instances without manual intervention. This eliminates the need to configure Kubernetes clusters or load balancers yourself, which is required when building a custom conversational agent from scratch.

Exam trap

The trap here is that candidates may confuse 'full control' (Option C) with the flexibility of Vertex AI Agent Builder, which actually limits architectural control in favor of managed simplicity, and may assume managed services always provide lower latency (Option E) without considering that custom optimizations can outperform generic managed solutions.

572
Multi-Selectmedium

A developer is using the Gemini API to generate marketing copy. They want the output to be diverse and creative but still relevant to the topic. Which THREE parameter adjustments would help achieve this? (Choose 3)

Select 3 answers
A.Increase temperature to 0.9
B.Increase top-k to 50
C.Decrease temperature to 0.1
D.Decrease top-k to 10
E.Increase top-p to 0.95
AnswersA, B, E

Higher temperature increases randomness, leading to more creative outputs.

Why this answer

Higher temperature increases randomness and creativity. Higher top-k and higher top-p both allow more tokens to be considered, increasing diversity. Lowering these would make output more focused.

573
MCQmedium

A startup develops a generative AI tool for legal document review. To ensure explainability, they want the model to cite specific clauses from source documents when making assertions. Which technique should they use?

A.Fine-tuning on legal documents with citation examples
B.Chain-of-thought prompting
C.Grounding using a retrieval system that provides source documents
D.Prompt engineering to ask for citations
AnswerC

Grounding forces the model to retrieve and cite actual source material, improving explainability.

Why this answer

Grounding in generative AI means the model cites verifiable sources for its outputs, which is essential for explainability in domains like law.

574
MCQmedium

A healthcare startup needs to process sensitive patient data using NLP models on Google Cloud. They require HIPAA compliance and the ability to run models within their VPC. Which service should they use to access Gemini models?

A.Gemini API directly via API key
B.BigQuery ML
C.Vertex AI
D.Google AI Studio
AnswerC

Vertex AI offers VPC Service Controls, data isolation, audit logging, and can sign a HIPAA BAA.

Why this answer

Vertex AI provides enterprise-grade features including VPC Service Controls, data isolation, and HIPAA BAA. Google AI Studio is free-tier prototyping only and does not offer these compliance or security controls.

575
MCQmedium

A company is piloting a GenAI feature for email drafting in Gmail. They want to measure productivity improvement. Which metric is MOST directly tied to the business goal of reducing time spent on email composition?

A.Adoption rate of the GenAI feature among the pilot group
B.Increase in employee satisfaction survey scores
C.Reduction in average time-to-send per email
D.Reduction in total tokens consumed per email
AnswerC

This directly measures the productivity improvement (time saved) when drafting emails.

Why this answer

Option C is correct because the primary business goal is to reduce the time employees spend composing emails. Measuring the average time-to-send per email directly quantifies this efficiency gain, as it captures the end-to-end duration from initiation to dispatch, which the GenAI feature aims to shorten by generating draft content.

Exam trap

Cisco often tests the distinction between proxy metrics (like token consumption or adoption) and direct business outcome metrics, so the trap here is that candidates confuse technical efficiency (fewer tokens) with user productivity (time saved).

How to eliminate wrong answers

Option A is wrong because adoption rate measures how many users try the feature, not the actual productivity impact; high adoption could occur even if the feature does not save time. Option B is wrong because employee satisfaction scores are a lagging, subjective indicator that can be influenced by factors unrelated to email composition speed, such as overall morale or feature usability. Option D is wrong because reduction in total tokens consumed per email measures model efficiency or verbosity, not the business-relevant time savings; fewer tokens do not guarantee faster composition due to latency or user review time.

576
MCQeasy

A company wants to estimate the total cost of ownership (TCO) for a gen AI solution on Google Cloud. Which factors are most important?

A.Only model training cost
B.Compute, storage, and API call costs
C.Only inference cost
D.Only compute cost
AnswerB

These three categories cover the primary cost drivers in a gen AI solution.

Why this answer

Option B is correct because the total cost of ownership (TCO) for a generative AI solution on Google Cloud encompasses all operational expenses, including compute (e.g., TPU/GPU instances for training and inference), storage (e.g., Cloud Storage for datasets and model artifacts), and API call costs (e.g., Vertex AI prediction requests). Focusing on a single cost component, such as training or inference alone, ignores the recurring expenses of serving the model and storing data, which often dominate long-term TCO.

Exam trap

Google Cloud often tests the misconception that TCO is dominated by a single cost factor (e.g., training), when in reality, inference and API costs frequently surpass training expenses in production deployments.

How to eliminate wrong answers

Option A is wrong because it ignores inference, storage, and API costs, which are significant for production gen AI solutions where models are queried repeatedly. Option C is wrong because inference cost is only one part of TCO; training, storage, and API overhead also contribute heavily, especially with large models like PaLM 2 or Gemini. Option D is wrong because compute cost alone excludes storage (e.g., model checkpoints, training data) and API call fees (e.g., per-token billing for Vertex AI), leading to an incomplete TCO estimate.

577
MCQmedium

A company is using Vertex AI Model Registry to manage multiple versions of its custom generative model. They want to automatically route a percentage of traffic to a new model version for testing. What should they do?

A.Set up a Cloud Tasks queue to distribute requests
B.Create a new endpoint for each version
C.Deploy both versions to the same endpoint and adjust traffic split settings
D.Use a load balancer in front of the endpoints
AnswerC

Vertex AI endpoints allow splitting traffic percentage across deployed models.

Why this answer

Vertex AI Endpoints support traffic splitting between model versions.

578
Multi-Selecthard

Which TWO strategies can effectively reduce the operational costs of a generative AI model in production without significantly degrading user experience?

Select 2 answers
A.Use larger batch sizes for inference
B.Increase the frequency of model retraining to improve efficiency
C.Cache frequent prompt completions
D.Adopt a pay-per-use pricing model instead of a flat rate
E.Deploy multiple models and route requests by complexity
AnswersC, D

Caching reduces duplicate inference calls, lowering cost.

Why this answer

Caching frequent prompt completions reduces operational costs by eliminating redundant inference calls for identical or similar user requests. This directly lowers compute usage and latency without degrading user experience, as cached responses are served instantly. It is a common optimization in production LLM deployments, especially for high-traffic applications with repetitive queries.

Exam trap

Google Cloud often tests the misconception that increasing batch sizes or retraining frequency inherently reduces costs, when in fact these actions typically increase resource usage or introduce operational overhead without guaranteeing cost savings.

579
Multi-Selecthard

A company is deploying a generative AI model for medical diagnosis assistance. To comply with both Google's AI Principles and emerging regulations (e.g., EU AI Act), they must ensure appropriate human oversight. Which THREE measures should they implement?

Select 3 answers
A.Use a model with high confidence scores to bypass human review
B.Allow the AI to act autonomously for low-risk cases to reduce workload
C.Require a human clinician to review all AI-generated diagnoses before acting on them
D.Provide an override mechanism that allows the human to reject the AI's recommendation
E.Document the human-in-the-loop process and roles clearly
AnswersC, D, E

High-stakes decisions need human review.

Why this answer

Option C is correct because requiring a human clinician to review all AI-generated diagnoses before acting on them directly implements the human oversight mandate of Google's AI Principles (specifically the 'be accountable to people' principle) and the EU AI Act's requirement for high-risk AI systems to have meaningful human review. This ensures that the model's output is validated by a domain expert, mitigating risks of false positives or negatives that could harm patients.

Exam trap

The trap here is that candidates mistakenly believe high confidence scores or low-risk classifications can justify removing human oversight, but the exam tests that regulatory frameworks like the EU AI Act require human review for all high-risk AI outputs regardless of confidence or perceived risk level.

580
MCQeasy

A startup wants to build a generative AI application for customer support. Their main concern is cost control while maintaining low latency. Which Google Cloud service is most suitable for deploying their custom model?

A.BigQuery ML
B.Cloud Run
C.Vertex AI Workbench
D.Vertex AI Prediction
AnswerD

Vertex AI Prediction provides autoscaling online prediction endpoints with low latency, ideal for cost-sensitive production.

Why this answer

Vertex AI Prediction is the correct choice because it provides a fully managed, serverless endpoint for deploying custom models with autoscaling to zero, which directly addresses the startup's need for cost control by only charging for compute resources when the endpoint serves predictions. It also supports low latency through optimized prediction containers and can leverage GPUs or TPUs for inference, making it ideal for real-time customer support applications.

Exam trap

The trap here is that candidates often confuse development tools (like Vertex AI Workbench) or batch inference services (like BigQuery ML) with production deployment services, overlooking that Vertex AI Prediction is the only option purpose-built for serving custom models with cost-efficient, low-latency inference.

How to eliminate wrong answers

Option A is wrong because BigQuery ML is designed for training and executing machine learning models using SQL queries directly within BigQuery, not for deploying custom models as low-latency, real-time prediction endpoints; it is more suited for batch inference on large datasets. Option B is wrong because Cloud Run is a serverless compute platform for running stateless containers, but it lacks native support for model serving optimizations like GPU acceleration, model versioning, and autoscaling tailored to inference workloads, which are critical for cost-effective, low-latency predictions. Option C is wrong because Vertex AI Workbench is a Jupyter-based development environment for building and training models, not a deployment service; it does not provide managed prediction endpoints or the infrastructure for serving custom models in production.

581
MCQeasy

A marketing team wants to generate consistent brand-aligned social media posts using Vertex AI Studio. Which prompt engineering technique should they use to ensure the output tone matches their brand voice?

A.Set the temperature to 0 and use a long system instruction
B.Provide a few-shot prompt with examples of previous brand-aligned posts
C.Use a zero-shot prompt describing the brand voice
D.Use chain-of-thought prompting to explain the reasoning behind each post
AnswerB

Few-shot examples guide the model to replicate the desired tone and style.

Why this answer

Few-shot examples provide the model with clear examples of desired tone, ensuring consistency. Zero-shot or chain-of-thought are less effective for tone adherence.

582
MCQmedium

The exhibit shows the output of describing a model on Vertex AI. What does 'modelSource: MODEL_GARDEN' indicate about this model?

A.The model was imported from the Vertex AI Model Garden.
B.The model was trained on Vertex AI from scratch.
C.The model has been exported to Model Garden.
D.The model was fine-tuned using AutoML.
AnswerA

MODEL_GARDEN indicates it's a Model Garden model.

Why this answer

Option A is correct because 'modelSource: MODEL_GARDEN' explicitly indicates that the model was sourced from Vertex AI Model Garden, which is a curated repository of pre-built and pre-trained foundation models. This field is set when a model is imported from Model Garden, not when it is trained or fine-tuned from scratch within Vertex AI.

Exam trap

The trap here is that candidates confuse 'modelSource' with the model's training or fine-tuning method, assuming 'MODEL_GARDEN' implies the model was trained or fine-tuned on Vertex AI, when in fact it strictly indicates the model was imported from the Model Garden repository.

How to eliminate wrong answers

Option B is wrong because 'modelSource: MODEL_GARDEN' specifically denotes an imported model, not one trained from scratch; models trained on Vertex AI from scratch would have a different source indicator, such as 'CUSTOM' or 'TRAINING_PIPELINE'. Option C is wrong because Model Garden is an import source, not an export destination; exporting a model to Model Garden is not a supported operation—models are imported from Model Garden, not exported to it. Option D is wrong because fine-tuning via AutoML would set a different source field (e.g., 'AUTOML' or 'TRAINING_PIPELINE'), and Model Garden models are typically pre-trained foundation models that may be fine-tuned later, but the source field reflects the origin, not the fine-tuning method.

583
MCQhard

A streaming platform uses a large generative model for personalized content suggestions. Budget constraints require minimizing inference costs without significantly degrading quality. Which approach is most effective?

A.Deploy the model on higher-end accelerators to save time.
B.Use a distilled version of the model.
C.Implement stronger safety filters to reduce output length.
D.Cache frequent prompts to avoid regeneration.
AnswerB

Distilled models are smaller, faster, and cheaper with comparable quality for many tasks.

Why this answer

Distillation trains a smaller 'student' model to mimic a larger 'teacher' model, reducing parameter count and inference latency while retaining most of the recommendation quality. This directly addresses the budget constraint by lowering compute and memory costs per inference, making it the most effective approach among the options.

Exam trap

Cisco often tests the misconception that caching or hardware upgrades are cost-saving measures, when in fact they either shift costs or fail to address the per-inference computational load that distillation directly reduces.

How to eliminate wrong answers

Option A is wrong because deploying on higher-end accelerators increases hardware cost, not reduces it, and while it may save time, the budget constraint demands minimizing inference costs, not just time. Option C is wrong because stronger safety filters do not reduce output length in a meaningful way for cost savings; they add computational overhead for filtering and may degrade user experience by blocking valid suggestions. Option D is wrong because caching frequent prompts only avoids regeneration for identical inputs, but personalized content suggestions are inherently unique per user session, so cache hit rates are low and the approach does not address the core inference cost per unique request.

584
MCQeasy

A project manager wants to understand which Google Cloud generative AI services are subject to the 'Prohibited Use' policy. Where can they find the most up-to-date information?

A.Google Cloud documentation
B.Google's AI Principles
C.The Google Cloud Acceptable Use Policy
D.The Gemini Terms of Service
AnswerC

This policy explicitly lists prohibited uses for all Google Cloud services.

Why this answer

The Google Cloud Acceptable Use Policy (AUP) is the authoritative document that defines prohibited uses of Google Cloud services, including generative AI offerings. It is regularly updated to reflect current legal, ethical, and security requirements, making it the most reliable source for the most up-to-date information on prohibited use cases. The AUP explicitly covers restrictions on generating harmful content, engaging in illegal activities, and violating intellectual property rights, which directly apply to generative AI services.

Exam trap

Cisco often tests the distinction between high-level ethical principles (AI Principles) and enforceable policy documents (Acceptable Use Policy), leading candidates to mistakenly choose the broader, aspirational document over the specific, binding one.

How to eliminate wrong answers

Option A is wrong because Google Cloud documentation provides general guidance on service features and best practices but does not serve as the definitive policy document for prohibited use; the AUP is the binding policy. Option B is wrong because Google's AI Principles are high-level ethical commitments that guide AI development and use, but they are not a specific, enforceable policy document detailing prohibited uses of services. Option D is wrong because the Gemini Terms of Service govern the use of the Gemini product specifically, not the broader set of Google Cloud generative AI services, and they do not replace the overarching Acceptable Use Policy.

585
MCQeasy

A developer is using the Gemini API to generate code snippets. They notice the outputs often contain deprecated API calls. Which parameter adjustment or prompt strategy would most effectively encourage the model to use current APIs?

A.Add a system instruction specifying 'Use the most recent API version and avoid deprecated functions.'
B.Set top-p to 0.5 to reduce output diversity
C.Provide one few-shot example of a correct API call
D.Set temperature to 1.5 to increase creativity
AnswerA

System instructions provide explicit guidance to the model on desired behavior.

Why this answer

Option A is correct because adding a system instruction that explicitly directs the model to 'Use the most recent API version and avoid deprecated functions' directly influences the model's behavior at the prompt level. The Gemini API supports system instructions that act as persistent, high-level guidance, steering the model toward preferred output patterns—in this case, avoiding deprecated API calls. This is the most effective and direct method to enforce current API usage without altering sampling parameters or relying on limited examples.

Exam trap

Cisco often tests the misconception that adjusting sampling parameters (like temperature or top-p) or providing a single example can reliably enforce content constraints, when in fact system instructions are the designed mechanism for persistent behavioral guidance in production-grade APIs.

How to eliminate wrong answers

Option B is wrong because setting top-p to 0.5 reduces the cumulative probability mass of token choices, which narrows output diversity but does not inherently bias the model toward current APIs; it may even suppress rare but correct modern API tokens. Option C is wrong because a single few-shot example provides only one instance of a correct API call, which is insufficient to override the model's training data bias toward deprecated APIs; the model may still default to older patterns. Option D is wrong because increasing temperature to 1.5 amplifies randomness and creativity, which can increase the likelihood of hallucinated or incorrect API calls, including deprecated ones, rather than encouraging adherence to current standards.

586
MCQhard

A company is required by the EU AI Act to ensure high-risk AI systems are transparent and auditable. They are using a proprietary model from a vendor. Which step is CRITICAL?

A.Implement custom safety filters on the model outputs
B.Ask the vendor to provide a Model Card and datasheets for the training data
C.Use a larger context window to capture all interactions
D.Train an internal model from scratch to replace the vendor model
AnswerB

Vendor-provided documentation is essential for transparency under the EU AI Act.

Why this answer

The vendor must provide documentation (e.g., model cards, datasheets) to enable transparency and audit. Auditing the vendor's training data is not possible without cooperation. The other options are internal measures that do not address vendor transparency.

587
Multi-Selectmedium

A data scientist wants to use Google Cloud AI services to build a solution that transcribes customer support calls, analyzes sentiment, and translates transcripts into multiple languages. Which TWO services are needed?

Select 2 answers
A.Vision AI
B.Natural Language AI
C.Cloud Translation API
D.Document AI
E.Cloud Speech-to-Text API
AnswersC, E

Required for translating transcripts.

Why this answer

Speech-to-Text transcribes audio, Translation API translates text. Natural Language AI could be used for sentiment but is not required for translation. Document AI and Vision AI are irrelevant.

588
Multi-Selectmedium

A company is deploying a generative AI system for medical diagnosis support. To comply with Google's AI Principles and regulatory requirements, which TWO actions are essential? (Select 2)

Select 2 answers
A.Implement a human-in-the-loop review for all diagnostic suggestions
B.Publish a Model Card for the model
C.Ensure the system complies with GDPR and other privacy regulations for patient data
D.Use SynthID to watermark all output
E.Use a larger model to improve accuracy
AnswersA, C

Essential for accountability in high-stakes decisions.

Why this answer

High-stakes medical decisions require human oversight (Principle: be accountable to people) and data privacy (Principle: incorporate privacy design principles). The other options are beneficial but not essential for compliance.

589
MCQhard

A company is building a GenAI application using the Gemini API. They want to minimize latency and cost for a high-volume use case. Which strategy is MOST effective?

A.Use a fine-tuned smaller model for the specific task
B.Increase temperature to generate more diverse responses
C.Use the largest available model for all requests
D.Implement caching for identical or similar user queries
AnswerD

Caching avoids re-computation for repeated inputs, reducing latency and cost.

Why this answer

Caching frequent requests reduces redundant processing, reducing both latency and token costs. Batching can also help, but caching directly addresses repeated queries.

590
MCQeasy

Which Google Cloud generative AI model is specifically designed for code generation tasks?

A.Gemini
B.PaLM 2
C.Imagen
D.Codey
AnswerD

Codey is a family of models fine-tuned from PaLM 2 specifically for code generation, code completion, and code chat.

Why this answer

Codey is Google's model fine-tuned for code generation, completion, and chat. PaLM 2 is a general purpose LLM, Gemini is multimodal, and Imagen is for image generation.

591
MCQhard

A developer runs the command above to test a text classification model deployed on a Vertex AI endpoint. The model returns an error. What is the most likely cause?

A.The endpoint ID '789' does not exist in the project
B.The model is not deployed to any endpoint
C.The instance schema (e.g., 'content' field) does not match the model's expected input signature
D.The region 'us-central1' does not match the region where the model is deployed
AnswerC

The model expects a different input format (e.g., 'text' field or a structured object), leading to the format error.

Why this answer

Option C is correct because the most common cause of inference errors on Vertex AI endpoints is a mismatch between the input instance schema (e.g., the 'content' field in the JSON request) and the model's expected input signature. Vertex AI validates the request payload against the model's saved signature (typically from TensorFlow SavedModel or PyTorch TorchScript), and if the field names, types, or shapes do not match, the endpoint returns an error rather than a prediction.

Exam trap

Cisco often tests the misconception that endpoint or deployment configuration errors (like wrong region or missing endpoint) are the primary cause, when in reality the most frequent and subtle failure is a schema mismatch between the request payload and the model's input signature.

How to eliminate wrong answers

Option A is wrong because if the endpoint ID '789' did not exist in the project, the error would be a 404 'not found' HTTP status, not a model-level inference error. Option B is wrong because the command explicitly targets an endpoint (endpoint ID '789'), and if no model were deployed, the endpoint would return a 400 error indicating 'no model deployed', but the question states the model returns an error, implying a model is present but failing. Option D is wrong because Vertex AI endpoints are regional resources; if the region 'us-central1' did not match the deployment region, the API call would fail with a 404 or 403 error before reaching the model, not a model-level error.

592
MCQeasy

According to Google's AI Principles, which of the following is a key commitment regarding privacy?

A.AI systems should incorporate privacy design principles, including notice and consent
B.AI systems should share all data with third parties for transparency
C.AI systems should collect as much data as possible to improve accuracy
D.AI systems should store data indefinitely for future analysis
AnswerA

This directly matches Google's AI Principles on privacy.

Why this answer

Option A is correct because Google's AI Principles explicitly commit to incorporating privacy design principles, such as notice and consent, into AI systems. This means that AI systems should be designed with privacy safeguards from the outset, ensuring users are informed about data collection and have control over their data, aligning with frameworks like GDPR and privacy-by-design.

Exam trap

Cisco often tests the misconception that transparency requires full data sharing or that maximizing data collection is always beneficial for AI accuracy, when in fact privacy principles mandate data minimization and user consent.

How to eliminate wrong answers

Option B is wrong because sharing all data with third parties for transparency violates core privacy commitments; transparency does not require indiscriminate data sharing, and Google's principles emphasize data minimization and user consent. Option C is wrong because collecting as much data as possible to improve accuracy contradicts the principle of data minimization and could lead to privacy violations; accuracy should be balanced with privacy, not achieved at any cost. Option D is wrong because storing data indefinitely for future analysis violates the principle of data retention limits and user control; AI systems should only retain data as long as necessary for specified purposes, with mechanisms for deletion.

593
MCQmedium

A developer wants to add real-time speech transcription to a customer call center application. They need low latency and high accuracy for multiple languages. Which Google AI API is most appropriate?

A.Speech-to-Text API
B.Natural Language API
C.Text-to-Speech API
D.Translation API
AnswerA

Speech-to-Text API converts audio to text, supporting real-time streaming and multiple languages.

Why this answer

The Speech-to-Text API is the correct choice because it is specifically designed to convert audio into text in real time, supporting over 125 languages and variants with low-latency streaming. It offers features like automatic punctuation, speaker diarization, and domain-specific models (e.g., phone call) that directly meet the requirements of a customer call center application needing high accuracy across multiple languages.

Exam trap

Cisco often tests the distinction between APIs that process text (Natural Language, Translation) versus those that process audio (Speech-to-Text, Text-to-Speech), and the trap here is confusing the direction of conversion (speech-to-text vs. text-to-speech) or assuming a translation API can handle raw audio input.

How to eliminate wrong answers

Option B is wrong because the Natural Language API analyzes text for entities, sentiment, and syntax, but it does not process audio or perform speech transcription. Option C is wrong because the Text-to-Speech API converts text into spoken audio, which is the opposite direction of the required speech-to-text functionality. Option D is wrong because the Translation API translates text between languages but cannot transcribe speech from audio input.

594
Multi-Selectmedium

Which THREE steps are required to secure a generative AI pipeline that uses Vertex AI and involves sensitive customer data?

Select 3 answers
A.Use VPC Service Controls to create a perimeter around Vertex AI resources
B.Apply IAM roles with least privilege and use service accounts for the pipeline
C.Expose the prediction endpoint publicly with an API key
D.Enable data encryption at rest using Cloud KMS
E.Disable audit logging to reduce data exposure
AnswersA, B, D

VPC-SC prevents data from leaking outside the perimeter.

Why this answer

VPC Service Controls are required to create a service perimeter around Vertex AI resources, preventing data exfiltration by restricting data movement across the perimeter boundary. This is critical for sensitive customer data because it mitigates the risk of unauthorized access or leakage, even from within the same project or organization.

Exam trap

The trap here is that candidates may confuse API key authentication (Option C) as a valid security measure, but for sensitive data, API keys lack identity binding and are considered a weak secret, whereas VPC Service Controls and IAM provide defense-in-depth.

595
MCQhard

A company is using a large language model to generate code reviews. They want to reduce token costs while maintaining quality. Which approach is MOST effective?

A.Reduce the max output tokens to 100
B.Use a zero-shot prompt instead of few-shot to reduce prompt tokens
C.Cache frequently used code snippets in a prompt template
D.Use a smaller model fine-tuned for code (e.g., Codey)
AnswerD

Codey is optimized for code tasks and is more cost-effective than a general large model.

Why this answer

Using a smaller, specialized model for code tasks reduces cost per token and often performs well on code tasks. Caching is not applicable for code generation. Reducing max tokens might truncate output.

Fewer examples may harm quality.

596
Multi-Selectmedium

A company needs to deploy a generative AI application on Google Cloud that meets data residency requirements. Which THREE features should they enable? (Select three.)

Select 3 answers
A.Enable VPC Service Controls
B.Use a multi-region storage bucket
C.Use the global endpoint for low latency
D.Enable data residency boundaries in IAM
E.Select a specific region for Vertex AI resources
AnswersA, D, E

VPC-SC helps prevent data exfiltration and restricts data movement.

Why this answer

To satisfy data residency, the company must control where data is stored (region selection), ensure no data leaves that region (data boundaries), and use a service that supports regional endpoints (Vertex AI).

597
MCQeasy

A startup wants to quickly prototype a gen AI application. Which Google Cloud service should they use first?

A.Vertex AI Workbench
B.Cloud TPUs
C.Gen AI Studio
D.Dataflow
AnswerC

Provides a low-code environment for quickly testing and iterating on gen AI models.

Why this answer

Gen AI Studio (now part of Vertex AI) provides a low-code/no-code interface for quickly prototyping generative AI applications using pre-trained models like PaLM 2 and Gemini. It allows startups to experiment with prompts, tune models, and deploy without managing infrastructure, making it the fastest path from idea to prototype.

Exam trap

The trap here is that candidates confuse Vertex AI Workbench (a general ML IDE) with Gen AI Studio (a generative AI prototyping tool), or assume that rapid prototyping requires custom hardware like TPUs, when Google explicitly designed Gen AI Studio for this purpose.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench is a Jupyter-based development environment for building custom ML models, not a rapid prototyping tool for generative AI; it requires more setup and coding. Option B is wrong because Cloud TPUs are specialized hardware accelerators for training large models, not a service for quick prototyping—they involve significant configuration and cost. Option D is wrong because Dataflow is a serverless data processing service for batch and stream pipelines (e.g., ETL), unrelated to generative AI application prototyping.

598
Multi-Selecthard

A company is deploying a code generation assistant for internal developers. They want to ensure the generated code is secure and follows best practices. Which two Vertex AI features should they use? (Choose TWO)

Select 2 answers
A.Vertex AI Agent Builder for conversation
B.Grounding with Google Search for real-time security best practices
C.Vertex AI Model Evaluation to assess code quality metrics
D.Cloud DLP for data loss prevention
E.AutoML Image for code snippet images
AnswersB, C

Grounding retrieves current security guidelines to inform the model.

Why this answer

Grounding with Google Search can retrieve up-to-date security best practices, and Model Evaluation can assess code quality. The other options are not directly relevant to security or best practices.

599
Multi-Selectmedium

A machine learning team is using Vertex AI to train a custom model. They want to optimize hyperparameters automatically. Which TWO steps are necessary to set up hyperparameter tuning in Vertex AI? (Choose TWO)

Select 2 answers
A.Enable Vertex AI Experiments
B.Use a custom container with a GPU
C.Enable distributed training across multiple nodes
D.Define a hyperparameter metric in the training code
E.Create a HyperparameterTuningJob with parameter specifications
AnswersD, E

The training code must report a metric that Vertex AI uses to evaluate hyperparameter trials.

Why this answer

To run hyperparameter tuning, you must specify a hyperparameter metric in the training code and configure the tuning job in Vertex AI with the parameter specifications.

600
MCQhard

Refer to the exhibit. A developer sees this error when trying to deploy a model from Vertex AI Model Registry. What is the most likely cause?

A.The region is not supported
B.The developer used the model display name instead of the full resource name
C.The model is not published
D.The model is in a different project
AnswerB

Display name is not a valid model reference; the full resource path is required.

Why this answer

The error occurs because Vertex AI Model Registry requires the full resource name (e.g., 'projects/{project}/locations/{region}/models/{model_id}') to deploy a model, not just the display name. The display name is a human-readable label that is not unique within a project, while the full resource name uniquely identifies the model version. Using the display name causes the API to fail with a 'not found' or 'invalid argument' error.

Exam trap

Google Cloud often tests the distinction between display names (non-unique, human-readable) and resource names (unique, API-required) in cloud services like Vertex AI, where candidates mistakenly assume display names can be used interchangeably with resource identifiers.

How to eliminate wrong answers

Option A is wrong because Vertex AI supports model deployment in all regions where the service is available, and the error message does not indicate a regional restriction. Option C is wrong because a model can be deployed from the registry even if it is not published to the public; publishing is only required for sharing with external users or making it available in the Model Garden. Option D is wrong because the error would reference a cross-project permission issue (e.g., 'permission denied' or 'resource not found in project'), not a display name mismatch.

Page 7

Page 8 of 14

Page 9