Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 826–900

997 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 12 of 14

826

Multi-Selecteasy

Which THREE strategies should be combined to effectively reduce biased outputs in a generative AI model? (Choose three.)

Select 3 answers

A.Implement safety filters targeting hate speech and stereotypes.

B.Conduct human evaluation and feedback loops.

C.Use diverse few-shot examples that represent different demographics.

D.Raise the temperature to increase output variability.

E.Fine-tune the model on a biased dataset to learn patterns.

AnswersA, B, C

Safety filters block explicitly biased content.

Why this answer

Option A is correct because implementing safety filters targeting hate speech and stereotypes directly blocks the generation of biased or harmful content at the output layer. These filters use predefined rule sets or trained classifiers to detect and suppress language that reflects demographic or cultural biases, reducing the risk of the model producing offensive or stereotypical responses.

Exam trap

Cisco often tests the misconception that increasing randomness (temperature) or training on biased data can somehow reduce bias, when in fact both actions worsen the problem by either amplifying noise or embedding the bias deeper into the model's weights.

Full explanation →

827

MCQeasy

A large e-commerce company is experiencing high costs for their generative AI product recommendation system. The system generates personalized product descriptions for millions of users daily. The team wants to reduce cost while maintaining quality. They are using a fine-tuned version of a large foundation model hosted on Vertex AI. The current cost is driven by the number of tokens processed. Which approach should they take?

A.Optimize prompts to generate shorter, more concise descriptions

B.Switch to a larger, more capable foundation model

C.Retrain the model with more product data to improve efficiency

D.Increase the batch size of inference requests

AnswerA

Shorter outputs use fewer tokens, reducing cost.

Why this answer

Option A is correct because prompt engineering to reduce output length decreases token usage per request, directly lowering cost without model changes. Option B (switching to a larger model) increases cost. Option C (increasing batch size) may not reduce per-request cost.

Option D (retraining with more data) does not affect inference cost.

Full explanation →

828

Multi-Selecthard

A company is using a large language model for automated translation of legal contracts. They find that the translations sometimes alter the meaning of specific clauses. Which TWO approaches would most effectively preserve the original meaning? (Choose two.)

Select 2 answers

A.Provide the full contract context in a single prompt.

B.Set top-p=0.1 to limit the vocabulary to the most likely tokens.

C.Fine-tune the model on a parallel corpus of legal translations.

D.Use a glossary of key legal terms with their translations.

E.Increase the temperature to allow more creative phrasing.

AnswersC, D

Fine-tuning on domain-specific translations improves accuracy.

Why this answer

Option C is correct because fine-tuning on a parallel corpus of legal translations adapts the model's weights to the specific domain, ensuring that legal terminology and clause structures are translated with high fidelity. This supervised learning approach directly aligns the model's output with ground-truth translations, preserving original meaning by learning the precise mapping between source and target legal texts.

Exam trap

Cisco often tests the misconception that prompt engineering alone (Option A) or parameter tweaks like top-p and temperature (Options B and E) can substitute for domain-specific fine-tuning or explicit glossary control, when in fact these methods do not address the root cause of semantic drift in specialized translations.

Full explanation →

829

MCQmedium

A company deployed a generative AI chatbot using Vertex AI PaLM API for customer support. Users report high latency (average 5 seconds per response). They need to reduce latency without significantly affecting response quality. Which design change should they prioritize?

A.Apply model quantization to the deployed model

B.Migrate the chatbot to run on edge devices

C.Increase the batch size of inference requests

D.Switch to a larger, more powerful foundation model

AnswerA

Quantization reduces model size and speeds inference with minor accuracy trade-offs.

Why this answer

Model quantization reduces the precision of the model's weights (e.g., from FP32 to INT8), which decreases the computational load and memory footprint during inference. This directly lowers latency per request on the Vertex AI PaLM API while preserving most of the model's accuracy, making it the most effective single change for reducing response time without significantly degrading quality.

Exam trap

Google Cloud often tests the misconception that increasing computational power (larger model) or batching always improves latency, when in fact these changes can increase per-request delay or degrade quality in interactive applications.

How to eliminate wrong answers

Option B is wrong because migrating to edge devices introduces network latency and limited compute resources, which often increases overall latency and reduces response quality for a cloud-based PaLM API chatbot. Option C is wrong because increasing batch size improves throughput for bulk processing but does not reduce per-request latency; in fact, it can increase the time to first token for individual requests. Option D is wrong because switching to a larger, more powerful foundation model increases computational requirements and inference time, directly worsening latency rather than reducing it.

Full explanation →

830

MCQmedium

A developer is building a real-time speech transcription application for customer support calls. The audio is streamed, and the transcription must be returned with low latency. Which Google Cloud AI service should they use?

A.Natural Language API

B.Cloud Speech-to-Text with streaming recognition

C.Cloud Text-to-Speech

D.Vertex AI with a custom model

AnswerB

Streaming recognition allows for real-time transcription of audio as it is being captured, meeting low-latency requirements.

Why this answer

Speech-to-Text supports streaming recognition, making it suitable for real-time transcription with low latency.

Full explanation →

831

Multi-Selecteasy

A company is choosing a generative AI model for code generation. Which TWO considerations are most important?

Select 2 answers

A.The total number of model parameters

B.Whether the model's training data includes the target programming languages

C.The open-source license of the model

D.The maximum context length supported by the model

E.The latency of the model's inference endpoint

AnswersB, D

If the model hasn't seen the language, it will generate poor code.

Why this answer

Option B is correct because a generative AI model for code generation must have been trained on the target programming languages to produce syntactically and semantically correct code. Without such training data, the model cannot understand language-specific syntax, libraries, or idioms, leading to irrelevant or erroneous outputs.

Exam trap

The trap here is that candidates often assume more parameters (A) or lower latency (E) are always better, but Cisco tests the understanding that domain-specific training data relevance (B) and context length (D) are critical for code generation accuracy and handling long code sequences.

Full explanation →

832

MCQmedium

A company needs to extract structured data from scanned invoices (invoice number, date, total amount) using a pre-built AI solution. Which Google Cloud service is MOST appropriate?

A.Natural Language AI

B.Translation API

C.Document AI

D.Vision AI

AnswerC

Document AI has pre-trained processors for invoices and other documents.

Why this answer

Document AI is specifically designed for processing documents like invoices and extracting structured data. Vision AI is for general image analysis, Natural Language AI for text, and Translation API for translation.

Full explanation →

833

MCQmedium

A healthcare startup is developing an AI system to assist radiologists in detecting tumors from X-ray images. Which Google AI Principle is MOST directly applicable to this use case?

A.Incorporate privacy design principles

B.Be built and tested for safety

C.Be socially beneficial

D.Avoid creating or reinforcing unfair bias

AnswerB

Safety is paramount in medical applications; the system must be rigorously tested to avoid harm.

Why this answer

The principle 'be built and tested for safety' directly applies to medical applications where incorrect detection could harm patients. The other principles are also relevant but safety is the most directly applicable to a diagnostic tool.

Full explanation →

834

MCQmedium

During model evaluation, a team observes good performance on training data but poor on validation data. Which regularization technique is most appropriate to address this?

A.Add more training data

B.Increase the learning rate

C.Apply dropout

D.Use a larger batch size

AnswerC

Dropout is a regularization method that prevents co-adaptation of neurons, reducing overfitting.

Why this answer

The scenario describes overfitting, where the model memorizes training data but fails to generalize to unseen validation data. Dropout is a regularization technique that randomly deactivates a fraction of neurons during training, forcing the network to learn more robust features and reducing co-adaptation, which directly mitigates overfitting.

Exam trap

Google Cloud often tests the distinction between techniques that improve generalization (regularization) versus those that improve optimization (learning rate, batch size), leading candidates to confuse data augmentation or hyperparameter tuning with regularization methods like dropout.

How to eliminate wrong answers

Option A is wrong because adding more training data can help reduce overfitting but is not a regularization technique; it addresses data scarcity, not the core issue of model complexity. Option B is wrong because increasing the learning rate can cause training instability, divergence, or overshooting of the loss minimum, and does not prevent overfitting. Option D is wrong because using a larger batch size often leads to sharper minima and poorer generalization, potentially worsening overfitting, and is not a regularization method.

Full explanation →

835

MCQhard

A gaming company is using Vertex AI Imagen to create concept art. They have a stable pipeline that generates images based on text prompts. Recently, they introduced a new feature: using a reference image to guide the style (image-to-image generation). However, when using a reference image, the generated images often have unnatural color shifts and artifacts. The team suspects that the reference image is being resized to a resolution that the model wasn't trained on. They are using the default Imagen settings. What is the most likely cause and the best solution?

A.Increase the number of inference steps to improve detail.

B.The reference image is being resized to a non-standard aspect ratio; preprocess the image to the recommended resolution and aspect ratio.

C.Reduce the style weight in the image-to-image prompt.

D.Switch to a different image generation model like Stable Diffusion.

AnswerB

Imagen works best with specific input dimensions; incorrect resizing causes artifacts.

Why this answer

Option B is correct because the default Imagen settings expect input images at specific resolutions (e.g., 256x256, 512x512, or 1024x1024) and a 1:1 aspect ratio. When a reference image is resized to a non-standard resolution or aspect ratio, the model's internal processing can introduce artifacts and unnatural color shifts due to misalignment with its training distribution. Preprocessing the image to the recommended resolution and aspect ratio ensures the model operates within its optimal input space, eliminating these issues.

Exam trap

The trap here is that candidates may confuse image quality issues with model hyperparameters (like inference steps or style weight) rather than recognizing that the fundamental input preprocessing—specifically resolution and aspect ratio—is the most common cause of artifacts in image-to-image generation with Imagen.

How to eliminate wrong answers

Option A is wrong because increasing inference steps primarily refines noise reduction and detail, but it does not address the root cause of resolution mismatch; it may even amplify artifacts from a poorly resized input. Option C is wrong because reducing style weight controls how strongly the reference image influences the output, but it does not fix the fundamental problem of the reference image being at a non-standard resolution or aspect ratio, which causes artifacts regardless of style influence. Option D is wrong because switching to a different model like Stable Diffusion would not resolve the resolution preprocessing issue; the team would still need to properly resize the reference image for that model, and the problem is with their pipeline, not the model itself.

Full explanation →

836

MCQhard

A healthcare organization is using a generative AI model to assist in diagnosing rare diseases from patient symptoms. They want to ensure that model outputs are explainable and that the clinician can verify the reasoning. Which feature should they prioritize?

A.Confidence indicators

B.Human oversight mechanisms

C.Grounding (citing sources)

D.Chain-of-thought reasoning

AnswerD

Chain-of-thought shows the intermediate reasoning steps, enabling clinicians to verify the logic.

Why this answer

Chain-of-thought reasoning is the correct feature because it enables the generative AI model to produce an explicit, step-by-step logical path from patient symptoms to a diagnosis. This allows clinicians to verify each reasoning step, ensuring explainability and trust in the model's output, which is critical for rare disease diagnosis where transparency is paramount.

Exam trap

Cisco often tests the distinction between explainability (how the model reached a conclusion) and interpretability (what the model output means), causing candidates to confuse grounding or confidence indicators with the step-by-step reasoning required for clinical verification.

How to eliminate wrong answers

Option A is wrong because confidence indicators only provide a numerical or probabilistic score (e.g., 85% confidence) without revealing the underlying reasoning, which fails to meet the requirement for verifiable, explainable outputs. Option B is wrong because human oversight mechanisms (e.g., requiring clinician approval) address accountability and safety but do not inherently make the model's internal reasoning transparent or explainable to the clinician. Option C is wrong because grounding (citing sources) ensures factual accuracy by linking outputs to external data, but it does not expose the model's step-by-step reasoning process, which is essential for verifying the diagnostic logic.

Full explanation →

837

MCQhard

A financial services firm needs to deploy a large language model (LLM) for analyzing sensitive client documents. They require the model to run within their Virtual Private Cloud (VPC) with no internet access and must comply with data residency regulations. Which Google Cloud generative AI offering should they use?

A.Vertex AI Model Garden with private endpoints and VPC Service Controls

B.Vertex AI Search

C.Cloud Run

D.Vertex AI Workbench

AnswerA

This combination allows secure, private deployment of LLMs within a VPC.

Why this answer

Option A is correct because Vertex AI Model Garden with private endpoints and VPC Service Controls allows the LLM to be deployed entirely within the customer's VPC, with no internet egress, and enforces data residency by restricting data movement to the configured VPC boundary. Private endpoints use Private Service Connect to route inference traffic through internal IPs, while VPC Service Controls prevent data exfiltration and ensure compliance with residency regulations.

Exam trap

The trap here is that candidates often confuse Vertex AI Model Garden (a deployment and management service for foundation models) with Vertex AI Workbench (a development environment) or Vertex AI Search (a retrieval service), and overlook the specific requirement for VPC isolation and no internet access, which only Model Garden with private endpoints and VPC Service Controls can satisfy.

How to eliminate wrong answers

Option B is wrong because Vertex AI Search is a managed search service that indexes and retrieves data from external sources (e.g., websites, Cloud Storage) and does not support deploying an LLM within a VPC with no internet access; it relies on Google-managed endpoints and cannot enforce strict VPC isolation. Option C is wrong because Cloud Run is a serverless compute platform that can run custom containers, but it does not natively provide private endpoints for LLM inference or VPC Service Controls to block internet access; it would require additional networking configuration (e.g., VPC connectors) and does not offer the same data residency guarantees as Vertex AI's managed VPC controls. Option D is wrong because Vertex AI Workbench is a Jupyter-based development environment for building and training models, not a deployment service for running LLMs in production; it is designed for experimentation, not for serving inference with VPC isolation and compliance controls.

Full explanation →

838

MCQeasy

A marketing company wants to fine-tune a generative AI model to adopt a specific brand voice. Which tuning method is most appropriate?

A.RLHF with general user feedback

B.Grounding with external knowledge base

C.Supervised fine-tuning with labeled examples of the brand voice

D.Prompt engineering with system instructions

AnswerC

Correct: Labeled examples directly teach the model the desired tone and style.

Why this answer

Supervised fine-tuning (SFT) is the most appropriate method because it directly trains the model on a curated dataset of input-output pairs that exemplify the desired brand voice. By adjusting the model's weights through backpropagation on labeled examples, the model learns to mimic the specific tone, vocabulary, and stylistic patterns of the brand, making it the most precise approach for adopting a fixed voice.

Exam trap

Cisco often tests the misconception that prompt engineering (Option D) is sufficient for fine-grained style control, when in reality it only provides a weak, non-parametric signal that cannot reliably enforce a consistent brand voice across varied contexts.

How to eliminate wrong answers

Option A is wrong because RLHF with general user feedback optimizes for broad human preferences (e.g., helpfulness, harmlessness) rather than a specific, consistent brand voice; it introduces variance from diverse user ratings that can dilute the target style. Option B is wrong because grounding with an external knowledge base retrieves factual information (e.g., via RAG) to reduce hallucination but does not alter the model's generation style or tone; it cannot teach the model to adopt a brand voice. Option D is wrong because prompt engineering with system instructions provides a static, high-level directive that the model may follow inconsistently, especially for nuanced stylistic constraints; it does not update model weights and fails to embed the brand voice deeply into the model's behavior across diverse prompts.

Full explanation →

839

MCQhard

A large insurance company is using generative AI to automate claims processing. They have deployed a custom fine-tuned model on Vertex AI that reads claim documents and extracts key information. Recently, they noticed that the model’s performance degrades over time for certain claim types, leading to incorrect payouts. The team needs to detect and address model drift with minimal manual intervention. They have a data pipeline that captures incoming claims and user feedback on predictions. Which approach should they take?

A.Implement a human review process for all claims the model processes

B.Set up continuous evaluation with automated retraining pipelines based on performance metrics

C.Switch to a simpler rule-based system to avoid drift

D.Manually retrain the model monthly using a snapshot of recent claims

AnswerB

Automates drift detection and model updates with minimal manual intervention.

Why this answer

Option B is correct because it establishes a closed-loop MLOps pipeline where continuous evaluation of performance metrics (e.g., precision, recall, or F1-score on streaming data) triggers automated retraining when drift is detected. This minimizes manual intervention while ensuring the model adapts to distribution shifts in claim types, which is critical for maintaining accurate payouts in production.

Exam trap

Google Cloud often tests the misconception that periodic manual retraining (Option D) is sufficient, but the trap here is that it ignores the need for real-time drift detection and automated response, which is essential for production systems handling high-stakes financial decisions.

How to eliminate wrong answers

Option A is wrong because implementing human review for all claims defeats the purpose of automation and introduces significant operational cost and latency, failing the requirement for minimal manual intervention. Option C is wrong because switching to a simpler rule-based system cannot handle the complexity and variability of claim documents, and it will still suffer from drift as claim patterns evolve over time. Option D is wrong because manually retraining monthly on a snapshot ignores real-time drift detection and may miss sudden shifts between retraining cycles, leading to prolonged periods of degraded performance.

Full explanation →

840

Multi-Selecthard

A company is deploying a generative AI chatbot for customer support. They want to ensure that the chatbot does not generate harmful content and that they can customize the safety thresholds. Which TWO features in Vertex AI should they use? (Select 2)

Select 2 answers

A.AutoML Tables

B.Custom safety settings (adjustable thresholds)

C.Model Cards

D.Safety filters

E.People + AI Guidebook

AnswersB, D

Allows customization of safety filter sensitivity.

Why this answer

Safety filters block harmful categories; custom safety settings allow adjustment of thresholds. Model Cards and the People + AI Guidebook are not operational safety controls.

Full explanation →

841

MCQmedium

A developer is using Gemini 1.5 Pro and needs to process a 2-hour video to answer questions about its content. The video is stored in Cloud Storage. What is the most efficient approach?

A.Extract frames using Video Intelligence API and then send them as images to Gemini

B.Transcribe the video using Chirp, then analyze the text with Gemini

C.Use a custom model fine-tuned on video understanding tasks

D.Send the video file as part of the prompt to Gemini 1.5 Pro

AnswerD

Gemini 1.5 Pro can directly process video files, understanding both audio and visual content.

Why this answer

Gemini 1.5 Pro supports video input natively; you can pass the video directly (via GCS URI) and ask questions. Transcribing first adds latency and loses visual context.

Full explanation →

842

MCQmedium

A media company uses generative AI to produce personalized news summaries for subscribers. They notice that the summaries sometimes contain factual inaccuracies, leading to customer complaints. The team needs to improve accuracy without slowing down the generation speed. They are using a pre-trained model via Vertex AI. What strategy should they implement?

A.Switch to a larger, more accurate foundation model

B.Fine-tune the model on a dataset of verified news articles

C.Implement retrieval-augmented generation (RAG) with a trusted knowledge base

D.Add a human-in-the-loop review for every summary

AnswerC

RAG provides factual grounding without sacrificing speed.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) grounds the model's output in a trusted, external knowledge base, allowing it to retrieve verified facts in real time without retraining. This directly addresses factual inaccuracies while maintaining generation speed, as the pre-trained model remains unchanged and only the retrieval step is added. RAG avoids the latency of human review and the computational cost of fine-tuning or switching models.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the default solution for accuracy issues, but the trap here is that RAG provides a faster, more scalable way to ground outputs in verified data without retraining, which is critical when speed and accuracy must both be maintained.

How to eliminate wrong answers

Option A is wrong because switching to a larger foundation model would increase inference latency and computational cost, contradicting the requirement to not slow down generation speed, and it does not guarantee improved factual accuracy without additional grounding. Option B is wrong because fine-tuning on a dataset of verified news articles requires significant time, data, and compute resources, and it may not prevent hallucinations on unseen topics, while also risking catastrophic forgetting of the model's general capabilities. Option D is wrong because adding a human-in-the-loop review for every summary introduces unacceptable latency and operational overhead, making it impractical for real-time personalized news generation at scale.

Full explanation →

843

Multi-Selectmedium

A company is deploying Gemini in an enterprise application and needs to choose between Gemini Pro and Gemini Flash for cost optimization. The application has high throughput and can tolerate lower latency. Which TWO considerations should guide the choice? (Choose 2)

Select 2 answers

A.Gemini Flash runs on-device for latency-sensitive applications

B.Gemini Nano is the best choice for high-throughput server workloads

C.Gemini Pro provides higher quality outputs at a higher cost

D.Gemini Pro is only available in Google AI Studio

E.Gemini Flash is optimized for lower cost and higher throughput

AnswersC, E

Pro is more capable but more expensive; if the task requires higher quality, Pro may be worth the cost.

Why this answer

Gemini Flash is cost-optimized for high throughput and lower cost, while Gemini Pro offers higher quality but at higher cost. For high volume, cost-efficient inference, Flash is better. If quality is critical, Pro is warranted.

On-device models (Nano) are not relevant here.

Full explanation →

844

MCQhard

A global company wants to deploy a GenAI application that must comply with GDPR and CCPA. They need to ensure that user data submitted to the LLM is not used for model training or improvement. Which combination of actions should they take on Vertex AI?

A.Disable data logging and use a pai-saas-llm model with a contract that prohibits training on inference data

B.Enable data logging for debugging and rely on model cards for compliance

C.Use the default Vertex AI settings, which automatically anonymize data

D.Store all user prompts in a separate BigQuery table for audit trails

AnswerA

Disabling logging prevents data retention; contractual prohibition ensures data is not used for training.

Why this answer

Disabling data logging and using a model that does not train on user data (like a custom deployment) ensures compliance. Default settings may allow training on data.

Full explanation →

845

MCQhard

A legal firm wants to use a generative AI model to draft contract clauses. They need to ensure the model's outputs cite specific legal precedents and statutes, and that the reasoning behind each clause is transparent. Which combination of explainability techniques should they prioritize?

A.Confidence indicators and model cards

B.Content safety filters and human oversight

C.Grounding and chain-of-thought reasoning

D.Datasheets for Datasets and PAIR Explorables

AnswerC

Grounding ensures outputs cite sources (legal precedents/statutes); chain-of-thought shows reasoning steps, providing transparency.

Full explanation →

846

MCQhard

A company is comparing Google Cloud Vertex AI vs AWS Bedrock vs Azure OpenAI. Their application requires grounding responses with real-time search results from the internet. Which platform's feature uniquely supports this requirement?

A.Gemini API without grounding

B.Vertex AI with Grounding + Google Search

C.Azure OpenAI with Bing Search grounding

D.AWS Bedrock with Knowledge Bases

AnswerB

Vertex AI uniquely provides Grounding with Google Search, allowing real-time web results to be incorporated into model responses.

Why this answer

Vertex AI with Grounding + Google Search is the correct answer because it uniquely provides native, real-time grounding against live internet search results via Google Search, enabling the model to retrieve and cite up-to-date information from the web. This feature is directly integrated into Vertex AI's model serving pipeline, allowing responses to be grounded in current, publicly available data without requiring external API calls or custom retrieval logic.

Exam trap

Cisco often tests the distinction between grounding with static knowledge bases (like AWS Bedrock Knowledge Bases) and dynamic, real-time internet search grounding, leading candidates to mistakenly select Azure OpenAI with Bing Search grounding because it also uses a search engine, but the question specifically asks for the platform's unique feature—and Vertex AI's native integration with Google Search is the only one that is built directly into the model serving platform without requiring a separate search service subscription or API key.

How to eliminate wrong answers

Option A is wrong because the Gemini API without grounding does not support any form of internet search grounding; it relies solely on the model's pre-trained knowledge, which is static and cannot incorporate real-time search results. Option C is wrong because Azure OpenAI with Bing Search grounding, while capable of grounding with Bing, is not a unique feature of the Azure platform in this context—the question asks for the platform whose feature uniquely supports the requirement, and Vertex AI's Grounding + Google Search is the only one that is natively built into the model serving infrastructure without requiring a separate search service integration. Option D is wrong because AWS Bedrock with Knowledge Bases is designed for grounding against private, static data sources (e.g., documents, databases) and does not support real-time internet search grounding; it lacks the ability to dynamically query live web content.

Full explanation →

847

MCQeasy

What is the key advantage of using adapter-based fine-tuning methods like LoRA compared to full fine-tuning of a large language model?

A.LoRA significantly reduces the number of trainable parameters, making fine-tuning more memory-efficient

B.LoRA is faster at inference time compared to the fully fine-tuned model

C.LoRA eliminates the need for a base model

D.LoRA enables training on a larger dataset than full fine-tuning

AnswerA

LoRA updates only low‑rank matrices, drastically cutting trainable parameters and memory usage while maintaining performance.

Why this answer

LoRA (Low-Rank Adaptation) injects trainable low-rank matrices into the transformer layers while keeping the original model weights frozen. This drastically reduces the number of trainable parameters (often by 10,000x), which lowers GPU memory requirements for storing optimizer states and gradients during training, making fine-tuning feasible on consumer hardware.

Exam trap

Cisco often tests the misconception that parameter-efficient methods like LoRA improve inference speed, when in reality they primarily reduce memory during training and do not accelerate inference.

How to eliminate wrong answers

Option B is wrong because LoRA does not change the inference path; the adapter weights are merged into the base model or applied as a separate forward pass, so inference speed is comparable to or slightly slower than a fully fine-tuned model, not faster. Option C is wrong because LoRA is an additive method that requires the original base model to remain frozen; it cannot function without the base model. Option D is wrong because LoRA does not inherently allow training on a larger dataset; the dataset size is independent of the fine-tuning method, and full fine-tuning can also use large datasets if sufficient memory is available.

Full explanation →

848

MCQhard

A large enterprise wants to deploy multiple generative AI models across different business units while ensuring cost governance and usage tracking. Which Google Cloud solution is best suited?

A.Use Vertex AI Endpoint with monitoring

B.Deploy each model in a separate project with IAM policies

C.Implement a custom cost allocation using labels

D.Use Cloud Billing budgets and alerts per model

AnswerB

Projects isolate resources and costs per business unit.

Why this answer

Option B is correct because deploying each model in a separate Google Cloud project with IAM policies provides the strongest isolation for cost governance and usage tracking. This approach ensures that each business unit's model usage is billed to its own project, enabling granular cost allocation and independent monitoring without cross-project interference. It also allows per-project budget alerts and usage quotas, directly addressing the enterprise's need for decentralized cost control.

Exam trap

The trap here is that candidates often confuse cost allocation mechanisms (like labels or budgets) with true cost isolation, assuming that tagging or alerting alone can enforce per-business-unit governance without the structural separation that separate projects provide.

How to eliminate wrong answers

Option A is wrong because Vertex AI Endpoint with monitoring tracks model performance and latency but does not inherently isolate costs per business unit; it aggregates usage under a single project, making per-unit cost governance difficult. Option C is wrong because custom cost allocation using labels requires manual tagging and can be inconsistent or incomplete, leading to inaccurate cost tracking; labels are metadata, not a billing boundary. Option D is wrong because Cloud Billing budgets and alerts per model are not natively supported; budgets apply at the project or billing account level, not per model, and cannot enforce cost isolation across multiple business units.

Full explanation →

849

MCQmedium

A developer is using the Gemini API to classify customer emails. They want to ensure that the model always returns one of three predefined labels: 'complaint', 'inquiry', or 'feedback'. Which model configuration is MOST appropriate?

A.Set temperature to 1.0 and top-p to 0.9 to allow creativity while constraining via system instructions

B.Fine-tune the model on a dataset of labeled emails to memorize the three classes

C.Use top-k sampling with k=50 and no temperature adjustment

D.Set temperature to 0.0 and use few-shot examples with required labels in the prompt

AnswerD

Low temperature makes the model deterministic. Combined with explicit labels in few-shot examples, it strongly biases output to the allowed set.

Why this answer

Setting temperature to 0.0 makes the model deterministic, minimizing randomness and ensuring consistent output. Combined with few-shot examples that explicitly list the three required labels ('complaint', 'inquiry', 'feedback') in the prompt, this configuration reliably constrains the model to return only those labels, which is the most appropriate approach for a strict classification task.

Exam trap

Cisco often tests the misconception that higher creativity settings (temperature, top-p) are needed for classification tasks, when in fact deterministic settings (temperature 0.0) combined with prompt engineering are the correct approach for strict label constraints.

How to eliminate wrong answers

Option A is wrong because temperature 1.0 and top-p 0.9 maximize randomness and creativity, which is counterproductive for a deterministic classification task where the model must output only three fixed labels. Option B is wrong because fine-tuning on labeled emails would teach the model to generate the labels, but it does not guarantee the model will never output other tokens; fine-tuning is overkill and less reliable than prompt engineering for such a simple constraint. Option C is wrong because top-k sampling with k=50 still introduces randomness and does not force the model to choose only from the three predefined labels; it only limits the pool of candidate tokens to the top 50, which may still include irrelevant tokens.

Full explanation →

850

MCQmedium

A product manager wants to generate meeting summaries automatically using Gemini for Google Workspace. They need summaries to be sent to all participants immediately after the meeting ends. Which Gemini feature should they use?

A.Gemini in Google Docs - Help me write

B.Gemini in Gmail - Smart Compose

C.Vertex AI Agent Builder with a meeting transcription model

D.Gemini in Google Meet - Take notes and summaries

AnswerD

This feature captures notes and summaries during or after a meeting and can automatically distribute them.

Why this answer

Gemini in Google Meet can automatically generate meeting summaries and share them after the meeting. The other options are for different Workspace apps or manual use.

Full explanation →

851

Multi-Selectmedium

A business leader is developing a gen AI strategy. Which three key components should be included in the strategy?

Select 3 answers

A.Focus solely on technology

B.Plan for responsible AI

C.Establish data governance policies

D.Define clear use cases with ROI

E.Involve stakeholders across departments

AnswersB, C, D

Responsible AI addresses fairness, transparency, and accountability.

Why this answer

Option B is correct because responsible AI is a foundational component of any generative AI strategy, ensuring ethical use, bias mitigation, and compliance with emerging regulations. Without a plan for responsible AI, the organization risks reputational damage, legal liability, and deployment failures due to lack of trust. This goes beyond simple fairness checklists to include continuous monitoring of model outputs for toxicity, hallucination, and privacy violations.

Exam trap

Google Cloud often tests the misconception that stakeholder involvement is a core strategic component, when in fact it is an implementation enabler, while responsible AI, data governance, and defined use cases with ROI are the three pillars that form the strategy itself.

Full explanation →

852

MCQhard

A financial institution wants to deploy a custom fine-tuned model for loan approval recommendations. They must ensure compliance with regulatory requirements, including explainability and bias monitoring. Which combination of Google Cloud services and practices best addresses these needs?

A.Use Vertex AI Search with grounding on internal policies and enable AutoML for model training

B.Deploy a pre-built model from Model Garden and use Vertex AI Model Registry

C.Fine-tune a foundation model using a custom training pipeline, then deploy with Vertex AI Model Monitoring and Vertex AI Explainable AI

D.Use Vertex AI AutoML for tabular data to train the model and enable Vertex AI Model Monitoring for bias

AnswerC

This combination offers full control, monitoring, and explainability for compliance.

Why this answer

Option C is correct because it combines custom fine-tuning of a foundation model (allowing domain-specific adaptation for loan approval logic) with Vertex AI Model Monitoring (for bias detection and drift monitoring) and Vertex AI Explainable AI (for feature attribution and regulatory explainability). This combination directly addresses the dual regulatory requirements of explainability and bias monitoring while leveraging generative AI capabilities.

Exam trap

Cisco often tests the distinction between traditional ML services (AutoML) and generative AI fine-tuning, where candidates mistakenly choose AutoML for tabular data (Option D) because it seems simpler, but the question explicitly requires a generative AI offering and the explainability features of Vertex AI Explainable AI.

How to eliminate wrong answers

Option A is wrong because Vertex AI Search is designed for enterprise search and grounding on internal policies, not for custom model training or fine-tuning, and AutoML does not provide the fine-grained explainability and bias monitoring required for regulatory compliance. Option B is wrong because deploying a pre-built model from Model Garden without fine-tuning cannot capture the institution's specific loan approval policies, and Vertex AI Model Registry alone does not provide explainability or bias monitoring. Option D is wrong because Vertex AI AutoML for tabular data is a traditional ML approach, not a generative AI offering, and while it includes some monitoring, it lacks the deep explainability features (e.g., feature attributions for generative models) needed for regulatory compliance in a generative AI context.

Full explanation →

853

Multi-Selectmedium

A data scientist wants to train a model using BigQuery ML. Which two statements are true about BigQuery ML? (Choose two.)

Select 2 answers

A.It requires a separate Vertex AI training cluster

B.It supports only linear regression models

C.Data must be exported to Cloud Storage before training

D.Models can be trained using SQL directly on BigQuery data

E.It supports both supervised and unsupervised learning

AnswersD, E

BigQuery ML uses SQL for model creation.

Why this answer

BigQuery ML supports supervised and unsupervised algorithms using SQL, and data never needs to leave BigQuery.

Full explanation →

854

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Train a custom model from scratch on the policy documents each month

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions based on the latest policy documents without retraining the model. RAG retrieves relevant document chunks from a vector store at query time and injects them into the prompt, enabling the model to use up-to-date information while keeping the underlying LLM static. This avoids the cost and complexity of monthly fine-tuning or retraining.

Exam trap

Cisco often tests the misconception that fine-tuning or retraining is required for domain-specific knowledge, when in fact RAG provides a cost-effective, update-friendly alternative that keeps the model static while dynamically injecting relevant context.

How to eliminate wrong answers

Option A is wrong because fine-tuning a base LLM monthly on updated policy documents is expensive, time-consuming, and risks catastrophic forgetting of previous knowledge; it also requires maintaining separate model versions for each update. Option B is wrong because pasting all documents into each prompt exceeds typical context window limits (e.g., 4K–128K tokens) for large document sets, leading to truncation, high latency, and increased cost per query. Option D is wrong because training a custom model from scratch each month is prohibitively expensive and computationally intensive, requiring massive datasets and GPU resources, and is unnecessary when RAG can achieve the same goal with far less overhead.

Full explanation →

855

MCQhard

An enterprise wants to use Gemini 1.5 Flash for a real-time chat application with low latency. Which trade-off should they expect compared to Gemini 1.5 Pro?

A.Higher quality and lower latency

B.Lower latency but potentially lower quality

C.Lower cost but longer context window

D.Higher accuracy but slower responses

AnswerB

Flash is designed for low latency and lower cost, with some trade-off in quality.

Why this answer

Gemini 1.5 Flash is specifically optimized for lower latency and cost efficiency, making it ideal for real-time chat applications. However, this optimization comes at the cost of reduced model capacity and reasoning depth compared to Gemini 1.5 Pro, which prioritizes higher quality and accuracy over speed. Therefore, the expected trade-off is lower latency but potentially lower quality.

Exam trap

Cisco often tests the misconception that 'faster' automatically means 'better' or that cost and context window are the primary trade-offs, when in reality the core trade-off is between latency and output quality due to model architecture differences.

How to eliminate wrong answers

Option A is wrong because it claims both higher quality and lower latency, which contradicts the fundamental trade-off between model complexity and speed; Flash sacrifices quality for speed. Option C is wrong because while Flash does offer lower cost, it does not provide a longer context window—both Flash and Pro support up to 1 million tokens, so this is not a distinguishing trade-off. Option D is wrong because it describes higher accuracy but slower responses, which is characteristic of Gemini 1.5 Pro, not the trade-off when choosing Flash over Pro.

Full explanation →

856

Multi-Selectmedium

A company is building a conversational AI using the Gemini API on Vertex AI. They want to reduce the chance of generating toxic content while still allowing creative and engaging responses for their gaming community. Which TWO safety settings should they adjust in the safety_settings parameter?

Select 2 answers

A.Set the threshold for 'SEXUALLY_EXPLICIT' category to BLOCK_NONE.

B.Set the threshold for 'SEXUALLY_EXPLICIT' category to BLOCK_LOW_AND_ABOVE.

C.Enable the 'harm_category' filter for 'DANGEROUS_CONTENT' with threshold BLOCK_ONLY_HIGH.

D.Set the threshold for 'HARASSMENT' category to BLOCK_LOW_AND_ABOVE.

E.Set the threshold for 'HATE_SPEECH' category to BLOCK_ONLY_HIGH.

AnswersC, E

Blocks only high probability dangerous content, maintaining safety without stifling creativity.

Why this answer

Option C is correct because setting the 'DANGEROUS_CONTENT' category to BLOCK_ONLY_HIGH allows the model to generate creative and engaging responses for a gaming community while still blocking the most severe dangerous content. This balances safety with creative freedom, as the gaming context may involve simulated conflict or action that is not genuinely harmful.

Exam trap

Google Cloud often tests the misconception that stricter blocking (e.g., BLOCK_LOW_AND_ABOVE) is always better for safety, but in creative contexts like gaming, BLOCK_ONLY_HIGH is the correct balance to avoid stifling legitimate content.

Full explanation →

857

MCQeasy

A retail company plans to use Vertex AI's generative AI to create product descriptions. They need to ensure descriptions are factually accurate and do not misrepresent products. Which strategy should they prioritize?

A.Implement human-in-the-loop review

B.Use prompt engineering

C.Use a larger model

D.Increase temperature parameter

AnswerA

Humans can verify and correct factual errors.

Why this answer

Human-in-the-loop (HITL) review is the correct strategy because it directly addresses the need for factual accuracy and prevention of misrepresentation. While generative AI can produce fluent text, it lacks a reliable grounding mechanism for product-specific facts, making human oversight essential to catch hallucinations, verify claims, and ensure compliance with advertising standards. This approach aligns with responsible AI practices and is a core recommendation for high-stakes content generation.

Exam trap

Google Cloud often tests the misconception that prompt engineering or model size alone can solve factual accuracy issues, when in reality, generative AI's inherent lack of ground truth makes human validation indispensable for high-stakes content.

How to eliminate wrong answers

Option B is wrong because prompt engineering, while useful for guiding output style and structure, does not guarantee factual accuracy; it cannot prevent the model from generating plausible-sounding but incorrect product details. Option C is wrong because using a larger model may improve fluency and reduce some errors, but it does not eliminate hallucinations or misrepresentations, and can even introduce more subtle inaccuracies. Option D is wrong because increasing the temperature parameter makes the model's output more random and creative, which increases the risk of generating factually incorrect or misleading descriptions, the opposite of what is needed.

Full explanation →

858

MCQeasy

A retail company wants to use gen AI for customer service chatbots. They have a large volume of customer interactions. What is the primary business consideration for deploying a gen AI solution?

A.Minimizing latency at any cost

B.Using open-source models only

C.Choosing the most complex model

D.Ensuring data privacy and compliance

AnswerD

Data privacy and regulatory compliance are top business considerations for handling customer data.

Why this answer

Option D is correct because ensuring data privacy and compliance is critical when handling customer data. Option A is wrong because complexity doesn't guarantee success. Option B is wrong because minimizing latency at any cost can be too expensive.

Option C is wrong because open-source models may not meet all requirements.

Full explanation →

859

MCQhard

An organization is running a large-scale training job for a custom NLP model with a batch size of 2048 and sequence length of 512. They need to minimize training time while keeping costs predictable. Which Google Cloud hardware should they choose?

A.Cloud TPU v5e pods

B.Compute Engine with NVIDIA A100 GPUs

C.Edge TPU devices

D.Compute Engine with NVIDIA T4 GPUs

AnswerA

TPU v5e pods are optimized for large-scale training, providing high throughput and predictable cost, ideal for large batch sizes and sequence lengths.

Why this answer

Cloud TPU v5e pods are purpose-built for large-scale training of transformer-based NLP models, offering high-throughput matrix multiplication and efficient scaling across multiple chips. With a batch size of 2048 and sequence length of 512, TPU v5e pods deliver superior training speed and predictable pricing via reserved capacity, minimizing time-to-train compared to GPU alternatives.

Exam trap

The trap here is that candidates often default to choosing NVIDIA A100 GPUs due to their general popularity, overlooking that TPU pods are specifically optimized for large-scale transformer training with predictable pricing and superior scaling efficiency.

How to eliminate wrong answers

Option B is wrong because NVIDIA A100 GPUs, while powerful, are general-purpose accelerators that lack the dedicated matrix-multiply units (MXU) and high-bandwidth interconnects of TPU pods, leading to higher cost and slower training for large-batch transformer workloads. Option C is wrong because Edge TPU devices are designed for low-power inference at the edge, not for large-scale training, and cannot handle batch sizes of 2048 or sequence lengths of 512. Option D is wrong because NVIDIA T4 GPUs are mid-range inference and training GPUs with lower memory bandwidth and fewer tensor cores, making them unsuitable for large-batch NLP training and resulting in significantly longer training times.

Full explanation →

860

MCQhard

A data scientist fine-tunes a generative image captioning model to describe medical images. The model outputs safe but very generic captions (e.g., 'An image of cells'). The goal is to produce more specific, clinically relevant descriptions. Which approach is most effective?

A.Perform incremental fine-tuning on a curated dataset of detailed medical image captions.

B.Use diverse beam search during decoding to generate multiple caption candidates.

C.Adjust top-k sampling to restrict the vocabulary to medical terms only.

D.Increase the temperature to encourage the model to output longer, more varied captions.

AnswerA

Fine-tuning with domain-specific examples teaches the model to generate precise clinical descriptions.

Why this answer

Option A is correct because incremental fine-tuning on a high-quality dataset of specific medical captions directly teaches the model the desired level of detail. Option B is wrong because increasing temperature may add irrelevant words, not specific clinical terms. Option C is wrong because top-k sampling can reduce output space but does not guarantee medical accuracy.

Option D is wrong because beam search is for diverse output but does not address specificity.

Full explanation →

861

MCQmedium

Which command correctly updates the traffic split?

A.gcloud ai endpoints update my-endpoint --region=us-central1 --remove-deployed-model model-v1 --add-deployed-model model-v2 --traffic-split=20

B.gcloud ai models update sentiment-model-v2 --traffic-split=20

C.gcloud ai endpoints update-traffic-split my-endpoint --region=us-central1 --traffic-split=model-v2=20,model-v1=80

D.gcloud ai endpoints update my-endpoint --region=us-central1 --update-traffic-split=model-v2=20,model-v1=80

AnswerC

This is the correct command to update the traffic split for an endpoint.

Why this answer

Option C is correct because the `gcloud ai endpoints update-traffic-split` command is the dedicated command for modifying traffic splits between deployed models on a Vertex AI endpoint. It uses the `--traffic-split` flag with key-value pairs (model_id=percentage) to assign traffic percentages, ensuring the total sums to 100. This command directly updates the routing configuration without redeploying models.

Exam trap

The trap here is that candidates confuse the `gcloud ai endpoints update` command (used for general endpoint configuration) with the specific `gcloud ai endpoints update-traffic-split` subcommand, leading them to choose options with incorrect flags like `--update-traffic-split` or `--traffic-split` on the wrong command.

How to eliminate wrong answers

Option A is wrong because `gcloud ai endpoints update` with `--remove-deployed-model` and `--add-deployed-model` is used to change the set of deployed models, not to update traffic splits; the `--traffic-split` flag here is invalid and would cause a syntax error. Option B is wrong because `gcloud ai models update` operates on model versions, not endpoints, and does not support a `--traffic-split` flag; traffic splitting is an endpoint-level configuration. Option D is wrong because `gcloud ai endpoints update` does not accept a `--update-traffic-split` flag; the correct flag is `--traffic-split` on the `update-traffic-split` subcommand, not on `update`.

Full explanation →

862

MCQhard

Refer to the exhibit. This IAM policy is applied to a Vertex AI project. A user 'test@example.com' reports they cannot create a ModelEvaluationPipelineJob. Which action should the administrator take?

A.Grant the user roles/aiplatform.specialist at the project level.

B.Add the user roles/aiplatform.user at the model level to allow pipeline creation.

C.Add the user to the roles/aiplatform.admin role at the project level.

D.Remove the service account from roles/aiplatform.admin.

AnswerC

Admin role includes permissions to create pipeline jobs.

Why this answer

Roles/aiplatform.user does not have the permissions to create pipeline jobs; it only allows viewing and using models and endpoints. Roles/aiplatform.admin has full control, so adding the user to this role is the simplest fix. There is no roles/aiplatform.specialist; removing the service account would not help; and granting at the model level is insufficient for creating pipeline jobs.

Full explanation →

863

MCQhard

A global financial services firm wants to deploy generative AI for personalized investment recommendations. They must comply with regulations in multiple jurisdictions, including GDPR and the SEC's Marketing Rule. The solution must also be auditable. Which approach best balances regulatory compliance, scalability, and cost?

A.Build a centralized model in a cloud region with the most stringent regulations and apply it globally.

B.Use a single global model with a unified compliance layer applied post-generation.

C.Deploy separate, jurisdiction-specific models with tailored guardrails and audit trails for each region.

D.Rely on a third-party API with built-in compliance for all regions.

AnswerC

This ensures compliance with local regulations and provides auditable logs.

Why this answer

Option C is correct because deploying separate, jurisdiction-specific models allows each model to be trained and governed with guardrails and audit trails that directly map to local regulations like GDPR (data minimization, right to erasure) and the SEC Marketing Rule (fair, clear, and not misleading disclosures). This approach avoids the compliance conflicts that arise when a single model must satisfy contradictory requirements across regions, and it scales cost-effectively by only applying the necessary compliance overhead to each region's data and inference pipeline.

Exam trap

Google Cloud often tests the misconception that a single global model with a post-generation compliance layer is sufficient, but the trap is that post-generation filtering cannot undo model outputs that already violate local regulations, and it fails to provide the granular audit trails required for each jurisdiction's specific rules.

How to eliminate wrong answers

Option A is wrong because building a centralized model in the most stringent region and applying it globally would force all jurisdictions to comply with that region's rules, potentially violating local laws (e.g., GDPR's data localization requirements) and increasing latency and cost for regions with less strict regulations. Option B is wrong because a single global model with a unified compliance layer applied post-generation cannot retroactively fix model outputs that violate jurisdiction-specific rules (e.g., SEC Marketing Rule's prohibition of misleading statements), and it creates an audit trail that is difficult to map to individual regulatory frameworks. Option D is wrong because relying on a third-party API with built-in compliance for all regions assumes a one-size-fits-all solution that rarely exists; third-party APIs often lack granular control over jurisdiction-specific guardrails and audit logging, and they introduce vendor lock-in and data sovereignty risks.

Full explanation →

864

Multi-Selectmedium

Which THREE of the following are features of Vertex AI Studio (Gen AI Studio)? (Choose 3)

Select 3 answers

A.Configure pre-built safety filters for generated content.

B.Deploy custom container images to Vertex AI endpoints.

C.Compare responses from different models side-by-side.

D.Fine-tune models with custom datasets using a visual interface.

E.Design and test prompts for various foundation models.

AnswersC, D, E

Studio has a comparison feature for model outputs.

Why this answer

Option C is correct because Vertex AI Studio (formerly Gen AI Studio) provides a built-in interface that allows users to run the same prompt against multiple foundation models simultaneously, enabling direct side-by-side comparison of outputs to evaluate quality, tone, and accuracy before selecting a model for deployment.

Exam trap

The trap here is that candidates confuse Vertex AI Studio with Vertex AI's broader deployment and safety management tools, mistakenly attributing endpoint deployment or safety filter configuration to the prompt-design interface, when in fact those are separate services under the Vertex AI umbrella.

Full explanation →

865

MCQmedium

A company wants to deploy a generative AI chatbot for customer service but is concerned about cost unpredictability due to variable usage. Which pricing model should they choose to best manage costs?

A.Committed use discounts

B.Free tier

C.Pay-as-you-go

D.Provisioned throughput

AnswerD

Provides fixed capacity with predictable monthly cost.

Why this answer

Option C is correct because provisioned throughput provides fixed capacity with predictable monthly cost, ideal for managing cost uncertainty. Option A (pay-as-you-go) is variable. Option B (committed use discounts) requires commitment but still variable if usage exceeds.

Option D (free tier) is too limited.

Full explanation →

866

Multi-Selectmedium

A data scientist wants to improve the performance of a text classification model for customer feedback. They have a small labeled dataset of 500 examples and a large unlabeled corpus of 100,000 feedback messages. Which TWO strategies would be most effective? (Choose 2)

Select 2 answers

A.Increase the context window of the model

B.Apply semi-supervised learning by pseudo-labeling the unlabeled data

C.Use RAG to retrieve similar examples from the unlabeled corpus during inference

D.Train a model from scratch on the labeled data only

E.Use a pre-trained LLM (e.g., Gemini) and fine-tune on the labeled data

AnswersB, E

Semi-supervised learning can use the unlabeled data to improve the model by generating pseudo-labels.

Why this answer

Option B is correct because semi-supervised learning with pseudo-labeling leverages the large unlabeled corpus to augment the small labeled dataset. The model first trains on the 500 labeled examples, then generates pseudo-labels for the unlabeled data, and retrains on the combined set, effectively increasing the training signal without requiring manual annotation.

Exam trap

Cisco often tests the misconception that RAG is a universal solution for any data scarcity problem, but in this context, RAG does not improve the model's training signal and is instead used for retrieval during inference, not for semi-supervised learning.

Full explanation →

867

Multi-Selecthard

A company is deploying a GenAI system that generates product descriptions. During A/B testing, the new system shows a 20% increase in click-through rate (CTR) but a 15% increase in average cost per query due to the model size. The team wants to optimize cost without sacrificing the CTR gain. Which THREE actions should they take? (Choose three.)

Select 3 answers

A.Batch similar requests to reduce per-request overhead

B.Use a larger model with higher accuracy to further increase CTR

C.Increase the number of few-shot examples in the prompt

D.Switch to a smaller model and re-A/B test to confirm CTR impact

E.Implement response caching for repeated product SKUs

AnswersA, D, E

Batching reduces the number of API calls and can lower cost.

Why this answer

Option A is correct because batching similar requests reduces the per-request overhead by combining multiple inference calls into a single batch, which amortizes the fixed costs (e.g., model loading, token processing) across more outputs. This directly lowers the average cost per query while preserving the model architecture and CTR gains, as the model's output quality remains unchanged.

Exam trap

Cisco often tests the misconception that adding more few-shot examples always improves output quality, but in reality, it increases token costs and can degrade performance due to context window limits or irrelevant examples.

Full explanation →

868

MCQmedium

A research team wants to use Google's AI to generate video content from text prompts for a creative project. Which Google Cloud generative AI model should they use?

A.Imagen

B.Codey

C.Veo

D.Gemini

AnswerC

Veo is a generative video model that can create videos from text and image prompts.

Why this answer

Veo is Google's video generation model. Imagen generates images, Gemini is multimodal but not primarily for video generation, and Codey is for code.

Full explanation →

869

Multi-Selecthard

Which THREE factors should be considered when choosing between fine-tuning and prompt engineering for a generative AI task? (Choose three.)

Select 3 answers

A.Availability of labeled training data

B.Cost of API calls per request

C.Latency requirements for the application

D.Degree of task specialization required

E.Size of the base model

AnswersA, C, D

Fine-tuning needs labeled data.

Why this answer

Option A is correct because fine-tuning requires a labeled dataset specific to the target task to adjust model weights via supervised learning, whereas prompt engineering relies on the model's existing knowledge without additional training data. Without sufficient labeled data, prompt engineering is often the only viable approach, as fine-tuning would risk overfitting or poor generalization.

Exam trap

Google Cloud often tests the misconception that cost or model size are primary decision factors, when in reality the core trade-off is between data availability (labeled vs. unlabeled) and the degree of task specialization required.

Full explanation →

870

MCQeasy

Which Google Cloud service allows you to run machine learning models directly using SQL queries on data in BigQuery?

A.Cloud Functions

B.BigQuery ML

C.Vertex AI

D.Dataflow

AnswerB

BigQuery ML allows SQL-based ML directly in BigQuery.

Why this answer

BigQuery ML enables users to create, train, and deploy ML models using standard SQL, eliminating the need to move data to a separate environment.

Full explanation →

871

Multi-Selectmedium

A data scientist is fine-tuning a generative AI model for customer sentiment analysis. To ensure the fine-tuned model does not inadvertently memorize and reproduce personally identifiable information (PII) from the training data, which THREE practices should they follow? (Select 3)

Select 3 answers

A.Apply differential privacy during fine-tuning

B.Apply model quantization to reduce model size

C.Remove or anonymize all PII from the training data

D.Use only a subset of data that is necessary for the task (data minimization)

E.Use k-fold cross-validation to evaluate the model

AnswersA, C, D

Differential privacy bounds the influence of any single data point.

Why this answer

Differential privacy limits memorization; PII removal prevents it from being present; data minimization reduces risk. Quantization and k-fold cross-validation do not directly address PII memorization.

Full explanation →

872

Multi-Selecteasy

Which TWO are key business considerations when adopting generative AI solutions?

Select 2 answers

A.Training duration on public datasets

B.Number of model parameters

C.Data privacy and compliance requirements

D.Model accuracy on benchmarks

E.Cost of inference per request

AnswersC, E

Privacy and compliance are critical business and legal considerations.

Why this answer

Data privacy and compliance requirements (Option C) are a key business consideration because generative AI models often process sensitive or proprietary data, and regulations like GDPR, HIPAA, or CCPA mandate strict controls on data handling, storage, and model training. Failure to address these can result in legal penalties, reputational damage, and loss of customer trust, making it a top priority for enterprise adoption.

Exam trap

Google Cloud often tests the distinction between technical metrics (like training duration, parameter count, and benchmark accuracy) and true business considerations (like compliance, cost, and scalability), leading candidates to confuse model performance indicators with strategic business drivers.

Full explanation →

873

MCQmedium

A company is deploying a custom large language model for internal use and requires the lowest cost for inference while maintaining reasonable quality. They can tolerate slight latency. Which Gemini model variant should they choose?

A.Gemini Ultra

B.Gemini Flash

C.Gemini Nano

D.Gemini Pro

AnswerB

Optimized for lower cost and fast inference with decent quality.

Why this answer

Gemini Flash is designed for low-latency and cost-efficient inference while maintaining strong quality, making it the ideal choice for internal deployments where cost is the primary concern and slight latency is acceptable. It balances performance and expense better than the other variants for this specific requirement.

Exam trap

The trap here is that candidates often assume 'Pro' is always the best cost-quality trade-off, but Google specifically positions Flash as the cost-optimized variant for high-volume, latency-tolerant inference, making it the correct answer when lowest cost is the priority.

How to eliminate wrong answers

Option A is wrong because Gemini Ultra is the largest and most capable model, optimized for maximum quality and complex reasoning, but it incurs the highest inference cost and latency, which contradicts the need for lowest cost. Option C is wrong because Gemini Nano is the smallest model designed for on-device deployment with minimal resource usage, but it sacrifices too much quality and capability for internal enterprise use cases that require reasonable quality. Option D is wrong because Gemini Pro offers a good balance of quality and cost but is more expensive than Flash for inference, as Flash is specifically optimized for faster and cheaper serving while maintaining competitive quality.

Full explanation →

874

MCQhard

An enterprise deploys a generative AI chatbot that must comply with GDPR right to deletion. Users can request deletion of their personal data. The chatbot uses a RAG pipeline with a vector database. What is the MOST effective way to handle deletion requests?

A.Delete the user's documents from the vector index and original storage, then rebuild the index

B.Update the user's records in the vector index with anonymized placeholders

C.Add a filter to the chat application to block the user's name from appearing in responses

D.Retrain the LLM from scratch without the user's data

AnswerA

This removes the data from retrieval so the chatbot cannot access it, satisfying GDPR deletion requirements.

Why this answer

GDPR requires that personal data be erased when requested. In a RAG system, the source documents must be deleted from the vector store and the original storage. Fine-tuning or model training is not the right approach.

Full explanation →

875

Multi-Selectmedium

A company wants to build a GenAI-powered customer support chatbot. They require the chatbot to provide accurate answers based on the latest product documentation, and they need to control costs by minimizing token usage. Which TWO strategies should they use?

Select 2 answers

A.Use the largest available model for maximum accuracy

B.Cache frequent queries and their responses

C.Implement Retrieval-Augmented Generation (RAG) to retrieve relevant document chunks

D.Use a longer context window to include entire documents in the prompt

E.Fine-tune a large language model on the product documentation

AnswersB, C

Caching reduces repeated API calls, saving tokens and cost.

Why this answer

RAG ensures answers are grounded in latest docs. Caching common queries reduces token usage. Fine-tuning is expensive and not dynamic.

Using a large model increases cost. Long context windows increase token usage.

Full explanation →

876

MCQhard

A company is using Vertex AI to generate personalized marketing emails. The model sometimes produces biased content. What is the most effective way to detect and mitigate bias?

A.Add more diverse training data

B.Manually review all generated emails before sending

C.Switch to a different generative model

D.Use Vertex AI Explainable AI to analyze predictions and detect bias in training data

AnswerD

Explainable AI helps identify bias sources.

Why this answer

Vertex AI Explainable AI provides feature attributions and model explanations that help identify which input features (e.g., demographic attributes, phrasing patterns) contribute most to biased outputs. By analyzing these attributions against training data, you can pinpoint and mitigate bias at the source, rather than relying on post-hoc manual review or model swapping.

Exam trap

Google Cloud often tests the misconception that bias mitigation is solely a data quantity problem, leading candidates to choose 'add more diverse training data' without recognizing the need for diagnostic tools like Explainable AI to first detect and understand the bias.

How to eliminate wrong answers

Option A is wrong because simply adding more diverse training data does not guarantee detection of existing bias; it may reduce bias over time but lacks the diagnostic capability to identify specific biased patterns. Option B is wrong because manual review of all generated emails is not scalable, introduces human bias, and does not address the root cause of bias in the model's training or architecture. Option C is wrong because switching to a different generative model does not inherently detect or mitigate bias; the new model may have similar or different biases, and the underlying issue of biased training data or model behavior remains unaddressed.

Full explanation →

877

MCQmedium

A company is building a document summarization tool using Vertex AI Gemini API. They notice that the model sometimes returns incomplete summaries that miss key points. Which approach is most likely to improve summary quality without increasing token usage significantly?

A.Refine the system instruction to specify the desired summary format and key elements to include

B.Increase the context window to include more of the document

C.Switch to a larger Gemini model (e.g., from 1.0 Pro to 1.5 Pro)

D.Increase the max output token limit to allow longer summaries

AnswerA

Better prompting guides the model to produce more complete summaries without extra tokens.

Why this answer

Refining the system instruction directly addresses the root cause of incomplete summaries by providing explicit guidance on the desired output format and key elements to include. This approach improves the model's adherence to the task without increasing the number of input or output tokens, as it only modifies the instruction text, not the document length or generation limits.

Exam trap

Cisco often tests the misconception that increasing model size or token limits directly improves output quality, when in fact prompt engineering—specifically system instructions—is a more efficient and cost-effective lever for controlling model behavior.

How to eliminate wrong answers

Option B is wrong because increasing the context window adds more tokens from the document, which does not guarantee the model will focus on key points and can actually dilute attention, increasing token usage without improving summary completeness. Option C is wrong because switching to a larger Gemini model (e.g., from 1.0 Pro to 1.5 Pro) increases computational cost and token usage (due to larger model overhead) but does not inherently fix the instruction quality; the issue is prompt design, not model capacity. Option D is wrong because increasing the max output token limit allows longer summaries but does not ensure the model includes missing key points; it may simply produce more verbose text without addressing the root cause of omission.

Full explanation →

878

Multi-Selectmedium

A company is developing a generative AI application that will be used by customers in the EU. To comply with GDPR, which TWO measures are REQUIRED? (Choose 2)

Select 2 answers

A.Provide users the ability to request deletion of their personal data

B.Implement end-to-end encryption for all data in transit and at rest

C.Publish a Model Card for the AI model

D.Ensure all data is stored in a data center within the EU

E.Obtain explicit consent from users before processing their personal data

AnswersA, E

Right to erasure is a core GDPR right.

Why this answer

Option A is correct because GDPR grants data subjects the 'right to erasure' (Article 17), requiring organizations to delete personal data upon request without undue delay. For a generative AI application, this means the company must be able to remove specific training data or user inputs from the model's memory or logs, which is technically challenging but legally mandatory.

Exam trap

Cisco often tests the distinction between GDPR's explicit legal requirements (like consent and erasure) and common security or transparency practices (like encryption or Model Cards) that are recommended but not mandated, leading candidates to over-select options that seem 'good' but are not legally required.

Full explanation →

879

Multi-Selectmedium

A company is deploying a generative AI system for medical diagnosis. Which TWO measures are essential for responsible AI in this high-stakes domain?

Select 2 answers

A.Allow the AI to make autonomous decisions in time-sensitive emergencies

B.Ensure a human medical professional reviews all AI-generated diagnoses

C.Publish all patient data used for training to ensure transparency

D.Provide model documentation (Model Cards) to clinicians detailing the system's limitations

E.Use the AI only for administrative tasks, not diagnosis

AnswersB, D

Human oversight is critical for high-stakes decisions to catch errors and maintain accountability.

Why this answer

High-stakes domains require human oversight and transparency. Human review ensures accountability, and Model Cards communicate capabilities and limitations to stakeholders.

Full explanation →

880

MCQeasy

A developer wants to improve the factual accuracy of the model's summaries. Based on the exhibit, what should they do?

A.Enable the support engine.

B.Increase the model's context window.

C.Configure grounding with a knowledge base.

D.Re-train the model with a dataset of facts.

AnswerC

Grounding provides factual context, improving accuracy.

Why this answer

Grounding with a knowledge base is the correct approach because it anchors the model's output to a trusted, external source of facts, directly improving factual accuracy without modifying the model's weights. This technique uses retrieval-augmented generation (RAG) to fetch relevant documents from the knowledge base and inject them into the prompt context, ensuring the summary is based on verified information rather than relying solely on the model's parametric memory.

Exam trap

Cisco often tests the distinction between modifying the model itself (retraining or fine-tuning) versus augmenting the input (grounding or RAG), and the trap here is that candidates may incorrectly choose re-training (Option D) because they assume improving factual accuracy always requires changing the model's weights, when in fact grounding is a more practical and scalable solution for many enterprise use cases.

How to eliminate wrong answers

Option A is wrong because enabling the support engine typically refers to a customer support or troubleshooting tool, not a mechanism for improving factual accuracy in generative AI summaries; it does not provide a knowledge base for grounding. Option B is wrong because increasing the model's context window only allows the model to process more tokens in a single request, but it does not introduce new factual information or correct hallucinations; it may even amplify errors if the additional context is unverified. Option D is wrong because re-training the model with a dataset of facts is a costly, time-consuming process that requires significant computational resources and expertise, and it does not guarantee factual accuracy for unseen or evolving information; grounding with a knowledge base is a more efficient and dynamic solution.

Full explanation →

881

MCQeasy

You are a generative AI lead at a healthcare startup developing a system to summarize patient medical records for quick review by doctors. The system uses a fine-tuned LLM. After deployment, doctors report that the summaries often miss critical details like medication dosages and allergy information. The current pipeline preprocesses patient records by extracting text from EHR, feeding it to the LLM, and outputting a summary. The team has limited time and budget. They cannot retrain the model because it is hosted as a managed API. Which action should you take to most effectively improve the summarization quality without changing the model?

A.Increase the maximum output token limit to force the model to include more details.

B.Replace the LLM with a simpler extractive summarization model that selects sentences from the original document.

C.Implement a retrieval-augmented generation (RAG) system that pulls supplementary data from external drug databases.

D.Revise the prompt to explicitly ask for medication dosages and allergies, and format the input text by adding headings (e.g., '### Medications') to emphasize important sections.

AnswerD

Prompt engineering is a low-cost, no-model-change solution that can emphasize key information.

Why this answer

Option D is correct because prompt engineering is the most effective and cost-efficient way to improve LLM output without retraining or changing the model. By explicitly instructing the model to include medication dosages and allergies, and by structuring the input with clear headings, you guide the model's attention to critical sections, directly addressing the missing details. This approach leverages the LLM's existing capabilities and requires no changes to the hosted API or additional infrastructure.

Exam trap

Google Cloud often tests the misconception that increasing output length or adding external data automatically improves quality, when in fact the most direct and cost-effective fix is to refine the input prompt to guide the model's focus.

How to eliminate wrong answers

Option A is wrong because increasing the maximum output token limit does not force the model to include specific missing details; it only allows longer responses, which may still omit critical information if the prompt does not direct the model's focus. Option B is wrong because replacing the LLM with an extractive summarization model would require retraining or deploying a new model, contradicting the constraint of not changing the model, and extractive methods cannot generate new text to explicitly mention dosages or allergies if they are not present in the original text. Option C is wrong because implementing a RAG system to pull from external drug databases adds complexity, cost, and latency, and does not address the core issue of missing details from the patient's own records; the problem is about extracting existing information, not supplementing with external data.

Full explanation →

882

Multi-Selecthard

A company is fine-tuning a Gemma model using Vertex AI. They observe that the model overfits. Which TWO actions should they take to mitigate overfitting?

Select 2 answers

A.Use a larger batch size

B.Increase the number of training epochs

C.Use more diverse data

D.Reduce the learning rate

E.Add dropout during fine-tuning

AnswersC, E

More diverse training data reduces overfitting to narrow patterns.

Why this answer

Option C is correct because introducing more diverse data helps the model generalize better by exposing it to a wider variety of patterns, reducing the risk of memorizing noise from a limited dataset. Option E is correct because dropout randomly deactivates a fraction of neurons during fine-tuning, which prevents co-adaptation and acts as a regularization technique to combat overfitting in transformer-based models like Gemma.

Exam trap

Google Cloud often tests the misconception that reducing the learning rate or increasing batch size are universal fixes for overfitting, when in fact these hyperparameters primarily affect optimization dynamics rather than regularization.

Full explanation →

883

MCQeasy

A retail company wants to integrate generative AI into its customer service chatbot to handle routine inquiries. They have a limited budget and want to launch quickly. Which strategy is most appropriate?

A.Partner with a generative AI vendor for a custom solution

B.Use pre-trained models via Google Cloud's Generative AI Studio API

C.Fine-tune an open-source model on their customer service logs

D.Build a custom LLM from scratch using the company's own data

AnswerB

Using pre-trained models via API is cost-effective and fast to implement.

Why this answer

Option B is correct because using pre-trained models via Google Cloud's Generative AI Studio API allows the company to leverage existing, powerful models without the high cost and time investment of custom development or fine-tuning. This approach enables rapid deployment on a limited budget by simply integrating the API into their chatbot, handling routine inquiries effectively without requiring extensive machine learning expertise or infrastructure.

Exam trap

Google Cloud often tests the misconception that fine-tuning or custom models are always better for domain-specific tasks, but the trap here is that for routine inquiries with limited budget and time, pre-trained APIs offer the fastest and most cost-effective solution without sacrificing quality.

How to eliminate wrong answers

Option A is wrong because partnering with a generative AI vendor for a custom solution typically involves significant upfront costs, long development cycles, and vendor lock-in, which contradicts the company's limited budget and need for quick launch. Option C is wrong because fine-tuning an open-source model on customer service logs requires substantial computational resources, data preparation, and machine learning expertise, making it slower and more expensive than using a pre-trained API. Option D is wrong because building a custom LLM from scratch is extremely resource-intensive, requiring massive datasets, specialized hardware, and months of training, which is impractical for a company with limited budget and a need for speed.

Full explanation →

884

Multi-Selecteasy

A developer is using Vertex AI Studio to test a text generation model. Which two actions can be performed in Vertex AI Studio? (Choose TWO)

Select 2 answers

A.Manage IAM roles

B.Monitor model cost

C.Create a dataset

D.Deploy a model to an endpoint

E.Fine-tune a model

AnswersD, E

From Studio, you can deploy a fine-tuned model directly to an endpoint.

Why this answer

Option D is correct because Vertex AI Studio provides a direct interface to deploy a text generation model to an endpoint for serving predictions. This action is a core capability of the platform, allowing developers to test and then operationalize their models without leaving the Studio environment.

Exam trap

Google Cloud often tests the distinction between actions performed within a specific tool (Vertex AI Studio) versus broader platform capabilities (IAM, cost monitoring, dataset creation) to see if candidates understand the scope and purpose of each service.

Full explanation →

885

MCQmedium

A research team wants to document the intended uses, limitations, and ethical considerations of their newly trained image classification model. Which Google Cloud tool should they use?

A.Explainable AI SDK

B.Datasheets for Datasets

C.Model Card Toolkit

D.What-If Tool

AnswerC

Model Card Toolkit is designed to create Model Cards that document model details, intended use, and ethical considerations.

Why this answer

The Model Card Toolkit is specifically designed to document the intended uses, limitations, and ethical considerations of machine learning models, including image classification models. It generates a structured model card that provides transparency and accountability, which aligns directly with the team's goal of documenting these aspects.

Exam trap

Cisco often tests the distinction between tools that explain individual predictions (Explainable AI SDK) versus tools that document the model's overall purpose and limitations (Model Card Toolkit), causing candidates to confuse local interpretability with global documentation.

How to eliminate wrong answers

Option A is wrong because the Explainable AI SDK focuses on providing feature attributions and explanations for individual predictions, not on documenting the model's intended uses, limitations, or ethical considerations. Option B is wrong because Datasheets for Datasets is a tool for documenting datasets, not models; it covers dataset characteristics, collection methods, and biases, but does not address model-level documentation. Option D is wrong because the What-If Tool is an interactive visualization tool for exploring model behavior and fairness across different slices of data, but it does not generate a static documentation artifact like a model card.

Full explanation →

886

MCQhard

A company is deploying a chatbot using Vertex AI and wants to ensure that the model's responses are grounded in Google Search results to reduce hallucinations. Which feature should they enable?

A.Vertex AI Agent Builder

B.Retrieval-Augmented Generation (RAG) with internal documents

C.Gemini API with custom fine-tuning

D.Vertex AI Search for grounding

AnswerD

Vertex AI provides Google Search grounding to use search results as a knowledge source.

Why this answer

Google Search grounding in Vertex AI allows the model to retrieve real-time information from Google Search to ground answers. RAG uses private data, not web search. Vertex AI Agent Builder includes grounding but is for building agents, not specifically the grounding feature itself.

Full explanation →

887

MCQeasy

An e-commerce company is using a generative AI model to recommend products. They notice that the recommendations are often irrelevant. What is the most likely cause?

A.Using an outdated model version

B.Incorrect regional endpoint configuration

C.Inadequate prompt engineering

D.Overfitting on training data

AnswerC

The model's output quality heavily depends on the prompt; poor prompts lead to irrelevant responses.

Why this answer

Inadequate prompt engineering is the most likely cause because generative AI models rely heavily on the quality and specificity of the input prompt to produce relevant outputs. If the prompts used to generate product recommendations are vague, poorly structured, or lack context (e.g., not including user preferences or historical behavior), the model will return generic or irrelevant suggestions. This is a common failure point in recommendation systems where the prompt acts as the primary interface for steering model behavior.

Exam trap

Google Cloud often tests the misconception that model performance issues are always due to training data or model version problems, when in fact prompt engineering is the most immediate and common cause of output irrelevance in generative AI systems.

How to eliminate wrong answers

Option A is wrong because using an outdated model version may affect performance or feature availability, but it does not directly cause irrelevant recommendations; the model would still generate outputs consistent with its training, and relevance is more tied to prompt quality. Option B is wrong because incorrect regional endpoint configuration would cause connectivity or latency issues (e.g., API timeouts or routing errors), not irrelevant content generation; the model's output relevance is independent of the endpoint's geographic location. Option D is wrong because overfitting on training data would cause the model to memorize specific patterns and perform poorly on new or diverse inputs, but in a recommendation context, overfitting typically leads to overly narrow or repetitive suggestions, not broadly irrelevant ones; the primary issue with irrelevant outputs is prompt misalignment, not training data memorization.

Full explanation →

888

MCQmedium

A financial services firm is using a foundation model on Vertex AI to generate investment summaries from quarterly reports. The summaries are accurate but often miss key financial metrics and trends. The team cannot afford to fine-tune the model frequently. Which technique should they use to improve the completeness and relevance of the summaries without modifying the model?

A.Increase temperature to 0.9 to encourage more creative outputs.

B.Provide three few-shot examples in the prompt that highlight the desired metrics.

C.Set stop sequences to [' '] to ensure the model finishes each paragraph.

D.Lower top_p to 0.5 to reduce the sampling pool.

AnswerB

Few-shot examples condition the model to replicate the structure and content of the examples.

Why this answer

Option B is correct because few-shot prompting provides the model with concrete examples of desired output structure and content, guiding it to include key financial metrics and trends without retraining. This technique leverages in-context learning, where the model generalizes from the examples in the prompt to produce more complete and relevant summaries, while avoiding the cost and latency of fine-tuning.

Exam trap

The trap here is that candidates confuse hyperparameter tuning (temperature, top_p) with prompt engineering, assuming that increasing randomness or restricting token selection will improve output quality, when in fact few-shot examples directly teach the model the desired output structure without modifying the model.

How to eliminate wrong answers

Option A is wrong because increasing temperature to 0.9 encourages randomness and creativity, which would likely make summaries less focused and more prone to missing key metrics, not more complete. Option C is wrong because setting stop sequences to ['

'] only controls when the model stops generating text, but does not influence the content or inclusion of specific financial metrics within the output. Option D is wrong because lowering top_p to 0.5 reduces the sampling pool to only the most likely tokens, which can make outputs more repetitive and less likely to include diverse or specific metrics, not improve completeness.

Full explanation →

889

Multi-Selectmedium

A healthcare chatbot must avoid hallucinations. Which TWO techniques should the team implement? (Choose two.)

Select 2 answers

A.Set frequency penalty to 0.0

B.Use chain-of-thought prompting

C.Use higher temperature

D.Increase top_k to 50

E.Enable grounding with a knowledge base

AnswersB, E

Encourages step-by-step reasoning, reducing errors.

Why this answer

Chain-of-thought prompting (B) reduces hallucinations by forcing the model to reason step-by-step, which improves factual accuracy and consistency in complex tasks like medical triage. Enabling grounding with a knowledge base (E) anchors the model's output to verified external data, directly preventing fabrication by restricting responses to retrieved facts.

Exam trap

Cisco often tests the misconception that increasing randomness (temperature, top_k) or disabling penalties improves output quality, when in fact these parameters increase hallucination risk in safety-critical applications like healthcare.

Full explanation →

890

MCQmedium

A developer is using the Vertex AI Gemini API to generate product descriptions. They get a 400 error 'INVALID_ARGUMENT: The model's maximum input token limit is 8192.' What is the most likely issue?

A.The prompt is too long

B.The API key is invalid

C.The output tokens are too high

D.The model is not available in the region

AnswerA

The error explicitly states the input token limit is exceeded.

Why this answer

The 400 error 'INVALID_ARGUMENT: The model's maximum input token limit is 8192' explicitly indicates that the combined token count of the prompt (system instructions, user input, and any conversation history) exceeds the 8192-token context window of the Gemini model being used. This is a hard limit enforced by the Vertex AI Gemini API, and the error is triggered before any generation begins. Therefore, the most likely issue is that the prompt is too long.

Exam trap

The trap here is that candidates confuse input token limits with output token limits or general API authentication errors, but the specific error message 'maximum input token limit' directly points to prompt length as the root cause.

How to eliminate wrong answers

Option B is wrong because an invalid API key would result in a 401 Unauthorized or 403 Forbidden error, not a 400 INVALID_ARGUMENT error related to token limits. Option C is wrong because the error message specifically mentions 'input token limit', not output tokens; output token limits are enforced separately (e.g., via max_output_tokens parameter) and would produce a different error. Option D is wrong because model availability in a region would cause a 404 or 403 error (e.g., 'Model not found' or 'Permission denied'), not a token-limit-related INVALID_ARGUMENT error.

Full explanation →

891

MCQmedium

A healthcare organization needs to run ML models on patient data stored in BigQuery while ensuring data never leaves the database. Which service allows them to create and execute ML models directly in BigQuery SQL?

A.Cloud Functions

B.Vertex AI Prediction

C.Vertex AI Workbench

D.BigQuery ML

AnswerD

BigQuery ML allows users to create, train, and evaluate ML models using SQL queries directly on BigQuery data.

Why this answer

BigQuery ML enables SQL-based ML model creation and execution directly on data in BigQuery, meeting the requirement without data movement.

Full explanation →

892

MCQeasy

A company is choosing between Google's Gemini API and an open-source model. Which factor is most important for a business with limited ML expertise?

A.Ease of integration and availability of support

B.Model parameter count

C.Cost per token

D.Community size

AnswerA

Limited ML expertise means the team needs a solution that is easy to integrate and comes with reliable support.

Why this answer

For a business with limited ML expertise, ease of integration and availability of support are paramount because they reduce the need for in-house machine learning engineering talent. Google's Gemini API offers managed infrastructure, pre-built SDKs, and enterprise-grade support (e.g., SLA-backed uptime, dedicated account management), which directly lowers the barrier to entry and operational risk. In contrast, open-source models require significant expertise for deployment, scaling, and troubleshooting, making them unsuitable for teams without deep ML skills.

Exam trap

Cisco often tests the misconception that technical metrics like parameter count or cost per token are the primary decision factors, when in reality, for a non-expert team, operational simplicity and vendor support are the critical success factors that determine whether a GenAI project can be delivered at all.

How to eliminate wrong answers

Option B is wrong because model parameter count (e.g., 7B vs 175B) is a technical metric that does not directly address the business's lack of ML expertise; a larger parameter count can actually increase complexity and resource requirements, making it harder to integrate without expert knowledge. Option C is wrong because cost per token, while important for budgeting, is secondary to the ability to actually use the model; without easy integration and support, even a low-cost model can become expensive due to hidden engineering costs and downtime. Option D is wrong because community size, though helpful for troubleshooting, does not provide the structured, guaranteed support and SLAs that a business with limited ML expertise needs; community forums lack accountability and may not offer timely or accurate solutions for production-critical issues.

Full explanation →

893

MCQhard

What is the most likely cause of the error?

A.The predict schema must be stored in the same bucket as the model artifacts and referenced without the full gs:// URI

B.The display name contains a hyphen which is not allowed

C.The container image URI is incorrect

D.The region us-central1 does not support TensorFlow models

AnswerA

The schema should be a relative path within the artifact URI.

Why this answer

The error occurs because the Vertex AI Predict schema must be stored in the same Cloud Storage bucket as the model artifacts, and when referenced in the model upload request, it should use a relative path (without the full `gs://` URI). Using the full URI causes a parsing failure, as Vertex AI expects the schema to be co-located with the model artifacts for validation and deployment.

Exam trap

Google Cloud often tests the nuance that Vertex AI expects schema files to be co-located with model artifacts and referenced without the full `gs://` URI, causing candidates to incorrectly assume the error is due to region limitations or container image issues.

How to eliminate wrong answers

Option B is wrong because hyphens are allowed in display names for Vertex AI models; the constraint is on the model ID (auto-generated) and display names can contain hyphens, underscores, and alphanumeric characters. Option C is wrong because the container image URI is syntactically correct and points to a valid Vertex AI pre-built serving image for TensorFlow; the error is not related to the container URI format. Option D is wrong because us-central1 fully supports TensorFlow models on Vertex AI; it is one of the primary regions for AI Platform and Vertex AI model deployment.

Full explanation →

894

MCQeasy

Which Google Cloud AI service provides a unified ML platform for building, deploying, and managing ML models in production?

A.AI Platform

B.Vertex AI

C.BigQuery ML

D.Cloud AutoML

AnswerB

Vertex AI is the single platform for all ML activities, including training, deployment, and management.

Why this answer

Vertex AI is the correct answer because it is Google Cloud's unified ML platform that integrates data engineering, data science, and ML engineering workflows into a single service. It provides end-to-end capabilities for building, training, deploying, and managing ML models in production, including AutoML, custom training, model registry, and MLOps features like continuous evaluation and monitoring.

Exam trap

The trap here is that candidates often confuse AI Platform (the legacy service) with Vertex AI, not realizing that Vertex AI is the successor that consolidates all ML capabilities into a single, unified platform, making AI Platform a deprecated option in the context of current Google Cloud ML strategy.

How to eliminate wrong answers

Option A is wrong because AI Platform (now legacy) was the predecessor to Vertex AI; it lacked the unified integration of AutoML and custom training under a single API and did not provide the same level of MLOps tooling, such as model monitoring and feature store. Option C is wrong because BigQuery ML is a service that allows users to create and execute ML models using SQL queries directly in BigQuery, but it is not a unified ML platform for building, deploying, and managing models in production—it is limited to in-database ML and does not support custom training frameworks or production deployment pipelines. Option D is wrong because Cloud AutoML is a subset of Vertex AI that focuses on training high-quality models with minimal effort using Google's transfer learning and neural architecture search, but it does not provide the full platform capabilities for custom model development, deployment, and management that Vertex AI offers.

Full explanation →

895

MCQhard

A team configures a Vertex AI prediction request as shown. Users report that the model sometimes produces incoherent or off-topic responses despite moderate settings. What is the most likely cause?

A.The temperature is too high for coherent responses.

B.The maxOutputTokens is too low.

C.The safety threshold blocks too much content.

D.The topK value is too low.

AnswerA

High temperature introduces randomness, reducing coherence.

Why this answer

A is correct because temperature controls the randomness of token selection; a high temperature (e.g., >0.8) increases the probability of sampling low-probability tokens, leading to incoherent or off-topic responses even with moderate settings. Vertex AI's default temperature is 0.0 for deterministic output, and raising it without careful tuning often causes semantic drift.

Exam trap

Cisco often tests the misconception that maxOutputTokens or safety thresholds cause incoherence, when in fact temperature is the primary hyperparameter controlling output randomness and topical consistency.

How to eliminate wrong answers

Option B is wrong because maxOutputTokens limits the length of the response, not its coherence; a low value would truncate output prematurely but not cause off-topic or incoherent content. Option C is wrong because safety thresholds block harmful or sensitive content entirely, returning a default refusal or empty response, not incoherent or off-topic text. Option D is wrong because topK limits the number of highest-probability tokens considered at each step; a low topK (e.g., 1) makes output more deterministic and focused, not less coherent.

Full explanation →

896

Multi-Selectmedium

A company wants to use a generative AI model to automatically generate marketing content. They are concerned about copyright infringement if the model reproduces copyrighted text. Which TWO strategies should they employ? (Choose 2)

Select 2 answers

A.Implement output filters to block known copyrighted phrases

B.Create a Model Card documenting the training data sources

C.Require human review of every generated piece of content

D.Use SynthID to watermark all generated content

E.Ensure training data does not include copyrighted material without proper licensing

AnswersA, E

Filters can catch and block direct reproduction of copyrighted text.

Why this answer

Training data provenance (B) helps avoid using copyrighted material without permission. Output filtering (C) can prevent reproduction of known copyrighted phrases. Watermarking (A) does not prevent infringement.

Model Cards (D) document but don't prevent. Human review (E) is costly and not a direct strategy.

Full explanation →

897

MCQhard

A startup is building a generative AI legal document assistant for small law firms. They want to ensure that the model's outputs are accurate and can be traced back to specific legal statutes. Which approach best supports this requirement?

A.Fine-tune the model on a large corpus of legal documents

B.Apply a high temperature setting to encourage diverse outputs

C.Use a model larger than 70B parameters

D.Use a RAG architecture that retrieves relevant statutes and includes them as citations in the model's response

AnswerD

RAG with citations provides traceability to specific sources.

Why this answer

Option D is correct because Retrieval-Augmented Generation (RAG) architecture retrieves specific legal statutes from a trusted external knowledge base and includes them as citations in the model's response. This ensures both accuracy (by grounding outputs in verifiable sources) and traceability (by providing direct references to the statutes used). Fine-tuning alone cannot guarantee that the model will cite specific statutes correctly, as it may hallucinate or misremember legal references.

Exam trap

Cisco often tests the misconception that larger models or fine-tuning alone can guarantee factual accuracy and traceability, when in fact retrieval-augmented generation is required for verifiable, source-grounded outputs.

How to eliminate wrong answers

Option A is wrong because fine-tuning on a large corpus of legal documents improves general legal knowledge but does not provide a mechanism to retrieve and cite specific, up-to-date statutes; the model may still hallucinate or produce outdated references. Option B is wrong because applying a high temperature setting increases randomness and diversity in outputs, which reduces accuracy and makes traceability to specific statutes impossible. Option C is wrong because using a model larger than 70B parameters does not inherently improve the ability to cite specific statutes; larger models can still hallucinate and lack a retrieval mechanism for grounded citations.

Full explanation →

898

MCQhard

A healthcare company is using a fine-tuned version of PaLM 2 on Vertex AI to generate clinical notes from doctor-patient conversations. The model was fine-tuned on a dataset of 10,000 de-identified transcripts and corresponding notes. During testing, the generated notes are grammatically correct and well-structured, but they often contain subtle inaccuracies: for example, they might mention a medication that was not discussed, or omit a key symptom. The team has already tried increasing the training epochs and adjusting learning rates, with minimal improvement. They need a solution that can be implemented quickly to improve factual accuracy without retraining the entire model. The team has access to a large archive of verified clinical notes and a small set of recent conversation-to-note pairs that have been manually reviewed and corrected. The inference pipeline currently uses a single call to the model with the conversation transcript as input. What should the team do?

A.Implement retrieval-augmented generation (RAG) by retrieving similar verified notes from the archive and providing them as context in the prompt.

B.Decrease the temperature to 0.1 to reduce randomness and force the model to stick to the input.

C.Use prompt engineering to instruct the model to only include information explicitly mentioned in the conversation.

D.Add a human-in-the-loop step to review and correct every generated note before use.

AnswerA

RAG grounds the generation in factual examples, directly reducing inaccuracies without retraining.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) directly addresses the core issue of factual inaccuracy without retraining. By retrieving verified clinical notes similar to the current conversation from the archive and injecting them as context in the prompt, the model gains access to ground-truth examples that anchor its output to factual details. This approach leverages the team's existing archive and small set of corrected pairs to provide relevant, accurate context, improving precision without modifying the model's weights.

Exam trap

The trap here is that candidates often assume factual inaccuracy is solely a randomness issue (temperature) or a prompt instruction problem, overlooking that the model's parametric knowledge is insufficient and needs external grounding via retrieval augmentation.

How to eliminate wrong answers

Option B is wrong because decreasing temperature to 0.1 reduces randomness but does not fix factual inaccuracies stemming from the model's training data or lack of context; it may actually cause the model to become overly deterministic and repeat hallucinations from its fine-tuning. Option C is wrong because prompt engineering to instruct the model to only include explicitly mentioned information is a superficial fix that cannot overcome the model's tendency to hallucinate or omit details when the training data or fine-tuning process has embedded those inaccuracies; it lacks the grounding provided by external verified data. Option D is wrong because adding a human-in-the-loop step to review every note is a manual, non-scalable solution that does not improve the model's output quality at inference time and fails to address the root cause of factual inaccuracy; it also contradicts the requirement for a quick implementation without retraining.

Full explanation →

899

MCQeasy

What is the primary purpose of Google's Datasheets for Datasets?

A.To serve as a legal contract for data sharing

B.To list all models trained on the dataset

C.To document the dataset's creation, composition, and intended use

D.To provide a template for labeling data

AnswerC

Datasheets provide a structured format for documenting datasets.

Why this answer

Datasheets for Datasets are designed to document the motivation, composition, collection process, and other details of a dataset to promote transparency and reproducibility.

Full explanation →

900

MCQhard

A hospital wants to summarize patient-doctor conversations into structured clinical notes using GenAI. They need high accuracy and must avoid hallucinated medical information. Which combination of techniques is BEST?

A.Use RAG with medical textbooks and a low temperature setting

B.Fine-tune a medical-specific model on de-identified transcripts and use a structured output format in the prompt

C.Use a large model with zero-shot prompting and post-process the output with rule-based checks

D.Use a Gemini model with a high temperature setting to encourage creativity

AnswerB

Fine-tuning on medical data improves accuracy; structured output reduces hallucination.

Why this answer

Fine-tuning with structured output (e.g., JSON schema) and a strict prompt ensures the model produces accurate, formatted notes and reduces hallucinations.

Full explanation →

Page 12 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Generative AI Concepts and Technologies Google AI Ecosystem and Strategy Responsible AI and Data Governance Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output Applying Generative AI in Business

See all domains with question counts →

Google Cloud Generative AI Leader Generative AI Leader Generative AI Leader Questions 826–900 | Page 12/14 | Courseiva