Knowledge + Practice

Google Cloud Generative AI Leader Generative AI Leader (Generative AI Leader) — Questions 676–750

997 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 10 of 14

676

Multi-Selecteasy

A data scientist is using Vertex AI's Generative AI Studio to experiment with prompt designs. Which THREE features are available in the studio?

Select 3 answers

A.Grounding configuration

B.Model parameter adjustments (temperature, top_p, etc.)

C.Automated hyperparameter tuning

D.Prompt templates

E.A/B testing of multiple prompt versions

AnswersA, B, D

Grounding can be set up in the studio.

Why this answer

Option A is correct because Vertex AI Generative AI Studio includes a grounding configuration feature that allows you to connect prompts to external data sources (e.g., Vertex AI Search, BigQuery, or private datasets) to ground responses in factual, up-to-date information, reducing hallucinations. This is a core capability for enterprise use cases requiring retrieval-augmented generation (RAG).

Exam trap

Cisco often tests the distinction between features available in Generative AI Studio (prompt design, model parameters, grounding, templates) versus those in Vertex AI Training or Prediction (hyperparameter tuning, A/B testing), so candidates mistakenly assume all ML workflow features are present in the studio.

Full explanation →

677

MCQeasy

Which Vertex AI feature allows a business user to explore and test over 300 foundation models from Google and partners, including Gemma, Llama, and Claude, without writing code?

A.Vertex AI Pipelines

B.Vertex AI Studio

C.Vertex AI Agent Builder

D.Model Garden

AnswerD

Model Garden offers over 300 foundation models for exploration, testing, and deployment.

Why this answer

Model Garden is the central hub in Vertex AI that provides access to a wide variety of foundation models for exploration and testing.

Full explanation →

678

MCQmedium

Refer to the exhibit. A data scientist runs this command to upload a custom model to Vertex AI. What is the primary purpose of the --container-image-uri flag?

A.To indicate the model artifact location

B.To set the training container

C.To specify the base image for model serving

D.To define the prediction container

AnswerC

Defines the serving environment for predictions.

Why this answer

The --container-image-uri flag in the `gcloud ai models upload` command specifies the custom container image that Vertex AI will use to serve predictions. This is the base image for model serving, not for training, because Vertex AI uses this image to create the serving environment that hosts the model and handles prediction requests.

Exam trap

The trap here is that candidates confuse the --container-image-uri flag with the training container (Option B) because both involve custom containers, but Vertex AI separates training and serving containers, and this flag is exclusively for serving.

How to eliminate wrong answers

Option A is wrong because the model artifact location is specified via the --artifact-uri flag, not --container-image-uri. Option B is wrong because the training container is set during model training (e.g., via `gcloud ai custom-jobs`), not during model upload; --container-image-uri is for serving. Option D is wrong because while the flag does define the container used for predictions, the correct technical term in Vertex AI is 'serving container' or 'prediction container' is a misnomer; the flag sets the base image for the serving container, not the prediction container itself (which is built from this base image).

Full explanation →

679

MCQhard

A company deployed a large language model on Vertex AI using the configuration shown in the exhibit. During peak usage, users report high latency. Which change is most likely to improve latency?

A.Remove the accelerator to simplify deployment.

B.Increase minReplicaCount to 3.

C.Switch to a GPU with more memory, such as NVIDIA_TESLA_A100.

D.Change machineType to n1-standard-4 to reduce cost.

AnswerB

More replicas ready at all times reduces cold-start and scaling latency.

Why this answer

Increasing minReplicaCount to 3 ensures that at least three instances of the model are always running and ready to serve requests. This reduces cold-start latency and distributes the load across multiple replicas, directly addressing high latency during peak usage by providing more concurrent serving capacity.

Exam trap

Cisco often tests the misconception that upgrading hardware (GPU memory or type) is the primary fix for latency, when in fact scaling out replicas is the more direct solution for handling concurrent request load.

How to eliminate wrong answers

Option A is wrong because removing the accelerator (GPU/TPU) would force the model to run on CPU, drastically increasing inference latency, especially for large language models. Option C is wrong because switching to a GPU with more memory (NVIDIA_TESLA_A100) does not directly improve latency; it addresses memory capacity issues, not the throughput bottleneck caused by insufficient replicas. Option D is wrong because changing machineType to n1-standard-4 reduces CPU and memory resources, which would likely increase latency rather than improve it, and cost reduction is not the goal here.

Full explanation →

680

MCQmedium

An e-commerce company is using a generative AI model to write product descriptions. They want to ensure that the model does not generate harmful content such as hate speech or violence. Which Google Cloud feature should they configure?

A.Cloud DLP (Data Loss Prevention)

B.Vertex AI Explainable AI

C.Vertex AI Safety Filters

D.Vertex AI Model Monitoring

AnswerC

Safety filters are specifically designed to block harmful content categories in model inputs and outputs.

Why this answer

Google Cloud's Vertex AI provides built-in safety filters that can be configured to block harmful content categories like hate speech, violence, sexual content, and dangerous instructions.

Full explanation →

681

MCQmedium

A financial services firm needs to generate synthetic data for training models while ensuring that no real customer data leaks. Which technique should they use?

A.Using the Vertex AI PII redaction service

B.Using a public foundation model without fine-tuning

C.Data masking before training

D.Differential privacy during fine-tuning

AnswerD

Differential privacy adds noise to protect individual data.

Why this answer

Differential privacy during fine-tuning is the correct technique because it adds calibrated noise to the training process, ensuring that the synthetic data generated does not reveal information about any individual real customer record. This approach provides a formal mathematical guarantee of privacy, making it suitable for generating synthetic data that preserves statistical properties while preventing data leakage. In contrast, other methods like redaction, masking, or using a public model do not inherently prevent the model from memorizing and reproducing sensitive information.

Exam trap

The trap here is that candidates confuse data masking or redaction (which only hide data in the training set) with techniques that prevent model memorization, overlooking that models can still leak sensitive information through inference even when the input data is obfuscated.

How to eliminate wrong answers

Option A is wrong because Vertex AI PII redaction service only removes or obscures personally identifiable information from existing text, but does not generate synthetic data; the underlying real data remains and could still be leaked through model memorization. Option B is wrong because using a public foundation model without fine-tuning does not generate synthetic data specific to the firm's domain; it may produce generic outputs that lack the required statistical fidelity, and it does not provide any privacy guarantee against leaking real customer data. Option C is wrong because data masking before training only obscures fields in the training dataset, but the model can still memorize and reconstruct masked values through inference attacks, especially if the masking is deterministic or reversible.

Full explanation →

682

MCQeasy

A retail company with a large FAQ database wants to build a generative AI customer service chatbot that can answer questions accurately with up-to-date information. Which business strategy should they prioritize?

A.Use retrieval-augmented generation (RAG) with vector search on the FAQ database.

B.Train a new model from scratch using the FAQ data.

C.Fine-tune a foundational model on the entire FAQ dataset.

D.Use a general-purpose language model without any customization.

AnswerA

RAG retrieves current, relevant information from the database, providing accurate and fresh responses without model retraining.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) with vector search allows the chatbot to dynamically retrieve the most relevant, up-to-date FAQ entries from a large database at inference time, grounding the generative model's responses in verified content without requiring retraining. This approach combines the flexibility of a pre-trained language model with the accuracy of real-time information retrieval, ensuring answers reflect the latest FAQ updates.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the best way to inject domain knowledge, but the trap here is that fine-tuning cannot efficiently handle frequently changing data, whereas RAG provides a modular, update-friendly architecture that avoids retraining costs.

How to eliminate wrong answers

Option B is wrong because training a new model from scratch on FAQ data is computationally prohibitive, requires massive datasets and resources, and still cannot guarantee up-to-date answers without frequent retraining. Option C is wrong because fine-tuning a foundational model on the entire FAQ dataset risks catastrophic forgetting of general language capabilities and does not inherently handle dynamic updates; any FAQ change would require re-fine-tuning. Option D is wrong because a general-purpose language model without customization lacks domain-specific knowledge and cannot access the company's proprietary FAQ database, leading to hallucinated or outdated answers.

Full explanation →

683

Multi-Selecthard

Which TWO techniques can help reduce latency for a real-time generative AI application? (Choose two.)

Select 2 answers

A.Use streaming responses to send tokens as generated.

B.Quantize the model to a lower precision.

C.Deploy more model replicas to handle load.

D.Enable prompt caching for repeated queries.

E.Batch multiple user requests together.

AnswersA, B

Streaming eliminates waiting for the full output, reducing perceived latency.

Why this answer

Streaming and model quantization directly reduce response time. Batching is for offline, and more deploy replicas can increase throughput but not necessarily reduce latency for a single request. Prompt caching can help if prompts repeat, but not generally.

Full explanation →

684

MCQmedium

A software development team builds an internal code assistant using a generative model. The assistant writes Python functions that often contain security vulnerabilities such as SQL injection or command injection. The team wants to mitigate these vulnerabilities without adding a manual review step for every code snippet, as that would slow development. They have access to a static analysis security scanner API. Which approach best addresses the vulnerabilities while maintaining developer velocity?

A.Increase top-k sampling to generate a wider variety of code tokens.

B.After each generation, automatically run the code through the static analysis scanner, and if vulnerabilities are found, send the output back to the model for revision with the scanner's feedback.

C.Fine-tune the model on a corpus of secure code examples.

D.Add a system prompt: 'Do not generate code with security vulnerabilities.'

AnswerB

This iterative process catches and corrects security issues without manual intervention, keeping velocity high.

Why this answer

Option B is correct because it creates an automated feedback loop: the static analysis scanner detects vulnerabilities in the generated code, and the model revises the output based on that feedback. This approach directly mitigates security flaws without requiring manual review, preserving developer velocity. It leverages the scanner's precise, rule-based detection to iteratively improve the model's output, which is more reliable than relying on the model's inherent safety.

Exam trap

Cisco often tests the misconception that a simple prompt or fine-tuning alone can guarantee safety, when in reality, a closed-loop validation with a dedicated security tool is required for reliable mitigation of injection vulnerabilities.

How to eliminate wrong answers

Option A is wrong because increasing top-k sampling broadens token selection, which can actually introduce more unpredictable and insecure code patterns, not reduce vulnerabilities. Option C is wrong because fine-tuning on secure code examples improves the model's baseline but does not guarantee that every generated snippet will be free of vulnerabilities, especially for novel or context-specific injection attacks. Option D is wrong because a system prompt is a weak, non-enforceable instruction; the model lacks true understanding of security and can easily generate vulnerable code despite the prompt, as it does not perform actual validation.

Full explanation →

685

MCQmedium

A company wants to build an internal knowledge base that allows employees to ask questions about company policies in natural language. The knowledge base is stored in a Google Cloud SQL database. Which architecture should they use?

A.Use Gemini API with a prompt that includes all policies

B.Use AutoML Natural Language to classify questions

C.Export Cloud SQL to BigQuery and use BigQuery ML

D.Use Vertex AI Agent Builder with Grounding to connect to Cloud SQL

AnswerD

Agent Builder can ground answers in database content via search or RAG.

Why this answer

Vertex AI Agent Builder with Grounding allows connecting to the database and answering questions based on its content, providing a natural language interface.

Full explanation →

686

MCQeasy

A developer uses a generative AI model with the system instruction shown. The response is correct but very brief. Which parameter adjustment could encourage more detail without losing accuracy?

A.Add 'Provide a detailed response' to the system instruction.

B.Set temperature to 0 to make output deterministic.

C.Set topK to 1 to focus on most likely tokens.

D.Increase temperature to 1.5 to encourage creativity.

AnswerA

System instructions can guide verbosity while maintaining accuracy.

Why this answer

Option A is correct because modifying the system instruction to explicitly request a detailed response directly influences the model's output behavior without altering its underlying probability distribution. This approach preserves accuracy by keeping temperature, topK, and other sampling parameters at their default values, ensuring the model remains faithful to the training data while simply prompting for more elaboration.

Exam trap

Cisco often tests the misconception that increasing randomness (temperature) or restricting token selection (topK) can improve detail, when in fact these parameters trade off accuracy for diversity or determinism, and the correct approach is to use prompt engineering to guide output length and style.

How to eliminate wrong answers

Option B is wrong because setting temperature to 0 makes the model deterministic by always selecting the highest-probability token, which typically results in shorter, repetitive, and less detailed responses—the opposite of what is needed. Option C is wrong because setting topK to 1 restricts token selection to only the single most likely token at each step, which similarly reduces output diversity and detail, often leading to generic or truncated answers. Option D is wrong because increasing temperature to 1.5 increases randomness in token sampling, which can introduce hallucinations, factual errors, or irrelevant content, thereby sacrificing accuracy for creativity.

Full explanation →

687

MCQmedium

An organization wants to use generative AI to automatically generate code snippets from natural language descriptions. The solution must be integrated into their existing CI/CD pipeline on Google Cloud. Which service should they use?

A.Chirp on Vertex AI

B.Codey on Vertex AI

C.Imagen on Vertex AI

D.Gemini 1.5 Flash on Vertex AI

AnswerB

Codey is a family of models for code generation, code chat, and code completion, directly integrated with Vertex AI.

Why this answer

Codey (Codey for code generation) is designed for code generation and is available via Vertex AI. Gemini can also generate code but Codey is specialized. Imagen and Chirp are for images and speech, respectively.

Full explanation →

688

MCQmedium

A healthcare company is building a clinical decision support system using Gemini 1.5 Pro on Vertex AI. They need responses that are highly accurate and comply with medical regulations, including traceability to source documents. They have a large corpus of curated medical guidelines stored in PDFs in Cloud Storage. Their team has experience with both fine-tuning and prompt engineering. Which approach best ensures regulatory compliance and accuracy?

A.Use a combination of grounding to the medical guidelines and prompt engineering with system instructions specifying compliance requirements.

B.Use prompt engineering with system instructions and few-shot examples, but no grounding.

C.Use grounding to the medical guidelines but rely on prompt engineering only for compliance instructions.

D.Fine-tune the model on the medical guidelines corpus to internalize the knowledge.

AnswerA

Grounding ensures traceability to source documents, and prompt engineering enforces regulatory language, together meeting compliance.

Why this answer

Option A is correct because grounding the model to the curated medical guidelines in Cloud Storage ensures responses are directly traceable to source documents, which is critical for medical regulatory compliance. Combining this with system instructions that specify compliance requirements (e.g., HIPAA, FDA guidelines) enforces behavioral constraints without altering the model's weights, maintaining accuracy and auditability.

Exam trap

Cisco often tests the misconception that fine-tuning is the best way to ensure accuracy and compliance for domain-specific tasks, but the trap here is that fine-tuning sacrifices traceability and can introduce staleness, whereas grounding with system instructions preserves source attribution and regulatory compliance.

How to eliminate wrong answers

Option B is wrong because relying solely on prompt engineering without grounding provides no mechanism to enforce traceability to specific source documents, making it impossible to meet medical regulatory requirements for evidence-based responses. Option C is wrong because while grounding provides source traceability, relying on prompt engineering alone for compliance instructions is insufficient; system instructions must be explicitly set in the model configuration to ensure consistent enforcement of regulatory constraints across all interactions. Option D is wrong because fine-tuning the model on the medical guidelines corpus internalizes knowledge into the model weights, which can lead to hallucination or outdated information over time, and critically, it breaks traceability to specific source documents since the model cannot cite exact PDF locations or versions.

Full explanation →

689

Multi-Selecteasy

Which THREE of the following are generative AI modalities supported by Google Cloud services?

Select 3 answers

A.Image generation

B.Code generation

C.Text generation

D.Tabular data generation

E.Speech generation

AnswersA, B, C

Imagen supports image generation.

Why this answer

Option A is correct because Google Cloud's Vertex AI and Imagen APIs provide image generation capabilities, allowing users to create and edit images from text prompts. This is a core generative AI modality supported by Google Cloud services.

Exam trap

Cisco often tests the distinction between core generative AI modalities (text, code, image) and other AI services (like speech or tabular data) that are not considered primary generative AI capabilities in the context of Vertex AI foundation models.

Full explanation →

690

MCQmedium

A company wants to generate high-quality images from text descriptions for their marketing materials. They need the ability to edit specific regions of an image without regenerating the entire image. Which Google Cloud service should they use?

A.Gemini

B.Codey

C.Imagen

D.Veo

AnswerC

Imagen supports text-to-image generation and includes features like inpainting and outpainting for region-specific editing.

Why this answer

Imagen offers inpainting and outpainting capabilities for editing specific regions. Gemini is multimodal but not optimized for image editing. Veo generates video, not still images.

Codey is for code.

Full explanation →

691

MCQhard

A generative AI model for chatbot responses sometimes produces toxic language. The team wants to reduce toxicity without significantly affecting the model's helpfulness. Which approach is best?

A.Increase the temperature parameter

B.Reduce the maximum output tokens

C.Fine-tune with a dataset of non-toxic responses and use RLHF

D.Apply a toxicity classifier as a post-processing filter

AnswerC

Fine-tuning combined with RLHF aligns model behavior effectively.

Why this answer

Fine-tuning with a curated dataset of non-toxic responses directly adjusts the model's weights to reduce the likelihood of generating toxic language, while RLHF (Reinforcement Learning from Human Feedback) further aligns the model with human preferences for helpfulness and safety. This combined approach addresses the root cause of toxicity in the model's behavior without the blunt trade-offs of other methods, preserving the model's utility.

Exam trap

Google Cloud often tests the misconception that post-processing filters (like toxicity classifiers) are sufficient for safety, when in fact they fail to address the model's learned behavior and can degrade helpfulness due to false positives, making fine-tuning with RLHF the superior alignment technique.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter increases randomness in token selection, which can actually amplify the probability of generating toxic or nonsensical outputs, not reduce them. Option B is wrong because reducing the maximum output tokens limits response length but does not influence the content or safety of the generated tokens, leaving toxicity unchanged. Option D is wrong because applying a toxicity classifier as a post-processing filter only masks toxic outputs after generation, wasting computational resources and potentially blocking helpful responses that contain false-positive flagged terms, without fixing the underlying model behavior.

Full explanation →

692

MCQeasy

A startup with limited budget wants to quickly test a generative AI use case for personalized email marketing. Which approach minimizes time-to-market and cost?

A.Hire a team of AI researchers to build a solution.

B.Develop a custom model from scratch.

C.Fine-tune a large open-source model on internal data.

D.Use a managed API like the PaLM API with prompt engineering.

AnswerD

Quick to implement, pay-per-use, no infrastructure management.

Why this answer

Option D is correct because using a managed API like the PaLM API with prompt engineering eliminates the need for infrastructure setup, model training, and data preparation. This approach leverages a pre-trained model via a simple REST API call, allowing the startup to iterate on prompts and achieve personalized email content in hours rather than weeks, minimizing both time-to-market and cost.

Exam trap

Google Cloud often tests the misconception that fine-tuning (Option C) is always the fastest and cheapest path for customization, but the trap here is that fine-tuning still requires significant compute and data preparation, whereas prompt engineering on a managed API is truly zero-infrastructure and pay-per-use, making it the optimal choice for a quick, low-cost test.

How to eliminate wrong answers

Option A is wrong because hiring a team of AI researchers is expensive and time-consuming, requiring salaries, compute resources, and months of development, which contradicts the limited budget and quick testing goal. Option B is wrong because developing a custom model from scratch demands vast amounts of labeled data, significant GPU/TPU compute, and deep expertise, making it cost-prohibitive and slow for a rapid proof-of-concept. Option C is wrong because fine-tuning a large open-source model still requires substantial compute for training (e.g., GPU hours for LoRA or full fine-tuning), data curation, and deployment overhead, which exceeds the minimal cost and speed constraints of a quick test.

Full explanation →

693

MCQmedium

A healthcare organization wants to use generative AI to draft email responses to patient inquiries. They need to ensure that the model never generates medical advice and always includes a disclaimer. Where should they enforce these constraints?

A.Select a model from Model Garden that is pre-trained on medical data

B.Fine-tune the model with a dataset that only includes disclaimers

C.Use Grounding with Google Search to restrict knowledge to approved sources

D.Configure safety settings and system instructions in the Vertex AI API call

AnswerD

Safety settings and system instructions provide runtime constraints to prevent unwanted outputs and enforce disclaimers.

Why this answer

Safety settings and system instructions are applied at the API call level to constrain model behavior. Fine-tuning would embed the rules but is heavy. Grounding doesn't enforce constraints.

Model Garden is for model selection.

Full explanation →

694

MCQmedium

A healthcare company wants to use generative AI to summarize patient records but must comply with HIPAA. Which deployment option should they choose?

A.Use Vertex AI on Google Cloud with data residency

B.Use Google Workspace AI

C.Use an on-premises deployment of open-source model

D.Use a third-party API

AnswerC

Full control over data and compliance.

Why this answer

Option C is correct because an on-premises deployment of an open-source model ensures that all patient data remains within the organization's controlled infrastructure, never leaving the local network. This eliminates any risk of data transmission to external cloud services, which is critical for HIPAA compliance where protected health information (PHI) must be safeguarded against unauthorized access or breaches. On-premises solutions allow the organization to implement its own security controls, encryption, and audit trails without relying on a third-party's compliance posture.

Exam trap

The trap here is that candidates assume cloud providers like Google Cloud or AWS are automatically HIPAA-compliant with data residency, but they overlook the shared responsibility model and the need for a BAA, which still exposes data to the provider's infrastructure and potential third-party risks, making on-premises the only option that guarantees full data control.

How to eliminate wrong answers

Option A is wrong because Vertex AI on Google Cloud, even with data residency, still involves data processing on Google's infrastructure, which requires a Business Associate Agreement (BAA) and may not satisfy all HIPAA requirements if the organization cannot fully control data access or auditing. Option B is wrong because Google Workspace AI is a SaaS offering that processes data on Google's servers, and while it can be HIPAA-compliant with a BAA, it introduces shared responsibility and potential data exposure risks that an on-premises solution avoids. Option D is wrong because using a third-party API means sending PHI to an external service, which requires the third-party to be HIPAA-compliant and sign a BAA, but it still exposes data to network transmission and external processing, increasing the attack surface and compliance burden.

Full explanation →

695

MCQhard

A global bank wants to deploy a generative AI assistant for employees across multiple European countries, each with strict data residency laws. Which deployment strategy is most compliant?

A.Deploy separate model instances in each country's cloud region.

B.Use a federated learning approach where data stays on-premises.

C.Deploy a single model in a US region and use data masking.

D.Use a third-party API that processes data outside Europe.

AnswerA

Ensures data never leaves the country, meeting local compliance requirements.

Why this answer

Option A is correct because deploying separate model instances in each country's cloud region ensures that data never crosses national borders, directly complying with strict data residency laws like the GDPR's data localization requirements. This strategy uses regional cloud infrastructure (e.g., AWS eu-central-1, Azure westeurope) to keep both training and inference data within the specific jurisdiction, avoiding any cross-border data transfer.

Exam trap

Google Cloud often tests the misconception that data masking or anonymization alone satisfies data residency laws, but the trap here is that data residency requires the data to physically remain within the jurisdiction, not just be obfuscated.

How to eliminate wrong answers

Option B is wrong because federated learning only keeps training data on-premises, but the model parameters or gradients must still be exchanged with a central server, which can violate data residency if that server is outside the country. Option C is wrong because deploying a single model in a US region and using data masking does not prevent the underlying data from being processed or stored in the US, which violates EU data residency laws like GDPR. Option D is wrong because using a third-party API that processes data outside Europe directly violates data residency requirements, as the data physically leaves the European Economic Area (EEA) without adequate safeguards.

Full explanation →

696

MCQhard

A financial services firm needs to analyze thousands of legal contracts to extract key clauses (e.g., termination, indemnification) with high accuracy. They plan to use GenAI but are concerned about data privacy because contracts contain sensitive information. They want a solution where data is not used for model training and remains in their own Google Cloud project. Which approach best meets these requirements?

A.Access a foundation model through Model Garden and use it via a Colab notebook

B.Deploy a fine-tuned model on Vertex AI and query via Vertex AI API with data residency in the same region

C.Use Gemini for Google Workspace to open each contract in Docs and ask Gemini to summarize clauses

D.Use Vertex AI Studio's prompt library with few-shot examples, ensuring prompts are saved in the project

AnswerB

Vertex AI API allows you to deploy models with data residency controls and contracts prohibit use of data for training. Fine-tuning can be done on compliant infrastructure.

Why this answer

Vertex AI API with data residency controls and no training data sharing provides the privacy and security needed. Vertex AI Studio and Model Garden are not inherently private, and Gemini for Workspace may use data for improvement unless disabled.

Full explanation →

697

Multi-Selecthard

Which THREE factors should you consider when selecting a foundation model from Model Garden? (Choose three.)

Select 3 answers

A.Number of model versions

B.The color of the model card

C.Model size

D.Model accuracy on benchmarks

E.Model license

AnswersC, D, E

Size impacts cost, latency, and deployment requirements.

Why this answer

Model size (C) is a critical factor because it directly impacts computational requirements, latency, and cost. Larger models generally offer higher capability but require more memory and processing power, which influences deployment decisions on Vertex AI.

Exam trap

Cisco often tests candidates' ability to distinguish between superficial UI elements (like card color) and substantive technical criteria (like model size, accuracy, and license) that directly affect deployment and compliance.

Full explanation →

698

MCQhard

A cloud architect is designing a generative AI pipeline that must comply with the EU AI Act for high-risk AI systems. Which of the following is a mandatory requirement under the Act?

A.The system must be explainable using chain-of-thought reasoning

B.The system must achieve a minimum accuracy of 90% on validation data

C.The system must be trained on data that is representative of the target population

D.The system must undergo a conformity assessment before deployment

AnswerD

High-risk AI systems must undergo conformity assessment to ensure compliance with the Act.

Why this answer

Under the EU AI Act, high-risk AI systems must undergo a conformity assessment before deployment to ensure compliance with requirements such as risk management, data governance, and transparency. This is a mandatory procedural step, not a performance metric or specific reasoning technique. The assessment may involve self-evaluation or third-party review depending on the system's risk category.

Exam trap

Cisco often tests the distinction between aspirational best practices (like explainability or accuracy thresholds) and actual legal mandates, leading candidates to pick a plausible-sounding but non-mandatory option like A or B instead of the procedural requirement in D.

How to eliminate wrong answers

Option A is wrong because the EU AI Act does not mandate a specific explainability technique like chain-of-thought reasoning; it requires general transparency and interpretability, but the method is left to the provider. Option B is wrong because the Act does not prescribe a fixed accuracy threshold like 90%; it requires appropriate levels of accuracy based on the system's intended purpose and risk, validated against representative data. Option C is wrong because while data representativeness is a key principle under the Act's data governance requirements, it is not a standalone mandatory requirement; the Act mandates that training, validation, and testing datasets be relevant, representative, and free from biases, but this is part of broader data governance obligations, not a single checkbox.

Full explanation →

699

MCQmedium

A team is developing a mobile app that must run AI inference on-device for low latency and offline capability. Which Gemini model variant is designed specifically for on-device deployment?

A.Gemini Pro

B.Gemini Nano

C.Gemini Ultra

D.Gemini Flash

AnswerB

Gemini Nano is specifically designed for on-device inference, offering efficiency with low latency and offline capability.

Why this answer

Gemini Nano is the smallest and most efficient model in the Gemini family, specifically optimized for on-device deployment. It is designed to run directly on mobile devices (e.g., Android phones) using hardware acceleration like Google's Pixel Neural Core or Qualcomm's AI Engine, enabling low-latency inference and offline capability without requiring a cloud connection.

Exam trap

The trap here is that candidates confuse 'lightweight cloud model' (Gemini Flash) with 'on-device model' (Gemini Nano), assuming any 'fast' or 'small' variant is suitable for mobile deployment, but Flash still requires cloud connectivity and is not optimized for local hardware constraints.

How to eliminate wrong answers

Option A is wrong because Gemini Pro is a mid-size model intended for cloud-based, high-performance tasks such as complex reasoning and multimodal analysis, not for on-device deployment due to its larger memory and compute requirements. Option C is wrong because Gemini Ultra is the largest and most capable model, designed for enterprise-scale cloud workloads and advanced research, making it unsuitable for resource-constrained mobile devices. Option D is wrong because Gemini Flash is a lightweight cloud model optimized for speed and cost in cloud inference, but it is not purpose-built for on-device execution and still requires a network connection.

Full explanation →

700

MCQeasy

A business wants to build a generative AI application but has limited data science resources. What is the recommended path?

A.Use Vertex AI's AutoML and pre-built APIs to accelerate development

B.Hire a team of ML engineers to develop an in-house solution

C.Purchase a third-party generative AI SaaS product off-the-shelf

D.Build a custom model from scratch using TensorFlow

AnswerA

AutoML abstracts away model building complexity, and APIs provide ready-to-use functionality.

Why this answer

Vertex AI's AutoML and pre-built APIs are the recommended path because they allow the business to leverage Google's managed infrastructure and pre-trained models, significantly reducing the need for in-house data science expertise. AutoML automates model training, tuning, and deployment, while pre-built APIs (e.g., for vision, language) provide immediate access to generative capabilities without custom development. This approach accelerates time-to-market and lowers the barrier to entry for organizations with limited ML resources.

Exam trap

Cisco often tests the misconception that 'limited data science resources' means the business should outsource all AI work (Option C) or build from scratch (Option D), when the correct answer leverages managed services that reduce the need for in-house expertise while still allowing customization.

How to eliminate wrong answers

Option B is wrong because hiring a full team of ML engineers is resource-intensive and contradicts the premise of limited data science resources; it also introduces significant overhead in recruitment, management, and infrastructure. Option C is wrong because purchasing a third-party SaaS product off-the-shelf may not offer the customization, data privacy controls, or integration flexibility needed for a generative AI application, and it can lock the business into a vendor's roadmap. Option D is wrong because building a custom model from scratch using TensorFlow requires deep ML expertise, extensive training data, and computational resources, which is impractical for a team with limited data science capabilities and would delay deployment.

Full explanation →

701

MCQmedium

A developer is using Vertex AI Studio to experiment with prompts. They want to ensure that the model's responses are grounded in factual information from a trusted knowledge base. Which feature should they enable?

A.Safety filters

B.Temperature setting reduction

C.Chain-of-thought prompting

D.Grounding with a Vertex AI Search data store

AnswerD

Grounding uses a search data store to retrieve and cite relevant information.

Why this answer

Vertex AI's grounding feature allows the model to cite sources from a provided knowledge base, improving factual accuracy and verifiability.

Full explanation →

702

Multi-Selecthard

Which THREE are best practices for designing prompts for a generative AI model?

Select 3 answers

A.Provide few-shot examples for complex tasks

B.Include specific and clear instructions

C.Break the task into smaller steps

D.Use negative prompts to avoid undesired outputs

E.Always set temperature to 1.0 for creativity

AnswersA, B, C

Correct: Examples guide the model toward desired outputs.

Why this answer

Option A is correct because providing few-shot examples (e.g., 2-5 input-output pairs) helps the model infer the desired pattern, reducing ambiguity for complex tasks like classification or structured extraction. This technique leverages in-context learning, where the model uses the examples as a template without fine-tuning.

Exam trap

Cisco often tests the misconception that negative prompts are a reliable control mechanism, when in fact they can lead to the 'forbidden token' problem where the model still generates the undesired content due to tokenization and probability smoothing.

Full explanation →

703

Multi-Selectmedium

Which TWO techniques can help improve the factual accuracy of a language model's outputs? (Choose two.)

Select 2 answers

A.Decrease the max output tokens.

B.Increase the temperature parameter.

C.Fine-tune on a domain-specific curated dataset.

D.Implement retrieval-augmented generation (RAG).

E.Use top-k random sampling.

AnswersC, D

Fine-tuning adapts the model to domain facts.

Why this answer

Fine-tuning on a domain-specific curated dataset (C) directly adjusts the model's weights using high-quality, verified examples, teaching it to produce factually correct outputs for that domain. This reduces hallucinations by grounding the model in accurate, relevant data rather than relying solely on its pre-training distribution.

Exam trap

Google Cloud often tests the misconception that adjusting decoding parameters (like temperature, top-k, or max tokens) can improve factual accuracy, when in reality these only control output style, length, or randomness, not the correctness of the underlying information.

Full explanation →

704

MCQmedium

A legal firm wants to use GenAI for contract analysis. They need to extract key clauses and flag risky terms. Which combination of services on Vertex AI is BEST suited?

A.Use Vertex AI RAG Engine with a vector store of example clauses

B.Fine-tune a foundation model on a dataset of labeled contracts and deploy it via Vertex AI Endpoint

C.Use Vertex AI Model Garden to select a model and then deploy it to an endpoint for real-time analysis

D.Use Vertex AI Agent Builder with a tool for document analysis

AnswerB

Fine-tuning on labeled contracts improves extraction and risk flagging accuracy.

Why this answer

Vertex AI's foundation models can parse contracts, and a fine-tuned model on legal documents would improve accuracy for clause extraction and risk flagging.

Full explanation →

705

MCQhard

A company wants to use AI to make hiring decisions. They are concerned about bias against certain demographic groups. According to Google's AI Principles, which approach is MOST aligned?

A.Pre-train the model on a dataset that is balanced across all demographics

B.Blind the model to demographic features to ensure fairness

C.Only use the model for initial resume screening, with final decisions by humans

D.Evaluate the model using diverse test sets and adjust if bias is found

AnswerD

Evaluation on diverse data helps identify bias, and adjustments can then be made.

Why this answer

The principle 'avoid creating or reinforcing unfair bias' requires proactive identification and mitigation. Evaluating the model on diverse test sets is a standard way to detect and address bias before deployment.

Full explanation →

706

MCQeasy

A developer wants to add a GenAI feature to their existing web application. They need to integrate with the app's backend using REST APIs. Which integration pattern is MOST appropriate?

A.Use Apps Script to call the Gemini API

B.Use Vertex AI Agent Builder to create an agent and embed it via iframe

C.Integrate via Vertex AI API or Gemini API

D.Build a Google Workspace add-on

AnswerC

REST APIs are the standard way to call GenAI models from any backend.

Why this answer

API-first integration using Vertex AI API or Gemini API is the standard way to add GenAI to existing applications. Workspace add-ons are for Google Workspace apps. Apps Script is for automating Workspace.

Agent Builder is for building conversational agents, not general API integration.

Full explanation →

707

MCQeasy

A data scientist is using a large language model to generate product descriptions. The descriptions are often too verbose. Which parameter adjustment is most appropriate?

A.Decrease the top-k value.

B.Increase the max output tokens.

C.Decrease the temperature.

D.Increase the frequency penalty.

AnswerD

Frequency penalty reduces repetitive phrases, encouraging conciseness.

Why this answer

Increasing the frequency penalty reduces the likelihood of the model repeating the same phrases or ideas, which directly addresses verbosity by discouraging repetitive or overly detailed descriptions. This parameter penalizes tokens that have already appeared in the generated text, promoting more concise and varied output. Other adjustments like temperature or top-k affect randomness and diversity but do not specifically target repetition or length.

Exam trap

Cisco often tests the distinction between parameters that control randomness (temperature, top-k) versus those that control repetition (frequency penalty, presence penalty), and the trap here is that candidates confuse 'less verbose' with 'less random' and incorrectly choose temperature or top-k adjustments.

How to eliminate wrong answers

Option A is wrong because decreasing the top-k value restricts the model to a smaller set of high-probability tokens, which can actually make output more predictable and potentially more repetitive, not less verbose. Option B is wrong because increasing the max output tokens allows the model to generate longer text, which would exacerbate verbosity rather than reduce it. Option C is wrong because decreasing the temperature makes the model more deterministic and conservative, often leading to safer but not necessarily shorter or less repetitive text; it does not directly penalize repetition or length.

Full explanation →

708

MCQhard

A model generates responses that frequently repeat phrases or words. Which parameter adjustment is most likely to fix this?

A.Increase top_k

B.Increase temperature

C.Increase repetition penalty

D.Increase max output tokens

AnswerC

Correct: Repetition penalty specifically reduces the likelihood of repeating tokens.

Why this answer

Increasing the repetition penalty directly discourages the model from selecting tokens that have already appeared in the generated sequence, thereby reducing repetitive phrases or words. This parameter works by subtracting a fixed penalty from the logits of previously generated tokens before applying the softmax function, making them less likely to be chosen again.

Exam trap

The trap here is that candidates often confuse repetition penalty with diversity-promoting parameters like temperature or top_k, mistakenly believing that increasing randomness or narrowing token selection will fix repetition, when in fact those adjustments can worsen the problem.

How to eliminate wrong answers

Option A is wrong because increasing top_k limits the sampling pool to the k most likely next tokens, which can actually increase repetition by narrowing the diversity of choices. Option B is wrong because increasing temperature flattens the probability distribution, making all tokens more equally likely, which can lead to more random and potentially more repetitive outputs, not less. Option D is wrong because increasing max output tokens only extends the length of the generated response; it does not address the underlying cause of repetition and may even exacerbate it by allowing more opportunities for the model to loop on repeated phrases.

Full explanation →

709

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Fine-tune a base LLM on the policy documents monthly

D.Train a custom model from scratch on the policy documents each month

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to retrieve relevant policy document chunks from a vector store at inference time, eliminating the need to retrain the model when documents are updated monthly. This keeps the system cost-effective and scalable, as only the vector index needs to be refreshed, not the underlying LLM.

Exam trap

Cisco often tests the misconception that fine-tuning or training from scratch is necessary for domain-specific knowledge, when in fact RAG provides a dynamic, cost-effective alternative that avoids retraining for frequently updated data.

How to eliminate wrong answers

Option A is wrong because pasting all documents into each prompt would exceed the context window limits of even large models (e.g., 128K tokens), leading to high latency, cost, and potential loss of relevant information due to truncation. Option C is wrong because fine-tuning a base LLM monthly on policy documents is expensive, time-consuming, and risks catastrophic forgetting of prior knowledge, making it impractical for frequent updates. Option D is wrong because training a custom model from scratch each month is prohibitively expensive and resource-intensive, requiring massive compute, data, and expertise, which is unnecessary when RAG can achieve the same goal with far less overhead.

Full explanation →

710

MCQmedium

A startup wants to generate realistic product videos from text descriptions for social media ads. Which Google Cloud service should they use?

A.Imagen

B.Gemini Pro Vision

C.Veo

D.Codey

AnswerC

Veo is Google's generative video model, capable of producing high-quality videos from text descriptions.

Why this answer

Veo is Google Cloud's advanced video generation model that can create high-quality, realistic videos from text prompts, making it the ideal choice for generating product videos for social media ads. Unlike other services, Veo is specifically designed for video synthesis, offering capabilities like style control and cinematic effects directly from text descriptions.

Exam trap

The trap here is that candidates often confuse multimodal understanding (Gemini Pro Vision) with generative creation (Veo), or assume that image generation (Imagen) can be trivially extended to video without understanding the distinct temporal modeling required.

How to eliminate wrong answers

Option A is wrong because Imagen is a text-to-image generation model, not a video generation service; it produces static images, not dynamic video content. Option B is wrong because Gemini Pro Vision is a multimodal model that can analyze and understand images and videos, but it does not generate new video content from text descriptions. Option D is wrong because Codey is a code generation model designed for assisting with programming tasks, not for generating visual media like videos.

Full explanation →

711

MCQmedium

Refer to the exhibit. A sudden surge of traffic reaches 15,000 requests per second, but the endpoint can only handle 1,000 req/s per replica. What will happen to new requests?

A.They will be processed, and replicas will exceed maxReplicaCount.

B.They will be redirected to a different model.

C.They will receive HTTP 429 (Too Many Requests) errors.

D.They will be queued until capacity becomes available.

AnswerC

Once max replicas are reached, new requests get a 429 status code.

Why this answer

Option C is correct because when a surge of 15,000 requests per second hits an endpoint configured with a maxReplicaCount (e.g., 10 replicas at 1,000 req/s each = 10,000 req/s capacity), any excess requests beyond that capacity are rejected with an HTTP 429 (Too Many Requests) status code. This is standard behavior in autoscaling systems: once the replica count reaches its maximum limit, the service cannot scale further, and new requests are throttled to prevent overload.

Exam trap

The trap here is that candidates assume autoscaling can handle any traffic surge indefinitely, ignoring the hard limit of maxReplicaCount, and thus incorrectly choose Option A or D, failing to recognize that HTTP 429 is the standard throttling mechanism when capacity is exhausted.

How to eliminate wrong answers

Option A is wrong because the maxReplicaCount is a hard upper limit; replicas cannot exceed this configured value, so new requests are not processed beyond that capacity. Option B is wrong because traffic redirection to a different model is not a standard behavior for capacity overflow; it would require explicit routing rules or a load balancer configured for failover, which is not implied in the scenario. Option D is wrong because queuing is not the default behavior for HTTP-based endpoints in this context; while some systems support request queuing (e.g., with message brokers), the exhibit describes a direct endpoint handling, and HTTP 429 is the standard response for rate limiting per RFC 6585.

Full explanation →

712

MCQhard

A team is building a multi-modal agent that needs to accept a user's image of a handwritten note, convert it to text, and then run a sentiment analysis. They want to minimize latency and cost. Which approach is best?

A.Fine-tune Gemini Pro on handwritten notes and sentiment labels

B.Use Document AI for OCR and then call Codey for sentiment analysis

C.Use Gemini 1.5 Flash with a prompt that includes the image and asks for sentiment analysis in one call

D.Use Cloud Vision API for OCR, then feed the text to a sentiment analysis model via Vertex AI

AnswerC

Gemini Flash is optimized for low latency and cost, and its multimodal capability handles both tasks in one inference.

Why this answer

Option C is correct because Gemini 1.5 Flash is a multimodal model that can directly process images and perform sentiment analysis in a single API call, eliminating the need for separate OCR and NLP services. This minimizes both latency (by reducing the number of sequential calls) and cost (by using a single, efficient model instead of multiple specialized services).

Exam trap

Cisco often tests the candidate's ability to recognize that multimodal models like Gemini 1.5 Flash can replace multi-step pipelines (OCR + NLP) in a single call, and the trap here is that candidates default to traditional separate-service architectures (like Cloud Vision + Vertex AI) without considering the latency and cost benefits of a unified multimodal approach.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini Pro on handwritten notes and sentiment labels is overkill for this task, incurring high training costs and latency, and Gemini Pro is a larger, more expensive model than needed for simple OCR and sentiment analysis. Option B is wrong because Document AI is designed for structured document extraction, not general handwritten note OCR, and Codey is a code generation model, not a sentiment analysis model, making this combination technically mismatched. Option D is wrong because using Cloud Vision API for OCR followed by a separate sentiment analysis model via Vertex AI introduces additional latency and cost from multiple API calls, whereas Gemini 1.5 Flash can achieve the same result in one step.

Full explanation →

713

Multi-Selectmedium

A company is developing an AI-powered interview assistant that screens job applicants. The responsible AI team wants to ensure the model does not discriminate based on gender, race, or age. Which TWO practices should they implement?

Select 2 answers

A.Deploy the model without human oversight to ensure consistency.

B.Regularly evaluate the model's outputs for bias using intersectional test sets.

C.Use SynthID watermarking on all model outputs.

D.Remove all demographic attributes from the training data to ensure fairness.

E.Use a diverse and representative training dataset that includes candidates from various demographics.

AnswersB, E

Ongoing evaluation helps detect and address bias that may emerge.

Full explanation →

714

MCQmedium

An enterprise needs to generate natural-sounding speech from text for a voice assistant. They require low latency and support for custom voice models. Which service should they use?

A.Text-to-Speech API

B.Cloud Translation API

C.Vertex AI Text Generation

D.Speech-to-Text API

AnswerA

Text-to-Speech API converts text into natural-sounding speech, supports low latency and custom voice models.

Why this answer

The Text-to-Speech API (A) is correct because it is specifically designed to convert text into natural-sounding speech with low latency, and it supports custom voice models through features like Custom Voice and WaveNet voices. This directly meets the enterprise's requirements for a voice assistant that needs real-time, high-quality speech synthesis.

Exam trap

The trap here is confusing the Text-to-Speech API with the Speech-to-Text API, as candidates often mix up the direction of conversion (text-to-audio vs. audio-to-text) under time pressure.

How to eliminate wrong answers

Option B (Cloud Translation API) is wrong because it translates text between languages, not text to speech, and does not generate audio output. Option C (Vertex AI Text Generation) is wrong because it generates text content (e.g., chat responses, summaries) rather than synthesizing speech from text. Option D (Speech-to-Text API) is wrong because it performs the inverse operation—converting audio speech into text—and does not produce speech output.

Full explanation →

715

MCQeasy

Refer to the exhibit. A developer runs this command but forgets to specify the model name. What will happen?

A.The command will fail with an error

B.The command will prompt for a name

C.The model will be uploaded with a default name

D.The command will succeed but the model will be unlisted

AnswerA

Missing required --display-name causes an error.

Why this answer

In the context of the `gcloud ai models upload` command (or similar model deployment commands in Vertex AI), the model name is a required positional argument. If omitted, the CLI will fail with an error because it cannot proceed without a unique identifier to register the model in the model registry. The command does not default to any name or prompt interactively; it strictly validates required parameters before execution.

Exam trap

Google Cloud often tests the misconception that cloud CLI tools will either prompt for missing required parameters or apply a sensible default, when in reality they fail fast with a clear error to enforce explicit configuration.

How to eliminate wrong answers

Option B is wrong because the command does not prompt for a name; it expects the name as a positional argument in the initial command string, and if missing, it immediately returns a usage error. Option C is wrong because there is no default name mechanism; model names must be explicitly provided to avoid collisions and ensure traceability in the registry. Option D is wrong because the command will not succeed at all; it fails before any upload occurs, so no model is created in any state (listed or unlisted).

Full explanation →

716

MCQeasy

You want to use a Google foundation model to generate text summaries of news articles. Which Vertex AI service should you use?

A.Vertex AI Prediction

B.Vertex AI Model Registry

C.Vertex AI Generative AI Studio

D.Vertex AI Feature Store

AnswerC

Generative AI Studio allows testing and using foundation models like text-bison@002.

Why this answer

Vertex AI Generative AI Studio (now part of Vertex AI Agent Builder) provides a no-code/low-code environment to access, test, and tune Google's foundation models, including PaLM 2 and Gemini, specifically for generative tasks like text summarization. It offers built-in prompt templates and safety settings tailored for summarization use cases, making it the correct service for this task.

Exam trap

The trap here is that candidates confuse Vertex AI Prediction (a general model serving service) with the specialized generative AI studio, assuming any model inference task uses Prediction, but Google explicitly separates foundation model access into Generative AI Studio for prompt-based generative workloads.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction is designed for deploying and serving custom-trained models or AutoML models for online predictions, not for directly accessing Google's foundation models for generative tasks. Option B is wrong because Vertex AI Model Registry is a metadata store for managing and versioning your own models, not a service for interacting with foundation models or generating summaries. Option D is wrong because Vertex AI Feature Store is a managed repository for storing, serving, and sharing feature data for ML training and online inference, unrelated to text generation or foundation model access.

Full explanation →

717

MCQeasy

Which Google Cloud feature enables you to experiment with different prompts and model parameters interactively, and also supports model tuning without writing code?

A.Vertex AI Agent Builder

B.Google Cloud Console

C.Model Garden

D.Vertex AI Studio

AnswerD

Vertex AI Studio offers a visual interface for prompt engineering, model tuning, and evaluation without requiring coding.

Why this answer

Vertex AI Studio is the correct answer because it provides an interactive, code-free environment for experimenting with prompts and model parameters, and also supports model tuning through a graphical interface. This aligns directly with the question's requirement for interactive experimentation and no-code tuning, which are core features of Vertex AI Studio within the Google Cloud AI Platform.

Exam trap

The trap here is that candidates may confuse Model Garden's model discovery and selection capabilities with the interactive experimentation and tuning features that are exclusive to Vertex AI Studio.

How to eliminate wrong answers

Option A is wrong because Vertex AI Agent Builder is designed for creating conversational agents and search experiences, not for interactive prompt experimentation or model tuning. Option B is wrong because Google Cloud Console is a general management interface for all GCP services, lacking the specialized, interactive prompt engineering and tuning capabilities of Vertex AI Studio. Option C is wrong because Model Garden is a repository for discovering and accessing pre-trained models, but it does not provide an interactive environment for prompt experimentation or direct model tuning without code.

Full explanation →

718

MCQhard

A company is evaluating SLA guarantees for a generative AI model deployed on Vertex AI. They require at least 99.9% uptime for production inference. Which SLA tier should they select?

A.Serverless endpoint with min replicas=0

B.Batch prediction job

C.Regional endpoint with at least one node

D.Global endpoint with automatic scaling

AnswerC

Regional endpoints with at least one node qualify for 99.9% SLA for online prediction.

Why this answer

Regional endpoints with at least one node guarantee a minimum of 99.9% uptime for production inference because they maintain a dedicated, always-on compute resource. This SLA applies to regional endpoints with a minimum of one replica, ensuring the model is continuously available for serving requests without cold-start delays.

Exam trap

The trap here is that candidates confuse automatic scaling or serverless endpoints with high availability, not realizing that only regional endpoints with a minimum of one replica meet the 99.9% SLA requirement for production inference.

How to eliminate wrong answers

Option A is wrong because a serverless endpoint with min replicas=0 can scale down to zero, introducing cold-start latency and not guaranteeing 99.9% uptime as the endpoint may be unavailable during scale-up. Option B is wrong because batch prediction jobs are not designed for real-time inference and do not offer an uptime SLA; they are asynchronous and may fail or queue without availability guarantees. Option D is wrong because global endpoints with automatic scaling do not have a specific SLA tier for 99.9% uptime; they are optimized for latency and load distribution but lack the dedicated replica requirement needed for the highest SLA commitment.

Full explanation →

719

MCQmedium

A company is evaluating ROI for a GenAI-based code review assistant. Which metric set BEST captures both productivity and quality improvements?

A.Number of reviews completed per day and lines of code written

B.Model inference latency and cost per token

C.Developer satisfaction score and reduction in code churn (percentage of code rewritten)

D.Time saved per code review and bug detection rate (percentage of bugs caught before deployment)

AnswerD

Time saved measures productivity; bug detection rate measures quality improvement. Together they provide a balanced view of the assistant's value.

Why this answer

Time saved per review (productivity) and bug detection rate (quality) directly measure the tool's impact. Code churn and developer satisfaction are secondary. Defect escape rate is important but harder to measure directly for code review.

Full explanation →

720

MCQmedium

An enterprise is concerned about the cost of using a large LLM for a high-volume customer support chatbot. They want to reduce token consumption while maintaining response quality. Which strategy would be MOST effective?

A.Use a multimodal model to handle both text and images

B.Always use the largest available model to ensure best quality

C.Increase the max output tokens to capture more detail

D.Implement response caching for common questions and batch similar requests

AnswerD

Caching eliminates redundant model calls for frequent queries; batching reduces per-call overhead.

Why this answer

Caching frequent queries reduces costs because the response is served from cache without model inference. Batching requests also saves on per-request overhead. Choosing a smaller model may reduce quality.

Full explanation →

721

MCQhard

A financial institution is deploying an AI model to approve loans. To comply with the EU AI Act, which requirement is MANDATORY for this high-risk AI system?

A.The system must include a human review and override mechanism

B.The model must be trained exclusively on EU citizen data

C.The model must be deployed on Google Cloud infrastructure within the EU

D.The model must provide explanations for all loan rejections

AnswerA

Human oversight is a key requirement for high-risk AI systems under the EU AI Act.

Why this answer

The EU AI Act requires human oversight for high-risk AI systems, including the ability to override the system's decisions. The other options are not specifically mandated by the Act.

Full explanation →

722

Multi-Selectmedium

An enterprise wants to use Gemini for a customer-facing application. They require the following: data isolation in a VPC, audit logging, and SLA guarantees. Which THREE features of Vertex AI satisfy these requirements?

Select 3 answers

A.Vertex AI SLA (Service Level Agreement)

B.Gemini Nano on-device

C.VPC Service Controls

D.Cloud Audit Logs integration

E.Google AI Studio free tier

AnswersA, C, D

SLA guarantees uptime and performance.

Why this answer

Option A is correct because Vertex AI offers a defined Service Level Agreement (SLA) that guarantees uptime and performance metrics for enterprise customers, which is a core requirement for customer-facing applications. The SLA provides contractual assurances, typically covering availability and response times, ensuring the enterprise can meet its own service commitments.

Exam trap

Cisco often tests the distinction between development tools (like AI Studio free tier) and production-ready enterprise features, and candidates mistakenly assume that any Google AI offering includes SLA and VPC controls by default.

Full explanation →

723

MCQmedium

A data scientist notices that a Gemini model generates inconsistent responses to similar prompts. What is the likely cause?

A.Model is not fine-tuned enough

B.The prompt is too short

C.The temperature setting is too low

D.The top_p or temperature parameters are set too high causing randomness

AnswerD

High temperature or top_p increases randomness and variability.

Why this answer

Option D is correct because high temperature (e.g., >1.0) or high top_p (e.g., >0.9) increases the randomness of token sampling, causing the model to select less probable tokens. This directly leads to inconsistent responses for similar prompts, as the model's output distribution becomes more uniform and less deterministic.

Exam trap

Google Cloud often tests the misconception that fine-tuning or prompt length is the primary cause of output inconsistency, when in fact the sampling parameters (temperature and top_p) directly control randomness and are the most common culprit.

How to eliminate wrong answers

Option A is wrong because fine-tuning adjusts the model's weights for a specific task, but it does not control the randomness of token generation; even a fully fine-tuned model will produce inconsistent outputs if sampling parameters are set too high. Option B is wrong because prompt length affects context and specificity, not the inherent randomness of the generation process; a short prompt can still yield consistent responses if temperature and top_p are low. Option C is wrong because a low temperature setting (e.g., 0.1) actually reduces randomness, making outputs more deterministic and consistent, not inconsistent.

Full explanation →

724

MCQmedium

A financial services firm needs to fine-tune a large language model on proprietary financial data. They require data to never leave their VPC and need full audit logging. Which Gemini access method should they use?

A.Vertex AI

B.Google AI Studio

C.Gemini API directly via Cloud Endpoints

D.Model Garden in Colab

AnswerA

Vertex AI offers VPC Service Controls, audit logging, and data isolation.

Why this answer

Vertex AI is the correct access method because it is the only option that allows fine-tuning of Gemini models within a customer's VPC (Virtual Private Cloud) with full audit logging via Cloud Audit Logs. This ensures proprietary financial data never leaves the secure network boundary, meeting strict compliance and data residency requirements.

Exam trap

The trap here is that candidates may confuse Google AI Studio's free-tier accessibility with enterprise-grade security, overlooking that Vertex AI is the only option with VPC controls and audit logging for fine-tuning proprietary data.

How to eliminate wrong answers

Option B is wrong because Google AI Studio is a web-based prototyping tool that does not support VPC-scoped fine-tuning or enterprise-grade audit logging; data is processed on Google's infrastructure outside the customer's VPC. Option C is wrong because the Gemini API directly via Cloud Endpoints does not provide native VPC controls or fine-tuning capabilities; it is a stateless API call without persistent model customization. Option D is wrong because Model Garden in Colab is a discovery and experimentation environment that lacks VPC isolation, audit logging, and fine-tuning support for production workloads.

Full explanation →

725

Multi-Selectmedium

Which THREE of the following are common techniques to reduce harmful biases in generative AI models? (Choose three.)

Select 3 answers

A.Use reinforcement learning from human feedback (RLHF) with a reward model that penalizes biased or unfair outputs.

B.Curate diverse and balanced training datasets that overrepresent underrepresented groups.

C.Decrease the model's temperature parameter to make outputs more deterministic.

D.Apply adversarial training to remove protected attribute information from hidden representations.

E.Conduct a legal review of all generated outputs before release.

AnswersA, B, D

RLHF can shape model behavior to avoid biased generations.

Why this answer

A is correct because RLHF uses a reward model trained on human preferences to score model outputs, and explicitly penalizing biased or unfair outputs during fine-tuning directly reduces harmful biases. This technique aligns the model's behavior with human values by optimizing against a learned reward signal that captures bias-related concerns.

Exam trap

Google Cloud often tests the distinction between hyperparameter tuning (like temperature) and actual bias mitigation techniques, so candidates mistakenly think lowering temperature reduces bias when it only affects output randomness.

Full explanation →

726

MCQmedium

A company wants to adopt GenAI for internal knowledge management. They plan to start with a small pilot team, gather feedback, and then expand. Which change management approach is MOST aligned with this strategy?

A.Pilot with a small group, identify AI champions, collect feedback, and iteratively expand

B.Conduct mandatory training for all employees before the rollout

C.Deploy the solution to the entire organization at once with a communication campaign

D.Focus solely on the technical deployment without change management activities

AnswerA

This approach minimizes risk, builds advocates, and allows for improvements based on real usage.

Why this answer

Iterative rollout with a pilot group, AI champions, and measuring adoption is a proven change management pattern. Starting with the entire organization is risky. Mandatory training may cause resistance.

Focusing only on technical deployment ignores the human side.

Full explanation →

727

MCQmedium

A developer is using Gemini 1.5 Flash for a real-time chat application and notices that responses are sometimes too slow. Which model parameter or configuration change would MOST likely reduce latency without significantly harming quality?

A.Decrease the top-k value to 10

B.Decrease the max output tokens setting

C.Increase the temperature to 1.5

D.Increase the top-p value to 1.0

AnswerB

Fewer generated tokens means faster response completion, directly reducing latency.

Why this answer

Lowering max output tokens reduces the number of tokens generated, directly decreasing response time. Temperature and top-p affect creativity, not latency. Reducing top-k may slightly speed up sampling but has a minimal effect compared to output length.

Full explanation →

728

MCQmedium

A company uses Vertex AI PaLM for code generation. The code often contains security vulnerabilities. Which improvement should be applied?

A.Set top_k to 1

B.Include a security-focused system instruction

C.Use Codey model instead

D.Increase temperature to 0.8

AnswerB

Guides the model to prioritize security practices.

Why this answer

Option B is correct because including a security-focused system instruction directly guides the model to prioritize secure coding practices, such as input validation and proper error handling, reducing vulnerabilities. This leverages prompt engineering to shape model behavior without altering parameters like temperature or top_k, which control randomness, not security awareness.

Exam trap

Cisco often tests the misconception that parameter tuning (like temperature or top_k) can fix content quality issues, when in fact prompt engineering—such as system instructions—is the primary tool for guiding model behavior toward specific goals like security.

How to eliminate wrong answers

Option A is wrong because setting top_k to 1 makes the model deterministic (always picks the highest-probability token), which can reduce output diversity but does not address security vulnerabilities—it may even amplify insecure patterns if they are common in training data. Option C is wrong because Codey is a specialized model for code generation, but it does not inherently include security guardrails; the same vulnerabilities can appear if the prompt lacks security context. Option D is wrong because increasing temperature to 0.8 increases randomness and creativity, which can introduce more unpredictable and potentially insecure code, worsening the vulnerability issue.

Full explanation →

729

MCQhard

Refer to the exhibit. A developer receives this error when trying to call a model for prediction. What is the most likely cause?

A.The project has exceeded its prediction quota.

B.The developer's service account lacks the required IAM role.

C.The model version has been deprecated.

D.The model is not deployed on an endpoint.

AnswerB

The 403 error is a standard permission denied response from IAM.

Why this answer

The error when calling a model for prediction most likely stems from the developer's service account lacking the required IAM role. In Google Cloud AI Platform, the 'aiplatform.user' or 'aiplatform.predictor' role is necessary to invoke prediction endpoints; without it, the API returns a permission-denied error. This is a common misconfiguration when service accounts are created without explicit roles attached.

Exam trap

Google Cloud often tests the misconception that quota limits are the default cause of prediction errors, but the trap here is that permission-denied errors are more frequently due to missing IAM roles rather than quota exhaustion, especially in multi-service-account environments.

How to eliminate wrong answers

Option A is wrong because exceeding the prediction quota would return a '429 RESOURCE_EXHAUSTED' or 'Quota exceeded' error, not a generic permission-denied error. Option C is wrong because a deprecated model version would still be accessible for predictions until it is deleted, and the error would typically indicate 'Model version not found' rather than an authorization failure. Option D is wrong because if the model is not deployed on an endpoint, the error would be 'Model not deployed' or 'Endpoint not found', not a permission error.

Full explanation →

730

MCQhard

A large enterprise is deploying a multi-modal generative AI application that processes customer support emails (text) and attached screenshots (images). They need to run inference on over 10,000 requests per minute with strict latency requirements (p99 < 500ms). They have already selected Gemini 1.5 Pro as the model and deployed it on Vertex AI using a GPU-based endpoint with autoscaling. During testing, they observe that the p99 latency spikes to over 2 seconds during peak traffic. The application is stateless and requests are independent. The team has access to Cloud Observability and can modify the deployment configuration. Which course of action should the team take to meet the latency requirements while minimizing cost?

A.Increase the maximum number of replicas in the autoscaling configuration to handle spikes

B.Enable Vertex AI Model Caching and deploy the endpoint on a managed instance group with larger GPU nodes (e.g., A100 40GB)

C.Use preemptible VMs for the endpoint to get priority scheduling

D.Switch to a CPU-based ml.c5 instance to reduce GPU contention

AnswerB

Caching reduces computation for repeated prompts, and larger GPUs accelerate inference.

Why this answer

Vertex AI Model Caching reduces latency by caching the model's weights in GPU memory, eliminating the need to reload them for each request. Deploying on larger GPU nodes (A100 40GB) provides higher memory bandwidth and compute capacity, which directly addresses the p99 latency spike by ensuring the model can process more requests per second without queueing. This combination minimizes cost because it optimizes existing GPU utilization rather than simply adding more replicas, which would increase cost without fixing the root cause of latency.

Exam trap

The trap here is that candidates often assume adding more replicas (Option A) is the universal fix for latency, but they overlook that the bottleneck is per-request inference time and cold-start delays, which require model caching and more powerful GPU nodes to reduce, not just horizontal scaling.

How to eliminate wrong answers

Option A is wrong because increasing the maximum number of replicas does not address the underlying latency per request; it only adds more endpoints, which can increase cost and may still suffer from high latency if each replica is overloaded or has cold-start delays. Option C is wrong because preemptible VMs are designed for cost savings on fault-tolerant workloads, not for latency-sensitive inference; they can be terminated at any time, causing request failures and violating the strict p99 < 500ms requirement. Option D is wrong because switching to CPU-based instances would dramatically increase inference latency for a large multi-modal model like Gemini 1.5 Pro, as CPUs lack the parallel processing power needed for GPU-accelerated models, making it impossible to meet the sub-500ms p99 target.

Full explanation →

731

Multi-Selecthard

A company is planning an iterative rollout of a GenAI code generation tool for developers. They want to ensure adoption and minimize resistance. Which THREE change management practices are most effective?

Select 3 answers

A.Start with a small pilot group and gather feedback before expanding

B.Provide mandatory training sessions for all developers before rollout

C.Identify AI champions within the team to advocate and support peers

D.Immediately replace existing code review processes with AI-generated reviews

E.Monitor only the number of prompts submitted as the success metric

AnswersA, B, C

Pilot allows learning and refinement before full rollout.

Why this answer

Identifying AI champions, providing hands-on training, and starting with a small pilot group are proven strategies for AI adoption.

Full explanation →

732

Multi-Selecthard

A company deploys a GenAI chatbot for customer support using Vertex AI Agent Builder. The chatbot sometimes gives incorrect answers. The team wants to improve accuracy without retraining the underlying model. Which THREE actions should they take? (Choose 3)

Select 3 answers

A.Enable Grounding with company knowledge base documents

B.Use Vertex AI Prompt Tuning to optimize the system prompt

C.Increase the model's temperature setting for more creative responses

D.Fine-tune the underlying model on past conversations

E.Implement a human-in-the-loop escalation for low-confidence responses

AnswersA, B, E

Grounding allows the agent to retrieve answers from authoritative sources.

Why this answer

Grounding with company documents (RAG) provides authoritative knowledge. Prompt tuning guides the model. Adding a human-in-the-loop fallback catches errors.

Fine-tuning would require retraining. Increasing temperature would worsen accuracy.

Full explanation →

733

MCQeasy

A startup wants to deploy a custom-tuned large language model for real-time inference on Vertex AI. They need the lowest possible latency for end users. What deployment strategy should they choose?

A.Use Vertex AI Model Garden to deploy the base PaLM 2 model.

B.Wrap the model in a Cloud Function and invoke via HTTP.

C.Deploy the tuned model to a Vertex AI endpoint with GPU acceleration and autoscaling.

D.Use Vertex AI Batch Prediction to process requests in batches.

AnswerC

Dedicated endpoints with GPUs provide the lowest latency for real-time inference.

Why this answer

Option C is correct because deploying a custom-tuned model to a Vertex AI endpoint with GPU acceleration and autoscaling provides the lowest possible latency for real-time inference. GPU acceleration enables parallel processing of inference requests, while autoscaling ensures sufficient compute resources are available to handle traffic spikes without cold starts. This combination minimizes both compute and network latency, which is critical for real-time user-facing applications.

Exam trap

Cisco often tests the distinction between real-time and batch inference strategies, and the trap here is that candidates may confuse 'lowest possible latency' with 'high throughput' or 'cost efficiency,' leading them to choose batch prediction (D) or serverless options (B) without recognizing that GPU-accelerated endpoints are specifically designed for sub-second inference.

How to eliminate wrong answers

Option A is wrong because using Vertex AI Model Garden to deploy the base PaLM 2 model does not incorporate the custom tuning, so the model would not reflect the startup's specific data or use case, and the base model may not achieve the desired accuracy or latency for the custom task. Option B is wrong because wrapping the model in a Cloud Function introduces additional cold-start latency and HTTP overhead, and Cloud Functions are not optimized for GPU-accelerated inference, leading to higher per-request latency compared to a dedicated endpoint. Option D is wrong because Vertex AI Batch Prediction is designed for asynchronous, high-throughput processing of large datasets, not for real-time inference; it introduces significant latency due to job queuing and batch processing, making it unsuitable for low-latency end-user requests.

Full explanation →

734

MCQhard

An organization uses a generative AI model to automatically approve or reject loan applications. To comply with the EU AI Act's requirements for high-risk AI systems, what must they implement?

A.A human-in-the-loop review for all loan decisions

B.Publish the model's accuracy metrics on a public website

C.A fully automated decision process with no human involvement

D.Regular bias audits without human review of individual decisions

AnswerA

Human oversight ensures accountability and compliance with the EU AI Act.

Why this answer

The EU AI Act mandates that high-risk AI systems, such as those used for credit scoring and loan approvals, must include human oversight to mitigate risks of automated bias and errors. A human-in-the-loop (HITL) review ensures that each loan decision is subject to human judgment, allowing for intervention in edge cases or when the model's confidence is low. This directly satisfies the Act's requirement for meaningful human control over high-risk AI outputs.

Exam trap

Cisco often tests the misconception that transparency measures (like publishing metrics) or bias audits alone are sufficient for compliance, when the EU AI Act specifically requires human oversight for high-risk systems, making human-in-the-loop review the mandatory control.

How to eliminate wrong answers

Option B is wrong because publishing accuracy metrics on a public website is a transparency measure, not a mandated control for high-risk systems under the EU AI Act; the Act focuses on risk management, documentation, and human oversight, not public disclosure of metrics. Option C is wrong because a fully automated decision process with no human involvement directly violates the EU AI Act's explicit requirement for human oversight in high-risk AI systems, such as loan approval. Option D is wrong because regular bias audits without human review of individual decisions fail to meet the Act's requirement for human-in-the-loop oversight; bias audits are a complementary measure, but they do not replace the need for human intervention in each specific decision.

Full explanation →

735

Multi-Selecteasy

Which TWO safety features are available in Vertex AI Gemini API? (Select TWO.)

Select 2 answers

A.Safety filters for categories like hate speech and harassment

B.Content restrictions based on configurable thresholds

C.Model-level encryption at rest

D.Automatic redaction of personally identifiable information (PII)

E.Integration with Cloud Data Loss Prevention (DLP)

AnswersA, B

Gemini API includes built-in safety filters for harmful content categories.

Why this answer

Option A is correct because the Vertex AI Gemini API includes built-in safety filters that automatically detect and block harmful content across categories such as hate speech, harassment, sexually explicit material, and dangerous content. These filters operate at the API level, analyzing both input prompts and model responses to enforce Google's AI safety policies before returning results to the user.

Exam trap

Cisco often tests the distinction between native API safety features (like configurable safety filters) and general Google Cloud security services (like encryption at rest or DLP integration) that are not part of the Gemini API's safety functionality.

Full explanation →

736

MCQmedium

A machine learning engineer wants to convert text into numerical vectors for similarity search. Which Google Cloud service should they use?

A.Vertex AI Embeddings API

B.Natural Language API

C.Vector Search

D.Gemini API

AnswerA

This API generates text embeddings for downstream tasks.

Why this answer

The Vertex AI Embeddings API is the correct choice because it is specifically designed to convert text (and other data types) into dense numerical vectors (embeddings) that capture semantic meaning. These embeddings are the fundamental input for similarity search, enabling efficient comparison of text based on conceptual closeness rather than exact keyword matching.

Exam trap

The trap here is that candidates confuse the Natural Language API's text analysis capabilities (like entity extraction) with the embedding generation required for similarity search, or they assume Vector Search or Gemini API can generate embeddings directly when they are actually downstream or generative tools.

How to eliminate wrong answers

Option B is wrong because the Natural Language API performs entity extraction, sentiment analysis, and syntax analysis, but it does not generate embeddings for similarity search. Option C is wrong because Vector Search is a service for indexing and querying embeddings at scale, not for generating them from raw text. Option D is wrong because the Gemini API is a multimodal generative model for chat and content generation, not a dedicated embedding service for converting text into vectors.

Full explanation →

737

MCQmedium

A company uses a generative model to produce product descriptions. The descriptions are factually inconsistent with the product specs. Which technique would best ensure factual accuracy?

A.Enhance the system prompt with product details

B.Implement retrieval-augmented generation (RAG) with product database

C.Lower the temperature to 0.0

D.Fine-tune the model on product descriptions

AnswerB

RAG grounds generation in factual data.

Why this answer

Retrieval-augmented generation (RAG) is the best technique because it dynamically retrieves relevant, up-to-date product specifications from a trusted database at inference time, grounding the model's output in verified facts. This directly addresses factual inconsistency by ensuring the generated description is based on authoritative source data rather than relying solely on the model's parametric memory.

Exam trap

Google Cloud often tests the misconception that prompt engineering alone (Option A) or deterministic sampling (Option C) can solve factual grounding issues, when in reality they do not provide external knowledge retrieval to correct hallucinations.

How to eliminate wrong answers

Option A is wrong because enhancing the system prompt with product details only provides static context that the model may still hallucinate or misinterpret; it does not enforce retrieval of current or specific factual data. Option C is wrong because lowering the temperature to 0.0 makes the output more deterministic but does not prevent the model from generating factually incorrect content that is confidently wrong. Option D is wrong because fine-tuning on product descriptions can improve style and consistency but does not guarantee factual accuracy for new or updated product specs, and it risks overfitting or memorizing inaccuracies from the training data.

Full explanation →

738

MCQeasy

A company wants to generate images from text descriptions using Google Cloud. Which service should they use?

A.Vertex AI Imagen

B.Vertex AI Gemini

C.Cloud Vision API

D.AutoML Vision

AnswerA

Imagen is the dedicated text-to-image service.

Why this answer

Vertex AI Imagen is Google Cloud's purpose-built service for generating high-fidelity images from text descriptions using diffusion models. It directly addresses the requirement of text-to-image generation, offering capabilities like image editing, upscaling, and style transfer, which are not available in other Vertex AI or Vision services.

Exam trap

The trap here is that candidates may confuse Vertex AI Gemini's multimodal capabilities (understanding images) with generative image creation, or assume that Cloud Vision API or AutoML Vision can be repurposed for generation, when in fact they are strictly analysis or custom training tools.

How to eliminate wrong answers

Option B is wrong because Vertex AI Gemini is a multimodal large language model (LLM) that can process text, images, audio, and video, but it is not optimized or primarily designed for generating images from text; its strength lies in understanding and reasoning across modalities, not in image synthesis. Option C is wrong because Cloud Vision API is a pre-trained model for analyzing and extracting information from images (e.g., object detection, OCR, label detection), not for generating images from text. Option D is wrong because AutoML Vision is a service for training custom image classification or object detection models on labeled datasets, not for generative text-to-image tasks.

Full explanation →

739

MCQeasy

A company uses a text generation model for customer support but notices it occasionally provides outdated information. Which technique should they implement to improve output accuracy?

A.Increase max output tokens

B.Implement retrieval-augmented generation (RAG)

C.Fine-tune the model with more historical support data

D.Increase model temperature to 1.0

AnswerB

RAG retrieves current information, making outputs accurate and up-to-date.

Why this answer

Retrieval-augmented generation (RAG) is the correct technique because it grounds the model's output in real-time, external knowledge sources (e.g., a vector database or document index) rather than relying solely on static training data. This directly addresses the problem of outdated information by allowing the model to retrieve and synthesize current facts at inference time, ensuring accuracy without requiring retraining.

Exam trap

The trap here is that candidates often confuse fine-tuning (which adapts the model's weights to a static dataset) with RAG (which dynamically retrieves external knowledge), leading them to choose fine-tuning as a 'deeper' fix when the core issue is stale information, not model capability.

How to eliminate wrong answers

Option A is wrong because increasing max output tokens only extends the length of the generated response, not its factual accuracy or timeliness; it may even introduce more hallucinated content. Option C is wrong because fine-tuning with more historical support data would reinforce outdated patterns and biases, making the model more likely to repeat stale information rather than adapt to current knowledge. Option D is wrong because increasing model temperature to 1.0 increases randomness and creativity in outputs, which degrades factual precision and reliability, the opposite of what is needed for accurate customer support.

Full explanation →

740

MCQmedium

A startup wants to integrate a GenAI assistant into Google Workspace (Docs, Gmail, Sheets) to help employees draft emails and create charts. Which Google AI offering is designed for this purpose?

A.Gemini for Workspace

B.Colab Enterprise

C.Vertex AI Agent Builder

D.NotebookLM

AnswerA

Gemini for Workspace provides AI assistance directly in Workspace applications.

Why this answer

Gemini for Workspace (formerly Duet AI) is Google's AI assistant embedded across Workspace apps for tasks like drafting and analysis.

Full explanation →

741

MCQeasy

A company wants to ensure that its generative AI application complies with the GDPR right to erasure (right to be forgotten) for user data used in model fine-tuning. What is the best approach?

A.Store data with expiration dates and automatically delete after a set period

B.Maintain a mapping of user identities to training data, and upon request, remove the specific data points and retrain the model

C.Use a broad data deletion request on all training data

D.Implement differential privacy during training to prevent memorization

AnswerB

This allows targeted removal and retraining, fulfilling the right to erasure.

Why this answer

Only option B fully addresses GDPR compliance by identifying and removing specific user data from the training set, then retraining. The other options do not effectively erase the user's influence from the model.

Full explanation →

742

MCQmedium

A data scientist is using Vertex AI to fine-tune a Gemini model for a specialized legal document summarization task. They have a small set of labeled examples (200 pairs). Which fine-tuning method is MOST cost-effective and likely to perform well?

A.Full fine-tuning of all model parameters

B.Adapter-based fine-tuning (e.g., LoRA)

C.Training a small custom model from scratch

D.Prompt engineering with few-shot examples only

AnswerB

LoRA updates low-rank matrices, preserving the base model and reducing memory/storage requirements while adapting to the new task.

Why this answer

Adapter-based fine-tuning (like LoRA) updates only a small fraction of parameters, making it efficient with small datasets and low cost, while still adapting the model to the task.

Full explanation →

743

MCQeasy

Refer to the exhibit. A data scientist sends a prediction request to a text generation model with the following parameters and receives repetitive output. Which parameter should be changed?

A.Decrease topP to 0.5

B.Increase topK to 100

C.Decrease maxOutputTokens

D.Increase temperature to 0.5

AnswerD

Introduces randomness to avoid repetition.

Why this answer

Temperature 0.0 makes the model deterministic, leading to repetitive text. Increasing temperature to 0.5 introduces randomness. Decreasing topP may help but temperature is the direct cause.

Increasing topK adds diversity but less effect, decreasing max tokens doesn't fix repetition.

Full explanation →

744

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Fine-tune a base LLM on the policy documents monthly

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions based on the latest policy documents without retraining the model. By indexing the documents in a vector store and retrieving relevant chunks at query time, RAG ensures the model uses up-to-date information while keeping the underlying LLM static, which is both cost-effective and scalable for monthly updates.

Exam trap

Cisco often tests the misconception that fine-tuning is the only way to incorporate new information into an LLM, leading candidates to overlook the efficiency and flexibility of RAG for dynamic, frequently updated knowledge bases.

How to eliminate wrong answers

Option A is wrong because training a custom model from scratch each month is prohibitively expensive and time-consuming, requiring significant computational resources and data preparation, and it does not leverage the benefits of a pre-trained foundation model. Option B is wrong because pasting all policy documents into each prompt would exceed the context window limits of even the largest models (e.g., 128K tokens for GPT-4 Turbo), leading to truncated inputs, high token costs, and degraded performance due to irrelevant context. Option D is wrong because fine-tuning a base LLM monthly on the policy documents would still require retraining the model each time, which is costly and introduces the risk of catastrophic forgetting, where the model loses previously learned knowledge from the fine-tuning process.

Full explanation →

745

MCQeasy

What is the primary purpose of the temperature parameter when configuring a generative AI model?

A.Adjusts the number of highest-probability tokens considered at each step

B.Controls the diversity of the output by scaling the log probabilities before sampling

C.Specifies the minimum probability threshold for token selection

D.Sets the maximum number of tokens in the response

AnswerB

Temperature is applied to the logits before softmax; higher values flatten the distribution, making lower‑probability tokens more likely.

Why this answer

Temperature controls the randomness of token selection. Higher temperature increases creativity/diversity; lower temperature produces more deterministic and focused responses.

Full explanation →

746

Multi-Selectmedium

A company is building a customer support agent that can answer questions about product manuals and also generate images of the products from descriptions. Which TWO Google Cloud services should they combine? (Select 2)

Select 2 answers

A.Codey

B.Chirp

C.Cloud Vision API

D.Imagen on Vertex AI

E.Gemini on Vertex AI

AnswersD, E

Imagen generates images from text descriptions.

Why this answer

Imagen on Vertex AI is correct because it is Google Cloud's text-to-image generation service, capable of creating product images from textual descriptions. This directly fulfills the requirement to 'generate images of the products from descriptions'.

Exam trap

The trap here is that candidates may confuse Cloud Vision API (which analyzes images) with Imagen (which generates images), or assume that a single model like Gemini can handle both text and image generation natively, when in fact Gemini is primarily a multimodal understanding model and Imagen is the dedicated image generation service.

Full explanation →

747

MCQhard

A company is using generative AI for code generation and wants to evaluate the quality of generated code for security vulnerabilities. Which metric is most appropriate?

A.BLEU score

B.Automatic static analysis

C.Human evaluation

D.Perplexity

AnswerB

Scans for common vulnerabilities efficiently.

Why this answer

Option C is correct because automatic static analysis can scan code for security issues efficiently. Option A (BLEU score) measures text similarity, not security. Option B (human evaluation) is subjective and expensive.

Option D (perplexity) measures language model confidence, not code security.

Full explanation →

748

MCQeasy

A non-profit organization uses generative AI to produce reports on climate change. They want to ensure that the model's outputs are scientifically accurate. Which Google AI Principle is most relevant?

A.Be built and tested for safety

B.Uphold high standards of scientific excellence

C.Be accountable to people

D.Be socially beneficial

AnswerB

This principle requires that AI systems are built on sound scientific methods and produce accurate outputs.

Why this answer

The Google AI Principle 'Uphold high standards of scientific excellence' directly addresses the need for generative AI outputs to be scientifically accurate, especially in domains like climate change reporting where factual precision is critical. This principle emphasizes rigorous validation, peer review, and adherence to established scientific methodologies to ensure the model's outputs are reliable and trustworthy.

Exam trap

Cisco often tests the distinction between broad ethical principles (like safety or social benefit) and the specific principle that mandates factual and methodological rigor, causing candidates to pick 'Be socially beneficial' because they conflate 'good for society' with 'scientifically accurate'.

How to eliminate wrong answers

Option A is wrong because 'Be built and tested for safety' focuses on preventing harmful or unsafe behaviors (e.g., avoiding dangerous instructions), not on ensuring scientific accuracy of factual content. Option C is wrong because 'Be accountable to people' relates to transparency, feedback mechanisms, and human oversight, but does not specifically mandate scientific rigor or factual correctness. Option D is wrong because 'Be socially beneficial' is a broad principle about overall positive societal impact, which does not inherently require the model to produce scientifically accurate outputs—it could be socially beneficial but still factually incorrect.

Full explanation →

749

MCQeasy

Which Google Cloud AI service is specifically designed for extracting structured data from scanned documents, such as invoices and receipts?

A.Document AI

B.Natural Language AI

C.Translation AI

D.Vision AI

AnswerA

Document AI is purpose-built for extracting structured data from scanned documents using OCR and parsing.

Why this answer

Document AI is the correct answer because it is purpose-built for understanding and extracting structured data from unstructured documents like invoices, receipts, and forms. It uses specialized processors (e.g., the Invoice Parser or Expense Parser) that combine optical character recognition (OCR) with natural language understanding and machine learning models trained on document layouts, enabling it to output structured fields such as vendor name, total amount, and line items.

Exam trap

The trap here is that candidates often confuse Vision AI’s general OCR capability with Document AI’s specialized document understanding, overlooking that Vision AI cannot natively extract structured fields like line items or totals without extensive custom coding.

How to eliminate wrong answers

Option B is wrong because Natural Language AI is designed for analyzing and extracting insights from text (e.g., sentiment, entity recognition, syntax analysis), not for processing scanned document images or extracting structured data from forms. Option C is wrong because Translation AI is a neural machine translation service that converts text between languages, with no capability to parse scanned documents or extract structured fields. Option D is wrong because Vision AI provides general-purpose image analysis (e.g., object detection, OCR for text extraction), but it lacks the specialized document understanding and pre-trained models for extracting structured data from invoices and receipts that Document AI offers.

Full explanation →

750

MCQhard

An enterprise is comparing Google Cloud Vertex AI vs AWS Bedrock vs Azure OpenAI for a generative AI application. Which unique Google differentiator allows the model to reference up-to-date web information and private data with managed retrieval?

A.Vertex AI Agent Builder with search grounding

B.TPU availability

C.Integration with Google Workspace

D.Multimodal understanding

AnswerA

Agent Builder provides managed search grounding.

Why this answer

Vertex AI offers grounding with Google Search and private data sources, a capability not directly matched by AWS Bedrock or Azure OpenAI.

Full explanation →

Page 10 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice Generative AI Leader by domain

Target a specific domain to shore up weak areas.

Fundamentals of Generative AI Business Strategies for Generative AI Solutions Generative AI Concepts and Technologies Google AI Ecosystem and Strategy Responsible AI and Data Governance Google Cloud's Generative AI Offerings Techniques to Improve Generative AI Model Output Applying Generative AI in Business

See all domains with question counts →