Knowledge + Practice

CCNA Google Cloud Gen Ai Offerings Questions

75 of 127 questions · Page 1/2 · Google Cloud Gen Ai Offerings topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQhard

During a load test, a Vertex AI endpoint serving a large language model experiences high latency and increased error rates. The endpoint is configured with autoscaling. What is the most likely cause?

A.There is a network bottleneck

B.The model size is too large for the machine type

C.The endpoint is using a global load balancer

D.The autoscaling metric is based on CPU utilization but the model is GPU-bound

AnswerD

GPU-bound models require GPU-based metrics for effective autoscaling.

Why this answer

If autoscaling is based on CPU utilization but the model is GPU-bound, the scaling metric does not reflect the actual load, causing insufficient resources.

Practice this question →

2

MCQmedium

A company is building a generative AI chatbot for customer support using Vertex AI. They want to ground the model responses with their internal knowledge base stored in Cloud Storage and BigQuery. Which feature should they use to ensure the model only answers from the provided data and avoids hallucination?

A.Vertex AI Grounding with Vertex AI Search

B.Vertex AI Prediction

C.Vertex AI Pipelines

D.Cloud Functions

AnswerA

Vertex AI Grounding with Search enables grounding on enterprise data sources.

Why this answer

Vertex AI Grounding with Vertex AI Search is the correct feature because it allows the model to retrieve and cite information from a specified data source (such as Cloud Storage and BigQuery) to generate responses. This process, known as grounding, ensures the model's output is based solely on the provided authoritative data, effectively reducing hallucinations by constraining the model to factual, retrieved content rather than relying on its internal parametric knowledge.

Exam trap

The trap here is that candidates may confuse Vertex AI Prediction (a general model serving endpoint) with the grounding feature, mistakenly thinking that simply deploying a model with Vertex AI Prediction will automatically restrict its answers to a specific knowledge base, when in fact grounding requires explicit integration with Vertex AI Search and a configured data store.

How to eliminate wrong answers

Option B is wrong because Vertex AI Prediction is a service for deploying and serving models to generate predictions or responses, but it does not inherently include grounding capabilities to restrict answers to a specific knowledge base; it would require additional integration with a retrieval system. Option C is wrong because Vertex AI Pipelines is an orchestration service for building and managing ML workflows, not a feature for grounding model responses or preventing hallucinations. Option D is wrong because Cloud Functions is a serverless compute service for running event-driven code, and while it could be used to build a custom retrieval pipeline, it is not a native Vertex AI feature for grounding and does not provide the built-in retrieval and citation mechanisms needed to ensure answers come only from the provided data.

Practice this question →

3

Multi-Selecteasy

Which TWO features are available in Vertex AI Studio for prompt engineering? (Choose two.)

Select 2 answers

A.Side-by-side comparison of model outputs

B.One-click deployment to a Vertex AI endpoint

C.Ability to test prompts with different model parameters (temperature, top_p)

D.Fine-tuning models directly in the interface

E.Building conversational agents with drag-and-drop

AnswersA, C

Allows output comparison.

Why this answer

Option A is correct because Vertex AI Studio provides a side-by-side comparison feature that allows prompt engineers to evaluate outputs from multiple model configurations or parameter settings simultaneously. This enables direct visual comparison of responses, helping to identify the most effective prompt phrasing or parameter combination without manual switching.

Exam trap

The trap here is that candidates may confuse Vertex AI Studio's prompt engineering features with those of Vertex AI Agent Builder or Vertex AI Model Registry, leading them to select options like one-click deployment or drag-and-drop agent building that belong to separate services.

Practice this question →

4

MCQhard

An organization is using Vertex AI to fine-tune a large language model. They notice training is taking longer than expected and cost is increasing. Which action is most likely to reduce training time and cost without significantly impacting model quality?

A.Increase the number of training steps

B.Increase the batch size

C.Use a higher learning rate

D.Enable mixed-precision training (bfloat16)

AnswerD

Mixed-precision reduces computation and memory, speeding up training on TPUs and GPUs.

Why this answer

Mixed-precision training (bfloat16) reduces memory usage and speeds up computation on compatible hardware while maintaining model quality. Increasing batch size or learning rate risks convergence issues; increasing steps increases cost.

Practice this question →

5

MCQhard

A company is using Gemini Pro for code generation. They want to ensure that the generated code does not contain security vulnerabilities. Which approach should they implement?

A.Enable grounding with security scanning tools

B.Use the Vertex AI Codey API with safety settings

C.Implement a human-in-the-loop review with automated scanning

D.Use a custom safety attribute filter

AnswerC

Combining human review with automated scanning is the recommended approach for code security.

Why this answer

Option D is correct because for security, human review combined with automated scanning is best practice. Option A is wrong because custom safety filters are generic and not code-specific. Option B is wrong because grounding with scanning tools is not a standard feature.

Option C is wrong because safety settings are about content harm, not code vulnerabilities.

Practice this question →

6

MCQeasy

A developer is using Vertex AI Studio to prototype a chat application. They want to provide the model with a system instruction to set the tone and style. How should they configure this in the Vertex AI Studio interface?

A.Add the instruction as part of the prompt text

B.Set the temperature parameter to a high value

C.Use the 'System Instruction' field in the model configuration

D.Add the instruction in the 'Context' parameter

AnswerC

Vertex AI Studio has a dedicated field for system instructions.

Why this answer

Option C is correct because Vertex AI Studio provides a dedicated 'System Instruction' field in the model configuration panel, which allows developers to set the tone, style, and behavioral guidelines for the model without mixing them into the user prompt. This field is specifically designed to hold system-level instructions that are prepended to the conversation context, ensuring consistent behavior across multiple turns.

Exam trap

The trap here is that candidates often confuse the 'System Instruction' field with the 'Context' parameter, mistakenly thinking both serve the same purpose, but the 'Context' parameter is designed for providing background knowledge or few-shot examples, not for setting persistent behavioral instructions.

How to eliminate wrong answers

Option A is wrong because adding the instruction as part of the prompt text would mix system-level guidance with user input, making it harder to maintain consistency and potentially causing the model to treat the instruction as part of the conversation rather than a persistent directive. Option B is wrong because the temperature parameter controls randomness in output generation, not the tone or style; a high temperature increases creativity and variability but does not enforce a specific behavioral instruction. Option D is wrong because the 'Context' parameter in Vertex AI Studio is used to provide background information or examples for grounding the model, not for setting system-level behavioral instructions like tone or style.

Practice this question →

7

MCQmedium

A company is building a customer support chatbot using Vertex AI Agent Builder. They want the agent to answer questions based on internal knowledge base documents stored in Cloud Storage. Which feature should they configure to ensure the agent can retrieve relevant information from these documents?

A.Deploy the agent to a Vertex AI endpoint

B.Fine-tune a Gemini model on the knowledge base

C.Enable grounding with a data store

D.Configure a safety filter to block irrelevant queries

AnswerC

Grounding allows the agent to retrieve information from a data store created from Cloud Storage documents.

Why this answer

Grounding connects the agent to external data sources like Cloud Storage, allowing it to retrieve and use information from the knowledge base. Option A is wrong because deploying to an endpoint is for model serving, not data retrieval. Option B is wrong because safety filters control content, not retrieval.

Option D is wrong because custom training is for fine-tuning, not retrieval.

Practice this question →

8

MCQmedium

A news organization is using Vertex AI Gemini to summarize articles. They observe that the summaries sometimes contain hallucinated facts—specifically, dates and statistics that are not in the original article. The team is using the default temperature and top_p settings. They want to reduce hallucinations without making summaries too repetitive or overly conservative. They also need to keep latency low. Which action should they take?

A.Increase the temperature to 1.0 and lower top_p to 0.1.

B.Enable grounding with Google Search to provide factual source context.

C.Fine-tune the model on a large dataset of articles and human-written summaries.

D.Lower the temperature to 0.0 and increase top_p to 1.0.

AnswerB

Grounding connects the model to verified information, reducing hallucination.

Why this answer

Option D is correct. Grounding with Google Search can provide factual references. Option A (lower temperature) may reduce creativity but not eliminate hallucination.

Option B (increase top_p) might increase randomness. Option C (fine-tuning) is expensive and might not generalize.

Practice this question →

9

MCQmedium

A fintech startup is building a generative AI application that generates personalized investment advice based on user profiles and market data. They are using Vertex AI Agent Builder to create an agent that retrieves information from a BigQuery table containing user data and from a real-time market data API. The agent needs to ensure that responses comply with financial regulations, meaning the model must not give specific stock recommendations unless the user explicitly requests them after disclaimers. The team has implemented grounding with both sources. During testing, the agent sometimes spontaneously suggests buying a particular stock without being asked, which could lead to regulatory issues. The team wants to enforce strict control over the agent's behavior. What should the team do?

A.Increase the safety filter sensitivity to block any financial recommendations

B.Add more historical data to the BigQuery table to improve grounding accuracy

C.Implement a custom system instruction that explicitly prohibits unsolicited stock recommendations and requires a disclaimer before any advice

D.Fine-tune the model on a dataset of compliant conversations

AnswerC

System instructions provide explicit behavioral constraints.

Why this answer

Option D is correct because a custom system instruction can explicitly constrain the agent's behavior. Option A is wrong because adding more data may not control behavior. Option B is wrong because fine-tuning may not cover all edge cases.

Option C is wrong because a safety filter may not catch subtle recommendations.

Practice this question →

10

MCQhard

A multinational corporation is using Vertex AI to generate multilingual customer support responses. They have fine-tuned the Gemini model on support tickets in English and now want to extend to 10 additional languages. The fine-tuning dataset for new languages is small (1000 tickets each). During evaluation, the model performs well for common languages (Spanish, French) but poorly for languages like Finnish and Thai. The team needs to improve performance for low-resource languages. They have budget constraints and cannot collect more data quickly. Which approach should they take?

A.Switch to Vertex AI Codey API for generating responses in all languages.

B.Use a multilingual foundation model and fine-tune with cross-lingual transfer learning techniques.

C.Deploy separate fine-tuned models for each language.

D.Collect more training data for low-resource languages via crowdsourcing.

AnswerB

Gemini is inherently multilingual; cross-lingual transfer can boost low-resource performance.

Why this answer

Option B is correct. Cross-lingual transfer learning (like using a multilingual model or fine-tuning with a high-resource language pair) leverages data from similar languages. Option A (collect more data) is not feasible quickly.

Option C (use Codey) is irrelevant. Option D (separate models) increases cost and complexity.

Practice this question →

11

MCQhard

A global e-commerce company is using Vertex AI to build a generative AI chatbot for customer support. The chatbot is powered by the Gemini 1.5 Pro model and uses a vector search index for retrieval-augmented generation (RAG) over product documentation. The company has deployed the application in four regions (us-central1, europe-west4, asia-east1, and australia-southeast1) using a multi-region deployment with a global endpoint. The application is critical and requires high availability with a target latency of under 500ms for the RAG pipeline. Recently, users in Australia are experiencing inconsistent latency spikes, with response times exceeding 2 seconds during peak hours. The team suspects that the issue is related to the vector search index's replication and serving configuration. The index has 10 million embeddings with a dimension of 768. It is stored in a single regional bucket in us-central1, and the vector search index endpoint is deployed in all four regions with the same deployed index ID. The team is using the default configuration for index updates and serving. Which action should the team take to resolve the latency issue for Australian users?

A.Move the vector search index to a multi-regional Cloud Storage bucket (e.g., 'us') to reduce latency for index updates.

B.Create a new regional bucket in australia-southeast1 and store a copy of the index there, then redeploy the vector search index endpoint to use the local bucket.

C.Deploy a separate vector search index endpoint for each region with its own index copy stored in a regional bucket in that region.

D.Increase the number of replicas for the vector search index in all regions to improve throughput and reduce latency.

AnswerA

Multi-regional buckets provide better replication and availability across regions, reducing update latency for distant regions.

Why this answer

The latency spike for Australian users is caused by the vector search index being stored in a single regional bucket in us-central1. When the index is updated, the new embeddings must be rebuilt and streamed from us-central1 to the australia-southeast1 endpoint, introducing significant cross-region latency. Moving the index to a multi-regional Cloud Storage bucket (e.g., 'us') allows the index to be served from a location closer to all regions, reducing the update propagation time and improving consistency for Australian users.

Exam trap

Google Cloud often tests the misconception that deploying separate endpoints or increasing replicas solves cross-region latency, when the real issue is the single-region storage bucket causing update propagation delays.

How to eliminate wrong answers

Option B is wrong because creating a separate regional bucket in australia-southeast1 and storing a copy of the index there does not solve the update propagation issue; the index must be rebuilt and streamed from the source bucket (us-central1) to the new bucket, and the endpoint still references the original deployed index ID, so the local copy is not automatically used. Option C is wrong because deploying separate endpoints per region with local index copies increases operational complexity and cost, and does not address the root cause of cross-region latency during index updates; the global endpoint already routes to the nearest region, but the index data still originates from us-central1. Option D is wrong because increasing the number of replicas improves throughput and query handling within a region, but does not reduce the latency caused by cross-region data transfer during index updates; the bottleneck is the update propagation, not query processing capacity.

Practice this question →

12

Multi-Selecteasy

Which TWO of the following are capabilities of Vertex AI Model Garden? (Choose 2)

Select 2 answers

A.Generate code snippets for common programming tasks.

B.Ability to generate images from text descriptions.

C.Deploy custom container images for model serving.

D.Access to a curated set of foundation models like PaLM and Gemini.

E.Ability to fine-tune and deploy foundation models.

AnswersD, E

Model Garden gives access to foundation models.

Why this answer

Option A and D are correct. Model Garden provides a curated set of foundation models and allows fine-tuning. Option B (generating images) is Imagen's capability.

Option C (generating code) is Codey's capability. Option E (deploying custom containers) is a general Vertex AI feature not specific to Model Garden.

Practice this question →

13

MCQeasy

A marketing agency wants to use Vertex AI to automatically generate social media posts for clients. They plan to use the Gemini API with few-shot prompting. The agency's developers have limited experience with generative AI and want the fastest way to prototype and iterate on prompts. They are already using Google Cloud for other services. Which approach should they take to quickly develop and test prompts?

A.Use a third-party platform like OpenAI Playground and migrate later.

B.Use Google Cloud Shell to invoke the model via curl commands.

C.Use Vertex AI Studio (Gen AI Studio) to design and test prompts interactively.

D.Write Python scripts using the Vertex AI SDK and run them in Airflow.

AnswerC

Vertex AI Studio is designed for rapid prototyping with a visual interface.

Why this answer

Option A is correct. Vertex AI Studio provides a no-code interface for prompt design and testing. Option B (write code) is slower.

Option C (use Cloud Shell) is possible but less user-friendly. Option D (third-party tool) adds complexity.

Practice this question →

14

MCQhard

An organization is using Vertex AI Gemini API for a multimodal chatbot. They notice that the model sometimes provides incorrect information with high confidence. They want to reduce hallucinations without retraining the model. What is the most effective approach?

A.Provide ground-truth context from a knowledge base using grounding

B.Increase the temperature parameter to make the model more creative

C.Reduce the maximum output tokens to force concise answers

D.Adjust safety settings to filter uncertain responses

AnswerA

Grounding supplies factual information that the model can use to generate accurate responses.

Why this answer

Safety settings reduce hallucinations? Actually, hallucinations are not purely safety. Grounding with a knowledge base provides factual retrieval to reduce hallucinations. Option B is wrong because safety settings block harmful content, not necessarily hallucinations.

Option C is wrong because increasing temperature increases randomness. Option D is wrong because reduced token count limits responses but may not reduce hallucinations.

Practice this question →

15

MCQeasy

A company wants to build a chatbot that can answer questions about its internal knowledge base using natural language. Which Google Cloud Generative AI offering should they use to quickly prototype and deploy this chatbot with minimal coding?

A.Generative AI Studio

B.Vertex AI Endpoints

C.Cloud Natural Language API

D.Vertex AI Model Garden

AnswerA

Generative AI Studio offers a drag-and-drop interface for building chatbots.

Why this answer

Generative AI Studio provides a no-code/low-code environment to prototype and deploy chatbots with foundation models.

Practice this question →

16

MCQmedium

A machine learning engineer is deploying a large generative model on Vertex AI. The model requires a GPU with high memory. Which machine configuration should they choose?

A.c2-standard-16 with no GPU

B.a2-highgpu-4g with 4 A100 GPUs

C.n1-standard-4 with a single T4 GPU

D.n2-standard-8 with a single P4 GPU

AnswerB

A2 machines offer A100s with large memory, suitable for large models.

Why this answer

A2 high-gpu machines with NVIDIA A100 GPUs provide high memory for large models.

Practice this question →

17

MCQhard

A company is using Vertex AI Gemini API to analyze customer feedback. They notice that the model occasionally generates offensive content. They have already set safety settings to block high-probability harmful content. What additional step should they take to further reduce offensive outputs?

A.Set the temperature to 0.0

B.Adjust safety settings to block medium-probability harmful content

C.Enable context caching

D.Fine-tune the model on customer feedback data

AnswerB

Stricter thresholds block more offensive outputs.

Why this answer

Option B is correct because the company has already blocked high-probability harmful content, but offensive outputs can still occur at lower probability thresholds. By adjusting safety settings to block medium-probability harmful content, they tighten the filter to catch more borderline cases without requiring model retraining or sacrificing output diversity. This leverages Vertex AI's configurable safety filters, which operate on likelihood categories (e.g., high, medium, low) rather than just binary blocking.

Exam trap

The trap here is that candidates assume fine-tuning (Option D) is the default fix for any output quality issue, but safety filtering is a separate, configurable layer that should be tuned before retraining, and temperature (Option A) is often mistakenly thought to control safety when it only controls randomness.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0.0 makes the model deterministic and reduces creativity, but it does not filter or block offensive content; temperature controls randomness in token selection, not safety. Option C is wrong because context caching improves latency and cost for repeated prompts by storing context, but it has no effect on content safety or filtering harmful outputs. Option D is wrong because fine-tuning on customer feedback data could inadvertently reinforce biases or offensive patterns in the data, and it does not directly address safety filtering; safety settings are a separate, configurable layer that should be adjusted first.

Practice this question →

18

MCQmedium

A media company uses Vertex AI to generate video captions. The generated captions sometimes contain factual errors about named entities (e.g., actor names). Which technique would most likely reduce these errors?

A.Enable response caching

B.Increase the temperature parameter

C.Use Vertex AI grounding with a knowledge base of verified entities

D.Decrease top_p to 0.3

AnswerC

Grounding supplies factual context to the model.

Why this answer

Option C is correct because Vertex AI grounding connects the model to a knowledge base of verified entities, allowing it to retrieve authoritative facts during generation. This reduces hallucinations about named entities by constraining outputs to validated data rather than relying solely on the model's parametric knowledge.

Exam trap

The trap here is that candidates confuse techniques that control output randomness (temperature, top_p) with techniques that improve factual accuracy, overlooking the fundamental need for external knowledge retrieval via grounding.

How to eliminate wrong answers

Option A is wrong because response caching stores previous outputs for reuse, which does not correct factual errors—it may even propagate them. Option B is wrong because increasing the temperature parameter increases randomness in token selection, making factual errors more likely, not less. Option D is wrong because decreasing top_p to 0.3 narrows the sampling pool to only the most probable tokens, which can reduce creativity but does not address factual accuracy about named entities—it still relies on the model's internal knowledge, which may be incorrect.

Practice this question →

19

MCQhard

A large enterprise is migrating their on-premise ML workloads to Vertex AI. They have a custom PyTorch model for text classification that they want to serve with minimal code changes. Which Vertex AI capability should they use for model serving?

A.Vertex AI Endpoints with a pre-built PyTorch runtime

B.Vertex AI Prediction with a custom container

C.Vertex AI Model Garden

D.Vertex AI Vector Search for approximate nearest neighbor

AnswerB

Custom containers support any framework and allow minimal code changes.

Why this answer

Option C is correct because Vertex AI Prediction with a custom container allows them to package their PyTorch model with dependencies and serve it without rewriting code. Option A (Vertex AI Endpoints) is an older term; custom containers are the way. Option B (Vertex AI Model Garden) hosts pre-built models.

Option D (Vertex AI Vector Search) is for embeddings, not classification.

Practice this question →

20

MCQhard

A financial services firm uses a fine-tuned Gemini model in Vertex AI for regulatory compliance checks. They notice that token usage is high, increasing costs. They want to reduce costs without sacrificing accuracy. Which approach should they take?

A.Switch to a smaller base model like PaLM 2 Bison

B.Enable context caching to reuse previous responses

C.Set max output tokens to a lower value and use more precise prompts

D.Reduce temperature to 0.0

AnswerC

Directly reduces output tokens; precise prompts maintain accuracy.

Why this answer

Option C is correct because reducing max output tokens directly lowers the number of tokens generated per request, which is the primary cost driver in pay-per-token models like Gemini. Using more precise prompts further reduces token waste by guiding the model to produce concise, relevant outputs without sacrificing accuracy, as compliance checks often require specific, structured responses rather than verbose explanations.

Exam trap

The trap here is that candidates often confuse cost-reduction strategies that affect model behavior (like temperature or model size) with those that directly reduce token count, leading them to pick options that change output quality rather than token usage.

How to eliminate wrong answers

Option A is wrong because switching to a smaller base model like PaLM 2 Bison may reduce per-token cost but can degrade accuracy on complex regulatory compliance tasks, as smaller models have less capacity for nuanced understanding and may miss critical compliance nuances. Option B is wrong because context caching is designed to reduce latency and cost for repeated identical prompts by reusing cached responses, but it does not help when each compliance check involves unique input data (e.g., different contracts or transactions), making cache hits unlikely. Option D is wrong because setting temperature to 0.0 makes the model deterministic but does not reduce token usage; it may even increase token count if the model becomes overly repetitive or verbose in its attempts to be precise.

Practice this question →

21

MCQmedium

An e-commerce company is using Vertex AI PaLM 2 for Text (via Model Garden) to generate product descriptions. They have an existing pipeline that calls the model with a prompt including product attributes. Recently, they migrated to the Gemini API. The team notices that the Gemini model sometimes outputs descriptions that are factually inconsistent with the input (e.g., wrong color or size). This was less frequent with PaLM 2. They have not changed the prompts. What is the most likely cause and solution?

A.Revert to PaLM 2 since it was more reliable for this task.

B.Add negative prompts to discourage incorrect facts.

C.Adjust the prompt to be more explicit about adhering to the input data, and reduce the temperature.

D.Increase the model's temperature to make outputs more deterministic.

AnswerC

Different models may require slight prompt adjustments; lower temperature and clearer instructions improve factual precision.

Why this answer

Option C is correct. Gemini may follow instructions differently; adjusting the prompt or temperature can help. Option A (increase model temperature) may increase randomness, worsening consistency.

Option B (add negative prompts) might not address factual alignment. Option D (switch back to PaLM 2) does not solve the cause.

Practice this question →

22

MCQeasy

A developer wants to generate Python code using Google Cloud's generative AI. Which model should they invoke?

A.Chirp

B.Codey

C.Imagen

D.Meena

AnswerB

Codey is designed for code generation.

Why this answer

Option A is correct because Codey is Google Cloud's specialized code generation model. Option B is wrong because Imagen generates images. Option C is wrong because Chirp is for audio.

Option D is wrong because Meena is a chatbot, not code-focused.

Practice this question →

23

MCQhard

The exhibit shows the response from a model deployed on Vertex AI that includes safety attributes. The application must reject any prediction where the toxicity score exceeds 0.8. Based on the response, what action should the application take?

A.Reject the prediction because the toxicity score exceeds 0.8.

B.Retry the request with a lower temperature.

C.Display the prediction because the insult score is below 0.8.

D.Log the prediction but still display it.

AnswerA

Toxicity 0.85 > 0.8, so reject.

Why this answer

The toxicity score is 0.85 which exceeds 0.8, so the application should reject the prediction and not display it.

Practice this question →

24

Multi-Selecthard

An organization is building a generative AI application on Vertex AI. Which THREE actions should they take to ensure responsible AI practices?

Select 3 answers

A.Disable content filtering

B.Implement human review for sensitive outputs

C.Conduct fairness evaluation

D.Create a safety policy and enforce via content filtering

E.Use only Google's foundation models

AnswersB, C, D

Human review ensures appropriate oversight.

Why this answer

Implementing human review, conducting fairness evaluation, and creating a safety policy with content filtering are key responsible AI practices. Using only Google's models does not guarantee responsibility, and disabling filtering is counterproductive.

Practice this question →

25

MCQeasy

A startup wants to embed generative AI features into their mobile app but has limited ML expertise. Which Google Cloud service is best suited for rapid integration with no ML training?

A.Vertex AI Model Garden

B.Vertex AI Agent Builder

C.Gemini API

D.Cloud Run with a custom container

AnswerC

Gemini API is ready-to-use with minimal setup.

Why this answer

Option C is correct because the Gemini API provides a simple REST API endpoint without requiring ML expertise. Option A is wrong because Vertex AI Model Garden still requires some ML knowledge. Option B is wrong because Vertex AI Agent Builder is more complex.

Option D is wrong because Cloud Run is a compute service, not AI-specific.

Practice this question →

26

MCQhard

A financial services firm is using Vertex AI to generate investment reports. They need to ensure that the model outputs are explainable and comply with regulatory requirements. Which Vertex AI feature should they use?

A.Vertex AI Model Registry

B.Vertex Explainable AI

C.Vertex AI Safety Settings

D.Vertex AI AutoML

AnswerB

Explainable AI provides attributions for model predictions, aiding regulatory compliance.

Why this answer

Vertex Explainable AI provides feature importance and explanations for model predictions. Option A is wrong because Model Registry is for version management. Option B is wrong because Safety Settings block harmful content but don't provide explanations.

Option D is wrong because AutoML is for model training, not explainability.

Practice this question →

27

MCQhard

A company deploys a Gemini model on Vertex AI for a healthcare application. They need to ensure that the model does not generate medical advice and that responses are grounded in trusted medical sources. Which combination of safety measures should they implement?

A.Enable safety filters and use Vertex AI Grounding with a labeled medical dataset

B.Use Vertex AI Grounding with a public dataset and disable safety filters

C.Enable safety filters only, without grounding

D.Fine-tune the model on a curated medical dataset and disable safety filters for faster responses

AnswerA

This combination ensures safety and factual grounding.

Why this answer

Option D is correct because safety filters and grounded generation with a labeled dataset ensure compliance. Option A is wrong because fine-tuning alone does not enforce safety filters. Option B is wrong because disabling grounding increases hallucination risk.

Option C is wrong because using a public dataset without safety filters is risky.

Practice this question →

28

Multi-Selecthard

A machine learning engineer is tuning a large language model on Vertex AI for question answering. They want to evaluate the model's performance before deployment. Which THREE metrics should they consider?

Select 3 answers

A.Cost per training epoch

B.F1 score

C.Exact match (EM)

D.Training time per epoch

E.ROUGE-L score

AnswersB, C, E

F1 balances precision and recall.

Why this answer

Options A, B, and D are correct for evaluation. Option C is wrong because training time is not a performance metric. Option E is wrong because cost per epoch is a cost metric, not performance.

Practice this question →

29

MCQeasy

A developer wants to integrate Gemini multimodal capabilities (text + image) into a mobile app using Python. Which Google Cloud client library should they use?

A.Dialogflow CX

B.Vertex AI client library (google-cloud-aiplatform)

C.Cloud Vision API

D.Natural Language API

AnswerB

The Vertex AI client library supports Gemini API for multimodal generation.

Why this answer

The Google Generative AI client library (or Vertex AI client library for Gemini API) supports multimodal inputs. Option A is wrong because Cloud Vision is for image analysis, not generative. Option B is wrong because Natural Language is text-only.

Option D is wrong because Dialogflow is for conversational agents, not direct API calls.

Practice this question →

30

Multi-Selectmedium

A company is building a generative AI application that must adhere to strict data residency regulations. Which TWO Google Cloud features can help ensure that data does not leave a specific geographic region?

Select 2 answers

A.Using the regional endpoint for Vertex AI

B.Vertex AI Model Caching

C.Global load balancer with Cloud Armor

D.Cloud CDN for content delivery

E.Deploying models on dedicated VMs in a specific region

AnswersA, E

Regional endpoints ensure API calls stay within the region.

Why this answer

Options B and D are correct. Dedicated VMs and the regional endpoint for Vertex AI ensure data stays in a specified region. Option A is wrong because model caching may use regional resources but does not enforce residency.

Option C is wrong because Cloud CDN distributes data globally. Option E is wrong because Global Load Balancer routes traffic globally.

Practice this question →

31

MCQeasy

You are using Vertex AI Model Garden to deploy a Llama model. Which deployment option provides the best latency for real-time inference?

A.Use Batch Prediction

B.Deploy to a Compute Engine VM

C.Deploy to Vertex AI Endpoint with a fixed number of replicas

D.Use MaaS (Model-as-a-Service) with autoscaling

AnswerC

Fixed replicas ensure always-on instances for low latency.

Why this answer

Option C is correct because a fixed number of replicas avoids cold starts and provides consistent low latency. Option A is wrong because Compute Engine VMs require manual management and may not be optimized. Option B is wrong because MaaS with autoscaling can have cold starts.

Option D is wrong because batch prediction is for asynchronous workloads, not real-time.

Practice this question →

32

MCQeasy

A startup wants to generate images from text descriptions for their marketing materials. They prefer a managed service that requires minimal coding. Which Google Cloud generative AI offering should they use?

A.Vertex AI Imagen

B.Natural Language API

C.Document AI

D.Cloud Speech-to-Text

AnswerA

Imagen provides text-to-image generation capabilities via Vertex AI.

Why this answer

Imagen on Vertex AI provides a managed API for text-to-image generation with simple API calls. Option A is wrong because Speech-to-Text is for audio transcription. Option B is wrong because Natural Language AI is for text analysis.

Option D is wrong because Document AI is for document processing.

Practice this question →

33

Multi-Selecthard

Which TWO actions can help reduce latency for an online prediction endpoint served by a large language model on Vertex AI? (Select TWO.)

Select 2 answers

A.Enable response caching for frequent similar queries

B.Increase the max input token limit to capture more context

C.Use automatic scaling to add more replicas

D.Deploy a smaller distilled version of the model

E.Set the model to preemptible instances

AnswersA, D

Caching avoids model inference for duplicate or similar requests, reducing latency.

Why this answer

Using a smaller model (A) and enabling response caching (D) can reduce latency. B increases latency. C reduces latency but is not a direct action (the platform handles it).

E does not exist as a standard optimization.

Practice this question →

34

MCQmedium

A data scientist runs the above command to upload a model to Vertex AI Model Registry. The model is a TensorFlow 2.6 model trained on tabular data. After deployment to an endpoint, the prediction latency is higher than expected. What is the most likely cause?

A.The artifact URI points to a single file instead of a directory

B.The model should be uploaded with a different display name

C.The container image used is CPU-only, but a GPU-accelerated image would improve latency

D.The model is uploaded to the wrong region

AnswerC

Using a CPU-only container for inference can be slower; a GPU image can reduce latency.

Why this answer

The command uses a tf2-cpu image; GPU-optimized images offer faster inference for many models. Option A is wrong because the command is correct. Option B is wrong because the artifact URI is a directory, and the command is correct.

Option C is wrong because the region is specified.

Practice this question →

35

MCQmedium

A developer is using Vertex AI's Generative AI Studio to prototype a text summarization model. The initial results are too verbose. What is the most efficient way to adjust the output length without retraining?

A.Switch to a smaller base model like BERT

B.Use a separate classifier to filter long responses

C.Fine-tune the model with a dataset of concise summaries

D.Modify the prompt with specific length instructions and adjust model parameters

AnswerD

Prompt engineering and parameter tuning can control output.

Why this answer

Option B is correct because adjusting parameters like max output tokens, temperature, and top_p directly controls verbosity in the prompt design. Option A is wrong because retraining is unnecessary. Option C is wrong because building a separate classifier adds complexity.

Option D is wrong because switching to a smaller base model may not yield desired quality.

Practice this question →

36

MCQmedium

Why is the model responding in English despite the prompt asking for French translation?

A.The model endpoint is configured for English only

B.The temperature is too high, causing random outputs

C.The system instruction to translate to French was not set; the user prompt alone is not sufficient

D.The maxOutputTokens is too low to complete the translation

AnswerC

Gemini requires system instruction for task specification.

Why this answer

Option C is correct because in Google Cloud's Vertex AI and Generative AI offerings, the system instruction is a separate, persistent directive that sets the model's behavior, such as language output. The user prompt alone, even if it asks for a French translation, is not sufficient to override the default language of the model; the system instruction must explicitly specify the target language. Without this instruction, the model defaults to its training language (typically English), regardless of the user's request.

Exam trap

The trap here is that candidates assume a user prompt's explicit instruction (e.g., 'Translate to French') is enough to override the model's default language, but in Google Cloud's Generative AI, the system instruction is the authoritative control for persistent behavior, not the user prompt.

How to eliminate wrong answers

Option A is wrong because model endpoints in Vertex AI are not configured for a specific language; they serve all languages the model supports, and language behavior is controlled via system instructions or prompt engineering, not endpoint configuration. Option B is wrong because a high temperature increases randomness in token selection but does not cause the model to ignore a language instruction; it would still attempt to follow the prompt, albeit with more creative or varied outputs, not systematically output English. Option D is wrong because maxOutputTokens limits the length of the response, not the language; if set too low, the model would produce a truncated translation, not switch to English.

Practice this question →

37

MCQeasy

A startup wants to generate product descriptions from a few keywords using a large language model. They have no prior ML experience and need the fastest time-to-market. Which Google Cloud service should they use?

A.Vertex AI Studio

B.Vertex AI Workbench with custom training

C.Vertex AI Agent Builder

D.Vertex AI Model Garden

AnswerA

No-code prompt engineering and testing.

Why this answer

Vertex AI Studio provides a no-code/low-code environment with pre-trained foundation models and prompt templates, enabling rapid generation of product descriptions from keywords without any ML expertise. It offers the fastest time-to-market because it eliminates the need for custom model training, infrastructure setup, or coding, directly leveraging Google's generative AI capabilities through a simple interface.

Exam trap

The trap here is that candidates might confuse Vertex AI Studio with Vertex AI Model Garden, thinking Model Garden offers a faster path because it lists models, but Model Garden still requires deployment and configuration steps, whereas Studio provides immediate generation capabilities.

How to eliminate wrong answers

Option B is wrong because Vertex AI Workbench with custom training requires writing code, selecting models, and managing training jobs, which demands ML experience and significantly increases time-to-market compared to using a pre-built solution. Option C is wrong because Vertex AI Agent Builder is designed for creating conversational agents and chatbots, not for generating product descriptions from keywords; it adds unnecessary complexity and overhead for this simple text generation task. Option D is wrong because Vertex AI Model Garden is a repository of pre-trained models that still requires users to select, deploy, and potentially fine-tune models, which involves ML knowledge and setup time, not offering the fastest path for a non-ML team.

Practice this question →

38

Multi-Selecteasy

Which TWO factors are most important when choosing a base foundation model for fine-tuning on a domain-specific task?

Select 2 answers

A.Model size and architecture

B.Model popularity in the developer community

C.Relevance of the model's training data to the target domain

D.Model license (open-source vs. proprietary)

E.Inference latency of the base model

AnswersA, C

Larger models may have better performance but higher cost; architecture affects fine-tuning ease.

Why this answer

Options A and D are correct. Model size/architecture affects capability and cost; training data relevance ensures domain knowledge transfer. Option B (model license) is less critical for fine-tuning feasibility.

Option C (popularity) is not a technical factor. Option E (inference latency) can be optimized post-fine-tuning, but choice of base model matters less.

Practice this question →

39

MCQmedium

A marketing team wants to use Vertex AI to generate ad copy. They need the model to follow a specific tone and style. What is the best approach?

A.Use Vertex AI Grounding to retrieve style guides

B.Provide few-shot examples in the prompt and adjust temperature

C.Fine-tune the model on a dataset of past ad copy

D.Enable safety filters to enforce brand guidelines

AnswerB

Few-shot prompting can guide tone effectively.

Why this answer

Option A is correct because prompt engineering with examples is the most flexible and quick method. Option B is wrong because fine-tuning requires large labeled datasets. Option C is wrong because grounding retrieves external info, not tone.

Option D is wrong because safety filters do not control style.

Practice this question →

40

MCQmedium

A healthcare company is building a chatbot to answer patient queries based on their medical documents stored in Cloud Storage. They want to minimize latency and ensure data residency in the EU. Which Vertex AI service should they use?

A.Vertex AI Model Garden with fine-tuning

B.Vertex AI Search with document grounding

C.Vertex AI Agent Builder with web search

D.Vertex AI Codey APIs

AnswerB

Supports private document indexing and data residency controls.

Why this answer

Vertex AI Search with document grounding is correct because it allows the chatbot to ground responses in the customer's own medical documents stored in Cloud Storage, ensuring low latency through optimized indexing and retrieval, while supporting data residency controls to keep data within the EU. This service is specifically designed for enterprise search and Q&A over private document repositories, making it ideal for healthcare use cases requiring compliance and fast responses.

Exam trap

The trap here is that candidates may confuse Vertex AI Search (which grounds in private documents) with Vertex AI Agent Builder (which defaults to web search), or assume fine-tuning is necessary for domain-specific Q&A when retrieval-augmented generation (RAG) with document grounding is the correct approach for minimizing latency and ensuring data residency.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Garden with fine-tuning is intended for selecting and customizing foundation models, not for directly grounding answers in specific documents; it would require additional retrieval infrastructure and does not natively enforce data residency. Option C is wrong because Vertex AI Agent Builder with web search grounds responses in public web data, not private medical documents, and cannot guarantee data residency in the EU. Option D is wrong because Vertex AI Codey APIs are specialized for code generation and completion, not for answering queries based on document content.

Practice this question →

41

MCQhard

A healthcare startup is using Vertex AI Imagen to generate synthetic medical images for training a diagnostic model. The images must comply with HIPAA regulations and cannot contain any real patient data. The team fine-tuned Imagen on a dataset of de-identified medical scans. However, during testing, they notice that some generated images closely resemble specific patients from the original dataset, even though the dataset was de-identified. They suspect that the model memorized some training examples. The team needs to address this issue without losing image quality. They have access to the original training data and Vertex AI tools. What action should they take?

A.Use a post-processing step to blur or distort generated images.

B.Re-tune the model using differential privacy (DP-SGD) to prevent memorization of individual examples.

C.Increase the size of the training dataset by adding more synthetic images.

D.Apply stricter output safety filters to block images that look like any known patient.

AnswerB

Differential privacy limits what the model can learn about specific training examples, reducing memorization.

Why this answer

Option D is correct. Differential privacy during fine-tuning adds noise to prevent memorization. Option A (more data) might not help if the model is overfitting.

Option B (stronger safety filters) won't prevent uniqueness recall. Option C (post-processing) only alters output after generation, doesn't fix memorization.

Practice this question →

42

Multi-Selectmedium

Which TWO features are available in Vertex AI Agent Builder to enhance the conversational abilities of an agent? (Choose TWO.)

Select 2 answers

A.Slot filling

B.Sentiment analysis

C.Code execution

D.Intent matching

E.Knowledge base integration

AnswersA, D

Slot filling collects required parameters from user input.

Why this answer

Slot filling is correct because it allows the agent to collect required parameters (slots) from the user during a conversation, enabling multi-turn interactions to fulfill complex requests. In Vertex AI Agent Builder, slot filling is a core feature for conversational agents, as it systematically prompts for missing information (e.g., date, location) until all necessary slots are filled, enhancing the agent's ability to handle dynamic user inputs.

Exam trap

The trap here is that candidates often confuse 'knowledge base integration' as a core conversational feature, but it is actually a retrieval-augmented generation (RAG) capability for grounding, not a direct mechanism for managing dialogue flow like slot filling or intent matching.

Practice this question →

43

MCQmedium

A retail company is building a chatbot for customer service. They need the model to generate product descriptions based on a catalog but also answer questions about store policies. The team wants to minimize latency and cost while maintaining high accuracy. Which Google Cloud generative AI offering should they use?

A.Vertex AI Model Garden with PaLM 2

B.Vertex AI Imagen

C.Vertex AI Codey APIs

D.Vertex AI Gemini API

AnswerD

Gemini offers multimodal capabilities and is optimized for both text generation and comprehension tasks.

Why this answer

Option B is correct because Gemini is a multimodal model that can handle both product descriptions (text) and policy questions, and Vertex AI offers optimized inference for latency and cost. Option A (Imagen) is for image generation, not text. Option C (Codey) is for code generation.

Option D (PaLM 2 via Model Garden) is possible but Gemini is more modern and efficient.

Practice this question →

44

MCQhard

An organization is deploying a summarization model on Vertex AI and needs to ensure that the model's responses are consistent and avoid hallucinations. They have a labeled dataset of source documents and human-written summaries. Which approach would best align the model with their quality requirements?

A.Deploy the model with a larger max_output_tokens

B.Use prompt engineering with few-shot examples

C.Increase the temperature to 0.9

D.Perform supervised fine-tuning using their labeled dataset

AnswerD

Fine-tuning adapts the model to the specific summarization style and reduces errors.

Why this answer

Supervised fine-tuning on a high-quality dataset specific to the task reduces hallucinations and improves consistency.

Practice this question →

45

MCQmedium

A company needs to fine-tune a foundation model on Vertex AI for a custom text classification task with only 500 labeled examples. They want to minimize cost while achieving high accuracy. What is the MOST cost-effective approach?

A.Fine-tune the foundation model using full fine-tuning on the entire dataset.

B.Use model distillation to train a smaller student model.

C.Use Vertex AI LLM-based evaluation to compare multiple large models and select the best one.

D.Design prompts with few-shot examples and test it with the available data.

AnswerD

Prompt engineering with few-shot examples is low-cost and effective for small datasets.

Why this answer

Option B is correct because with a small dataset, prompt engineering and few-shot examples are often sufficient and much cheaper than fine-tuning. Fine-tuning (C) risks overfitting and higher cost. Model distillation (D) requires a large teacher model.

Evaluation (A) is not a training method.

Practice this question →

46

MCQmedium

Refer to the exhibit. You ran the gcloud command to list a model, but received this error. What is the most likely issue?

A.The model's artifact URI is wrong

B.The model wasn't uploaded correctly

C.The model is missing a serving container

D.The model is in a different project

AnswerC

The error explicitly says 'no serving container image'.

Why this answer

Option B is correct because the error clearly states the model has no serving container image. Option A is wrong because a failed upload would show a different error. Option C is wrong because artifact URI issues would show a different message.

Option D is wrong because cross-project listing would require additional flags.

Practice this question →

47

MCQeasy

An engineer is testing a generative AI application using the Gemini API. They receive a 400 error with message 'INVALID_ARGUMENT: text has been blocked.' What is the most likely cause?

A.The specified Gemini model version does not exist

B.The input text was flagged by a safety filter

C.The API key is invalid

D.The API quota has been exceeded

AnswerB

Safety filters block inappropriate content and return 400 with blocked message.

Why this answer

Option B is correct because the error indicates the input text triggered a safety filter. Option A (quota exceeded) would give 429. Option C (authentication) gives 401/403.

Option D (model not found) gives 404.

Practice this question →

48

MCQeasy

A team wants to fine-tune a PaLM 2 model with their own data on Vertex AI. What is the recommended way to prepare the training data?

A.TFRecord files

B.JSON Lines file with 'input_text' and 'output_text' keys

C.CSV file with prompt and completion columns

D.Pickle serialized objects

AnswerB

JSONL with the correct keys is required.

Why this answer

Fine-tuning for PaLM expects data in JSON Lines format with 'input_text' and 'output_text' fields.

Practice this question →

49

Multi-Selectmedium

A company is evaluating Google Cloud's generative AI offerings for enterprise use. Which TWO considerations are most important when selecting the right model deployment option?

Select 2 answers

A.Data residency

B.Developer preference

C.Model size

D.Latency requirements

E.Training time

AnswersA, D

Data residency is often a legal or compliance requirement.

Why this answer

Latency requirements and data residency are critical enterprise considerations. Model size and training time are secondary, and developer preference is not a primary factor.

Practice this question →

50

MCQmedium

A healthcare company is building a chatbot to answer patient queries using Vertex AI Agent Builder. They want to ensure the chatbot only uses approved medical references and does not generate unverified advice. How should they configure the agent?

A.Set up grounding with a private data store containing verified medical documents

B.Enable strict safety filters to block any medical advice

C.Increase the temperature parameter to get more diverse responses

D.Use Vertex AI Model Monitoring to track answer accuracy

AnswerA

Grounding restricts responses to the provided data store, ensuring only approved references are used.

Why this answer

Grounding with a curated data store ensures the chatbot only retrieves information from approved sources. Option B is wrong because safety filters block categories but not unverified content. Option C is wrong because high temperature increases creativity, risking unverified answers.

Option D is wrong because model monitoring detects drift but does not restrict sources.

Practice this question →

51

MCQeasy

A developer is configuring a Vertex AI Agent Builder agent to use grounding. They receive the above error when calling the API. What is the most likely cause?

A.The data store was just created and is not yet propagated

B.The agent is not authenticated to access the data store

C.The data store has not been created in the specified project

D.The grounding configuration is missing required permissions

AnswerC

The error indicates the referenced data store does not exist; it needs to be created first.

Why this answer

The error says the data store does not exist. The developer likely created a data store in a different project or did not create one. Option A is wrong because authentication errors give different messages.

Option C is wrong because permissions errors are 403. Option D is wrong because a race condition would not produce 'does not exist'.

Practice this question →

52

MCQmedium

A security team wants to prevent prompt injection attacks on their generative AI application hosted on Vertex AI. Which best practice should they implement?

A.Use a custom model instead of a foundation model

B.Disable all logging

C.Use a private endpoint

D.Implement input validation and output filtering

AnswerD

This helps detect and block malicious prompts and undesired outputs.

Why this answer

Input validation and output filtering are key defenses against prompt injection. Custom models do not inherently prevent it; disabling logging reduces visibility; private endpoints do not block injection.

Practice this question →

53

MCQeasy

A retail company wants to build a customer service chatbot that can handle returns, order status, and FAQs. They need to integrate with their existing backend systems. Which Google Cloud service should they use?

A.Vertex AI Model Garden

B.Vertex AI Agent Builder

C.Vertex AI Search

D.Vertex AI Codey API

AnswerB

Provides tools for building chatbots with backend integration.

Why this answer

Vertex AI Agent Builder is the correct choice because it provides a low-code platform specifically designed for building conversational AI agents (chatbots) that can be integrated with enterprise backend systems via APIs, connectors, and custom tools. It supports grounding in enterprise data, multi-turn dialogue management, and seamless integration with existing systems for handling returns, order status, and FAQs, making it the most suitable service for this use case.

Exam trap

The trap here is that candidates may confuse Vertex AI Agent Builder with Vertex AI Search or Model Garden, assuming any generative AI service can build a chatbot, but only Agent Builder provides the necessary conversational orchestration and backend integration capabilities required for a production customer service chatbot.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Garden is a repository of pre-trained and foundation models for discovery and deployment, not a service for building conversational agents with backend integration. Option C is wrong because Vertex AI Search is optimized for enterprise search and information retrieval over structured and unstructured data, not for building multi-turn conversational chatbots that require backend system integration. Option D is wrong because Vertex AI Codey API is focused on code generation and code-related tasks (e.g., code completion, chat, and generation), not on building customer service chatbots that interact with backend systems.

Practice this question →

54

MCQeasy

A developer wants to use Gemini 1.5 Pro to analyze hour-long video content and generate a summary. Which feature of Gemini 1.5 Pro is most suitable for this task?

A.Long context window (up to 1 million tokens)

B.Multimodal generation from text and images

C.Code generation and debugging

D.Function calling to retrieve external data

AnswerA

The long context allows ingesting entire video content for summarization.

Why this answer

Option D is correct because Gemini 1.5 Pro's long context window (up to 1M tokens) allows processing entire videos. Option A (multimodal generation) is useful but not the key feature. Option B (function calling) is for APIs.

Option C (code generation) is not relevant.

Practice this question →

55

MCQhard

You are a machine learning engineer at a healthcare startup. Your team has developed a generative AI model that summarizes patient medical records. The model is deployed on Vertex AI Endpoints using a custom container. You have configured the endpoint with a single n1-standard-4 machine (4 vCPUs, 15 GB memory) without accelerators. The model uses a small transformer architecture. During load testing with 50 concurrent requests, you observe that the average latency is 8 seconds, which exceeds the requirement of 2 seconds. Additionally, some requests time out after 10 seconds. You suspect the CPU is the bottleneck. You also notice that the model inference code uses TensorFlow but is not optimized for inference. Which action should you take to reduce latency?

A.Reduce the model size by pruning and quantization, then redeploy.

B.Increase the request timeout to 30 seconds to accommodate the latency.

C.Enable autoscaling to handle the load with multiple replicas.

D.Deploy the model on a machine with a GPU and use TensorRT for inference optimization.

AnswerD

GPU acceleration and model optimization can drastically reduce latency.

Why this answer

Option D is correct because the CPU is identified as the bottleneck, and deploying on a GPU with TensorRT optimization directly addresses this by accelerating the TensorFlow inference. TensorRT optimizes the model graph and fuses layers, significantly reducing latency for transformer-based models, which is essential to meet the 2-second requirement.

Exam trap

The trap here is that candidates may choose autoscaling (Option C) thinking it handles high concurrency, but they overlook that the per-request latency remains unchanged on CPU, failing to meet the 2-second requirement.

How to eliminate wrong answers

Option A is wrong because pruning and quantization reduce model size and can improve latency, but they may degrade model accuracy and do not address the fundamental CPU bottleneck as effectively as GPU acceleration with TensorRT. Option B is wrong because increasing the timeout to 30 seconds does not reduce latency; it only masks the problem, and requests still exceed the 2-second requirement, leading to poor user experience. Option C is wrong because autoscaling adds more replicas to handle concurrent requests, but each request still runs on a CPU-bound n1-standard-4 machine, so the per-request latency remains high and does not solve the CPU bottleneck.

Practice this question →

56

MCQeasy

A startup wants to quickly integrate a generative AI chatbot into their customer support platform. They need a solution that can answer questions based on their internal knowledge base with minimal setup. Which Google Cloud service should they use?

A.Use Model Garden to deploy a pre-built Q&A model

B.Call the Gemini API directly and implement grounding logic manually

C.Use Cloud AI Notebooks to fine-tune a model on their knowledge base

D.Use Vertex AI Agent Builder to create a conversational agent grounded in their data

AnswerD

Agent Builder offers pre-built components for grounding and conversation flow, enabling rapid deployment.

Why this answer

Option C is correct because Vertex AI Agent Builder provides a no-code/low-code environment to build conversational agents grounded in enterprise data, making it ideal for quick integration with a knowledge base. Option A (Gemini API with manual grounding) requires more development effort. Option B (Cloud AI Notebooks) is for data science, not production deployment.

Option D (Model Garden) is for accessing and deploying models but not for building a complete agent.

Practice this question →

57

MCQeasy

A data scientist wants to fine-tune a foundation model from Vertex AI Model Garden on their custom dataset. They want to choose a cost-effective method that updates only a small subset of parameters. Which fine-tuning approach should they use?

A.Full fine-tuning

B.Prompt tuning

C.Parameter-Efficient Fine-Tuning (PEFT) like LoRA

D.Reinforcement Learning from Human Feedback (RLHF)

AnswerC

PEFT methods update only a small subset of parameters.

Why this answer

Option C is correct because Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA (Low-Rank Adaptation) are specifically designed to update only a small subset of parameters (e.g., low-rank matrices injected into transformer layers) while keeping the majority of the foundation model frozen. This drastically reduces memory and compute costs compared to full fine-tuning, making it the most cost-effective choice for customizing a model from Vertex AI Model Garden on a custom dataset.

Exam trap

The trap here is that candidates often confuse prompt tuning (which does not update model parameters) with parameter-efficient fine-tuning (which updates a small subset of parameters), leading them to incorrectly select Option B as a cost-effective method for updating parameters.

How to eliminate wrong answers

Option A is wrong because full fine-tuning updates all model parameters, which is computationally expensive and memory-intensive, contradicting the requirement for a cost-effective method that updates only a small subset of parameters. Option B is wrong because prompt tuning is a soft-prompt technique that does not update any model parameters; instead, it learns a small set of virtual tokens prepended to the input, which is not a parameter-efficient fine-tuning method (it is a prompt-based approach). Option D is wrong because Reinforcement Learning from Human Feedback (RLHF) is a training paradigm that uses human preferences to align model behavior, typically requiring multiple models (reward model, policy model) and full or PEFT fine-tuning, and it is not primarily a cost-effective method for updating a small subset of parameters on a custom dataset.

Practice this question →

58

MCQmedium

A company is deploying a chatbot that must ensure customer data remains within the European Union. Which approach should they take?

A.Use Vertex AI Agent Builder with global endpoint

B.Use the Gemini API with a regional endpoint in europe-west4

C.Use Vertex AI with a multi-region endpoint

D.Deploy a custom model on GKE in a specific region

AnswerB

Regional endpoints ensure data remains in the specified region.

Why this answer

The Gemini API offers regional endpoints, such as europe-west4, which can restrict data processing to that region. Vertex AI multi-region endpoints may not guarantee EU-only residency.

Practice this question →

59

MCQmedium

A global e-commerce company wants to translate product descriptions into 50 languages with high accuracy. They need to handle domain-specific terms (e.g., 'size chart', 'return policy'). Which approach should they use?

A.Use the Gemini API with a prompt like 'Translate to French'

B.Build a custom agent with Vertex AI Agent Builder

C.Use Vertex AI Translation with custom glossaries

D.Use Imagen to generate translated images

AnswerC

Custom glossaries ensure domain-specific terms are translated correctly.

Why this answer

Option C is correct because Vertex AI Translation with custom glossaries is specifically designed for high-accuracy, domain-specific translations. Custom glossaries allow you to define precise translations for terms like 'size chart' and 'return policy', ensuring consistency across 50 languages. This approach leverages Google's neural machine translation models while overriding generic translations with your business-specific terminology.

Exam trap

The trap here is that candidates may confuse general-purpose generative AI APIs (like Gemini) with specialized translation services, or assume that any AI model can handle domain-specific translation without customization, when in fact glossaries are required for consistent, accurate terminology.

How to eliminate wrong answers

Option A is wrong because the Gemini API is a general-purpose generative AI model, not a specialized translation service; it lacks built-in support for custom glossaries and may produce inconsistent or hallucinated translations for domain-specific terms. Option B is wrong because Vertex AI Agent Builder is designed for building conversational agents and workflows, not for bulk, high-accuracy translation tasks; it would require significant custom development to replicate glossary-based translation. Option D is wrong because Imagen is a text-to-image generation model, not a translation tool; it cannot translate text and would be irrelevant for translating product descriptions.

Practice this question →

60

Multi-Selecthard

Which THREE factors should be considered when choosing between Gemini 1.5 Pro and Gemini 1.5 Flash for a customer-facing chatbot? (Choose three.)

Select 3 answers

A.Cost constraints: Flash is more cost-effective per token

B.Task complexity: Pro is better for complex reasoning

C.Safety filters: Pro has stricter safety defaults

D.Latency requirements: Flash provides faster responses

E.Multimodal capability: Flash does not support image input

AnswersA, B, D

Flash is cheaper.

Why this answer

Option A is correct because Gemini 1.5 Flash is designed as a cost-optimized model, offering significantly lower per-token pricing compared to Gemini 1.5 Pro. For a customer-facing chatbot with high query volumes, cost efficiency is a primary consideration, making Flash the more economical choice for routine interactions.

Exam trap

The trap here is that candidates often assume Flash lacks multimodal capabilities or that Pro has stricter safety defaults, when in fact both models share the same safety configuration and both support multimodal inputs, with the key differentiators being cost, latency, and task complexity.

Practice this question →

61

Multi-Selectmedium

Which THREE capabilities does Vertex AI Agent Builder provide out of the box? (Select THREE.)

Select 3 answers

A.Automatic escalation to human agents when confidence is low

B.Fine-tuning of foundation models using custom datasets

C.Enterprise search over internal documents

D.Grounding with enterprise data sources like Cloud Storage and BigQuery

E.Pre-built customer service and sales agents

AnswersA, D, E

Agent Builder supports fallback to human agents based on confidence thresholds.

Why this answer

Agent Builder includes pre-built agents (A), grounding (D), and automatic fallback (E). B is done via Model Garden, not Agent Builder directly. C is a feature of Vertex AI Search.

Practice this question →

62

MCQhard

A data scientist is using Vertex AI Model Registry to manage multiple versions of a custom text classification model. They need to ensure that only the version that passes all evaluation metrics can be deployed to a Vertex AI Endpoint for online predictions. What deployment strategy should they use?

A.Use A/B testing between versions and manually select the best performer

B.Set up continuous evaluation with model monitoring to auto-promote versions that meet thresholds

C.Manually track version IDs and deploy the latest version

D.Deploy all versions to a single endpoint and route traffic manually

AnswerB

Continuous evaluation automatically checks metrics and can auto-promote versions that pass defined thresholds.

Why this answer

Model Registry supports continuous evaluation and can automatically promote versions that meet thresholds. Option A is wrong because manual version control is error-prone. Option B is wrong because A/B testing is for traffic splitting, not automatic promotion.

Option D is wrong because batch predictions are offline.

Practice this question →

63

Multi-Selecthard

An organization is building a search application using Vertex AI Vector Search. They have encoded their documents into embeddings and want to retrieve the most similar documents for a query. Which TWO actions are required to set up a Vector Search index?

Select 2 answers

A.Specify embedding dimension size in the index config.

B.Deploy the index to the IndexEndpoint.

C.Train a custom embedding model.

D.Download the query embeddings to local storage.

E.Create an IndexEndpoint resource.

AnswersB, E

Deployment makes the index available for querying.

Why this answer

A and D are correct: Creating an IndexEndpoint and deploying the index to an endpoint are necessary steps. B is not required because dimensions come from the model, not set here. C is done after deployment.

E is for training, not serving.

Practice this question →

64

Multi-Selectmedium

Which TWO options are benefits of using Vertex AI Model Garden compared to using raw pre-trained models from external sources? (Choose two.)

Select 2 answers

A.Lower cost compared to using generic APIs

B.Ability to fine-tune models on custom data

C.Integration with Vertex AI tools like evaluation and monitoring

D.Simplified deployment and scaling with Vertex AI endpoints

E.Guaranteed data privacy and no data sharing

AnswersC, D

Native integration with Vertex AI ecosystem.

Why this answer

Option C is correct because Vertex AI Model Garden is deeply integrated with the Vertex AI ecosystem, providing native access to tools like Vertex AI Evaluation for model performance assessment and Vertex AI Monitoring for drift detection and observability. This integration eliminates the need for custom pipelines to connect external models with these managed services, streamlining the MLOps workflow.

Exam trap

Google Cloud often tests the distinction between inherent platform benefits (like integration and managed deployment) versus features that are not exclusive to Model Garden (like fine-tuning or cost), leading candidates to mistakenly select options that are generally true for any model but not unique advantages of Model Garden.

Practice this question →

65

Multi-Selecthard

Which THREE capabilities are provided by Vertex AI Agent Builder? (Choose three.)

Select 3 answers

A.Automated model hyperparameter tuning.

B.Integration with Dialogflow CX for conversational flows.

C.Support for multimodal (text, image, video) input processing in agents.

D.Creating custom agents with memory and tool integration.

E.Built-in grounding with Google Search to improve answer accuracy.

AnswersB, D, E

Agent Builder can leverage Dialogflow CX for advanced conversational design.

Why this answer

Options A, B, and D are correct. Vertex AI Agent Builder allows creating custom agents with memory and tools (A), integrates with Dialogflow CX (B), and provides built-in grounding with Google Search (D). Automated hyperparameter tuning (C) is a feature of Vertex AI Training, not Agent Builder.

Multimodal inputs (E) are supported by Gemini models but not a built-in capability of Agent Builder.

Practice this question →

66

MCQmedium

A financial services company is building a customer-facing chatbot using Vertex AI Gemini API to answer questions about account balances, transactions, and branch locations. The chatbot must adhere to strict data privacy regulations (e.g., GDPR) that prohibit sending personally identifiable information (PII) to the model provider. The architecture uses a retrieval-augmented generation (RAG) approach where customer queries are passed to a Cloud Run service, which queries a BigQuery database for relevant data and then sends the context along with the query to the Gemini API. The team is concerned that the context may contain PII. They want to minimize modifications to the existing architecture. What step should the team take to ensure compliance?

A.Build a separate anonymization pipeline using Cloud Data Loss Prevention to remove PII before sending context.

B.Modify the chatbot to reject any query that might contain PII based on regex patterns.

C.Route all requests through a third-party proxy that strips PII before sending to Gemini.

D.Configure the Gemini API to disable data logging and use the Enterprise tier that ensures data stays within Google Cloud's controls.

AnswerD

Enterprise features of Vertex AI provide data residency and no logging, satisfying compliance with minimal changes.

Why this answer

Option B is correct. Vertex AI Gemini API can be configured to not log prompts and responses, and data is processed within Google Cloud's infrastructure (data residency). Option A (third-party proxy) adds complexity and latency.

Option C (always reject PII queries) would break functionality. Option D (anonymizing pipeline) is costly and not minimal.

Practice this question →

67

MCQhard

A retail company has deployed a customer support chatbot using Vertex AI Agent Builder. The chatbot is configured with a knowledge base stored in BigQuery (user manuals) and Cloud Storage (product images). The agent uses a Gemini 1.5 Pro model for response generation. Users report that the chatbot frequently gives incorrect answers and sometimes does not reference the knowledge base at all. Logs show high latency (average response time > 10 seconds) and many responses are generic or hallucinated. The agent's grounding configuration currently uses the default settings. The development team is considering the following actions: A) Switch to a smaller model like Gemini 1.5 Flash to reduce latency. B) Increase the context window of the model to allow more knowledge base content. C) Enable Vertex AI Search for grounding and configure a search aggregation strategy that retrieves relevant documents from the knowledge base. D) Fine-tune the Gemini model with the company's historical chat logs to improve domain-specific responses. Which action should the team take FIRST to address the issues?

A.Switch to a smaller model like Gemini 1.5 Flash to reduce latency.

B.Enable Vertex AI Search for grounding and configure a search aggregation strategy that retrieves relevant documents from the knowledge base.

C.Increase the context window of the model to allow more knowledge base content.

D.Fine-tune the Gemini model with the company's historical chat logs to improve domain-specific responses.

AnswerB

This directly improves retrieval accuracy and ensures the model references the knowledge base, addressing both hallucination and latency (by retrieving only relevant content).

Why this answer

Option C is correct because the symptoms indicate the agent is not effectively retrieving and leveraging the knowledge base. Enabling grounding with Vertex AI Search and configuring search aggregation directly addresses the incorrect answers and lack of knowledge base usage. Reducing model size (A) might help latency but not accuracy.

Increasing context window (B) could hurt performance further. Fine-tuning (D) is costly and may not fix retrieval issues without proper grounding.

Practice this question →

68

MCQhard

A company is using Vertex AI for multimodal generative AI to analyze images and text. They need to ensure that the model's outputs are auditable and can be traced back to the input data. Which feature should they enable?

A.Vertex AI Feature Store

B.Vertex AI Experiments

C.Cloud Logging

D.Vertex AI Model Monitoring with Explainable AI

AnswerD

Model Monitoring with Explainable AI provides attribution and traceability.

Why this answer

Option D is correct because Vertex AI Model Monitoring with Explainable AI provides feature attributions that map model predictions back to specific input features (e.g., pixels in images or tokens in text). This creates an auditable trail by quantifying how each input contributed to the output, enabling traceability for compliance and debugging. The other options lack the direct input-to-output attribution required for auditability.

Exam trap

The trap here is that candidates confuse operational logging (Cloud Logging) or experiment tracking (Vertex AI Experiments) with the specific need for input-to-output attribution, which only Explainable AI provides for auditability.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature data, but it does not provide per-prediction attribution or traceability from model outputs back to specific inputs. Option B is wrong because Vertex AI Experiments tracks training runs, hyperparameters, and metrics, but it focuses on model development history, not on explaining individual inference outputs. Option C is wrong because Cloud Logging captures operational logs (e.g., API calls, errors) but does not generate feature-level explanations or attributions that link a specific output to its input data.

Practice this question →

69

MCQhard

A company is deploying a large language model on Vertex AI for real-time inference. They observe high latency and want to optimize. They have already enabled model caching. What next step should they take to reduce latency?

A.Add more GPUs to the prediction endpoint

B.Use a larger, more accurate model variant

C.Increase the batch size for inference requests

D.Apply model quantization to reduce precision

AnswerD

Quantization speeds up inference with minimal accuracy loss.

Why this answer

Option D is correct because reducing model precision (e.g., int8 quantization) can significantly reduce compute time and latency. Option A is wrong because increasing batch size increases latency per request. Option B is wrong while scaling predicts may help, it's not a model optimization.

Option C is wrong because using a larger model increases latency.

Practice this question →

70

MCQhard

A financial services company wants to use Vertex AI Grounding with enterprise data to power a regulatory compliance chatbot. They have strict data residency requirements: data must remain in the EU. What should they do?

A.Enable Data Residency by selecting a EU region during data store creation

B.Use a VPN to the US region

C.Convert data to private tokens

D.Use a Vertex AI endpoint in a European region

AnswerA

Data stores for grounding are region-specific; selecting EU ensures data stays in EU.

Why this answer

Option B is correct because when creating a data store for grounding, you must select a region in the EU to meet data residency. Option A is wrong because the endpoint region does not determine data storage location. Option C is wrong because a VPN does not change data residency.

Option D is wrong because private tokens are unrelated to residency.

Practice this question →

71

MCQeasy

A retailer wants to use generative AI to write product descriptions automatically. They have a large dataset of existing product descriptions and need to customize a foundation model for their brand voice. Which Vertex AI feature should they use?

A.Vertex AI Search with grounding

B.Prompt design with the Gemini API directly

C.Vertex AI Model Evaluation

D.Vertex AI custom model tuning

AnswerD

Tuning adapts the model to the retailer's brand voice using their dataset.

Why this answer

Tuning (like supervised fine-tuning) allows customizing a pretrained model with your own data. Option A is wrong because prompt design is a technique, not a Vertex AI feature. Option B is wrong because Vertex AI Search is for search, not text generation customization.

Option D is wrong because Model Evaluation assesses performance, not customization.

Practice this question →

72

MCQeasy

A data scientist wants to generate realistic product images for an online catalog using Google Cloud's generative AI. Which service should they use?

A.Imagen on Vertex AI

B.Codey API for code generation

C.Gemini API with text-to-text prompts

D.Vertex AI Model Garden without a specific model

AnswerA

Imagen is purpose-built for image generation.

Why this answer

Option B is correct because Imagen is Google Cloud's image generation model. Option A is wrong because Gemini is a multimodal model but not specialized for image generation. Option C is wrong because Codey is for code generation.

Option D is wrong because Vertex AI Model Garden is a model repository, not a specific generation service.

Practice this question →

73

Multi-Selecteasy

A developer is using the Vertex AI PaLM API to generate code. They want to ensure the output is safe and adheres to company policies. Which THREE attributes can they configure in the safety_settings parameter?

Select 3 answers

A.Language detection

B.Sentiment analysis

C.Toxicity

D.Harassment

E.Sexually explicit content

AnswersC, D, E

Toxicity is a standard safety category.

Why this answer

B, D, and E are standard safety categories for PaLM API. A and C are not available as safety categories.

Practice this question →

74

Multi-Selecthard

Which THREE benefits does Vertex AI Agent Builder provide over building a custom conversational agent from scratch?

Select 3 answers

A.Automatic scaling and load balancing

B.Pre-built integration for grounding on enterprise data sources

C.Full control over the underlying ML model architecture

D.Built-in safety filters and guardrails

E.Guaranteed lower inference latency

AnswersA, B, D

Managed service scales according to demand without manual intervention.

Why this answer

Options B, C, and E are correct. Pre-built grounding with your data reduces development effort; built-in safety filters ensure compliance; automatic scaling handles traffic without manual ops. Option A (full control over ML models) is more true for custom builds.

Option D (lower latency) is not guaranteed; custom builds can optimize latency.

Practice this question →

75

MCQmedium

A company is using Vertex AI Model Registry to manage multiple versions of its custom generative model. They want to automatically route a percentage of traffic to a new model version for testing. What should they do?

A.Set up a Cloud Tasks queue to distribute requests

B.Create a new endpoint for each version

C.Deploy both versions to the same endpoint and adjust traffic split settings

D.Use a load balancer in front of the endpoints

AnswerC

Vertex AI endpoints allow splitting traffic percentage across deployed models.

Why this answer

Vertex AI Endpoints support traffic splitting between model versions.

Practice this question →

Page 1 of 2 · 127 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Google Cloud Gen Ai Offerings questions.

Start 20-question session