Generative AI Leader Practice Test 4 — 25 Questions

Question 1

A global e-commerce company is using Vertex AI to build a generative AI chatbot for customer support. The chatbot is powered by the Gemini 1.5 Pro model and uses a vector search index for retrieval-augmented generation (RAG) over product documentation. The company has deployed the application in four regions (us-central1, europe-west4, asia-east1, and australia-southeast1) using a multi-region deployment with a global endpoint. The application is critical and requires high availability with a target latency of under 500ms for the RAG pipeline. Recently, users in Australia are experiencing inconsistent latency spikes, with response times exceeding 2 seconds during peak hours. The team suspects that the issue is related to the vector search index's replication and serving configuration. The index has 10 million embeddings with a dimension of 768. It is stored in a single regional bucket in us-central1, and the vector search index endpoint is deployed in all four regions with the same deployed index ID. The team is using the default configuration for index updates and serving. Which action should the team take to resolve the latency issue for Australian users?

Accepted Answer

Move the vector search index to a multi-regional Cloud Storage bucket (e.g., 'us') to reduce latency for index updates.. The latency spike for Australian users is caused by the vector search index being stored in a single regional bucket in us-central1. When the index is updated, the new embeddings must be rebuilt and streamed from us-central1 to the australia-southeast1 endpoint, introducing significant cross-region latency. Moving the index to a multi-regional Cloud Storage bucket (e.g., 'us') allows the index to be served from a location closer to all regions, reducing the update propagation time and improving consistency for Australian users.

Answer

Create a new regional bucket in australia-southeast1 and store a copy of the index there, then redeploy the vector search index endpoint to use the local bucket.

Answer

Deploy a separate vector search index endpoint for each region with its own index copy stored in a regional bucket in that region.

Answer

Increase the number of replicas for the vector search index in all regions to improve throughput and reduce latency.

Question 2

A large enterprise is deploying a generative AI-powered code assistant for their developers. The solution uses Vertex AI with a fine-tuned Codey model. The security team requires that all prompts and responses be logged for audit purposes, but the logs must not contain sensitive information such as API keys or passwords. The operations team is concerned about high latency during peak usage. You need to design a solution that meets security requirements without compromising performance. Which approach should you take?

Accepted Answer

Enable Vertex AI model monitoring with Cloud Logging, and configure a log sink with a custom exclusion filter to redact sensitive patterns before storing. Option B is correct because it uses Vertex AI model monitoring with Cloud Logging to capture prompts and responses, then applies a custom exclusion filter with a log sink to redact sensitive patterns (e.g., API keys, passwords) in real time before logs are stored. This meets the security requirement for audit logging without sensitive data while avoiding the latency overhead of post-processing or a custom proxy, thus satisfying the operations team's performance concern.

Answer

Use Cloud Audit Logs to capture all API calls to Vertex AI, but do not log the actual prompts and responses

Answer

Log all prompts and responses to Cloud Storage and use a Cloud DLP job to scan and redact sensitive data periodically

Answer

Implement a custom proxy that logs all requests after stripping sensitive data, then forward to the model

Question 3

A large enterprise runs a production application that uses the Gemini API on Vertex AI for real-time content moderation. They are experiencing occasional 429 (Too Many Requests) errors during peak hours. Their current quota is 1000 requests per minute (RPM) and they are hitting around 950 RPM on average, with spikes up to 1050. They have already implemented exponential backoff and retry logic. They need to reduce the error rate without reducing the quality of moderation. Which additional measure should they take?

Accepted Answer

Implement a local caching layer for common moderation queries.. Option C is correct because implementing a local caching layer for common moderation queries reduces the number of identical requests sent to the Gemini API, directly lowering the effective RPM without compromising moderation quality. Since the enterprise is already using exponential backoff and retry logic, caching addresses the root cause of hitting quota limits by eliminating redundant API calls, which is a standard pattern for rate-limit mitigation in production AI workloads.

Answer

Deploy the model on a dedicated Vertex AI endpoint with autoscaling.

Answer

Switch to a lower-tier model like Gemini 1.0 Pro to reduce quota consumption.

Answer

Request a quota increase from Google Cloud support.

Question 4

A developer is using the Gemini API to build a chatbot. They want the model to always respond in a friendly, professional tone. Which prompt engineering technique should they use?

Accepted Answer

Set system instructions to 'You are a friendly and professional assistant.'. Option A is correct because setting system instructions is the most direct and reliable way to define the model's persona and behavioral constraints. In the Gemini API, system instructions act as a persistent, top-level directive that influences every response, ensuring the chatbot consistently adopts a friendly and professional tone without requiring repeated examples or parameter tuning.

Answer

Include a few-shot example in every user message.

Answer

Set the temperature to 0.2.

Answer

Set max output tokens to 100.

Question 5

Refer to the exhibit. What is the most likely cause of this error?

Accepted Answer

The user does not have the required IAM role. The error shown in the exhibit is an HTTP 403 Forbidden response, which indicates that the server understood the request but refuses to authorize it. In Google Cloud, this is most commonly caused by the user's identity lacking the necessary IAM role or permission to call the specific API or access the resource. Even if the project ID is correct and the network is functional, a missing IAM role (e.g., `aiplatform.user` or `roles/aiplatform.user`) will result in this exact error.

Answer

The model is too large

Answer

The network is down

Answer

The project ID is incorrect

Question 6

An MLOps engineer wants to implement continuous evaluation of a generative model in production. Which Vertex AI component should they use?

Accepted Answer

Vertex AI Model Monitoring. Vertex AI Model Monitoring is the correct component because it provides continuous evaluation of model performance in production, including detecting prediction drift, data drift, and feature attribution drift. For generative models, it can monitor output quality and safety metrics over time, alerting engineers to degradation or shifts in model behavior without requiring manual intervention.

Answer

Vertex AI Feature Store

Answer

Vertex AI Prediction

Answer

Vertex AI Pipelines

Question 7

Refer to the exhibit. A developer runs this command but forgets to specify the model name. What will happen?

Accepted Answer

The command will fail with an error. In the context of the `gcloud ai models upload` command (or similar model deployment commands in Vertex AI), the model name is a required positional argument. If omitted, the CLI will fail with an error because it cannot proceed without a unique identifier to register the model in the model registry. The command does not default to any name or prompt interactively; it strictly validates required parameters before execution.

Answer

The command will prompt for a name

Answer

The model will be uploaded with a default name

Answer

The command will succeed but the model will be unlisted

Question 8

A company is building a conversational AI using the Gemini API on Vertex AI. They want to reduce the chance of generating toxic content while still allowing creative and engaging responses for their gaming community. Which TWO safety settings should they adjust in the safety_settings parameter?

Accepted Answer

Enable the 'harm_category' filter for 'DANGEROUS_CONTENT' with threshold BLOCK_ONLY_HIGH.. Option C is correct because setting the 'DANGEROUS_CONTENT' category to BLOCK_ONLY_HIGH allows the model to generate creative and engaging responses for a gaming community while still blocking the most severe dangerous content. This balances safety with creative freedom, as the gaming context may involve simulated conflict or action that is not genuinely harmful.

Answer

Set the threshold for 'SEXUALLY_EXPLICIT' category to BLOCK_NONE.

Answer

Set the threshold for 'SEXUALLY_EXPLICIT' category to BLOCK_LOW_AND_ABOVE.

Answer

Set the threshold for 'HARASSMENT' category to BLOCK_LOW_AND_ABOVE.

Question 9

Which TWO strategies are effective for reducing latency in a generative AI chat application deployed on Vertex AI? (Select 2)

Accepted Answer

Use streaming responses. Option B is correct because streaming responses reduce perceived latency by sending tokens to the client as they are generated, rather than waiting for the full response. This leverages server-sent events (SSE) or chunked transfer encoding to deliver partial results immediately, improving user experience in chat applications.

Answer

Deploy on TPU instead of GPU

Answer

Increase the max output tokens

Answer

Use larger batch sizes

Question 10

A research lab is fine-tuning a large language model on a small dataset of medical records. They observe that the model overfits, memorizing specific patient details and producing outputs that violate privacy regulations. Which technique should they apply to improve generalization and reduce memorization?

Accepted Answer

Apply differential privacy (DP-SGD) during fine-tuning. Differential privacy (DP-SGD) is the correct technique because it directly addresses memorization of sensitive patient data by adding calibrated noise to the gradient updates during fine-tuning. This bounds the model's ability to encode any single individual's information, improving generalization and ensuring compliance with privacy regulations like HIPAA.

Answer

Increase the batch size to 64

Answer

Increase the number of training epochs

Answer

Use early stopping based on validation loss

Question 11

Which THREE factors should be considered when choosing between fine-tuning and prompt engineering for a generative AI task? (Choose three.)

Accepted Answer

Availability of labeled training data. Option A is correct because fine-tuning requires a labeled dataset specific to the target task to adjust model weights via supervised learning, whereas prompt engineering relies on the model's existing knowledge without additional training data. Without sufficient labeled data, prompt engineering is often the only viable approach, as fine-tuning would risk overfitting or poor generalization.

Answer

Cost of API calls per request

Answer

Size of the base model

Question 12

A developer uses Vertex AI to generate code but the output is not syntactically correct. Which parameter should be adjusted?

Accepted Answer

temperature. Temperature controls the randomness of token selection during generation. A high temperature increases the likelihood of less probable tokens, which can lead to syntactically incorrect code. Lowering temperature makes the model more deterministic and conservative, favoring higher-probability tokens that are more likely to form valid syntax.

Answer

candidate_count

Answer

max_output_tokens

Answer

top_k

Question 13

Which THREE are valid methods to reduce bias in generative AI outputs?

Accepted Answer

Using a more diverse training dataset. Option C is correct because training on a more diverse dataset reduces representational bias by exposing the model to a wider range of demographics, cultures, and perspectives. This directly mitigates the model's tendency to overrepresent majority groups or underrepresent minorities, which is a root cause of biased outputs in generative AI.

Answer

Using only English prompts

Answer

Increasing model size

Question 14

A team is fine-tuning a large language model for medical advice. Which TWO techniques are most effective for improving the safety and reliability of the model's outputs?

Accepted Answer

Constitutional AI. Constitutional AI (A) is correct because it embeds a set of ethical principles directly into the model's training process, allowing the model to self-critique and revise its outputs to avoid harmful or unsafe medical advice. This technique proactively enforces safety constraints without requiring extensive human labeling, making it highly effective for high-stakes domains like healthcare.

Answer

Lowering the temperature to 0.0

Answer

Increasing training data size

Answer

Increasing top_p to 1.0

Question 15

A team is fine-tuning a large language model on custom data using Vertex AI. They find that the training loss decreases but validation loss increases. What is the best course of action?

Accepted Answer

Reduce the model size or add dropout regularization.. The increasing validation loss while training loss decreases is a classic sign of overfitting, where the model memorizes the training data but fails to generalize. Reducing model size or adding dropout regularization directly combats overfitting by limiting the model's capacity or introducing noise during training, which forces the model to learn more robust features. This is the best course of action because it addresses the root cause without further exacerbating the problem.

Answer

Increase the number of training epochs.

Answer

Increase the learning rate.

Answer

Switch to a smaller batch size.

Question 16

A team is using a pre-trained language model to summarize legal documents. They find that summaries often miss key dates and parties involved. Which technique would most effectively improve factual accuracy?

Accepted Answer

Fine-tune the model on a dataset of legal summaries with annotated key entities.. Fine-tuning on a dataset of legal summaries with annotated key entities directly teaches the model to recognize and reproduce critical factual elements like dates and parties. This supervised learning approach adjusts the model's weights to prioritize entity extraction and accurate generation, which is the most effective method for improving factual accuracy in domain-specific tasks.

Answer

Use top-p sampling with a low p value.

Answer

Increase the temperature parameter.

Answer

Use chain-of-thought prompting.

Question 17

A company is adopting generative AI for customer support. Which TWO strategies should they implement to manage risks related to brand reputation?

Accepted Answer

Establish a human-in-the-loop escalation process for sensitive interactions.. Option A is correct because a human-in-the-loop escalation process ensures that sensitive or ambiguous customer interactions are reviewed by a human agent before an AI-generated response is sent. This directly mitigates brand reputation risk by preventing the AI from inadvertently making offensive, legally problematic, or factually incorrect statements that could go viral. The human reviewer acts as a safety net, catching edge cases that automated filters might miss, such as nuanced sarcasm or cultural insensitivity.

Answer

Publish a disclaimer that the AI may make mistakes.

Answer

Deploy the model without any content filters to maximize helpfulness.

Answer

Disable customer support AI entirely to avoid any risk.

Question 18

A company wants to use GenAI to automate customer support. They have a large knowledge base. Which approach maximizes ROI in the first 6 months?

Accepted Answer

Use a pre-built conversational AI platform with Retrieval-Augmented Generation (RAG). Option B maximizes ROI in the first 6 months because it leverages a pre-built conversational AI platform integrated with Retrieval-Augmented Generation (RAG). RAG allows the model to dynamically retrieve relevant information from the existing knowledge base at inference time, providing accurate, context-aware responses without the need for costly retraining or custom model development. This approach balances rapid deployment, low upfront investment, and high accuracy, making it the most cost-effective solution for automating customer support quickly.

Answer

Deploy a general-purpose chatbot without customization

Answer

Build a custom LLM from scratch using their data

Answer

Fine-tune a foundation model on historical support tickets

Question 19

A business leader is developing a gen AI strategy. Which three key components should be included in the strategy?

Accepted Answer

Plan for responsible AI. Option B is correct because responsible AI is a foundational component of any generative AI strategy, ensuring ethical use, bias mitigation, and compliance with emerging regulations. Without a plan for responsible AI, the organization risks reputational damage, legal liability, and deployment failures due to lack of trust. This goes beyond simple fairness checklists to include continuous monitoring of model outputs for toxicity, hallucination, and privacy violations.

Answer

Focus solely on technology

Answer

Involve stakeholders across departments

Question 20

A startup with limited budget wants to quickly test a generative AI use case for personalized email marketing. Which approach minimizes time-to-market and cost?

Accepted Answer

Use a managed API like the PaLM API with prompt engineering.. Option D is correct because using a managed API like the PaLM API with prompt engineering eliminates the need for infrastructure setup, model training, and data preparation. This approach leverages a pre-trained model via a simple REST API call, allowing the startup to iterate on prompts and achieve personalized email content in hours rather than weeks, minimizing both time-to-market and cost.

Answer

Hire a team of AI researchers to build a solution.

Answer

Develop a custom model from scratch.

Answer

Fine-tune a large open-source model on internal data.