Sample questions
Google Cloud Generative AI Leader Generative AI Leader practice questions
A data scientist is trying to get online predictions from a Vertex AI endpoint but receives the error shown. What is the most likely cause?
Exhibit
Refer to the exhibit. ``` ERROR: Prediction failed: model 'projects/my-project/locations/us-central1/models/123' is not deployed to endpoint 'projects/my-project/locations/us-central1/endpoints/456'. Deploy the model to the endpoint before sending prediction requests. ```
Trap 1: The region in the request does not match the endpoint region
The region matches; the error would be different if regions mismatched.
Trap 2: The endpoint ID is incorrect
The endpoint ID is correctly referenced; the error is about the model not being deployed.
Trap 3: The model ID is incorrect
The model ID is correctly referenced; the error is about deployment status.
- A
The region in the request does not match the endpoint region
Why wrong: The region matches; the error would be different if regions mismatched.
- B
The model has not been deployed to the specified endpoint
The error message directly states the model is not deployed to the endpoint.
- C
The endpoint ID is incorrect
Why wrong: The endpoint ID is correctly referenced; the error is about the model not being deployed.
- D
The model ID is incorrect
Why wrong: The model ID is correctly referenced; the error is about deployment status.
A data scientist notices that a text generation model deployed on Vertex AI returns repetitive outputs after a few turns in a chat application. What is the most likely cause and the best parameter adjustment?
Trap 1: The max_output_tokens is too low; increase it to allow more diverse…
Max tokens controls length, not repetition.
Trap 2: The model is overfitted; switch to a smaller model.
Overfitting is unlikely in pre-trained models; repetition is a decoding issue.
Trap 3: The temperature is too low; increase temperature to add randomness.
Low temperature makes output more deterministic, increasing repetition.
- A
The max_output_tokens is too low; increase it to allow more diverse output.
Why wrong: Max tokens controls length, not repetition.
- B
The top_p value is too high; reduce top_p to limit token sampling.
Reducing top_p narrows the token pool, reducing repetition.
- C
The model is overfitted; switch to a smaller model.
Why wrong: Overfitting is unlikely in pre-trained models; repetition is a decoding issue.
- D
The temperature is too low; increase temperature to add randomness.
Why wrong: Low temperature makes output more deterministic, increasing repetition.
A company is deploying a generative AI model for medical diagnosis support. Which THREE considerations are critical for responsible AI?
Trap 1: Maximize model throughput to handle high volumes.
Throughput is secondary to safety.
Trap 2: Use the cheapest model to reduce costs.
Cost should not compromise quality or safety.
- A
Ensure the training data is diverse and representative.
Diverse data reduces bias.
- B
Maximize model throughput to handle high volumes.
Why wrong: Throughput is secondary to safety.
- C
Implement human oversight for all diagnostic suggestions.
Human in the loop ensures safety.
- D
Provide clear disclaimers about the model's limitations.
Transparency is essential for responsible AI.
- E
Use the cheapest model to reduce costs.
Why wrong: Cost should not compromise quality or safety.
Which THREE considerations are critical when deploying a generative AI model using Vertex AI Endpoints for a latency-sensitive application? (Choose THREE.)
Trap 1: Number of model versions
Model versioning does not directly affect latency.
Trap 2: Number of model instances
While important, autoscaling handles instance count dynamically.
- A
Model size and architecture
Larger models introduce higher latency.
- B
Number of model versions
Why wrong: Model versioning does not directly affect latency.
- C
GPU type and number
GPU selection impacts inference speed.
- D
Autoscaling configuration
Proper autoscaling ensures low latency under varying load.
- E
Number of model instances
Why wrong: While important, autoscaling handles instance count dynamically.
A company is deploying a generative AI model for customer support. They want to reduce hallucinations while maintaining fluency. They have a large dataset of previous support conversations. Which strategy should they prioritize?
Trap 1: Increase the beam search width to 10.
Wider beam search improves fluency but not factual accuracy.
Trap 2: Fine-tune the model on the conversation dataset.
Fine-tuning may help but doesn't guarantee factual grounding for new queries.
Trap 3: Set the temperature to 0.1.
Low temperature reduces creativity but doesn't add factual grounding.
- A
Increase the beam search width to 10.
Why wrong: Wider beam search improves fluency but not factual accuracy.
- B
Implement retrieval-augmented generation (RAG) using the conversation dataset as a knowledge base.
RAG retrieves relevant facts from the dataset, reducing hallucinations.
- C
Fine-tune the model on the conversation dataset.
Why wrong: Fine-tuning may help but doesn't guarantee factual grounding for new queries.
- D
Set the temperature to 0.1.
Why wrong: Low temperature reduces creativity but doesn't add factual grounding.
Which TWO techniques are commonly used to control the style and tone of a generative model's output?
Trap 1: Adjusting the temperature
Temperature controls randomness, not style.
Trap 2: Modifying the top_k value
Top_k affects token selection diversity.
Trap 3: Changing the top_p value
Top_p controls nucleus sampling.
- A
Adjusting the temperature
Why wrong: Temperature controls randomness, not style.
- B
Modifying the top_k value
Why wrong: Top_k affects token selection diversity.
- C
Fine-tuning on a dataset with desired style
Fine-tuning adapts the model to a specific style.
- D
Prompt engineering with style instructions
Prompts can specify desired style.
- E
Changing the top_p value
Why wrong: Top_p controls nucleus sampling.
A startup is building a generative AI content creation tool. They want to minimize operational costs while maintaining low latency for end users. Which deployment strategy should they adopt?
Trap 1: Deploy the model on edge devices to reduce cloud dependency.
Edge devices may lack the compute power for large models and increase maintenance.
Trap 2: Build an on-premises infrastructure to avoid cloud egress fees.
On-premises requires significant capital expenditure and may not scale efficiently.
Trap 3: Provision dedicated GPU instances for consistent performance.
Dedicated GPUs are costly and may be underutilized.
- A
Deploy the model on edge devices to reduce cloud dependency.
Why wrong: Edge devices may lack the compute power for large models and increase maintenance.
- B
Build an on-premises infrastructure to avoid cloud egress fees.
Why wrong: On-premises requires significant capital expenditure and may not scale efficiently.
- C
Use a serverless inference endpoint that scales to zero when not in use.
Serverless aligns cost with usage and auto-scales to meet demand.
- D
Provision dedicated GPU instances for consistent performance.
Why wrong: Dedicated GPUs are costly and may be underutilized.
A manufacturing company wants to use generative AI to create maintenance manuals from sensor data. The manuals must be accurate and reflect the latest equipment configurations. Which approach best ensures data freshness and consistency?
Trap 1: Train the model in real-time as sensor data streams in.
Real-time training is computationally expensive and may cause instability.
Trap 2: Periodically retrain the model with the latest sensor data.
Retraining cycles may lead to outdated information between updates.
Trap 3: Have human technicians review and update the manuals manually.
Manual updates are slow and error-prone.
- A
Train the model in real-time as sensor data streams in.
Why wrong: Real-time training is computationally expensive and may cause instability.
- B
Periodically retrain the model with the latest sensor data.
Why wrong: Retraining cycles may lead to outdated information between updates.
- C
Have human technicians review and update the manuals manually.
Why wrong: Manual updates are slow and error-prone.
- D
Use a retrieval-augmented generation (RAG) system that queries a live database of sensor configurations.
RAG ensures responses are based on the most current data.
A marketing agency wants to generate images using Imagen on Vertex AI. They need to ensure the images are unique and avoid copyright issues. Which parameter adjustment is most relevant?
Trap 1: Increase training steps
Not applicable to inference-time generation.
Trap 2: Increase seed variability
Increases output diversity but doesn't avoid copyright.
Trap 3: Set safety threshold
Filters offensive content, not copyright.
- A
Increase training steps
Why wrong: Not applicable to inference-time generation.
- B
Increase seed variability
Why wrong: Increases output diversity but doesn't avoid copyright.
- C
Use negative prompts
Specifies elements to avoid, reducing copyright risk.
- D
Set safety threshold
Why wrong: Filters offensive content, not copyright.
A retail company with a large FAQ database wants to build a generative AI customer service chatbot that can answer questions accurately with up-to-date information. Which business strategy should they prioritize?
Trap 1: Train a new model from scratch using the FAQ data.
Training from scratch is expensive, time-consuming, and still requires retraining for updates.
Trap 2: Fine-tune a foundational model on the entire FAQ dataset.
Fine-tuning would require frequent retraining to keep up with updates and may still fail on out-of-distribution queries.
Trap 3: Use a general-purpose language model without any customization.
A general-purpose model may hallucinate or provide generic answers that are not aligned with the company's specific knowledge base.
- A
Use retrieval-augmented generation (RAG) with vector search on the FAQ database.
RAG retrieves current, relevant information from the database, providing accurate and fresh responses without model retraining.
- B
Train a new model from scratch using the FAQ data.
Why wrong: Training from scratch is expensive, time-consuming, and still requires retraining for updates.
- C
Fine-tune a foundational model on the entire FAQ dataset.
Why wrong: Fine-tuning would require frequent retraining to keep up with updates and may still fail on out-of-distribution queries.
- D
Use a general-purpose language model without any customization.
Why wrong: A general-purpose model may hallucinate or provide generic answers that are not aligned with the company's specific knowledge base.
A company wants to measure the business impact of a GenAI content generation tool. Which metric is most appropriate?
Trap 1: Number of model parameters
Parameter count is irrelevant to business impact.
Trap 2: Model accuracy on a test set
Accuracy does not directly translate to business impact; it measures model performance, not efficiency or revenue.
Trap 3: Training loss
Training loss is used during model development, not for measuring business outcome.
- A
Reduction in content production time
This metric directly measures the business value of automation and efficiency.
- B
Number of model parameters
Why wrong: Parameter count is irrelevant to business impact.
- C
Model accuracy on a test set
Why wrong: Accuracy does not directly translate to business impact; it measures model performance, not efficiency or revenue.
- D
Training loss
Why wrong: Training loss is used during model development, not for measuring business outcome.
A retail company wants to integrate generative AI into its customer service chatbot to handle routine inquiries. They have a limited budget and want to launch quickly. Which strategy is most appropriate?
Trap 1: Partner with a generative AI vendor for a custom solution
Partnering often involves longer timelines and higher costs.
Trap 2: Fine-tune an open-source model on their customer service logs
Fine-tuning requires ML expertise and may not be quick to deploy.
Trap 3: Build a custom LLM from scratch using the company's own data
Building from scratch is prohibitively expensive and time-consuming for a small budget.
- A
Partner with a generative AI vendor for a custom solution
Why wrong: Partnering often involves longer timelines and higher costs.
- B
Use pre-trained models via Google Cloud's Generative AI Studio API
Using pre-trained models via API is cost-effective and fast to implement.
- C
Fine-tune an open-source model on their customer service logs
Why wrong: Fine-tuning requires ML expertise and may not be quick to deploy.
- D
Build a custom LLM from scratch using the company's own data
Why wrong: Building from scratch is prohibitively expensive and time-consuming for a small budget.
A team set a budget alert for their GenAI API usage at $10,000. They received the alert with current spend of $12,500. Which business action is most appropriate as a first step?
Exhibit
Refer to the exhibit.
```
{
"budgetDisplayName": "genai-budget",
"alertThresholdExceeded": 1.0,
"costAmount": 12500,
"budgetAmount": 10000,
"alertName": "projects/123456789/budgets/12345"
}
```Trap 1: Pause all non-critical use cases immediately
Pausing without analyzing which use cases are driving cost could halt valuable projects unnecessarily.
Trap 2: Switch to a cheaper model provider
Switching providers is a major undertaking and may not be necessary if usage can be optimized on the current platform.
Trap 3: Increase the budget by 50% to $15,000
Increasing the budget without analysis may lead to continued overspend and does not address the underlying usage growth.
- A
Pause all non-critical use cases immediately
Why wrong: Pausing without analyzing which use cases are driving cost could halt valuable projects unnecessarily.
- B
Switch to a cheaper model provider
Why wrong: Switching providers is a major undertaking and may not be necessary if usage can be optimized on the current platform.
- C
Review usage patterns and optimize prompt lengths and frequencies
Optimizing usage is the most cost-effective first step; it can reduce consumption without disrupting operations.
- D
Increase the budget by 50% to $15,000
Why wrong: Increasing the budget without analysis may lead to continued overspend and does not address the underlying usage growth.
A company is considering monetizing a generative AI-powered product. Which two business models are most common and viable?
Trap 1: Free with advertising.
Ad-supported models are rare for specialized AI tools and may not generate sufficient revenue.
Trap 2: One-time license fee for the model.
One-time fees are uncommon for cloud-hosted AI; ongoing costs for inference and maintenance require recurring revenue.
Trap 3: Selling user data collected from interactions.
Selling user data raises privacy concerns and is illegal in many jurisdictions.
- A
Free with advertising.
Why wrong: Ad-supported models are rare for specialized AI tools and may not generate sufficient revenue.
- B
One-time license fee for the model.
Why wrong: One-time fees are uncommon for cloud-hosted AI; ongoing costs for inference and maintenance require recurring revenue.
- C
Pay-per-use based on tokens consumed.
Pay-per-use matches costs to usage, common in cloud API services.
- D
Subscription tiered by usage.
Subscription provides predictable recurring revenue and aligns with user consumption.
- E
Selling user data collected from interactions.
Why wrong: Selling user data raises privacy concerns and is illegal in many jurisdictions.
A team built a GenAI chatbot that uses a vector database to retrieve context. Users report irrelevant responses. What is the most likely business strategy issue?
Trap 1: The model is too small to generate accurate responses
Model size affects generation quality, but irrelevant responses often stem from poor retrieval, not generation capability.
Trap 2: The chatbot is too verbose
Verbosity does not cause irrelevance; it is a style parameter.
Trap 3: The system is overfitting to the training data
Overfitting would cause the model to memorize training data, not retrieve irrelevant context.
- A
The model is too small to generate accurate responses
Why wrong: Model size affects generation quality, but irrelevant responses often stem from poor retrieval, not generation capability.
- B
The chatbot is too verbose
Why wrong: Verbosity does not cause irrelevance; it is a style parameter.
- C
The system is overfitting to the training data
Why wrong: Overfitting would cause the model to memorize training data, not retrieve irrelevant context.
- D
The embedding model is not aligned with the domain vocabulary
If the embeddings do not capture domain-specific meanings, retrieved context will be irrelevant, leading to poor answers.
An organization uses a fine-tuned model for medical diagnosis and must comply with HIPAA. Which measure is essential when deploying the model on Vertex AI?
Trap 1: Store all patient data in Cloud Storage with object versioning.
Data storage is separate from model deployment; versioning does not address access restrictions.
Trap 2: Enable encryption at rest for all resources.
Encryption at rest is necessary but not sufficient; network access control is also required.
Trap 3: Use a publicly accessible endpoint for faster response times.
Public endpoints violate HIPAA by exposing PHI to the internet.
- A
Store all patient data in Cloud Storage with object versioning.
Why wrong: Data storage is separate from model deployment; versioning does not address access restrictions.
- B
Enable encryption at rest for all resources.
Why wrong: Encryption at rest is necessary but not sufficient; network access control is also required.
- C
Use a publicly accessible endpoint for faster response times.
Why wrong: Public endpoints violate HIPAA by exposing PHI to the internet.
- D
Use a private Google Cloud Access and disable external internet access for the endpoint.
This ensures the endpoint is not publicly accessible, a key requirement for HIPAA.
A prompt engineer wants to improve the model's adherence to a specific output format (e.g., always start with a greeting). Which technique should they try first?
Trap 1: Use a lower temperature to make the output more deterministic.
Lower temperature reduces randomness but does not enforce a specific format.
Trap 2: Fine-tune the model on many examples of the desired format.
Fine-tuning is effective but more costly and should be considered after prompt engineering.
Trap 3: Modify the model's tokenizer to encode the format rules.
Tokenizer modification is not a standard or practical approach.
- A
Use a lower temperature to make the output more deterministic.
Why wrong: Lower temperature reduces randomness but does not enforce a specific format.
- B
Fine-tune the model on many examples of the desired format.
Why wrong: Fine-tuning is effective but more costly and should be considered after prompt engineering.
- C
Include a system instruction at the beginning of the prompt that specifies the desired format.
System instructions set global behavior and are the easiest first step.
- D
Modify the model's tokenizer to encode the format rules.
Why wrong: Tokenizer modification is not a standard or practical approach.
Refer to the exhibit. A user with this IAM role tries to deploy a model to a Vertex AI Endpoint but fails. What is the most likely reason?
Exhibit
{
"bindings": [
{
"role": "roles/aiplatform.user",
"members": [
"user:user@example.com"
]
}
]
}Trap 1: The user is not authorized to use Vertex AI at all
The aiplatform.user role does authorize use of Vertex AI, just not deployment.
Trap 2: The model artifact is not in the same region as the endpoint
While region mismatch can cause issues, the most common IAM-related deployment failure is missing permissions.
Trap 3: The user needs the roles/aiplatform.admin role
Admin role is more permissive but not specifically required for deployment.
- A
The user is not authorized to use Vertex AI at all
Why wrong: The aiplatform.user role does authorize use of Vertex AI, just not deployment.
- B
The model artifact is not in the same region as the endpoint
Why wrong: While region mismatch can cause issues, the most common IAM-related deployment failure is missing permissions.
- C
The user needs the roles/aiplatform.deployer role
Deploying a model requires the aiplatform.deployer role or equivalent permissions.
- D
The user needs the roles/aiplatform.admin role
Why wrong: Admin role is more permissive but not specifically required for deployment.
What is the purpose of grounding in Vertex AI?
Trap 1: To improve training speed
Grounding is used during inference, not training.
Trap 2: To reduce model size for faster inference
Grounding does not affect model size; it augments the generation process.
Trap 3: To enable multi-modal inputs
Multi-modal inputs are handled by specific models, not grounding.
- A
To improve training speed
Why wrong: Grounding is used during inference, not training.
- B
To connect model outputs to verifiable sources
Grounding ensures the model's responses are based on authoritative information.
- C
To reduce model size for faster inference
Why wrong: Grounding does not affect model size; it augments the generation process.
- D
To enable multi-modal inputs
Why wrong: Multi-modal inputs are handled by specific models, not grounding.
A company's generative AI model is producing biased outputs. What is the most effective mitigation strategy?
Trap 1: Use a larger model with more parameters to improve overall accuracy
Larger models can still be biased; parameter count does not address bias directly.
Trap 2: Use prompt engineering to instruct the model to avoid biased…
Prompt engineering can reduce but not eliminate bias if the model's training data is skewed.
Trap 3: Increase the diversity of input samples by random sampling
Random sampling does not guarantee balanced representation of all groups.
- A
Use a larger model with more parameters to improve overall accuracy
Why wrong: Larger models can still be biased; parameter count does not address bias directly.
- B
Fine-tune the model using a balanced, representative dataset and implement output filtering
Balanced data reduces bias during training, and filters catch biased outputs in production.
- C
Use prompt engineering to instruct the model to avoid biased language
Why wrong: Prompt engineering can reduce but not eliminate bias if the model's training data is skewed.
- D
Increase the diversity of input samples by random sampling
Why wrong: Random sampling does not guarantee balanced representation of all groups.
A company wants to use Generative AI for customer support chatbots. They are concerned about cost and latency. Which deployment option best balances these concerns?
Trap 1: Deploy an open-source model on-premise to avoid cloud costs
On-premise deployment incurs significant infrastructure and maintenance costs, and latency may not be optimized without cloud TPUs.
Trap 2: Rely on a third-party chatbot API that abstracts the model
Third-party APIs can be convenient but often have per-query costs that scale linearly and may lack customization for specific business needs.
Trap 3: Use the largest available foundation model via API for highest…
Larger models cost more per token and have higher latency, which may not be necessary for simple chatbot tasks.
- A
Deploy an open-source model on-premise to avoid cloud costs
Why wrong: On-premise deployment incurs significant infrastructure and maintenance costs, and latency may not be optimized without cloud TPUs.
- B
Rely on a third-party chatbot API that abstracts the model
Why wrong: Third-party APIs can be convenient but often have per-query costs that scale linearly and may lack customization for specific business needs.
- C
Use the largest available foundation model via API for highest accuracy
Why wrong: Larger models cost more per token and have higher latency, which may not be necessary for simple chatbot tasks.
- D
Use a fine-tuned version of a smaller model on Vertex AI with response caching
A tuned smaller model reduces compute cost and caching minimizes repeated inference, lowering latency. Vertex AI provides scalable infrastructure.
Which TWO of the following are key differences between generative AI and discriminative AI? (Choose two.)
Trap 1: Generative models require less training data than discriminative…
Generative models often require more data to capture the full distribution.
Trap 2: Generative models cannot be used for supervised learning tasks like…
Generative models can be used for classification by computing P(Y|X) via Bayes rule.
Trap 3: Discriminative models always outperform generative models on tasks…
Performance varies; generative models can sometimes excel.
- A
Generative models can create new data samples, while discriminative models only assign labels to existing data.
Generation is a hallmark of generative AI.
- B
Generative models require less training data than discriminative models.
Why wrong: Generative models often require more data to capture the full distribution.
- C
Generative models cannot be used for supervised learning tasks like classification.
Why wrong: Generative models can be used for classification by computing P(Y|X) via Bayes rule.
- D
Generative models model the joint probability distribution of inputs and labels, whereas discriminative models model the conditional probability of labels given inputs.
This is a fundamental theoretical distinction.
- E
Discriminative models always outperform generative models on tasks like image classification.
Why wrong: Performance varies; generative models can sometimes excel.
A team uses Vertex AI to host a large language model. They want to reduce latency for real-time applications. What is the best strategy?
Trap 1: Increase number of replicas
More replicas improve throughput, not per-request latency.
Trap 2: Switch to a smaller model
Smaller may sacrifice quality; quantization preserves quality better.
Trap 3: Use batch prediction instead of online
Batch prediction is for non-real-time, not suitable for real-time applications.
- A
Increase number of replicas
Why wrong: More replicas improve throughput, not per-request latency.
- B
Switch to a smaller model
Why wrong: Smaller may sacrifice quality; quantization preserves quality better.
- C
Use model quantization
Quantization reduces model size and speeds up inference.
- D
Use batch prediction instead of online
Why wrong: Batch prediction is for non-real-time, not suitable for real-time applications.
Which THREE of the following are common techniques to reduce harmful biases in generative AI models? (Choose three.)
Trap 1: Decrease the model's temperature parameter to make outputs more…
Temperature does not address bias; it affects randomness.
Trap 2: Conduct a legal review of all generated outputs before release.
This is a process after deployment, not a technique to train the model.
- A
Use reinforcement learning from human feedback (RLHF) with a reward model that penalizes biased or unfair outputs.
RLHF can shape model behavior to avoid biased generations.
- B
Curate diverse and balanced training datasets that overrepresent underrepresented groups.
Balanced data reduces model bias toward majority groups.
- C
Decrease the model's temperature parameter to make outputs more deterministic.
Why wrong: Temperature does not address bias; it affects randomness.
- D
Apply adversarial training to remove protected attribute information from hidden representations.
Adversarial debiasing forces the model to not encode sensitive attributes.
- E
Conduct a legal review of all generated outputs before release.
Why wrong: This is a process after deployment, not a technique to train the model.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.