Knowledge + Practice

CCNA Techniques to Improve Generative AI Model Output Questions

75 of 121 questions · Page 1/2 · Techniques to Improve Generative AI Model Output · Answers revealed

Practice these questions Domain overview All questions

1

MCQhard

A healthcare startup uses a generative model fine-tuned on general medical literature to provide preliminary diagnostic suggestions from patient text. The model frequently misses rare diseases and sometimes suggests common conditions that are unlikely given the symptoms. The startup has a curated dataset of rare disease case reports and wants to improve the model’s sensitivity to rare conditions without sacrificing overall accuracy. They cannot afford to retrain the entire model from scratch. The model is deployed on Vertex AI Prediction with low latency requirement. Which approach should they take?

A.Perform continued fine-tuning on the rare disease dataset using a low learning rate.

B.Add a system prompt instructing the model to consider rare diseases more carefully.

C.Reduce top-p sampling to focus on high-probability tokens, assuming rare diseases have lower probability.

D.Implement a human-in-the-loop system: for outputs with low confidence or suspected rare disease, route to a human expert.

AnswerD

Human-in-the-loop catches edge cases without retraining, preserving accuracy for common conditions.

Why this answer

Option D is correct because implementing a human-in-the-loop process for rare disease flags combines AI with expert review, catching misses while maintaining speed for common cases. Option A is wrong because prompt engineering alone may not teach the model about rare diseases. Option B is wrong because increasing top-p restricts vocabulary but doesn't inject knowledge.

Option C is wrong because fine-tuning again might cause catastrophic forgetting of common conditions.

Practice this question →

2

MCQeasy

A data scientist is using Vertex AI generative AI studio to create a chatbot. The chatbot gives inconsistent answers to similar questions. Which parameter should they adjust to make responses more consistent?

A.Decrease temperature to 0.2

B.Increase top-p to 0.9

C.Increase presence penalty to 0.5

D.Decrease frequency penalty to 0.0

AnswerA

Lower temperature makes the model more deterministic, leading to more consistent outputs.

Why this answer

Option C is correct because lowering temperature makes the model more deterministic, reducing randomness. Option A is wrong because top-p affects diversity but not as directly as temperature. Option B is wrong because presence penalty encourages new topics, increasing variability.

Option D is wrong because frequency penalty reduces repetition but doesn't enforce consistency.

Practice this question →

3

Multi-Selecthard

A team is fine-tuning a large language model for medical advice. Which TWO techniques are most effective for improving the safety and reliability of the model's outputs?

Select 2 answers

A.Constitutional AI

B.Lowering the temperature to 0.0

C.Increasing training data size

D.Increasing top_p to 1.0

E.Reinforcement learning from human feedback (RLHF)

AnswersA, E

Constitutional AI uses predefined rules to guide model behavior.

Why this answer

Constitutional AI (A) is correct because it embeds a set of ethical principles directly into the model's training process, allowing the model to self-critique and revise its outputs to avoid harmful or unsafe medical advice. This technique proactively enforces safety constraints without requiring extensive human labeling, making it highly effective for high-stakes domains like healthcare.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (temperature, top_p) or data scaling alone can solve safety issues, when in fact alignment techniques like Constitutional AI and RLHF are specifically designed for that purpose.

Practice this question →

4

MCQmedium

Despite applying safety filters, a generative AI model still produces toxic outputs in some cases. Which additional technique should be applied?

A.Add more examples of toxic content to training

B.Increase the filter threshold

C.Use RLHF with human feedback to reduce toxicity

D.Decrease the model's temperature

AnswerC

Correct: RLHF explicitly trains the model away from toxic outputs.

Why this answer

RLHF using human feedback to penalize toxic responses directly reduces such outputs. Other options either weaken safety or do not target toxicity specifically.

Practice this question →

5

MCQhard

A research team is using a large language model to analyze medical research papers and generate summaries. They need to minimize hallucinations while retaining key details. They have access to a curated database of paper abstracts. Which approach is best?

A.Fine-tune the model on the entire database of papers.

B.Use chain-of-thought prompting to reason step-by-step.

C.Use few-shot prompting with examples of accurate summaries and set temperature=0.0.

D.Implement RAG to retrieve relevant abstracts and incorporate them into the prompt.

AnswerD

RAG provides direct factual context from the database.

Why this answer

Option B is correct because implementing RAG to retrieve relevant abstracts and incorporate them into the prompt directly grounds the output in the curated database, reducing hallucinations. Option A (few-shot with low temperature) does not prevent hallucination if the model lacks knowledge. Option C (fine-tuning on the entire database) is costly and may overfit.

Option D (chain-of-thought) improves reasoning but not factual grounding.

Practice this question →

6

MCQhard

A large e-commerce company deploys a generative AI chatbot on Vertex AI for customer service. The chatbot is powered by a fine-tuned model on the company's historical support tickets. Despite high accuracy on training topics, the chatbot frequently gives irrelevant or off-topic answers when customers ask about new products or promotions. The company maintains a comprehensive product catalog and a knowledge base of current promotions. The chatbot's prompts include a system instruction to 'Answer based on your knowledge' and no other retrieval mechanism. The response time requirement is under 3 seconds. Which course of action should the team take?

A.Implement a RAG pipeline that retrieves relevant product and promotion data from the knowledge base and injects it into the prompt.

B.Increase the temperature to encourage the model to generate more diverse answers.

C.Add additional safety filters to block irrelevant responses.

D.Fine-tune the model again on a larger dataset that includes recent support tickets.

AnswerA

RAG provides current, specific context to the model, directly improving relevance for new topics.

Why this answer

Option C is correct because implementing RAG with the product catalog allows real-time retrieval of current information, addressing the irrelevance for new products without needing retraining. Option A is wrong because fine-tuning again on outdated data won't help with new products. Option B is wrong because increasing temperature makes outputs more random and less focused.

Option D is wrong because adding more safety filters doesn't improve topical relevance.

Practice this question →

7

MCQeasy

A team uses a generative model to summarize lengthy legal documents. The summaries are accurate but often exceed the target length of 200 words, varying widely. Which simple adjustment should be applied to ensure consistent output length?

A.Fine-tune the model on summaries that are exactly 200 words.

B.Set the max output tokens parameter to 200.

C.Add a system prompt that says 'Summarize in exactly 200 words.'

D.Lower the temperature to reduce variability in word choices.

AnswerB

Max token limits directly truncate the output, enforcing the length constraint.

Why this answer

Option B is correct because setting a maximum token limit directly controls the output length. Option A is wrong because temperature affects creativity, not length. Option C is wrong because prompt engineering can request a specific length but the model may not strictly follow it; token limit is more reliable.

Option D is wrong because fine-tuning is heavy and may not be needed when a simple constraint works.

Practice this question →

8

Multi-Selecthard

Which THREE approaches are effective for reducing bias in generative model outputs? (Choose three.)

Select 3 answers

A.Set temperature to a very high value.

B.Use adversarial training.

C.Use a balanced training dataset.

D.Use prompt engineering to specify neutral tone.

E.Fine-tune on a debiased dataset.

AnswersC, D, E

Balanced data reduces representation bias.

Why this answer

Option C is correct because a balanced training dataset reduces the risk of the model learning spurious correlations or skewed distributions that lead to biased outputs. By ensuring that all demographic groups, topics, or perspectives are represented proportionally, the model's learned probability distribution is less likely to favor one group over another, directly mitigating representation bias at the data level.

Exam trap

The trap here is that candidates confuse randomness (high temperature) with fairness, or mistake adversarial training (a robustness technique) for a bias mitigation method, when in fact bias reduction requires data-level or fine-tuning interventions like balanced datasets, debiased fine-tuning, or prompt engineering.

Practice this question →

9

Multi-Selecthard

A team is fine-tuning a model for a legal document summarization task. They need to ensure high accuracy and avoid hallucinations. Which TWO approaches should they combine? (Choose two.)

Select 2 answers

A.Use Retrieval-Augmented Generation to retrieve relevant legal texts

B.Increase temperature to 1.5 during inference

C.Implement early stopping during fine-tuning

D.Incorporate a human-in-the-loop review process

E.Use character-level tokenization to improve spelling

AnswersA, D

RAG grounds the summary in actual documents, reducing hallucination.

Why this answer

Correct: A and D. A (RAG) provides source material to ground summaries. D (human-in-the-loop validation) catches errors before final output.

B (increase temperature) is counterproductive. C (early stopping) addresses overfitting but not factuality. E (character-level tokenization) is not relevant.

Practice this question →

10

MCQhard

An e-commerce company fine-tunes a model on customer reviews to generate product feedback summaries. They want to ensure the model does not reproduce toxic language from the training data. Besides filtering the training data, which additional technique is most effective at inference time?

A.Set temperature to 0.0 to reduce variance

B.Set top-k to 10 to limit token choices

C.Pass the model output through a toxicity detection model and conditionally regenerate or block

D.Use beam search with a high beam width

AnswerC

Inference-time filtering is a robust safety layer that catches toxic outputs without retraining.

Why this answer

Option A is correct because a toxicity classifier (e.g., Perspective API) applied to the output can block or flag toxic content. Option B is wrong because temperature reduction does not guarantee avoidance of toxic patterns. Option C is wrong because beam search may produce repetitive, not safer, outputs.

Option D is wrong because top-k sampling reduces randomness but does not filter toxicity.

Practice this question →

11

MCQeasy

A user provides a long document as context for a question-answering task, but the model outputs irrelevant answers. What is the most likely cause?

A.The document exceeds the model's context window, truncating important details.

B.Safety filters are blocking the relevant response.

C.The model's temperature is too low, making it deterministic.

D.The model is not generating any tokens.

AnswerA

Models have a maximum input length; exceeding it truncates the beginning or end.

Why this answer

The model's context window may be exceeded, causing loss of relevant information. Option B is wrong because temperature is unrelated to context length. Option C is wrong because safety filters would block content, not cause irrelevance.

Option D is wrong because tokens are always generated.

Practice this question →

12

MCQeasy

A social media company uses a generative AI model to moderate user posts. The model occasionally allows offensive content. Which safety technique should be implemented?

A.Use a different tokenizer to avoid offensive words.

B.Configure safety filters on the model endpoint in Vertex AI.

C.Add few-shot examples of safe posts in the prompt.

D.Reduce the temperature to 0.

AnswerB

Safety filters are designed to detect and block harmful content.

Why this answer

Safety filters explicitly block harmful content categories. Option A is wrong because lower temperature may produce bland but not necessarily safe outputs. Option B is wrong because few-shot examples may not cover all offensive patterns.

Option D is wrong because changing tokens does not inherently filter content.

Practice this question →

13

Multi-Selectmedium

A company is deploying a generative AI system that generates customer-facing emails. The system must ensure outputs are not toxic, biased, or harmful. Which TWO techniques are most effective for reducing toxicity in model outputs without significantly affecting performance?

Select 2 answers

A.Increase the maximum output token count to allow more context.

B.Set temperature to a very low value (e.g., 0.1).

C.Fine-tune the model on a dataset of safe emails using reinforcement learning from human feedback (RLHF).

D.Apply a toxicity detection and filtering layer using Vertex AI Safety Filters.

E.Provide 50 few-shot examples of safe emails in every prompt.

AnswersC, D

RLHF aligns model behavior to human preferences, reducing toxicity effectively.

Why this answer

Options A and D are correct. Fine-tuning with RLHF (or using a safety-tuned model) directly aligns the model to avoid toxic outputs. Output filtering (e.g., safety classifiers) provides a robust post-processing layer.

Option B (temperature) does not prevent toxicity, only randomness. Option C (few-shot) is insufficient for safety. Option E (increasing tokens) may increase risk.

Practice this question →

14

Multi-Selectmedium

A developer is tuning a text-generation model for creative writing. They want the outputs to be more diverse and less repetitive. Which THREE parameters/changes can help? (Choose three.)

Select 3 answers

A.Increase temperature to 0.9

B.Reduce top-k to 10

C.Increase presence penalty to 0.5

D.Increase top-p to 0.95

E.Reduce frequency penalty to 0.0

AnswersA, C, D

Higher temperature increases randomness and diversity.

Why this answer

Increasing temperature to 0.9 raises the randomness of the probability distribution over the vocabulary, making the model more likely to sample less probable tokens. This directly increases output diversity and reduces repetitiveness by flattening the softmax curve, which is a standard technique for creative generation.

Exam trap

Google Cloud often tests the misconception that reducing top-k or top-p increases diversity, when in fact narrowing the sampling pool (lower top-k or lower top-p) reduces diversity, and the correct approach is to increase these values or increase temperature/penalties.

Practice this question →

15

MCQhard

A data scientist fine-tunes a model on a small proprietary dataset. After fine-tuning, the model repeats training examples verbatim. What is the most effective mitigation?

A.Reduce the temperature during inference to 0.

B.Train for more epochs to improve generalization.

C.Use early stopping based on validation loss.

D.Add regularization like dropout and use a smaller learning rate.

AnswerD

Regularization techniques discourage memorization and encourage generalization.

Why this answer

Increasing the learning rate slightly or using dropout can reduce memorization. Option A is wrong because more epochs increase memorization. Option B is wrong because ground truth stopping doesn't prevent memorization.

Option D is wrong because temperature during inference doesn't fix overfitting.

Practice this question →

16

Multi-Selecteasy

Which TWO methods are most effective for improving factual accuracy in a language model's responses? (Choose two.)

Select 2 answers

A.Use prompt engineering to instruct the model to rely on provided facts.

B.Decrease the temperature to make responses more deterministic.

C.Increase top-k sampling to consider a wider range of tokens.

D.Replace the model with a smaller, more focused model.

E.Implement Retrieval-Augmented Generation (RAG) with a trusted knowledge base.

AnswersA, E

Prompt engineering can explicitly direct the model to verify claims or stick to given knowledge.

Why this answer

Options A and C are correct. A: prompt engineering with specific instructions can guide the model to be more careful. C: RAG retrieves verified information from external sources, reducing hallucination.

B is wrong because increasing top-k introduces randomness. D is wrong because decreasing temperature makes output more deterministic but not necessarily accurate. E is wrong because using a smaller model tends to reduce factual accuracy due to limited knowledge.

Practice this question →

17

MCQmedium

A company uses a generative AI model to generate product descriptions. They notice variations in style and length across products. How can they enforce consistent formatting?

A.Adjust top-k sampling to include more token candidates.

B.Set a system instruction specifying style and structure.

C.Randomly select few-shot examples from a pool of descriptions.

D.Use a high temperature and vary the prompt slightly.

AnswerB

System instructions guide the model's behavior across all responses.

Why this answer

System instructions set tone and format rules for the model. Option B is wrong because temperature range increases randomness. Option C is wrong because random example selection reduces consistency.

Option D is wrong because top-k sampling increases variability.

Practice this question →

18

MCQmedium

To improve factuality in generative AI, which is the best approach?

A.Set top_p to 0.1

B.Reduce output length

C.Grounded generation with citations

D.Increase model size

AnswerC

Anchors answers to external evidence.

Why this answer

Grounded generation with citations forces the model to base answers on retrieved evidence, directly improving factuality. Increasing model size may help but not as targeted, reducing length doesn't improve accuracy, and adjusting top_p is unrelated.

Practice this question →

19

MCQmedium

A data science team is fine-tuning a large language model using Vertex AI to generate marketing copy. They notice that the generated text is often repetitive and lacks creativity. Which technique should they apply to improve output diversity?

A.Increase the temperature parameter to 0.9.

B.Decrease the beam search width to 1.

C.Decrease the top-k sampling threshold.

D.Add more examples of repetitive text to the training dataset.

AnswerA

Higher temperature increases randomness and diversity in generated text.

Why this answer

Increasing the temperature parameter to 0.9 raises the randomness of the probability distribution over tokens, allowing less likely tokens to be selected. This directly counteracts repetitive output by encouraging the model to explore more diverse word choices, which is a standard technique for improving creativity in text generation.

Exam trap

Google Cloud often tests the misconception that decreasing sampling thresholds (like top-k or beam width) increases diversity, when in fact they reduce the candidate pool and make output more deterministic.

How to eliminate wrong answers

Option B is wrong because decreasing beam search width to 1 reduces the number of candidate sequences considered, which actually makes output more deterministic and less diverse, worsening repetitiveness. Option C is wrong because decreasing the top-k sampling threshold restricts the model to only the k most likely tokens, which reduces diversity and can increase repetition. Option D is wrong because adding more examples of repetitive text to the training dataset would reinforce the unwanted behavior, making the model more likely to generate repetitive output, not less.

Practice this question →

20

Multi-Selectmedium

A team notices the RAG pipeline sometimes retrieves irrelevant documents. Which THREE improvements should they consider? (Choose three.)

Select 3 answers

A.Add a reranking step

B.Use exact keyword matching instead of embedding similarity

C.Increase chunk size of documents

D.Reduce the number of retrieved documents

E.Use a higher quality embedding model

AnswersA, D, E

Reranks retrieved documents by relevance.

Why this answer

Using a higher quality embedding model improves semantic understanding, adding a reranking step refines results, and reducing the number of retrieved documents reduces noise. Increasing chunk size can dilute relevance, and using exact keyword matching loses semantic context.

Practice this question →

21

MCQeasy

To ensure that a generative AI model uses the most current information from the web for answering user queries, which Vertex AI feature should be enabled?

A.Grounding with Google Search

B.Safety filters

C.Context caching

D.Model tuning

AnswerA

Correct: This feature retrieves current web information to ground responses.

Why this answer

Grounding with Google Search is the correct feature because it enables the model to retrieve and reference real-time information from the web, ensuring responses are based on the most current data available. This is achieved by integrating Google Search results directly into the model's generation process, allowing it to cite live sources and reduce hallucinations from outdated training data.

Exam trap

Google Cloud often tests the distinction between features that improve output quality through external data retrieval (Grounding) versus those that modify the model's internal behavior (tuning, caching, filtering), leading candidates to confuse safety or optimization features with live data access.

How to eliminate wrong answers

Option B is wrong because safety filters are designed to block harmful or inappropriate content, not to fetch current web information. Option C is wrong because context caching stores frequently accessed context to reduce latency and cost, but it does not provide live web data. Option D is wrong because model tuning adjusts the model's parameters on a specific dataset to improve performance on a task, but it does not enable real-time web retrieval.

Practice this question →

22

MCQeasy

A developer is using Vertex AI Studio to test prompts for a text generation model. They want the model to follow a specific output format (JSON). Which prompt engineering approach is most effective?

A.Set stop sequences to '}'.

B.Include a few-shot example of the exact JSON format in the prompt.

C.Set the system instruction to 'Always output JSON.'

D.Set temperature to 0 to make output deterministic.

AnswerB

Providing an example gives the model a concrete template to follow.

Why this answer

Option B is correct because including a few-shot example of the exact JSON format in the prompt provides the model with a concrete pattern to follow, which is the most reliable method for enforcing structured output in generative models. Few-shot prompting leverages in-context learning, where the model uses the provided example to infer the desired schema and formatting rules, reducing ambiguity and improving adherence to the specified JSON structure.

Exam trap

Google Cloud often tests the misconception that system instructions or hyperparameter tuning alone can enforce output format, when in practice, few-shot examples are the most direct and reliable method for guiding model behavior in structured generation tasks.

How to eliminate wrong answers

Option A is wrong because setting stop sequences to '}' would prematurely terminate generation at the first closing brace, which may cut off nested JSON objects or arrays, and does not guarantee the model outputs valid JSON from the start. Option C is wrong because a system instruction like 'Always output JSON' is a high-level directive that models often fail to follow precisely without explicit formatting examples, as they may still produce markdown, extra text, or malformed JSON. Option D is wrong because setting temperature to 0 makes output deterministic but does not enforce a specific output format; the model could still generate non-JSON text or deviate from the required schema, as temperature controls randomness, not structure.

Practice this question →

23

MCQhard

A team is using Vertex AI Pipelines to deploy a generative AI model for real-time inference. The model sometimes generates harmful content. They want to implement a safety filter that checks the output before returning it to the user, but they need to minimize latency. Which approach best balances safety and performance?

A.Use a secondary lightweight classifier to filter outputs in real-time.

B.Retrain the model on every flagged harmful output.

C.Manually review all outputs before delivery.

D.Disable safety checks to improve latency.

AnswerA

A small classifier adds minimal latency while providing effective filtering.

Why this answer

Option A is correct because deploying a secondary lightweight classifier (e.g., a distilled BERT or a small logistic regression model) as a post-processing filter allows real-time inference with minimal latency overhead. This approach decouples safety from the primary generative model, enabling fast rejection of harmful outputs without retraining or blocking the main inference pipeline.

Exam trap

Google Cloud often tests the misconception that safety must be integrated into the generative model itself (e.g., via retraining or fine-tuning), when in practice a separate, lightweight post-processing filter is the standard for low-latency production systems.

How to eliminate wrong answers

Option B is wrong because retraining the model on every flagged harmful output is computationally expensive, introduces significant latency, and can lead to catastrophic forgetting or overfitting to specific examples, making it impractical for real-time inference. Option C is wrong because manual review of all outputs introduces unacceptable latency and does not scale, violating the requirement to minimize latency. Option D is wrong because disabling safety checks entirely eliminates the safety requirement, which is explicitly needed, and would expose users to harmful content, failing the core objective.

Practice this question →

24

MCQmedium

A developer uses the Gemini API to summarize long articles. The summaries often miss key points from the end of the article. Which technique specifically addresses this length-based loss of information?

A.Increase the max output tokens to 2048

B.Break the article into sections and ask the model to summarize each section, then combine

C.Truncate the article to the first 2000 tokens

D.Use a different model with a larger context window

AnswerB

This structured approach ensures each part is summarized, mitigating attention drop-off.

Why this answer

Option D is correct because instructing the model to 'summarize each section and then combine' (chain-of-thought style) helps it process the full document. Option A is wrong because increasing max tokens doesn't change how the model attends to the input. Option B is wrong because truncation worsens the problem.

Option C is wrong because a different model with a longer context window could help, but the question is about technique for the current model.

Practice this question →

25

MCQeasy

A company uses a generative AI model to answer customer queries. The model sometimes returns outdated information. Which technique should they apply to ensure responses rely on current data?

A.Fine-tune the model on historical data.

B.Extend the context window to include more tokens.

C.Increase the model's temperature to encourage novelty.

D.Use grounding with a refreshed knowledge base.

AnswerD

Grounding connects the model to a current, authoritative data source, ensuring recency.

Why this answer

Grounding the model with up-to-date source documents ensures responses are based on current information, reducing outdated outputs. Option A is wrong because prompt engineering alone cannot guarantee recency. Option B is wrong because fine-tuning with old data may perpetuate outdated patterns.

Option D is wrong because temperature adjustment does not affect factual recency.

Practice this question →

26

MCQeasy

A developer is using the Gemini API to generate creative product taglines. The taglines are often bland and uncreative. The developer wants more variety and novelty in the outputs. Which parameter adjustment would most effectively increase the diversity of the generated taglines?

A.Decrease top_p from 1.0 to 0.5.

B.Set frequency_penalty to 2.0.

C.Increase temperature from 0.2 to 0.9.

D.Decrease temperature from 0.7 to 0.2.

AnswerC

Higher temperature increases randomness, leading to more diverse and creative outputs.

Why this answer

Option A is correct because higher temperature increases the randomness and creativity of the output. Option B is wrong because lower temperature makes output more deterministic and less creative. Option C is wrong because lower top_p reduces diversity.

Option D is wrong because a high frequency penalty may discourage novel words, reducing creativity.

Practice this question →

27

Multi-Selecteasy

Which TWO are advantages of using Retrieval-Augmented Generation (RAG) over fine-tuning?

Select 2 answers

A.No need to retrain the base model

B.Requires less data preparation

C.Lower inference latency

D.More secure because model weights are not modified

E.Better suited for rapidly changing knowledge bases

AnswersA, E

Correct: RAG works with the pre-trained model and a retrieval system.

Why this answer

RAG doesn't require retraining and is easy to update with new information. Inference latency is typically higher for RAG due to retrieval, and data preparation is still needed for indexing.

Practice this question →

28

MCQeasy

A developer is using the Gemini API to generate creative marketing copy. They want the output to be more diverse and unexpected. Which parameter should they increase?

A.Temperature.

B.Presence penalty.

C.Top-p.

D.Frequency penalty.

AnswerA

Higher temperature increases randomness and diversity.

Why this answer

Option A is correct because increasing temperature increases randomness, leading to more diverse and unexpected outputs. Option B (top-p) also affects diversity but temperature has a more direct effect. Options C and D (frequency and presence penalties) discourage repetition and encourage novelty, but temperature is the primary parameter for randomness.

Practice this question →

29

MCQhard

Refer to the exhibit. A Vertex AI endpoint configured with the above deployment is returning HTTP 429 (Too Many Requests) errors during peak traffic. The current CPU utilization reaches 80% consistently. What should the team adjust to resolve this?

A.Increase maxReplicaCount to 10

B.Increase scaleTarget to 0.9

C.Change machineType to n1-highmem-2

D.Increase minReplicaCount to 2

AnswerA

Correct: Higher max allows more replicas to handle traffic spikes.

Why this answer

429 errors indicate insufficient capacity. Increasing maxReplicaCount allows more replicas to be added when load increases. Changing machine type or scale target would not directly address the capacity shortage.

Practice this question →

30

MCQmedium

A developer uses a code generation model to write Python functions. The output frequently contains syntax errors due to incorrect braces and indentation. Which technique should be used to produce syntactically valid code?

A.Increase the temperature to introduce more varied token choices.

B.Apply constrained decoding techniques that enforce a grammar for the target programming language.

C.Fine-tune the model on a large corpus of syntactically correct Python code.

D.Provide a few-shot example of correct Python function in the prompt.

AnswerB

Constrained decoding ensures the generated tokens follow legal syntax rules.

Why this answer

Option A is correct because constrained decoding (e.g., with guidance or grammar) forces the output to match a formal grammar, preventing syntax errors. Option B is wrong because few-shot helps with format but does not enforce grammar. Option C is wrong because temperature changes do not fix syntax.

Option D is wrong because fine-tuning is heavy; constrained decoding is a lighter real-time fix.

Practice this question →

31

MCQhard

A real-time customer support chatbot using Gemini is experiencing high latency. The team must maintain response quality while improving speed. Which technique should they implement?

A.Switch to a larger model

B.Increase the batch size

C.Use context caching for frequent queries

D.Decrease the temperature

AnswerC

Correct: Context caching speeds up responses for repeated patterns while retaining quality.

Why this answer

Context caching reuses context from common intents, reducing processing time without sacrificing quality. Temperature changes or larger models would not help latency or quality.

Practice this question →

32

MCQeasy

A company uses a text-to-image model to generate marketing visuals. The outputs often contain distorted human faces. Which technique is most likely to improve face generation?

A.Fine-tune the model on a curated dataset of human faces

B.Increase the output resolution

C.Increase the number of inference steps

D.Reduce the classifier-free guidance scale

AnswerA

Fine-tuning specializes the model for better face generation.

Why this answer

Fine-tuning the model on a high-quality dataset of human faces directly addresses the distortion issue. Option B is wrong because increasing inference steps may improve image quality but not specifically faces. Option C is wrong because reducing CFG scale reduces adherence to the prompt, not face quality.

Option D is wrong because increasing image size might not fix distortion.

Practice this question →

33

MCQhard

A generative AI model for code generation sometimes produces syntactically incorrect code. The team wants to reduce syntax errors without retraining the entire model. Which approach is most effective?

A.Implement constrained decoding with grammar rules

B.Run a syntax checker after generation and regenerate

C.Add a system prompt that instructs the model to produce valid code

D.Increase beam search width

AnswerA

Constrained decoding ensures output respects syntax rules.

Why this answer

Constrained decoding with grammar rules directly enforces the syntax of the target programming language during token generation, preventing the model from producing invalid constructs. This approach modifies the decoding process (e.g., using a context-free grammar or a formal syntax specification) to mask or forbid tokens that would lead to a syntax error, without altering the underlying model weights. It is the most effective method because it guarantees syntactically correct output at generation time, rather than relying on post-hoc fixes or probabilistic adjustments.

Exam trap

The trap here is that candidates often choose a post-hoc correction method (Option B) or a prompt-based approach (Option C) because they seem simpler, but they fail to recognize that only a decoding-time constraint can guarantee syntactic validity without retraining, which is the core requirement of the question.

How to eliminate wrong answers

Option B is wrong because running a syntax checker after generation and regenerating is inefficient and does not prevent errors; it relies on trial-and-error, which can be costly and may still produce invalid code if the model repeatedly generates similar errors. Option C is wrong because adding a system prompt is a soft instruction that the model may not reliably follow, especially for complex or edge-case syntax rules, and it does not enforce constraints at the token level. Option D is wrong because increasing beam search width improves the diversity and likelihood of finding high-probability sequences but does not incorporate any syntactic constraints; it may still produce syntactically incorrect code if the highest-scoring beams violate grammar rules.

Practice this question →

34

MCQeasy

A company is using a generative AI model to generate product descriptions. They notice the outputs often include factual inaccuracies about product specifications. Which technique would best address this issue without modifying the model's architecture?

A.Implement a Retrieval-Augmented Generation (RAG) pipeline that retrieves product specs from a database

B.Decrease the temperature parameter to 0.1

C.Increase the max output tokens to 1024

D.Use few-shot prompting with 5 examples of correct descriptions

AnswerA

RAG grounds generation in retrieved relevant documents, improving factual accuracy.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct technique because it grounds the model's output in factual, up-to-date product specifications retrieved from an external database. This directly addresses factual inaccuracies without modifying the model's architecture, as the model generates text based on retrieved context rather than relying solely on its parametric knowledge.

Exam trap

Google Cloud often tests the misconception that adjusting generation parameters (like temperature or token limits) or providing examples can fix factual accuracy, when in fact only retrieval-augmented methods or fine-tuning on verified data can correct hallucinations without changing the model architecture.

How to eliminate wrong answers

Option B is wrong because decreasing the temperature parameter to 0.1 makes the model more deterministic and reduces randomness, but it does not provide any factual grounding; it can still hallucinate incorrect specifications. Option C is wrong because increasing max output tokens only allows longer generations and does not improve factual accuracy; it may even increase the chance of errors. Option D is wrong because few-shot prompting with examples can guide the style and format but cannot supply specific, dynamic product specs; the model may still invent details not present in the examples.

Practice this question →

35

MCQmedium

Refer to the exhibit. The endpoint is experiencing high latency during traffic spikes. The team wants to improve response time by reducing queueing. Which change to the configuration would be most effective?

A.Decrease minReplicaCount to 0

B.Change the model version to '2'

C.Decrease the target value in autoscaling metric to 50

D.Increase maxReplicaCount to 10

AnswerD

More replicas handle higher load.

Why this answer

Increasing maxReplicaCount to 10 allows the autoscaler to provision more replicas during traffic spikes, distributing the incoming requests across additional endpoints. This directly reduces queueing at each replica because the load is spread over more instances, lowering per-instance latency. The change targets the root cause—insufficient capacity to handle peak load—rather than adjusting thresholds or model versions.

Exam trap

Google Cloud often tests the misconception that lowering the autoscaling target metric (Option C) is the primary fix for high latency, when in fact the maxReplicaCount ceiling is the bottleneck that must be raised to allow sufficient capacity during spikes.

How to eliminate wrong answers

Option A is wrong because decreasing minReplicaCount to 0 would cause the endpoint to scale down to zero replicas during idle periods, leading to cold starts and increased latency when traffic spikes, which worsens queueing. Option B is wrong because changing the model version to '2' does not affect the number of replicas or queueing behavior; it only changes the model artifact, which may have different inference latency but does not address scaling capacity. Option C is wrong because decreasing the target value in the autoscaling metric (e.g., CPU utilization or requests per replica) would cause the autoscaler to add replicas sooner, but without increasing maxReplicaCount, the endpoint may still hit the upper limit and queue requests; the target value adjustment alone does not provide additional capacity during extreme spikes.

Practice this question →

36

MCQhard

A law firm uses a generative model to analyze contracts and extract key clauses. The model often outputs irrelevant clauses or misses important ones. They want to improve the relevance of the outputs without retraining the entire model. Which approach is best?

A.Increase the input token limit to provide the entire contract in the prompt.

B.Decrease the temperature to make outputs more deterministic.

C.Implement Retrieval-Augmented Generation (RAG) with a curated legal clause database and a reranker to select the most on-topic passages.

D.Fine-tune the base model on a labeled dataset of contract-clause pairs.

AnswerC

RAG supplies the model with relevant context, and reranking refines the selection, directly boosting relevance.

Why this answer

Option C is correct because RAG with a legal corpus retrieves clause-specific paragraphs, and reranking prioritizes relevant content, improving precision. Option A is wrong because temperature adjustment does not improve relevance. Option B is wrong because increasing context length may dilute focus.

Option D is wrong because fine-tuning requires significant data and resources.

Practice this question →

37

MCQhard

Refer to the exhibit. A team's IAM policy for Vertex AI includes the following binding. They can deploy models but cannot create tuning jobs. Which statement is true?

A.The developer needs the aiplatform.admin role

B.The aiplatform.user role overrides the modelUser role

C.The aiplatform.user role lacks permission to create tuning jobs

D.The policy is missing the aiplatform.specialist role

AnswerC

Missing aiplatform.tuningJobs.create permission.

Why this answer

The roles/aiplatform.user role does not include permission to create tuning jobs (aiplatform.tuningJobs.create). The modelUser role does not override the user role, admin role is not needed, and specialist role doesn't exist.

Practice this question →

38

MCQhard

You are a Generative AI architect at a large financial services firm. The firm has deployed a custom large language model (LLM) fine-tuned on proprietary financial reports to assist analysts in generating quarterly earnings summaries. The model is hosted on Vertex AI using a dedicated endpoint with autoscaling enabled. Recently, the model's output has exhibited two issues: (1) occasional factual inaccuracies about specific financial figures, and (2) a tendency to produce overly verbose and repetitive text in the summaries, sometimes exceeding the desired length of 200 words. The team has already tried adjusting the temperature parameter from 0.7 to 0.2 and increased the top-k sampling from 40 to 50, but the problems persist. The model's training data includes over 10,000 financial reports, and the fine-tuning process used low-rank adaptation (LoRA) with rank 16. The production environment uses a batch size of 1 for inference. You need to recommend a course of action that most directly addresses both the factual accuracy and verbosity issues without requiring a full retraining of the model. Which approach should you take?

A.Increase the LoRA rank to 32 and fine-tune the model for additional epochs on a curated subset of reports that focus on concise and accurate summaries.

B.Implement a retrieval-augmented generation (RAG) pipeline that queries a vector database of verified financial data, and apply constrained decoding with a maximum token limit and a repetition penalty.

C.Switch to a larger pre-trained model (e.g., PaLM 2 or GPT-4) and use the same fine-tuning data with higher rank LoRA to improve capability, then rely on the larger model's inherent accuracy.

D.Experiment with higher temperature (e.g., 0.9) and lower top-k (e.g., 20) to encourage more diverse and concise outputs, and add a post-processing step to truncate summaries to 200 words.

AnswerB

This directly improves factual accuracy by grounding outputs in retrieved evidence and reduces verbosity through decoding constraints, without retraining.

Why this answer

Option B is correct because it directly addresses both issues without retraining. A RAG pipeline grounds the model's outputs in verified financial data, eliminating factual inaccuracies. Constrained decoding with a maximum token limit and repetition penalty directly curbs verbosity and repetition, which temperature and top-k adjustments failed to fix.

Exam trap

Google Cloud often tests the misconception that adjusting hyperparameters like temperature or top-k can fix factual accuracy and verbosity, when in reality these issues stem from the model's lack of external knowledge and lack of output constraints, which require architectural changes like RAG and constrained decoding.

How to eliminate wrong answers

Option A is wrong because increasing LoRA rank and fine-tuning on a curated subset still relies on the model's parametric memory, which is prone to hallucination and does not guarantee factual accuracy; it also requires retraining, contradicting the 'no full retraining' constraint. Option C is wrong because switching to a larger model does not inherently solve factual inaccuracies (larger models can still hallucinate) and requires full retraining or significant adaptation, violating the constraint. Option D is wrong because higher temperature (0.9) increases randomness, likely worsening factual inaccuracies, and lower top-k (20) reduces diversity, which may not fix verbosity; post-processing truncation does not address the root cause of repetition or inaccuracy.

Practice this question →

39

Multi-Selectmedium

Which TWO techniques effectively reduce bias in generative model outputs? (Choose two.)

Select 2 answers

A.Apply adversarial debiasing during training or fine-tuning.

B.Increase the temperature parameter to introduce more variability.

C.Use a larger model with more parameters.

D.Fine-tune on a dataset with balanced representation across groups.

E.Reduce max output tokens to limit the model's expression.

AnswersA, D

Adversarial methods train the model to ignore protected attributes, reducing bias.

Why this answer

Options A and C are correct. A: fine-tuning on a balanced dataset reduces sampling bias in training data. C: adversarial debiasing actively adjusts the model to remove spurious correlations.

B is wrong because increasing temperature adds randomness without addressing bias. D is wrong because larger models may amplify biases. E is wrong because reducing max tokens does not affect bias.

Practice this question →

40

MCQmedium

A team is using a pre-trained language model to summarize legal documents. They find that summaries often miss key dates and parties involved. Which technique would most effectively improve factual accuracy?

A.Fine-tune the model on a dataset of legal summaries with annotated key entities.

B.Use top-p sampling with a low p value.

C.Increase the temperature parameter.

D.Use chain-of-thought prompting.

AnswerA

Fine-tuning adapts the model to domain-specific requirements, improving factual accuracy.

Why this answer

Fine-tuning on a dataset of legal summaries with annotated key entities directly teaches the model to recognize and reproduce critical factual elements like dates and parties. This supervised learning approach adjusts the model's weights to prioritize entity extraction and accurate generation, which is the most effective method for improving factual accuracy in domain-specific tasks.

Exam trap

Google Cloud often tests the misconception that inference-time parameters (temperature, top-p) or prompting strategies can substitute for targeted training, when in fact only fine-tuning with domain-specific annotated data reliably improves factual accuracy for structured entities.

How to eliminate wrong answers

Option B is wrong because top-p sampling with a low p value restricts the vocabulary to a small set of high-probability tokens, which can reduce creativity but does not address factual accuracy or entity recall—it may even omit rare but important entities. Option C is wrong because increasing the temperature parameter adds randomness to token selection, which typically reduces factual consistency and can lead to hallucinated or missing details. Option D is wrong because chain-of-thought prompting improves reasoning steps for multi-step tasks but does not inherently enforce factual accuracy for specific entities; it relies on the model's existing knowledge, which may still miss key dates and parties without targeted training.

Practice this question →

41

MCQmedium

A team is building a generative AI model for customer support. They notice the model often produces overly polite but unhelpful responses. Which technique would best improve response quality without sacrificing helpfulness?

A.Apply reinforcement learning from human feedback (RLHF)

B.Increase the amount of training data

C.Lower the top_k sampling value

D.Increase the temperature parameter

AnswerA

RLHF tunes the model to align with desired response characteristics.

Why this answer

RLHF directly addresses the misalignment between the model's training objective (e.g., predicting the next token) and the desired outcome (helpful, not just polite). By using human feedback to train a reward model, the system learns to optimize for response quality and helpfulness, reducing sycophantic or overly polite but uninformative outputs.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (temperature, top_k) or more data alone can fix alignment issues, when in fact only RLHF directly optimizes for human-judged helpfulness and quality.

How to eliminate wrong answers

Option B is wrong because simply increasing training data does not correct the model's tendency toward polite but unhelpful responses; it may reinforce existing patterns without addressing alignment. Option C is wrong because lowering top_k sampling reduces diversity by restricting token choices to the top k most likely tokens, which can make responses even more generic and less helpful, not more substantive. Option D is wrong because increasing the temperature parameter increases randomness in token selection, which can lead to less coherent or more erratic responses, not more helpful ones.

Practice this question →

42

Multi-Selecteasy

Which TWO techniques are commonly used to control the style and tone of a generative model's output?

Select 2 answers

A.Adjusting the temperature

B.Modifying the top_k value

C.Fine-tuning on a dataset with desired style

D.Prompt engineering with style instructions

E.Changing the top_p value

AnswersC, D

Fine-tuning adapts the model to a specific style.

Why this answer

Option C is correct because fine-tuning on a dataset that embodies the desired style directly adjusts the model's weights, making it consistently produce outputs with that specific tone and style. This is a fundamental technique for customizing generative models, as it teaches the model the exact patterns, vocabulary, and stylistic nuances present in the training data.

Exam trap

Google Cloud often tests the distinction between sampling parameters (temperature, top_k, top_p) that control output randomness and diversity versus training or conditioning techniques (fine-tuning, prompt engineering) that directly influence style and tone, leading candidates to incorrectly select sampling parameters as style-control methods.

Practice this question →

43

Multi-Selectmedium

Which TWO techniques are most effective for improving the quality of a generative AI model's output when summarizing complex documents?

Select 2 answers

A.Providing few-shot examples of ideal summaries

B.Using a larger, more capable model (e.g., PaLM 2 instead of PaLM)

C.Increasing max output length significantly

D.Setting top_p to 0.1

E.Adjusting temperature to 0.8

AnswersA, B

Few-shot examples teach the model the desired output structure and detail.

Why this answer

Options B and D are correct. Few-shot examples guide the model to desired output format and consistency. Using a larger, more capable model often yields better summaries due to deeper language understanding.

Option A (temperature adjustment) is less critical for summaries. Option C (max output length) affects length, not quality. Option E (low top_p) may restrict output too much.

Practice this question →

44

MCQhard

For a document summarization task, a team wants to produce concise summaries without losing key information. Which combination of techniques is most effective?

A.Use few-shot examples and reduce max output tokens

B.Set top_p to 0.5 and increase repetition penalty

C.Use prompt caching and increase batch size

D.Increase temperature and use a larger model

AnswerA

Correct: Few-shot demonstrates concise style; max tokens enforces length limit.

Why this answer

Using few-shot examples teaches the desired format, and reducing max tokens enforces conciseness. Other options either encourage verbosity or are off-target.

Practice this question →

45

MCQeasy

Refer to the exhibit. A user wants formal translations from a generative AI model, but the model outputs informal style inconsistently. Which prompt engineering technique would best ensure consistent formal translations?

A.Use context caching

B.Provide a few-shot example with formal and informal pairs

C.Use a longer system prompt with detailed rules

D.Set top_k to 1

AnswerB

Correct: Few-shot examples directly show the expected output format.

Why this answer

Providing a few-shot example that explicitly demonstrates the desired formal translation guides the model to follow that pattern. System instructions can help but are less direct.

Practice this question →

46

MCQmedium

A company uses a text-to-image model to generate marketing visuals. The results often misinterpret the prompt, e.g., 'a red car' generates a blue car. Which technique should they try first to align the output with the prompt?

A.Use a negative prompt to exclude blue

B.Refine the prompt with more adjectives and context, e.g., 'bright red sports car'

C.Upscale the image resolution to 1024x1024

D.Increase the guidance scale to 20

AnswerB

Clearer, more descriptive prompts help the model understand the desired output.

Why this answer

Option B is correct because refining the prompt with more adjectives and context directly addresses the root cause of misalignment: insufficient specificity in the text description. Text-to-image models rely on the semantic richness of the prompt to guide the latent diffusion process; adding 'bright red sports car' provides stronger conditioning signals that steer the model's cross-attention layers toward the intended color and object attributes. This is the most efficient first step before adjusting hyperparameters like guidance scale.

Exam trap

The trap here is that candidates often jump to hyperparameter tuning (guidance scale) or post-processing (upscaling) as a first fix, when the most fundamental and cost-effective step is to improve the input prompt's specificity, which directly controls the conditioning signal in the diffusion process.

How to eliminate wrong answers

Option A is wrong because using a negative prompt to exclude 'blue' is a reactive band-aid that does not fix the core issue of the model failing to associate 'red' with the car; it also risks suppressing other unintended features and can degrade image quality by over-constraining the latent space. Option C is wrong because upscaling resolution to 1024x1024 only increases pixel density and does not alter the semantic alignment between the prompt and the generated image; the model's misinterpretation of 'red' would persist at any resolution. Option D is wrong because increasing the guidance scale to 20 excessively amplifies the prompt's influence, often leading to image saturation, artifacts, and mode collapse, while still not correcting the fundamental misassociation of the color attribute.

Practice this question →

47

MCQmedium

A media company is using Vertex AI's Imagen model to generate images for marketing campaigns. They have a set of prompts that describe desired scenes, but the generated images often contain artifacts such as distorted faces or unnatural lighting. The team has tried varying the prompt wording but the issues persist. They are using the default parameters (no modifications). They have a budget for additional compute resources and want to improve image quality without switching to a more expensive model. The team has access to a small set of high-quality images in the same style as their target outputs. What should the team do?

A.Increase the guidance scale parameter to make the model follow prompts more closely.

B.Use a more detailed prompt style with negative prompts to avoid artifacts.

C.Fine-tune the Imagen model on the small set of high-quality images to improve output quality.

D.Increase the number of images generated per prompt and manually select the best ones.

AnswerC

Fine-tuning adapts the model to produce images with fewer artifacts and desired style.

Why this answer

Option B is correct because fine-tuning Imagen on the small set of high-quality images can improve the model's ability to generate images with fewer artifacts and better style consistency. Option A is wrong because increasing the number of images generated does not improve quality per image. Option C is wrong because adjusting guidance scale without fine-tuning may not address the specific artifacts.

Option D is wrong because using a different prompt style may help but the team already tried varying wording; the issue is likely the model's lack of familiarity with the desired quality.

Practice this question →

48

MCQhard

An enterprise uses a fine-tuned PaLM 2 model for code generation. They want to ensure the generated code passes security audits. Which combination of techniques would be most effective?

A.Integrate a static analysis tool in the pipeline and add a safety filter to reject code containing dangerous functions.

B.Use a few-shot prompt with examples of secure code and set temperature to 1.0.

C.Fine-tune the model on a dataset of insecure code and use top-p=0.9.

D.Increase the model's context window and use a system instruction to 'be secure'.

AnswerA

Static analysis and safety filters directly block insecure code patterns.

Why this answer

Option D is correct because integrating a static analysis tool and a safety filter provides automated, real-time security checks. Option A (few-shot with high temperature) increases risk due to randomness. Option B (fine-tuning on insecure code) is counterproductive.

Option C (increasing context window) does not directly address security.

Practice this question →

49

Multi-Selecteasy

A team is using a language model for customer feedback analysis. They want to improve the accuracy of sentiment extraction. Which TWO techniques should they apply? (Choose two.)

Select 2 answers

A.Increase the temperature to 0.8 to allow more creative interpretations.

B.Provide few-shot examples of correctly labeled sentiment in the prompt.

C.Use the model's built-in sentiment analysis API instead of prompting.

D.Add a system instruction that asks the model to strictly follow JSON output format.

E.Fine-tune the model on a labeled dataset of customer feedback.

AnswersB, E

Few-shot examples guide the model's output format and accuracy.

Why this answer

Options A and C are correct. Few-shot prompting provides examples of correct sentiment labeling, and fine-tuning on a labeled dataset adapts the model to the domain. Option B (increasing temperature) adds randomness and reduces accuracy.

Option D (using a different API) is not a technique for improving the current model. Option E (JSON formatting) helps structure output but does not improve sentiment accuracy.

Practice this question →

50

Multi-Selectmedium

Which TWO techniques are most effective for improving factual accuracy in a generative AI model's responses? (Choose two.)

Select 2 answers

A.Retrieval-Augmented Generation (RAG) with curated datasets.

B.Increasing the model's temperature to 1.5.

C.Grounding with a trusted knowledge base.

D.Using longer system prompts with multiple instructions.

E.Fine-tuning on a large corpus of general text.

AnswersA, C

RAG retrieves relevant, up-to-date documents to inform responses.

Why this answer

Grounding and RAG both provide external authoritative sources to enhance factual accuracy. Fine-tuning on general data doesn't guarantee accuracy, and increasing temperature hurts accuracy. Prompt engineering is helpful but not as robust as retrieval-based methods.

Practice this question →

51

MCQhard

A healthcare startup has fine-tuned a Vertex AI PaLM 2 model on a dataset of medical records to generate patient summaries. The model produces fluent text but occasionally fabricates diagnoses not present in the input. The team has already tried increasing the training data size by 20% and adjusting the temperature from 0.7 to 0.2, but hallucinations persist. The summaries must be factually accurate for regulatory compliance. What should the team do next?

A.Increase the maximum output tokens to allow the model to generate more detailed summaries.

B.Implement a RAG pipeline using Vertex AI Search to retrieve relevant medical documents before generation.

C.Add more few-shot examples to the prompt for each generation.

D.Switch the base model to Gemini 1.5 Pro without additional changes.

AnswerB

RAG provides grounded, up-to-date context, reducing hallucinations significantly.

Why this answer

Option B is correct because augmenting the model with a retrieval-augmented generation (RAG) pipeline grounded in a trusted medical knowledge base directly addresses hallucination by forcing the model to reference verified sources. Option A is wrong because changing the base model does not solve the fundamental lack of grounding. Option C is wrong because few-shot examples improve output format but not factual accuracy.

Option D is wrong because increasing the context window does not prevent fabrication; it may even introduce more irrelevant information.

Practice this question →

52

MCQeasy

A team notices their text generation model repeats phrases excessively. Which technique would most directly reduce repetition?

A.Use beam search with a beam width of 5

B.Apply a repetition penalty of 1.2

C.Increase top_k to 100

D.Lower temperature to 0.5

AnswerB

Repetition penalty directly discourages repeated tokens.

Why this answer

Using a repetition penalty during decoding discourages the model from repeating tokens. Option A is wrong because it increases randomness, which might reduce repetition but could also reduce coherence. Option B is wrong because beam search can increase repetition.

Option D is wrong because temperature reduction makes output more deterministic, potentially increasing repetition.

Practice this question →

53

MCQhard

A model generates biased output. Which technique is least effective?

A.Use adversarial debiasing

B.Apply safety filters

C.Set frequency penalty to 1.0

D.Fine-tune on diverse data

AnswerC

Affects repetition, not fairness or bias.

Why this answer

Setting frequency penalty to 1.0 reduces token repetition but does not address bias. Fine-tuning on diverse data, adversarial debiasing, and safety filters all directly tackle bias.

Practice this question →

54

Multi-Selecthard

Which THREE techniques are commonly used to improve the overall quality and coherence of generative model outputs? (Choose three.)

Select 3 answers

A.Using self-consistency or iterative refinement to choose the best output.

B.In-context learning (few-shot prompting) with relevant examples.

C.Applying output safety filters to remove inappropriate content.

D.Prompt chaining to decompose complex tasks into simpler sub-tasks.

E.Random sampling to increase output diversity.

AnswersA, B, D

Iterative methods improve reliability and coherence by selecting the most consistent response.

Why this answer

Options A, B, and E are correct. A: few-shot prompting provides examples that improve output structure. B: prompt chaining breaks complex tasks into steps, enhancing coherence.

E: iterative refinement (e.g., self-consistency) improves quality by generating multiple outputs and selecting the best. C is wrong because random sampling degrades quality. D is wrong because safety filters only block harmful content, not improve quality.

Practice this question →

55

MCQeasy

A data scientist is using the Gemini API to generate product descriptions for an e-commerce site. The descriptions are often too verbose and include speculative claims that are not in the product specifications. The scientist wants to reduce hallucinations and control the length of the output without retraining the model. What should they do?

A.Increase the max output token count to 2048 and decrease temperature to 0.1.

B.Refine the prompt to be concise and include instructions to stick to facts and limit output to 50 words.

C.Add three few-shot examples of short, factual descriptions.

D.Set temperature to 0.0 and top_k to 1.

AnswerB

Clear constraints in the prompt directly control length and hallucination.

Why this answer

Option A is correct because verbose prompts often lead to verbose output; simplifying the prompt reduces length. Adding explicit constraints like 'only use provided facts' reduces hallucination. Option B is wrong because increasing token count would make descriptions longer, not shorter.

Option C is wrong because lower temperature reduces randomness but doesn't prevent speculation. Option D is wrong because few-shot examples may help but do not directly enforce length or factuality.

Practice this question →

56

MCQmedium

A retail company is deploying a generative AI chatbot on Vertex AI to provide product recommendations. The chatbot uses a base foundation model with no fine-tuning. Users report that the chatbot sometimes gives offensive or insensitive responses. The team must quickly implement safety controls without modifying the model. They also want to reduce irrelevant off-topic answers. Which combination of techniques should they apply?

A.Fine-tune the model on a curated dataset of safe retail conversations.

B.Set temperature to 0.0 and top_p to 0.1.

C.Enable Vertex AI Safety Filters and craft system instructions defining appropriate behavior.

D.Provide 50 few-shot examples of safe interactions.

AnswerC

Safety filters block harmful output and system instructions guide the model's tone and relevance.

Why this answer

Option D is correct because safety filters (e.g., Vertex AI Safety Settings) block harmful content, and prompt engineering with system instructions keeps the model on topic and respectful. Option A is wrong because temperature adjustment alone does not prevent toxicity. Option B is wrong because few-shot examples may not cover all safety scenarios.

Option C is wrong because fine-tuning is not allowed per the constraint (no model modification).

Practice this question →

57

MCQmedium

Refer to the exhibit. A team attempted to start a model tuning job but received the error 'Quota limit exceeded for tuning jobs in region us-central1'. What is the most appropriate action?

A.Request a quota increase for tuning jobs in us-central1

B.Change the region to us-west1 and retry

C.Reduce the size of the training data

D.Use a different base model

AnswerA

Correct: Quota issues are resolved by requesting a higher limit.

Why this answer

The error indicates that the quota for tuning jobs in us-central1 has been reached. Requesting a quota increase directly addresses the issue. Using a different region or model might work but is not the first recommended step.

Practice this question →

58

MCQhard

A company is using a fine-tuned LLM for generating financial reports. They need to ensure that the output complies with regulatory standards and does not include speculative content. Which combination of techniques should they implement?

A.Increase the model's safety settings to maximum, use a low top-p value, and limit output tokens.

B.Fine-tune the model on historical compliant reports, use RAG with a regulatory database, and implement a human-in-the-loop review.

C.Use a larger model with more parameters and rely on its inherent knowledge.

D.Use a system instruction to adhere to regulations, set temperature to 0.0, and apply a keyword filter.

AnswerB

Combines domain adaptation, real-time grounding, and human oversight.

Why this answer

Option A is correct because fine-tuning on historical compliant reports, using RAG with a regulatory database, and implementing a human-in-the-loop review provides multiple layers of compliance. Option B (system instruction with low temperature and keyword filter) is insufficient for complex regulations. Option C (safety settings and low top-p) may block non-speculative content and doesn't ensure compliance.

Option D (larger model) does not guarantee compliance.

Practice this question →

59

MCQhard

The exhibit shows the deployment configuration for a conversational AI model used in a finance application. Users report that responses are creative but often contain factually incorrect financial advice. Which parameter change would most improve factual accuracy?

A.Add grounding sources, such as "EnterpriseSearch" or "Web"

B.Lower temperature to 0.1

C.Increase topP to 1.0

D.Increase maxOutputTokens to 1024

AnswerA

Grounding forces the model to base responses on real data, directly improving factual accuracy.

Why this answer

Option B is correct because grounding sources (e.g., Google Search or a knowledge base) inject real-world facts into responses, reducing hallucination. Option A (lower temperature) would reduce creativity but not directly fix factual inaccuracies. Option C (increased max tokens) addresses length, not facts.

Option D (top_p increase) would make output even more variable.

Practice this question →

60

MCQmedium

After deploying a text-to-image model, the output images often contain distorted objects. The team suspects the prompt is too complex. Which prompt engineering technique should they try first?

A.Increase the guidance scale.

B.Add more descriptive adjectives.

C.Use a negative prompt to exclude distortions.

D.Break the prompt into simpler, separate steps.

AnswerD

Simpler prompts reduce the risk of distortion.

Why this answer

Option A is correct because breaking the prompt into simpler, separate steps reduces complexity and helps the model generate each part correctly. Option B (adding more adjectives) increases complexity. Option C (increasing guidance scale) can amplify distortions.

Option D (using a negative prompt) helps but is not the first step for simplifying the prompt.

Practice this question →

61

MCQhard

A team deployed a fine-tuned model for code generation. After training, the model produces syntactically correct but functionally wrong code. What is the most likely cause?

A.Incorrect prompt format

B.Low temperature setting

C.Insufficient training epochs

D.Overfitting to training data

AnswerD

Model memorizes training examples, losing generalization.

Why this answer

Option D is correct because overfitting to training data causes the model to memorize specific code patterns and syntax from the training set without learning the underlying logic or functional requirements. This results in syntactically correct outputs that fail to generalize to new, unseen coding tasks, producing functionally wrong code despite proper syntax.

Exam trap

Google Cloud often tests the misconception that syntactically correct but functionally wrong code is caused by prompt or temperature issues, when in fact it is a classic sign of overfitting where the model memorizes syntax without understanding logic.

How to eliminate wrong answers

Option A is wrong because incorrect prompt format typically leads to malformed or irrelevant outputs, not syntactically correct but functionally wrong code; the model would likely produce gibberish or off-topic responses. Option B is wrong because low temperature setting reduces randomness and makes outputs more deterministic, which would actually improve syntactic correctness and consistency, not cause functional errors. Option C is wrong because insufficient training epochs would result in underfitting, where the model fails to learn even basic syntax and produces incomplete or incoherent code, not syntactically correct but functionally wrong outputs.

Practice this question →

62

Multi-Selectmedium

A team wants to reduce hallucinations in a question-answering model. Which THREE techniques should they consider?

Select 3 answers

A.Fine-tune the model on a curated factual dataset

B.Use retrieval-augmented generation (RAG)

C.Apply prompt engineering with specific instructions to cite sources

D.Reduce the number of tokens in output

E.Increase the temperature parameter

AnswersA, B, C

Fine-tuning on factual data improves accuracy.

Why this answer

Fine-tuning on a curated factual dataset directly adjusts the model's weights to prioritize accurate, domain-specific knowledge, reducing the likelihood of generating unsupported or hallucinated content. This technique anchors the model's output in verified data, making it more reliable for question-answering tasks.

Exam trap

Google Cloud often tests the misconception that reducing output length or increasing randomness (temperature) can improve factual accuracy, when in reality these parameters control style and creativity, not truthfulness.

Practice this question →

63

MCQmedium

A financial technology company has deployed a custom-tuned PaLM 2 model on Vertex AI to generate personalized investment recommendations for retail clients. The model was fine-tuned on a corpus of historical market data and advisory transcripts. Recently, the compliance team flagged that several recommendations contradicted SEC guidelines, and the model sometimes repeated prohibited statements from outdated training materials. The team has already implemented safety filters (e.g., blocking toxic content) and adjusted the model's system instructions to be more conservative. However, the issues persist. The model's deployment parameters are: temperature=0.4, top_p=0.9, max_output_tokens=500, and no grounding. The company must maintain compliance without significantly increasing latency. What should they do next?

A.Increase temperature to 0.7 to allow more diverse responses, and add a second model to verify outputs

B.Perform an additional fine-tuning round exclusively on the most recent SEC regulatory filings and compliance-approved content

C.Implement a chain-of-thought prompting technique that requires the model to explain its reasoning step by step

D.Configure Vertex AI grounding using a curated data store of real-time SEC regulations and market data

AnswerD

Grounding with an authoritative, live data source directly ensures outputs comply with current regulations and eliminates reliance on outdated training data.

Why this answer

Option D is correct because grounding with real-time authoritative sources (e.g., up-to-date SEC regulations and market data) ensures outputs are based on current, compliant information, directly addressing the root cause of outdated or prohibited content. Option A (fine-tuning) may introduce bias and doesn't guarantee real-time accuracy. Option B (temperature increase) would worsen variability.

Option C (chain-of-thought) can improve reasoning but does not anchor to current compliance data.

Practice this question →

64

MCQmedium

A developer deployed a large language model on Vertex AI for real-time chat. Users report slow response times. The model generates sentences one word at a time. Which optimization should be applied to reduce latency?

A.Batch multiple user queries together.

B.Deploy the model with more accelerators.

C.Enable prompt caching to reuse previous queries.

D.Use streaming responses to start output earlier.

AnswerD

Streaming sends tokens as they are generated, reducing the wait for the full response.

Why this answer

Enabling streaming allows the model to output tokens progressively, reducing perceived latency. Option A is wrong because prompt caching doesn't speed up generation. Option C is wrong because batching increases latency for real-time.

Option D is wrong because more accelerators may actually increase overhead without optimization.

Practice this question →

65

MCQmedium

A company wants to build a customer support chatbot that answers based on internal documentation. They use Vertex AI Search and want to ensure the model only uses retrieved documents. What should they do?

A.Fine-tune the model on the documentation

B.Enable grounding with Vertex AI Search

C.Increase max output tokens

D.Set temperature to 0.0

AnswerB

Ground forces the model to answer based on provided context.

Why this answer

Grounding with Vertex AI Search restricts the model to the retrieved context, preventing it from using internal knowledge. Setting temperature to 0.0 reduces creativity but doesn't enforce retrieval. Fine-tuning is unnecessary, and increasing max tokens doesn't affect retrieval.

Practice this question →

66

Multi-Selectmedium

Which TWO techniques are effective for reducing bias in generative AI model outputs?

Select 2 answers

A.Increasing model size to learn more patterns

B.Training on diverse and representative datasets

C.Relying solely on post-hoc filters

D.Using adversarial debiasing methods during fine-tuning

E.Limiting the model to only factual prompts

AnswersB, D

Correct: Diverse data helps reduce biased associations.

Why this answer

Option B is correct because training on diverse and representative datasets directly reduces sampling bias and coverage gaps in the training distribution, which are primary sources of stereotypical or skewed outputs. By ensuring the model sees balanced examples across demographics, contexts, and edge cases, it learns more equitable representations and reduces the likelihood of generating biased content.

Exam trap

Google Cloud often tests the misconception that increasing model size or adding post-hoc filters is sufficient to mitigate bias, when in reality these approaches fail to address the root causes of bias in training data and model representations.

Practice this question →

67

Multi-Selecthard

A financial analyst uses generative AI to summarize earnings reports. The summaries vary in style. Which THREE methods can improve consistency? (Choose three.)

Select 3 answers

A.Set temperature to 0.2

B.Increase max output tokens

C.Enable citation mode

D.Use few-shot prompting with fixed examples

E.Fine-tune on a curated dataset of desired summaries

AnswersA, D, E

Reduces output randomness.

Why this answer

Fine-tuning on curated data aligns the model to a desired style, few-shot prompting provides consistent examples, and low temperature reduces randomness. Increasing max tokens does not affect style, and citation mode adds references but not style consistency.

Practice this question →

68

MCQmedium

A user reports that the model's response to the same prompt varies significantly across different calls. Which parameter change would most likely reduce variability?

A.Decrease topK to 10.

B.Decrease temperature to 0.2.

C.Increase candidateCount to 3.

D.Increase maxOutputTokens to 2000.

AnswerB

Lower temperature reduces randomness, making outputs more consistent.

Why this answer

Option A is correct because decreasing temperature reduces randomness and makes outputs more deterministic. Option B (increase maxOutputTokens) does not affect variability. Option C (increase candidateCount) returns multiple candidates but each still varies.

Option D (decrease topK) reduces diversity but temperature has a stronger effect.

Practice this question →

69

MCQhard

After fine-tuning a model on customer support data, the model starts using profanity. What is the most effective mitigation?

A.Add profanity to training data as negative examples

B.Reduce learning rate and retrain

C.Increase temperature to reduce confidence

D.Enable a safety attribute filter

AnswerD

Blocks profanity in real-time without retraining.

Why this answer

Enabling a safety attribute filter blocks profanity at inference time. Reducing learning rate may not help, adding profanity as negative examples could help but is less immediate, and increasing temperature increases randomness, potentially worsening the issue.

Practice this question →

70

MCQhard

A research lab is fine-tuning a large language model on a small dataset of medical records. They observe that the model overfits, memorizing specific patient details and producing outputs that violate privacy regulations. Which technique should they apply to improve generalization and reduce memorization?

A.Increase the batch size to 64

B.Increase the number of training epochs

C.Use early stopping based on validation loss

D.Apply differential privacy (DP-SGD) during fine-tuning

AnswerD

DP-SGD bounds the influence of any single example, reducing memorization and improving privacy.

Why this answer

Differential privacy (DP-SGD) is the correct technique because it directly addresses memorization of sensitive patient data by adding calibrated noise to the gradient updates during fine-tuning. This bounds the model's ability to encode any single individual's information, improving generalization and ensuring compliance with privacy regulations like HIPAA.

Exam trap

Google Cloud often tests the misconception that early stopping or batch size adjustments can prevent memorization, when in fact only techniques like differential privacy directly bound the influence of individual training examples.

How to eliminate wrong answers

Option A is wrong because increasing batch size to 64 reduces gradient variance but does not prevent memorization of specific patient details; it may even accelerate overfitting on a small dataset. Option B is wrong because increasing the number of training epochs exacerbates overfitting, causing the model to memorize more training examples and worsen privacy violations. Option C is wrong because early stopping based on validation loss only halts training when validation performance degrades, but it does not impose any privacy guarantee or fundamentally limit memorization of unique patient records.

Practice this question →

71

MCQmedium

A team wants to improve the factual accuracy of their chatbot responses regarding internal company policies. What is the most effective approach?

A.Use few-shot prompting with example Q&A pairs

B.Increase the model's maximum tokens

C.Fine-tune the model on policy documents

D.Use RAG with Vertex AI Search indexing the policies

AnswerD

Correct: RAG retrieves fresh data from indexed policies, ensuring factual accuracy.

Why this answer

RAG with Vertex AI Search retrieves current policy documents, providing authoritative context. Fine-tuning may not capture frequent updates, and other options do not integrate live knowledge.

Practice this question →

72

MCQeasy

A company notices that their AI chatbot occasionally generates incorrect information. Which technique can best reduce hallucinations without retraining?

A.Use a longer system prompt without examples

B.Use system instructions to constrain the model to only answer from provided context

C.Set top_p to 0.1

D.Increase temperature to 0.9

AnswerB

Correct: This confines the model to the given context, minimizing hallucination.

Why this answer

Using system instructions to constrain the model to the provided context directly reduces fabrication. Increasing temperature or setting top_p low would not specifically target hallucinations.

Practice this question →

73

MCQhard

A streaming platform uses a large generative model for personalized content suggestions. Budget constraints require minimizing inference costs without significantly degrading quality. Which approach is most effective?

A.Deploy the model on higher-end accelerators to save time.

B.Use a distilled version of the model.

C.Implement stronger safety filters to reduce output length.

D.Cache frequent prompts to avoid regeneration.

AnswerB

Distilled models are smaller, faster, and cheaper with comparable quality for many tasks.

Why this answer

Using a smaller distilled model reduces compute cost with minimal quality loss for recommendation tasks. Option A is wrong because stronger safety filters don't reduce cost. Option B is wrong because caching is limited.

Option D is wrong because more accelerators increase cost.

Practice this question →

74

MCQeasy

A developer is using the Gemini API to generate code snippets. They notice the outputs often contain deprecated API calls. Which parameter adjustment or prompt strategy would most effectively encourage the model to use current APIs?

A.Add a system instruction specifying 'Use the most recent API version and avoid deprecated functions.'

B.Set top-p to 0.5 to reduce output diversity

C.Provide one few-shot example of a correct API call

D.Set temperature to 1.5 to increase creativity

AnswerA

System instructions provide explicit guidance to the model on desired behavior.

Why this answer

Option A is correct because including context in the system instruction (e.g., 'Use only the latest stable version of the API') directly guides the model. Option B is wrong because temperature affects randomness, not temporal awareness. Option C is wrong because few-shot examples can help if they show current APIs, but the instruction is more direct.

Option D is wrong because top-p is about nucleus sampling, not freshness.

Practice this question →

75

MCQhard

An AI team is building a customer support chatbot for a telecom company using a fine-tuned LLM on Vertex AI. The model performs well on common issues but fails to answer correctly for rare or novel problems, often providing plausible-sounding but incorrect solutions. The team has a large corpus of internal troubleshooting documents. They want to minimize incorrect answers while keeping latency low. Which approach should they take?

A.Switch to a larger base model (e.g., Gemini Ultra) without any retrieval.

B.Implement a retrieval-augmented generation (RAG) pipeline using Vertex AI Search to fetch relevant documents before generating answers.

C.Collect more data on rare issues and continue fine-tuning the model weekly.

D.Use a few-shot prompt with 10 examples of rare problems and solutions.

AnswerB

RAG dynamically retrieves relevant context, enabling accurate answers for rare issues.

Why this answer

Option B is correct because RAG uses the troubleshooting documents as a knowledge base, providing grounded answers for rare issues without retraining. Option A is wrong because more fine-tuning on common issues won't help with rare ones. Option C is wrong because a larger model may increase latency and cost without solving the grounding problem.

Option D is wrong because few-shot examples cannot cover all rare scenarios.

Practice this question →

Page 1 of 2 · 121 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Techniques to Improve Generative AI Model Output questions.

Start 20-question session