Knowledge + Practice

CCNA Techniques to Improve Generative AI Model Output Questions

46 of 121 questions · Page 2/2 · Techniques to Improve Generative AI Model Output · Answers revealed

Practice these questions Domain overview All questions

76

MCQhard

A developer is building a chatbot for a medical application that discusses sensitive health topics. The chatbot consistently gets its outputs blocked. What should the developer do?

A.Disable the safety filter entirely to allow all topics.

B.Adjust the safety category thresholds to allow VIOLENCE and SEXUAL content since it's medical.

C.Increase the input token limit to 2000.

D.Review and refine the system instructions to avoid triggering safety filters, and consider using a different model endpoint that allows medical contexts.

AnswerD

Refining prompts and using appropriate endpoints can prevent unnecessary blocks.

Why this answer

Option C is correct because reviewing and refining system instructions to avoid triggering safety filters, and possibly using a different model endpoint that allows medical contexts, directly addresses the filter blocks. Option A (disable safety filter) violates policy. Option B (adjust thresholds to allow VIOLENCE/SEXUAL) is inappropriate for medical context and may violate guidelines.

Option D (increase token limit) does not help with safety blocks.

Practice this question →

77

Multi-Selecteasy

A company is prompt engineering a model for customer support. They want to reduce hallucination (false information) in responses. Which TWO techniques are most effective? (Choose two.)

Select 2 answers

A.Implement RAG to retrieve relevant documents for context

B.Provide 3 few-shot examples of conversations

C.Reduce max output tokens to 150

D.Add a system instruction: 'Only answer based on the provided context.'

E.Increase temperature to 1.2

AnswersA, D

RAG provides factual grounding, reducing hallucination.

Why this answer

Correct options: B and D. B (Retrieval-Augmented Generation) grounds the model in real data. D (specify in system instruction to only use provided facts) instructs the model to rely on context.

A (increase temperature) increases creativity, worsening hallucination. C (few-shot examples) helps format but not factuality. E (reduce max tokens) only limits length.

Practice this question →

78

Multi-Selectmedium

A development team is integrating a large language model into a healthcare application. They need to reduce the risk of generating harmful medical advice. Which THREE measures should they implement? (Choose three.)

Select 3 answers

A.Use a safety filter to block outputs containing harmful medical terminology.

B.Implement RAG to retrieve verified medical information from trusted sources.

C.Fine-tune the model on a curated dataset of medical textbooks.

D.Include a disclaimer in the system instruction that the model is not a doctor.

E.Set the temperature to a very high value to ensure diverse outputs.

AnswersA, B, C

Safety filters directly block harmful content at inference time.

Why this answer

Options A, B, and C are correct. A safety filter blocks outputs containing harmful terminology, fine-tuning on a curated medical dataset improves domain knowledge and safety, and RAG with trusted sources grounds outputs in verified information. Option D (high temperature) increases randomness and risk.

Option E (disclaimer) does not reduce the generation of harmful advice.

Practice this question →

79

MCQhard

A healthcare organization needs a generative AI model to answer medical questions using proprietary clinical guidelines. They have a large dataset of doctor-patient interactions. Should they fine-tune a pre-trained model or use Retrieval-Augmented Generation (RAG)?

A.Use RAG to reduce inference costs by skipping model updates.

B.Use RAG to retrieve relevant guidelines during inference, avoiding frequent retraining.

C.Use prompt engineering to encode all guidelines into the system prompt.

D.Fine-tune the model on the clinical guidelines and interactions.

AnswerB

RAG dynamically pulls up-to-date guidelines, ensuring accuracy and compliance.

Why this answer

RAG is preferred because it can incorporate the latest guidelines without retraining, crucial for regulatory changes. Fine-tuning may cause overfitting to outdated interactions. Option B is wrong because fine-tuning requires continuous retraining.

Option C is wrong because prompt engineering alone cannot inject proprietary knowledge. Option D is wrong because RAG does not inherently reduce cost.

Practice this question →

80

MCQhard

A healthcare startup fine-tunes a model to generate patient education materials. They want to ensure the model never gives medical advice, only information. They add a safety instruction, but the model sometimes still gives advice. What advanced technique should they apply?

A.Hard-code a list of prohibited phrases in a post-processing script

B.Add a secondary classifier to rewrite any detected advice into general information

C.Use semantic similarity to a 'medical advice' embedding and reject if close

D.Apply RLHF with a reward model that penalizes outputs containing medical advice

AnswerD

RLHF directly optimizes the model to avoid undesired behaviors based on human preferences.

Why this answer

Option C is correct because reinforcement learning from human feedback (RLHF) with a reward model that penalizes advice can steer the model away from that behavior. Option A is wrong because hard-coded rules may not cover all cases. Option B is wrong because embedding distance is not effective for controlling output content.

Option D is wrong because output filtering can block but does not prevent generation of advice in the first place.

Practice this question →

81

MCQeasy

Which technique allows a model to incorporate real-time data from external APIs?

A.RAG with tool calling

B.Prompt engineering

C.Fine-tuning

D.Model pruning

AnswerA

Enables dynamic API access during generation.

Why this answer

RAG with tool calling enables the model to query external APIs in real-time to retrieve current data. Fine-tuning uses static data, prompt engineering alone doesn't fetch data, and model pruning reduces size.

Practice this question →

82

MCQmedium

A team is deploying a large language model for legal document summarization. They find the model occasionally omits critical legal clauses. Which improvement technique would be most effective?

A.Design a prompt that explicitly lists required sections

B.Increase the top_p value to 1.0

C.Fine-tune the model on legal summaries

D.Lower the temperature to 0.1

AnswerA

A structured prompt with requirements improves completeness.

Why this answer

Using prompt engineering with explicit instructions to include all clauses and possibly a checklist directly addresses omissions. Option A is wrong because fine-tuning would require labeled data of summaries with clauses. Option B is wrong because temperature reduction might make output less creative but doesn't enforce completeness.

Option D is wrong because it adds randomness, making omissions more likely.

Practice this question →

83

MCQmedium

A developer is building a customer support chatbot using a large language model. The chatbot frequently generates plausible-sounding but incorrect answers to product questions. Which technique should be applied to improve factual accuracy?

A.Provide a few-shot example of correct answers in the prompt.

B.Use a higher temperature setting to encourage more creative responses.

C.Increase the model's context length to include more of the conversation history.

D.Enable Grounding with the company's product knowledge base.

AnswerD

Grounding retrieves live, verified data and injects it into the prompt, directly improving factual accuracy.

Why this answer

Option D is correct because Grounding (e.g., using Vertex AI Grounding with Search) retrieves relevant information from a trusted source in real time, reducing hallucination. Option A is wrong because increasing context length may include more irrelevant information and does not guarantee accuracy. Option B is wrong because higher temperature increases randomness, worsening hallucinations.

Option C is wrong because few-shot prompting can help but only if examples are accurate and relevant; it does not dynamically look up facts.

Practice this question →

84

MCQmedium

A team monitors their generative AI model on Vertex AI. They notice output quality declining. Which metric is most likely the root cause?

A.Input token count per request is increasing.

B.Output token count is decreasing.

C.Prediction latency is stable.

D.Error rate is less than 1%.

AnswerA

Growing inputs may push the model beyond optimal context length, reducing focus.

Why this answer

The increasing input token count suggests users are providing more context, which may exceed the model's effective context window or dilute relevant information, degrading quality. Latency and error rate are fine.

Practice this question →

85

MCQmedium

A content generation model for e-commerce product descriptions repeats the same phrases across multiple descriptions (e.g., 'high-quality', 'best-in-class'). The team wants more varied and engaging output. Which parameter adjustment is most appropriate?

A.Increase the frequency penalty parameter to 1.0.

B.Decrease the max output tokens to 50.

C.Increase the temperature parameter to 1.5.

D.Set the top-p value to a very small number like 0.1.

AnswerA

Frequency penalty specifically reduces the model's tendency to repeat tokens, improving lexical diversity.

Why this answer

Option B is correct because increasing the frequency penalty discourages the model from repeating tokens, directly reducing repetition. Option A is wrong because higher temperature increases randomness but may not specifically target repetition. Option C is wrong because focusing top-p only on a small set may increase repetition.

Option D is wrong because decreasing max tokens truncates output but doesn't reduce repetition.

Practice this question →

86

Multi-Selecthard

Which TWO techniques can help reduce latency for a real-time generative AI application? (Choose two.)

Select 2 answers

A.Use streaming responses to send tokens as generated.

B.Quantize the model to a lower precision.

C.Deploy more model replicas to handle load.

D.Enable prompt caching for repeated queries.

E.Batch multiple user requests together.

AnswersA, B

Streaming eliminates waiting for the full output, reducing perceived latency.

Why this answer

Streaming and model quantization directly reduce response time. Batching is for offline, and more deploy replicas can increase throughput but not necessarily reduce latency for a single request. Prompt caching can help if prompts repeat, but not generally.

Practice this question →

87

MCQmedium

A software development team builds an internal code assistant using a generative model. The assistant writes Python functions that often contain security vulnerabilities such as SQL injection or command injection. The team wants to mitigate these vulnerabilities without adding a manual review step for every code snippet, as that would slow development. They have access to a static analysis security scanner API. Which approach best addresses the vulnerabilities while maintaining developer velocity?

A.Increase top-k sampling to generate a wider variety of code tokens.

B.After each generation, automatically run the code through the static analysis scanner, and if vulnerabilities are found, send the output back to the model for revision with the scanner's feedback.

C.Fine-tune the model on a corpus of secure code examples.

D.Add a system prompt: 'Do not generate code with security vulnerabilities.'

AnswerB

This iterative process catches and corrects security issues without manual intervention, keeping velocity high.

Why this answer

Option D is correct because post-generation automatic scanning with the security scanner catches vulnerabilities and can request regeneration with suggestions, maintaining speed. Option A is wrong because fine-tuning may not eliminate all vulnerabilities. Option B is wrong because top-k only affects output diversity, not security.

Option C is wrong because system prompts are not reliably followed for security.

Practice this question →

88

MCQeasy

A developer uses a generative AI model with the system instruction shown. The response is correct but very brief. Which parameter adjustment could encourage more detail without losing accuracy?

A.Add 'Provide a detailed response' to the system instruction.

B.Set temperature to 0 to make output deterministic.

C.Set topK to 1 to focus on most likely tokens.

D.Increase temperature to 1.5 to encourage creativity.

AnswerA

System instructions can guide verbosity while maintaining accuracy.

Why this answer

Adding a length constraint in a system instruction (e.g., 'Provide detailed responses') is effective. Lower temperature may reduce creativity. Higher temperature could introduce errors.

Changing topK doesn't directly control length.

Practice this question →

89

MCQhard

A generative AI model for chatbot responses sometimes produces toxic language. The team wants to reduce toxicity without significantly affecting the model's helpfulness. Which approach is best?

A.Increase the temperature parameter

B.Reduce the maximum output tokens

C.Fine-tune with a dataset of non-toxic responses and use RLHF

D.Apply a toxicity classifier as a post-processing filter

AnswerC

Fine-tuning combined with RLHF aligns model behavior effectively.

Why this answer

Fine-tuning with a curated dataset of non-toxic responses directly adjusts the model's weights to reduce the likelihood of generating toxic language, while RLHF (Reinforcement Learning from Human Feedback) further aligns the model with human preferences for helpfulness and safety. This combined approach addresses the root cause of toxicity in the model's behavior without the blunt trade-offs of other methods, preserving the model's utility.

Exam trap

Google Cloud often tests the misconception that post-processing filters (like toxicity classifiers) are sufficient for safety, when in fact they fail to address the model's learned behavior and can degrade helpfulness due to false positives, making fine-tuning with RLHF the superior alignment technique.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter increases randomness in token selection, which can actually amplify the probability of generating toxic or nonsensical outputs, not reduce them. Option B is wrong because reducing the maximum output tokens limits response length but does not influence the content or safety of the generated tokens, leaving toxicity unchanged. Option D is wrong because applying a toxicity classifier as a post-processing filter only masks toxic outputs after generation, wasting computational resources and potentially blocking helpful responses that contain false-positive flagged terms, without fixing the underlying model behavior.

Practice this question →

90

Multi-Selecthard

Which THREE are best practices for designing prompts for a generative AI model?

Select 3 answers

A.Provide few-shot examples for complex tasks

B.Include specific and clear instructions

C.Break the task into smaller steps

D.Use negative prompts to avoid undesired outputs

E.Always set temperature to 1.0 for creativity

AnswersA, B, C

Correct: Examples guide the model toward desired outputs.

Why this answer

Clear instructions, few-shot examples, and task decomposition improve the model's understanding and output quality. Negative prompting is less reliable, and fixed temperature is not a universal best practice.

Practice this question →

91

Multi-Selectmedium

Which TWO techniques can help improve the factual accuracy of a language model's outputs? (Choose two.)

Select 2 answers

A.Decrease the max output tokens.

B.Increase the temperature parameter.

C.Fine-tune on a domain-specific curated dataset.

D.Implement retrieval-augmented generation (RAG).

E.Use top-k random sampling.

AnswersC, D

Fine-tuning adapts the model to domain facts.

Why this answer

Fine-tuning on a domain-specific curated dataset (C) directly adjusts the model's weights using high-quality, verified examples, teaching it to produce factually correct outputs for that domain. This reduces hallucinations by grounding the model in accurate, relevant data rather than relying solely on its pre-training distribution.

Exam trap

Google Cloud often tests the misconception that adjusting decoding parameters (like temperature, top-k, or max tokens) can improve factual accuracy, when in reality these only control output style, length, or randomness, not the correctness of the underlying information.

Practice this question →

92

MCQeasy

A data scientist is using a large language model to generate product descriptions. The descriptions are often too verbose. Which parameter adjustment is most appropriate?

A.Decrease the top-k value.

B.Increase the max output tokens.

C.Decrease the temperature.

D.Increase the frequency penalty.

AnswerD

Frequency penalty reduces repetitive phrases, encouraging conciseness.

Why this answer

Option A is correct because increasing the frequency penalty discourages repetition and can make output more concise. Option B (decreasing temperature) reduces randomness but not verbosity. Option C (decreasing top-k) limits word choice but not length.

Option D (increasing max tokens) makes descriptions longer.

Practice this question →

93

MCQhard

A model generates responses that frequently repeat phrases or words. Which parameter adjustment is most likely to fix this?

A.Increase top_k

B.Increase temperature

C.Increase repetition penalty

D.Increase max output tokens

AnswerC

Correct: Repetition penalty specifically reduces the likelihood of repeating tokens.

Why this answer

Increasing the repetition penalty directly discourages the model from selecting tokens that have already appeared in the generated sequence, thereby reducing repetitive phrases or words. This parameter works by subtracting a fixed penalty from the logits of previously generated tokens before applying the softmax function, making them less likely to be chosen again.

Exam trap

The trap here is that candidates often confuse repetition penalty with diversity-promoting parameters like temperature or top_k, mistakenly believing that increasing randomness or narrowing token selection will fix repetition, when in fact those adjustments can worsen the problem.

How to eliminate wrong answers

Option A is wrong because increasing top_k limits the sampling pool to the k most likely next tokens, which can actually increase repetition by narrowing the diversity of choices. Option B is wrong because increasing temperature flattens the probability distribution, making all tokens more equally likely, which can lead to more random and potentially more repetitive outputs, not less. Option D is wrong because increasing max output tokens only extends the length of the generated response; it does not address the underlying cause of repetition and may even exacerbate it by allowing more opportunities for the model to loop on repeated phrases.

Practice this question →

94

MCQmedium

A company uses Vertex AI PaLM for code generation. The code often contains security vulnerabilities. Which improvement should be applied?

A.Set top_k to 1

B.Include a security-focused system instruction

C.Use Codey model instead

D.Increase temperature to 0.8

AnswerB

Guides the model to prioritize security practices.

Why this answer

Including a security-focused system instruction (e.g., 'Write secure code that avoids SQL injection') directly guides the model to produce safer output. Increasing temperature worsens security, setting top_k to 1 reduces diversity but doesn't address security, and using Codey alone without instructions may not suffice.

Practice this question →

95

MCQmedium

A company uses a generative model to produce product descriptions. The descriptions are factually inconsistent with the product specs. Which technique would best ensure factual accuracy?

A.Enhance the system prompt with product details

B.Implement retrieval-augmented generation (RAG) with product database

C.Lower the temperature to 0.0

D.Fine-tune the model on product descriptions

AnswerB

RAG grounds generation in factual data.

Why this answer

Retrieval-augmented generation (RAG) is the best technique because it dynamically retrieves relevant, up-to-date product specifications from a trusted database at inference time, grounding the model's output in verified facts. This directly addresses factual inconsistency by ensuring the generated description is based on authoritative source data rather than relying solely on the model's parametric memory.

Exam trap

Google Cloud often tests the misconception that prompt engineering alone (Option A) or deterministic sampling (Option C) can solve factual grounding issues, when in reality they do not provide external knowledge retrieval to correct hallucinations.

How to eliminate wrong answers

Option A is wrong because enhancing the system prompt with product details only provides static context that the model may still hallucinate or misinterpret; it does not enforce retrieval of current or specific factual data. Option C is wrong because lowering the temperature to 0.0 makes the output more deterministic but does not prevent the model from generating factually incorrect content that is confidently wrong. Option D is wrong because fine-tuning on product descriptions can improve style and consistency but does not guarantee factual accuracy for new or updated product specs, and it risks overfitting or memorizing inaccuracies from the training data.

Practice this question →

96

MCQeasy

A company uses a text generation model for customer support but notices it occasionally provides outdated information. Which technique should they implement to improve output accuracy?

A.Increase max output tokens

B.Implement retrieval-augmented generation (RAG)

C.Fine-tune the model with more historical support data

D.Increase model temperature to 1.0

AnswerB

RAG retrieves current information, making outputs accurate and up-to-date.

Why this answer

Option B is correct because RAG (Retrieval-Augmented Generation) retrieves current information from a knowledge base, ensuring factual accuracy. Option A (temperature increase) would increase randomness, making outputs less reliable. Option C (fine-tuning with historical data) might not include recent updates.

Option D (max tokens) only affects length, not accuracy.

Practice this question →

97

MCQeasy

Refer to the exhibit. A data scientist sends a prediction request to a text generation model with the following parameters and receives repetitive output. Which parameter should be changed?

A.Decrease topP to 0.5

B.Increase topK to 100

C.Decrease maxOutputTokens

D.Increase temperature to 0.5

AnswerD

Introduces randomness to avoid repetition.

Why this answer

Temperature 0.0 makes the model deterministic, leading to repetitive text. Increasing temperature to 0.5 introduces randomness. Decreasing topP may help but temperature is the direct cause.

Increasing topK adds diversity but less effect, decreasing max tokens doesn't fix repetition.

Practice this question →

98

MCQmedium

A team uses Vertex AI Generative AI Studio to tune a model via RLHF. After tuning, the model outputs are bland. What likely went wrong?

A.Insufficient training data

B.Too many training steps

C.Low temperature during evaluation

D.Reward model overfits to generic responses

AnswerD

Penalizes unique outputs, making them bland.

Why this answer

If the reward model overfits to generic responses, it penalizes creativity, leading to bland output. Too many training steps can cause overfitting to the reward model, but the most direct cause is the reward model itself. Low temperature during evaluation is a parameter issue, not tuning.

Insufficient data may cause underfitting.

Practice this question →

99

MCQmedium

A chatbot built with Vertex AI PaLM API often provides outdated information about company policies because the training data is months old. Which approach should the team use?

A.Implement grounding by connecting to a knowledge base of current policies.

B.Use prompt engineering to instruct the model to say 'I don't know' if unsure.

C.Increase the context window to include more history.

D.Fine-tune the model on the latest policy documents.

AnswerA

Grounding retrieves real-time information from the knowledge base.

Why this answer

Option A is correct because implementing grounding by connecting to a knowledge base of current policies ensures the chatbot retrieves up-to-date information at runtime. Option B (fine-tuning on latest documents) is time-consuming and requires continuous updates. Option C (prompting to say 'I don't know') doesn't provide correct info.

Option D (increasing context window) doesn't update knowledge.

Practice this question →

100

MCQmedium

Refer to the exhibit. A team runs 'gcloud ai models list --filter=displayName:qa-chat-v1' and sees the output. The model was tuned using supervised fine-tuning (SFT) but shows 'state: DEPLOYING' for days. What is the most likely issue?

A.The evaluation metrics are missing, causing deployment to hang

B.The training pipeline failed silently

C.The model is stuck in deployment due to insufficient quota

D.The model has no errors, so it is fine

AnswerC

Quota limits can cause indefinite DEPLOYING state.

Why this answer

The model is stuck in DEPLOYING state with no errors, suggesting a resource issue like hitting the deployment quota. Silent failures would show errors, missing evaluation metrics don't block deployment, and no errors doesn't mean it's fine.

Practice this question →

101

MCQmedium

What is the primary purpose of a system instruction in the Gemini API?

A.Set the model's temperature and top_p

B.Define the overall behavior and constraints for the model

C.Provide few-shot examples for each query

D.Set the maximum output length

AnswerB

Correct: System instructions guide the model's persona and rules.

Why this answer

The system instruction in the Gemini API is the primary mechanism to define the overall behavior, persona, constraints, and guardrails for the model across all interactions. Unlike per-query parameters, it sets a persistent context that shapes how the model interprets every user prompt, ensuring consistent adherence to rules such as tone, format, or safety policies.

Exam trap

Google Cloud often tests the distinction between persistent system-level instructions and per-request parameters, so the trap here is confusing the system instruction (which defines the model's role and constraints) with generation controls like temperature, top_p, or max tokens, which only affect the style or length of a single response.

How to eliminate wrong answers

Option A is wrong because temperature and top_p are sampling parameters that control randomness and diversity of output, not the overarching behavioral constraints set by a system instruction. Option C is wrong because few-shot examples are typically provided in the user prompt or as part of a structured conversation, not as the primary purpose of a system instruction, which is for persistent context rather than per-query demonstrations. Option D is wrong because maximum output length is a generation parameter that limits token count, not a behavioral or constraint-setting mechanism like a system instruction.

Practice this question →

102

MCQhard

A company is deploying a generative AI model for customer support. They want to reduce hallucinations while maintaining fluency. They have a large dataset of previous support conversations. Which strategy should they prioritize?

A.Increase the beam search width to 10.

B.Implement retrieval-augmented generation (RAG) using the conversation dataset as a knowledge base.

C.Fine-tune the model on the conversation dataset.

D.Set the temperature to 0.1.

AnswerB

RAG retrieves relevant facts from the dataset, reducing hallucinations.

Why this answer

Retrieval-augmented generation (RAG) directly addresses hallucinations by grounding the model's responses in factual, retrieved data from the conversation dataset. This approach allows the model to generate fluent, contextually relevant answers while reducing the risk of inventing information, as it retrieves actual support interactions as evidence before generating a response.

Exam trap

Google Cloud often tests the misconception that tuning generation parameters (like temperature or beam search) can fix hallucinations, when in fact only grounding techniques like RAG or knowledge graph integration address the root cause of factual inaccuracy.

How to eliminate wrong answers

Option A is wrong because increasing beam search width to 10 improves output fluency by exploring more candidate sequences but does not reduce hallucinations; it may even amplify incorrect patterns if the model is prone to hallucination. Option C is wrong because fine-tuning on the conversation dataset can improve domain-specific fluency but risks overfitting to noise or biases in the data, and without retrieval, the model may still hallucinate when faced with novel queries. Option D is wrong because setting temperature to 0.1 makes the model more deterministic and less creative, which can reduce variability but does not prevent hallucinations; it may cause the model to repeat common but incorrect patterns from training data.

Practice this question →

103

MCQeasy

A company deploys a sentiment analysis model to classify customer reviews. The model consistently returns overly positive sentiment for all reviews, even when reviews contain negative feedback. Which technique would best resolve this issue?

A.Add a system prompt instructing the model to analyze the review for both positive and negative sentiment and output the overall classification.

B.Fine-tune the model on a dataset with an equal number of positive and negative examples.

C.Reduce the max output tokens to limit the model's tendency to generate positive language.

D.Increase the temperature parameter to reduce model confidence.

AnswerA

Prompt engineering can directly guide the model to consider all sentiment categories.

Why this answer

Option C is correct because using a system prompt with explicit instructions to detect all sentiments, including negative, guides the model to consider the full emotional range. Option A is wrong because increasing temperature adds randomness and doesn't enforce balance. Option B is wrong because adjusting max tokens only affects output length.

Option D is wrong because fine-tuning on a balanced dataset is a good practice but is not the quickest fix; prompt engineering is more immediate.

Practice this question →

104

Multi-Selecteasy

Which THREE strategies should be combined to effectively reduce biased outputs in a generative AI model? (Choose three.)

Select 3 answers

A.Implement safety filters targeting hate speech and stereotypes.

B.Conduct human evaluation and feedback loops.

C.Use diverse few-shot examples that represent different demographics.

D.Raise the temperature to increase output variability.

E.Fine-tune the model on a biased dataset to learn patterns.

AnswersA, B, C

Safety filters block explicitly biased content.

Why this answer

Diverse few-shot examples, safety filters, and human-in-the-loop evaluation directly address bias. Increasing temperature may amplify bias, and fine-tuning on biased data perpetuates bias.

Practice this question →

105

Multi-Selecthard

A company is using a large language model for automated translation of legal contracts. They find that the translations sometimes alter the meaning of specific clauses. Which TWO approaches would most effectively preserve the original meaning? (Choose two.)

Select 2 answers

A.Provide the full contract context in a single prompt.

B.Set top-p=0.1 to limit the vocabulary to the most likely tokens.

C.Fine-tune the model on a parallel corpus of legal translations.

D.Use a glossary of key legal terms with their translations.

E.Increase the temperature to allow more creative phrasing.

AnswersC, D

Fine-tuning on domain-specific translations improves accuracy.

Why this answer

Options B and D are correct. Using a glossary of key legal terms with their translations ensures consistent terminology, and fine-tuning on a parallel corpus of legal translations adapts the model to the domain. Option A (full context in one prompt) may exceed token limits.

Option C (high temperature) increases creativity and risk of altering meaning. Option E (low top-p) restricts vocabulary but does not preserve meaning of specific clauses.

Practice this question →

106

MCQeasy

A marketing company wants to fine-tune a generative AI model to adopt a specific brand voice. Which tuning method is most appropriate?

A.RLHF with general user feedback

B.Grounding with external knowledge base

C.Supervised fine-tuning with labeled examples of the brand voice

D.Prompt engineering with system instructions

AnswerC

Correct: Labeled examples directly teach the model the desired tone and style.

Why this answer

Supervised fine-tuning with labeled examples directly teaches the desired style. RLHF is for broader alignment, and grounding or prompt engineering are not as precise for tone.

Practice this question →

107

MCQhard

A data scientist fine-tunes a generative image captioning model to describe medical images. The model outputs safe but very generic captions (e.g., 'An image of cells'). The goal is to produce more specific, clinically relevant descriptions. Which approach is most effective?

A.Perform incremental fine-tuning on a curated dataset of detailed medical image captions.

B.Use diverse beam search during decoding to generate multiple caption candidates.

C.Adjust top-k sampling to restrict the vocabulary to medical terms only.

D.Increase the temperature to encourage the model to output longer, more varied captions.

AnswerA

Fine-tuning with domain-specific examples teaches the model to generate precise clinical descriptions.

Why this answer

Option A is correct because incremental fine-tuning on a high-quality dataset of specific medical captions directly teaches the model the desired level of detail. Option B is wrong because increasing temperature may add irrelevant words, not specific clinical terms. Option C is wrong because top-k sampling can reduce output space but does not guarantee medical accuracy.

Option D is wrong because beam search is for diverse output but does not address specificity.

Practice this question →

108

MCQeasy

A developer wants to improve the factual accuracy of the model's summaries. Based on the exhibit, what should they do?

A.Enable the support engine.

B.Increase the model's context window.

C.Configure grounding with a knowledge base.

D.Re-train the model with a dataset of facts.

AnswerC

Grounding provides factual context, improving accuracy.

Why this answer

Option A is correct because GROUNDING_CONFIG is NONE, so enabling grounding with a knowledge base would allow the model to retrieve factual information. Option B (enable support engine) is a different feature. Option C (re-train) is possible but more resource-intensive.

Option D (increase context window) does not directly improve factual accuracy.

Practice this question →

109

MCQmedium

A financial services firm is using a foundation model on Vertex AI to generate investment summaries from quarterly reports. The summaries are accurate but often miss key financial metrics and trends. The team cannot afford to fine-tune the model frequently. Which technique should they use to improve the completeness and relevance of the summaries without modifying the model?

A.Increase temperature to 0.9 to encourage more creative outputs.

B.Provide three few-shot examples in the prompt that highlight the desired metrics.

C.Set stop sequences to [' '] to ensure the model finishes each paragraph.

D.Lower top_p to 0.5 to reduce the sampling pool.

AnswerB

Few-shot examples condition the model to replicate the structure and content of the examples.

Why this answer

Option C is correct because adding few-shot examples that specifically include the desired metrics (e.g., revenue growth, profit margins) trains the model to include those details. Option A is wrong because increasing temperature increases randomness, which could omit key facts. Option B is wrong because stopping at newlines doesn't guarantee completeness.

Option D is wrong because adjusting top_p does not target completeness.

Practice this question →

110

Multi-Selectmedium

A healthcare chatbot must avoid hallucinations. Which TWO techniques should the team implement? (Choose two.)

Select 2 answers

A.Set frequency penalty to 0.0

B.Use chain-of-thought prompting

C.Use higher temperature

D.Increase top_k to 50

E.Enable grounding with a knowledge base

AnswersB, E

Encourages step-by-step reasoning, reducing errors.

Why this answer

Grounding with a knowledge base ensures responses are based on retrieved facts, and chain-of-thought prompting improves reasoning steps. Higher temperature increases randomness, top_k increases diversity, and frequency penalty affects repetition, not hallucinations.

Practice this question →

111

MCQhard

A team configures a Vertex AI prediction request as shown. Users report that the model sometimes produces incoherent or off-topic responses despite moderate settings. What is the most likely cause?

A.The temperature is too high for coherent responses.

B.The maxOutputTokens is too low.

C.The safety threshold blocks too much content.

D.The topK value is too low.

AnswerA

High temperature introduces randomness, reducing coherence.

Why this answer

The combination of high temperature (0.9) and high topK (40) increases randomness and diversity, leading to incoherent outputs. Lowering temperature and topK can improve coherence. Safety settings are unrelated, and maxOutputTokens is a limit, not a cause of incoherence.

Practice this question →

112

MCQhard

A healthcare company is using a fine-tuned version of PaLM 2 on Vertex AI to generate clinical notes from doctor-patient conversations. The model was fine-tuned on a dataset of 10,000 de-identified transcripts and corresponding notes. During testing, the generated notes are grammatically correct and well-structured, but they often contain subtle inaccuracies: for example, they might mention a medication that was not discussed, or omit a key symptom. The team has already tried increasing the training epochs and adjusting learning rates, with minimal improvement. They need a solution that can be implemented quickly to improve factual accuracy without retraining the entire model. The team has access to a large archive of verified clinical notes and a small set of recent conversation-to-note pairs that have been manually reviewed and corrected. The inference pipeline currently uses a single call to the model with the conversation transcript as input. What should the team do?

A.Implement retrieval-augmented generation (RAG) by retrieving similar verified notes from the archive and providing them as context in the prompt.

B.Decrease the temperature to 0.1 to reduce randomness and force the model to stick to the input.

C.Use prompt engineering to instruct the model to only include information explicitly mentioned in the conversation.

D.Add a human-in-the-loop step to review and correct every generated note before use.

AnswerA

RAG grounds the generation in factual examples, directly reducing inaccuracies without retraining.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) directly addresses the core issue of factual inaccuracy without retraining. By retrieving verified clinical notes similar to the current conversation from the archive and injecting them as context in the prompt, the model gains access to ground-truth examples that anchor its output to factual details. This approach leverages the team's existing archive and small set of corrected pairs to provide relevant, accurate context, improving precision without modifying the model's weights.

Exam trap

The trap here is that candidates often assume factual inaccuracy is solely a randomness issue (temperature) or a prompt instruction problem, overlooking that the model's parametric knowledge is insufficient and needs external grounding via retrieval augmentation.

How to eliminate wrong answers

Option B is wrong because decreasing temperature to 0.1 reduces randomness but does not fix factual inaccuracies stemming from the model's training data or lack of context; it may actually cause the model to become overly deterministic and repeat hallucinations from its fine-tuning. Option C is wrong because prompt engineering to instruct the model to only include explicitly mentioned information is a superficial fix that cannot overcome the model's tendency to hallucinate or omit details when the training data or fine-tuning process has embedded those inaccuracies; it lacks the grounding provided by external verified data. Option D is wrong because adding a human-in-the-loop step to review every note is a manual, non-scalable solution that does not improve the model's output quality at inference time and fails to address the root cause of factual inaccuracy; it also contradicts the requirement for a quick implementation without retraining.

Practice this question →

113

MCQmedium

A legal firm uses a generative AI to draft contracts. They want the output to follow a specific clause structure. Which technique should they use in the prompt?

A.Include a system instruction that defines the required format.

B.Increase temperature to encourage variance.

C.Use grounding to pull from a database of contracts.

D.Set stop sequences to end generation at certain points.

AnswerA

System instructions can specify structure, ensuring adherence.

Why this answer

System instructions set overarching rules for output structure. Option A is wrong because temperature doesn't enforce structure. Option C is wrong because grounding retrieves facts, not structure.

Option D is wrong because adjusting stop sequences only ends generation.

Practice this question →

114

MCQmedium

A marketing agency uses a generative AI model to create slogans for ad campaigns. The model outputs generic slogans like 'Quality you can trust' that lack originality. The agency has a library of past award-winning slogans and wants to generate more creative and brand-specific outputs. They have a requirement that the model must not produce slogans longer than 15 words. Which technique should they prioritize?

A.Use few-shot prompting with 3-5 examples of award-winning slogans in the prompt.

B.Set max tokens to 15 to force shorter, potentially more punchy slogans.

C.Increase the temperature to 1.2 to encourage more creative word combinations.

D.Fine-tune the model on the library of award-winning slogans.

AnswerA

Few-shot examples teach the desired style and creativity directly.

Why this answer

Option C is correct because providing few-shot examples of award-winning slogans in the prompt directly inspires creativity and style matching. Option A is wrong because increasing temperature may produce nonsense. Option B is wrong because fine-tuning is heavy and not needed.

Option D is wrong because token limit only truncates length, doesn't improve creativity.

Practice this question →

115

MCQeasy

A product team uses a translation model to convert English product descriptions into French. The model mixes formal and informal French dialects. Which simple prompt modification likely solves this?

A.Increase the temperature to encourage more consistent output.

B.Add a system prompt specifying 'Use only formal French with no informal expressions.'

C.Fine-tune the model on a corpus of formal French texts.

D.Provide a few-shot example of a formal French translation in the prompt.

AnswerB

Prompt engineering directly addresses the style issue.

Why this answer

Option D is correct because adding a requirement in the prompt to 'Use formal French' directly instructs the model on the desired style. Option A is wrong because temperature is for creativity, not dialect. Option B is wrong because few-shot with formal examples could help, but it's not a simple modification; a system prompt is simpler.

Option C is wrong because fine-tuning is overkill.

Practice this question →

116

MCQmedium

A team is deploying a text generation model for legal document review. They observe that the model occasionally generates factually incorrect legal citations. Which approach best reduces this issue?

A.Implement retrieval-augmented generation (RAG) with a verified legal database.

B.Lower the temperature to 0.0.

C.Use a larger base model.

D.Increase the max output tokens.

AnswerA

RAG retrieves factual information from verified sources, reducing hallucinations.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) with a verified legal database grounds the model in factual, up-to-date sources, directly addressing incorrect citations. Option A (lowering temperature) reduces randomness but does not prevent hallucination. Option B (increasing max tokens) has no effect on factual accuracy.

Option D (using a larger model) may not guarantee correctness without proper grounding.

Practice this question →

117

MCQmedium

A travel company fine-tuned a language model on customer chat logs to provide travel recommendations. After deployment, they receive complaints that the model sometimes generates inappropriate or offensive content. What is the most effective approach to improve output safety while preserving overall performance?

A.Modify the system instruction to request polite responses only

B.Retrain the model on a larger dataset of chat logs

C.Reduce the temperature to 0.0

D.Add a post-processing safety classifier that filters or rewrites unsafe outputs

AnswerD

A safety classifier directly catches and mitigates harmful content without modifying the base model.

Why this answer

Option B is correct because applying a safety classifier as an output filter catches harmful content without retraining. Option A is wrong because retraining on more data may not address specific safety issues. Option C is wrong because lowering temperature reduces creativity but not necessarily offensive content.

Option D is wrong because system instruction alone is often insufficient for robust safety.

Practice this question →

118

MCQeasy

A developer is using Vertex AI PaLM 2 to generate product descriptions. The output is often too verbose and includes irrelevant details. Which technique should the developer apply?

A.Set top_p to 0.1

B.Enable safety filters

C.Use few-shot prompting with examples of concise descriptions

D.Increase temperature to 0.9

AnswerC

Guides the model to match the style of provided examples.

Why this answer

Option C is correct because the developer needs to constrain the model's output to be concise and relevant. Few-shot prompting provides the model with explicit examples of the desired output format (concise descriptions), guiding it to mimic that style and length. This directly addresses verbosity and irrelevant details without altering the model's fundamental randomness or safety settings.

Exam trap

The trap here is that candidates confuse hyperparameter tuning (top_p, temperature) with prompt engineering techniques, assuming that reducing randomness (top_p) or increasing creativity (temperature) can fix verbosity, when only explicit examples in the prompt can reliably enforce a specific output style.

How to eliminate wrong answers

Option A is wrong because setting top_p to 0.1 reduces the cumulative probability threshold for token sampling, which makes the output less diverse and more deterministic, but it does not teach the model to be concise or omit irrelevant details—it only narrows the pool of possible next tokens. Option B is wrong because safety filters block harmful or sensitive content (e.g., toxicity, violence), not verbose or irrelevant details; they do not control output length or relevance. Option D is wrong because increasing temperature to 0.9 increases randomness and creativity in token selection, which would likely make the output even more verbose and include more irrelevant details, the opposite of what is needed.

Practice this question →

119

MCQhard

Refer to the exhibit. The team changed the generation parameters to reduce output variability. However, summaries now often repeat the same phrases. Which parameter change is most likely causing the repetition?

A.Reducing top_p from 0.95 to 0.85

B.Reducing temperature from 0.7 to 0.2

C.Using the same model text-bison@002

D.Reducing top_k from 40 to 10

AnswerB

Low temperature increases determinism and repetition.

Why this answer

Lowering temperature to 0.2 makes the model more deterministic, increasing repetition. Option A is wrong because top_k reduction also contributes to determinism. Option B is wrong because top_p reduction also narrows token selection.

Option D is wrong because the model is the same.

Practice this question →

120

MCQeasy

A company is using Vertex AI to generate customer support summaries from chat logs. They notice that the summaries sometimes include irrelevant details from the conversation. Which technique should they use to reduce irrelevant details?

A.Use a higher top-k value.

B.Fine-tune the model on a large dataset of general conversations.

C.Add a system instruction to focus on key points.

D.Increase the temperature parameter.

AnswerC

This guides the model to produce concise, relevant summaries.

Why this answer

Option A is correct because adding a system instruction to focus on key points directly guides the model to omit irrelevant details. Option B (increasing temperature) would increase randomness and potentially introduce more irrelevant content. Option C (using a higher top-k value) increases diversity of word choices, not relevance.

Option D (fine-tuning on general conversations) is not targeted and may not resolve the specific issue.

Practice this question →

121

MCQeasy

A developer is using the Gemini API to build a chatbot. They want the model to always respond in a friendly, professional tone. Which prompt engineering technique should they use?

A.Set system instructions to 'You are a friendly and professional assistant.'

B.Include a few-shot example in every user message.

C.Set the temperature to 0.2.

D.Set max output tokens to 100.

AnswerA

System instructions define the assistant's behavior for the entire session.

Why this answer

Option A is correct because setting system instructions is the most direct and reliable way to define the model's persona and behavioral constraints. In the Gemini API, system instructions act as a persistent, top-level directive that influences every response, ensuring the chatbot consistently adopts a friendly and professional tone without requiring repeated examples or parameter tuning.

Exam trap

Google Cloud often tests the distinction between controlling output style (system instructions) versus controlling output randomness (temperature) or length (max tokens), so the trap here is that candidates may confuse temperature or token limits with persona control, thinking that lowering creativity or capping length will enforce a specific tone.

How to eliminate wrong answers

Option B is wrong because including a few-shot example in every user message is inefficient and not a persistent technique; it would require repeating the example in each turn, increasing token usage and latency, and it does not guarantee consistent tone across all interactions. Option C is wrong because setting the temperature to 0.2 controls randomness and creativity, not tone; a low temperature makes outputs more deterministic but does not enforce a specific persona or style. Option D is wrong because setting max output tokens to 100 limits response length but has no effect on the tone or style of the output; it only truncates the response.

Practice this question →

← PreviousPage 2 of 2 · 121 questions total

Ready to test yourself?

Try a timed practice session using only Techniques to Improve Generative AI Model Output questions.

Start 20-question session