CCNA Gcl Genai Concepts Tech Questions

75 of 122 questions · Page 1/2 · Gcl Genai Concepts Tech topic · Answers revealed

1
MCQmedium

A developer is using the Gemini API for text generation and finds that the outputs are too repetitive. Which parameter adjustment is most likely to increase output diversity?

A.Increase the top-k value from 20 to 50
B.Increase the temperature from 0.7 to 1.2
C.Decrease the temperature from 0.7 to 0.2
D.Decrease the top-p value from 0.9 to 0.5
AnswerB

Higher temperature increases randomness, reducing repetitiveness.

Why this answer

Increasing the temperature parameter from 0.7 to 1.2 increases the randomness of token selection by scaling the logits before applying the softmax function, which flattens the probability distribution. This makes lower-probability tokens more likely to be chosen, directly reducing repetitiveness in generated text. In the Gemini API, temperature values above 1.0 amplify diversity, while values below 1.0 make outputs more deterministic and repetitive.

Exam trap

Cisco often tests the misconception that increasing top-k or decreasing top-p increases diversity, when in fact both restrict the token pool and reduce randomness, while temperature is the direct parameter for controlling output variability.

How to eliminate wrong answers

Option A is wrong because increasing top-k from 20 to 50 actually expands the pool of candidate tokens from the 20 most likely to the 50 most likely, which can increase diversity but is less effective than temperature for reducing repetition, and may introduce irrelevant tokens without addressing the core probability distribution. Option C is wrong because decreasing temperature from 0.7 to 0.2 sharpens the probability distribution, making the model more deterministic and likely to repeat high-probability tokens, thus worsening repetitiveness. Option D is wrong because decreasing top-p from 0.9 to 0.5 narrows the cumulative probability threshold, restricting token selection to a smaller, higher-probability set, which reduces diversity and increases repetitiveness.

2
Multi-Selectmedium

A company is fine‑tuning a large language model for a domain‑specific task. They have a limited budget and want to minimize the cost of fine‑tuning. Which TWO approaches are most cost‑effective?

Select 2 answers
A.Use a smaller base model like Gemini Flash
B.Use full fine‑tuning for better quality
C.Increase the number of training epochs
D.Use adapter‑based fine‑tuning (LoRA)
E.Use a larger base model like Gemini Ultra
AnswersA, D

Smaller models require less compute for both training and inference.

Why this answer

Option A is correct because using a smaller base model like Gemini Flash reduces the computational resources (GPU memory, training time) required for fine-tuning, directly lowering cost while still achieving adequate performance for domain-specific tasks. Smaller models have fewer parameters, which means fewer floating-point operations (FLOPs) per training step, making them more budget-friendly.

Exam trap

Cisco often tests the misconception that larger models or full fine-tuning always yield better results, but the trap here is that cost-effectiveness prioritizes resource efficiency over raw quality, and adapter methods like LoRA provide a practical trade-off that candidates overlook.

3
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month
B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
C.Fine-tune a base LLM on the policy documents monthly
D.Use a larger foundation model with a longer context window and paste all documents into each prompt
AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

4
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month
B.Fine-tune a base LLM on the policy documents monthly
C.Use a larger foundation model with a longer context window and paste all documents into each prompt
D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

5
Multi-Selectmedium

A data scientist wants to build a question-answering system over a large corpus of scientific papers. They want to minimize hallucinations and keep the knowledge current. Which TWO techniques should they combine?

Select 2 answers
A.Fine-tuning the model on the corpus
B.Retrieval-Augmented Generation (RAG)
C.Using a zero-shot prompt
D.Using only the largest Gemini model with no retrieval
E.Increasing the model's temperature to 1.5
AnswersA, B

Fine-tuning adapts the model to the domain, improving answer quality while RAG provides fresh knowledge.

Why this answer

Fine-tuning the model on the corpus (A) adapts the model's weights to the specific domain and style of scientific papers, improving relevance and reducing factual errors. Retrieval-Augmented Generation (RAG) (B) grounds each answer in retrieved, up-to-date passages from the corpus, directly countering hallucinations and enabling knowledge updates without retraining. Together, they combine domain adaptation with dynamic retrieval for accurate, current responses.

Exam trap

Cisco often tests the misconception that a larger model alone or higher temperature can solve hallucinations, when in fact grounding via retrieval and domain adaptation are the proven mitigations.

6
MCQmedium

A financial institution wants to use Gemini to analyze customer support transcripts and generate summaries. They need to ensure that personally identifiable information (PII) is not included in the summaries. Which approach should they take?

A.Preprocess the transcripts with Cloud DLP API to redact PII before sending to Gemini
B.Use a carefully engineered prompt instructing Gemini not to include PII
C.Post‑process the generated summaries with a regex filter to remove PII
D.Fine‑tune Gemini to avoid generating PII
AnswerA

Redacting PII upstream ensures the model never receives sensitive data, providing a robust solution for compliance.

Why this answer

Option A is correct because the Cloud Data Loss Prevention (DLP) API provides a purpose-built, scalable service to detect and redact PII from text before it reaches the Gemini model. This ensures that sensitive data is removed at the source, preventing any possibility of leakage in the generated summary, regardless of model behavior or prompt engineering.

Exam trap

Cisco often tests the misconception that prompt engineering or post-processing can reliably handle security requirements, when in fact a dedicated data loss prevention service like Cloud DLP is the only robust approach for guaranteed PII redaction before model inference.

How to eliminate wrong answers

Option B is wrong because prompt engineering alone cannot guarantee PII removal; Gemini may still inadvertently include PII due to model hallucinations, context misinterpretation, or adversarial inputs. Option C is wrong because post-processing with a regex filter is brittle and cannot reliably catch all PII formats (e.g., context-dependent identifiers, non-standard patterns), and PII may have already been exposed in the model's output. Option D is wrong because fine-tuning Gemini to avoid generating PII is impractical and risky; it requires extensive labeled data, may degrade model performance, and cannot guarantee complete PII avoidance across all edge cases.

7
MCQhard

A financial services firm needs to deploy a generative AI model that generates reports from structured and unstructured data. The solution must ensure that outputs never contain sensitive customer information. Which combination of Google Cloud services should they use?

A.Vertex AI Gemini API with DLP integration + IAM roles + VPC Service Controls
B.Vertex AI Gemini API + Cloud Key Management Service (KMS) + Secret Manager
C.Vertex AI Gemini API + Cloud Data Loss Prevention (DLP) standalone + Cloud NAT
D.Vertex AI Gemini API + Cloud DLP + Cloud Armor
AnswerA

DLP integration inspects and redacts sensitive data in prompts/responses; IAM and VPC-SC enforce access controls and network perimeter.

Why this answer

Vertex AI's DLP integration can inspect and redact PII from prompts and responses. Combined with IAM roles and VPC-SC, this provides comprehensive data protection. The other options either lack DLP integration or do not cover both prompt and response inspection.

8
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt
B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
C.Train a custom model from scratch on the policy documents each month
D.Fine-tune a base LLM on the policy documents monthly
AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

9
MCQeasy

What is the primary purpose of the temperature parameter in a generative language model?

A.It defines the context window size for the prompt
B.It controls the tradeoff between creativity and determinism
C.It limits the vocabulary to the top K tokens
D.It sets the maximum number of tokens in the output
AnswerB

Temperature adjusts the probability distribution: lower values sharpen the distribution (more likely tokens), higher values flatten it (more diverse outputs).

Why this answer

Option B is correct because the temperature parameter directly controls the probability distribution over the next token. A lower temperature (e.g., 0.1) makes the model more deterministic by favoring high-probability tokens, while a higher temperature (e.g., 1.5) flattens the distribution, increasing randomness and creative output. This tradeoff is fundamental to balancing coherence with novelty in generative models.

Exam trap

Cisco often tests the confusion between temperature and other sampling parameters (top-k, top-p) — candidates mistakenly think temperature controls output length or vocabulary size, when it strictly governs the randomness of token selection.

How to eliminate wrong answers

Option A is wrong because the context window size is determined by the model's architecture (e.g., 2048 tokens for GPT-2, 8192 for GPT-4), not by the temperature parameter. Option C is wrong because limiting vocabulary to the top K tokens is the function of the top-k sampling parameter, not temperature. Option D is wrong because the maximum number of output tokens is set by the max_tokens parameter in the API call, not by temperature.

10
MCQmedium

A data science team wants to compare semantic similarity between thousands of customer reviews to identify emerging themes. Which Google Cloud service and approach should they use?

A.Use Vertex AI Embeddings API to generate embeddings, index them in Vertex AI Vector Search, and query for nearest neighbors
B.Use BigQuery ML to train a custom similarity model on the reviews
C.Use the Gemini API to compute semantic similarity directly
D.Use Vertex AI Embeddings API to generate embeddings and then compute cosine similarity pairwise
AnswerA

This scalable approach allows efficient similarity search across large datasets.

Why this answer

Option A is correct because Vertex AI Embeddings API generates dense vector representations of text, which can be indexed in Vertex AI Vector Search for efficient approximate nearest neighbor (ANN) search. This approach scales to thousands of reviews and allows the team to query for the most semantically similar reviews, enabling theme discovery without pairwise computation.

Exam trap

Cisco often tests the distinction between generating embeddings and efficiently querying them at scale; the trap here is that candidates may think pairwise cosine similarity is sufficient, ignoring the scalability requirement implied by 'thousands of customer reviews.'

How to eliminate wrong answers

Option B is wrong because BigQuery ML is designed for structured data and SQL-based machine learning, not for training custom semantic similarity models on unstructured text; it lacks native support for embedding-based similarity search. Option C is wrong because the Gemini API is a multimodal generative model, not a dedicated similarity computation service; using it directly for pairwise similarity would be inefficient and not designed for large-scale nearest neighbor queries. Option D is wrong because computing cosine similarity pairwise on thousands of reviews results in O(n²) complexity, which is computationally prohibitive and does not scale; Vertex AI Vector Search is required for efficient indexing and retrieval.

11
Multi-Selecthard

A data scientist is evaluating the output quality of a text generation model. They observe that the model often repeats phrases and produces very generic responses. Which THREE parameter adjustments could help increase diversity and reduce repetition? (Choose three.)

Select 3 answers
A.Increase the top-k value from 40 to 100
B.Increase the temperature from 0.5 to 0.9
C.Decrease the max output tokens from 1024 to 512
D.Increase the top-p value from 0.8 to 0.95
E.Decrease the frequency penalty to 0.0
AnswersA, B, D

Higher top-k considers more candidate tokens, increasing the chance of less common tokens.

Why this answer

Higher temperature increases randomness, reducing repetition. Higher top-k and top-p allow a wider set of tokens, increasing diversity. Reducing max output tokens does not affect diversity; increasing frequency penalty reduces repetition directly.

12
MCQeasy

A startup wants to build a text-to-speech application for generating audiobooks. Which Google Cloud generative AI service is best suited for this task?

A.Imagen
B.Chirp
C.Codey
D.Gemini
AnswerB

Chirp is Google's model for generating natural-sounding speech from text.

Why this answer

Chirp is Google Cloud's text-to-speech AI model. Imagen generates images, Codey generates code, and Gemini is a multimodal model not specialized for high-quality speech synthesis.

13
MCQhard

A financial services firm is deploying a Gemini-based application that must comply with GDPR. The application processes customer queries and may include personal data. Which Google Cloud capability should the firm use to ensure that the model does not expose personally identifiable information (PII) in its responses?

A.Set the model's safety filters to block all categories at maximum threshold
B.Integrate Cloud Sensitive Data Protection (DLP) to inspect and redact PII from prompts and responses
C.Use Vertex AI's Model Garden with a restricted access policy
D.Fine-tune the model on a dataset that has no PII
AnswerB

DLP can be used to scan for PII and redact it before sending to the model or after receiving the response, ensuring GDPR compliance.

Why this answer

Sensitive Data Protection (formerly DLP) can be integrated into the GenAI pipeline to inspect and redact PII from prompts and responses, ensuring compliance.

14
Multi-Selecthard

A team is evaluating whether to use reinforcement learning from human feedback (RLHF) or in-context learning for a chatbot. Which TWO statements correctly describe trade-offs? (Select two.)

Select 2 answers
A.RLHF can align model behavior without any human input.
B.In-context learning is generally more cost-effective than RLHF for one-off tasks.
C.RLHF requires a large dataset of human preferences and additional training.
D.In-context learning permanently modifies model weights.
E.In-context learning can handle longer contexts than RLHF fine-tuned models.
AnswersB, C

In-context learning requires no training, so it's cheaper for small-scale use.

Why this answer

Option B is correct because in-context learning (ICL) uses examples provided in the prompt at inference time, requiring no additional training or data collection, making it far more cost-effective for one-off tasks compared to RLHF, which demands a large dataset of human preference labels and a full fine-tuning pipeline. RLHF's upfront cost in data labeling and compute is only justified when the model needs persistent alignment across many interactions.

Exam trap

Cisco often tests the misconception that in-context learning modifies model weights or that RLHF can operate without human input, so candidates must remember that ICL is purely prompt-based and RLHF is inherently human-dependent.

15
MCQmedium

A developer is building a code‑generation assistant using the Codey API on Vertex AI. The assistant should generate Python functions based on natural language descriptions. However, the generated code sometimes contains syntax errors. Which parameter adjustment would MOST directly help reduce syntax errors?

A.Lower the temperature (e.g., from 0.8 to 0.2)
B.Increase the context window
C.Set top-k to 1
D.Increase the max output tokens
AnswerA

Lower temperature makes outputs more conservative and less random, reducing the likelihood of generating invalid syntax.

Why this answer

Reducing temperature makes the model more deterministic, which typically reduces creative but incorrect outputs like syntax errors. Prompt engineering can also help, but adjusting temperature is the simplest direct fix. Increasing max tokens or changing top-k does not directly address syntax correctness.

16
MCQmedium

A developer is building a code generation assistant using Codey. They notice that the generated code sometimes contains deprecated API calls. What is the most likely cause?

A.The top-p sampling is too low, limiting the model's vocabulary
B.The temperature setting is too high, causing creative but incorrect outputs
C.The context window is too short to include relevant API documentation
D.Codey's training data has a knowledge cutoff date before the deprecation
AnswerD

The model's pre-training data cutoff means it does not know about recent deprecations.

Why this answer

Option D is correct because Codey, like all large language models, is trained on a static dataset with a specific knowledge cutoff date. If the training data predates the deprecation of certain APIs, the model will not be aware of the newer, recommended alternatives and will continue to generate code using the deprecated calls. This is a fundamental limitation of the model's training data recency, not a parameter tuning issue.

Exam trap

The trap here is that candidates often confuse model parameter settings (like temperature or top-p) with the model's training data limitations, leading them to incorrectly select options A or B instead of recognizing the knowledge cutoff as the root cause.

How to eliminate wrong answers

Option A is wrong because top-p sampling controls the cumulative probability threshold for token selection, affecting output diversity, not the model's knowledge of deprecated APIs. Option B is wrong because temperature controls the randomness of token selection; while high temperature can increase creativity, it does not cause the model to generate deprecated APIs it is unaware of from its training data. Option C is wrong because the context window length determines how much input text the model can consider at once, but it does not affect the model's inherent knowledge of API deprecation dates; even with a long context, the model will still generate deprecated calls if its training data lacks that information.

17
Multi-Selecthard

A company is deploying a generative AI application using Vertex AI. They need to minimize latency for real‑time inference while maintaining high quality. Which TWO actions are most effective?

Select 2 answers
A.Batch multiple inference requests together
B.Use a smaller model like Gemini Flash
C.Use Gemini Pro instead of Gemini Flash to ensure quality
D.Reduce the max output tokens to the minimum acceptable length
E.Increase the temperature to 1.0 for more creative outputs
AnswersB, D

Flash is optimized for lower latency.

Why this answer

Using a smaller model (e.g., Gemini Flash) reduces latency, and reducing the max output tokens limits generation time. Increasing temperature does not affect latency; using a larger model increases latency; batching is for throughput, not single‑request latency.

18
Multi-Selectmedium

A data scientist wants to apply reinforcement learning from human feedback (RLHF) to improve a chatbot's helpfulness. Which TWO steps are part of the RLHF process? (Select 2)

Select 2 answers
A.Use prompt engineering to tune the model without retraining
B.Collect human-annotated demonstrations of ideal responses
C.Collect human rankings or preferences on multiple model outputs
D.Deploy the model in A/B testing to gather implicit feedback
E.Train a reward model based on human preferences
AnswersC, E

Human feedback is used to train a reward model.

Why this answer

RLHF typically involves collecting human rankings of model outputs and then training a reward model to score outputs, which is used to fine-tune the model via PPO.

19
MCQeasy

What is the primary function of embeddings in the context of generative AI?

A.To control the creativity of the model's output
B.To define the maximum length of the generated text
C.To convert tokens into a fixed-length vector that captures semantic meaning
D.To determine the next token in a sequence during generation
AnswerC

Embeddings map words, sentences, or images into vectors that represent semantic meaning, used for similarity search and retrieval.

Why this answer

Embeddings are numerical representations of data that capture semantic meaning, enabling vector search and semantic similarity. They are not used for token generation directly, and they are not the same as temperature or context window.

20
Multi-Selecteasy

A team is deciding between using fine-tuning and in-context learning for a document classification task. They have 500 labeled examples and need low latency. Which TWO statements are accurate?

Select 2 answers
A.In-context learning always has lower latency than fine-tuning
B.In-context learning can only be used with models that have a context window smaller than 1000 tokens
C.Fine-tuning eliminates the need for a validation dataset
D.Fine-tuning generally improves accuracy more than in-context learning when sufficient labeled data is available
E.In-context learning requires no training step and can be used immediately
AnswersD, E

With 500 examples, fine-tuning can adapt the model better to the task, often yielding higher accuracy.

Why this answer

Option D is correct because fine-tuning updates the model's weights on a labeled dataset, which generally leads to higher accuracy than in-context learning when sufficient labeled data (like 500 examples) is available. In-context learning relies on the model's pre-existing knowledge and a few examples in the prompt, which often yields lower accuracy for complex classification tasks.

Exam trap

Cisco often tests the misconception that in-context learning always has lower latency than fine-tuning, but the trap is that latency depends on prompt length and model architecture, not just the absence of a training step.

21
MCQhard

A company uses Gemini 1.5 Pro to analyze customer call transcripts and generate summaries. They notice that the summaries occasionally include fabricated details that were not in the transcript. Which technique is specifically designed to reduce such hallucinations?

A.Ground the model responses by implementing Retrieval-Augmented Generation (RAG) with the transcript as a source
B.Decrease the temperature to 0.0
C.Use a system prompt that instructs the model not to make up information
D.Fine-tune the model on a larger dataset of call transcripts
AnswerA

RAG retrieves relevant transcript segments and provides them to the model as context, grounding the output in actual data and reducing hallucinations.

Why this answer

Grounding with RAG retrieves factual data from trusted sources to condition the model's response, directly reducing hallucinations. Prompt engineering can help but does not guarantee factual accuracy; fine-tuning may reduce but not eliminate; temperature reduction makes output more deterministic but does not address factuality from external data.

22
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
B.Use a larger foundation model with a longer context window and paste all documents into each prompt
C.Train a custom model from scratch on the policy documents each month
D.Fine-tune a base LLM on the policy documents monthly
AnswerA

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

23
MCQhard

A company is fine-tuning a large language model for a domain-specific legal document summarization task. They have limited labeled data but want to adapt the model efficiently without catastrophic forgetting. Which technique is most suitable?

A.Reinforcement Learning from Human Feedback (RLHF)
B.In-context learning with few-shot examples in prompts
C.Supervised fine-tuning using LoRA adapters
D.Full fine-tuning of all model parameters
AnswerC

LoRA adapters are parameter-efficient, reducing risk of forgetting and requiring less data.

Why this answer

LoRA (Low-Rank Adaptation) is the most suitable technique because it enables parameter-efficient fine-tuning by injecting trainable low-rank matrices into the transformer layers, drastically reducing the number of updated parameters. This preserves the pre-trained knowledge and prevents catastrophic forgetting, even with limited labeled data, while efficiently adapting the model to domain-specific legal summarization.

Exam trap

Cisco often tests the misconception that in-context learning (Option B) is sufficient for domain adaptation, but the trap here is that it does not modify model weights and thus cannot achieve the deep, consistent specialization required for tasks like legal document summarization.

How to eliminate wrong answers

Option A is wrong because RLHF is a post-training alignment technique that uses human preferences to optimize model behavior, not a method for efficient domain adaptation with limited labeled data; it requires extensive human feedback and does not directly address catastrophic forgetting during fine-tuning. Option B is wrong because in-context learning with few-shot examples does not update model parameters, so it cannot achieve deep domain adaptation for a specialized task like legal summarization, and it is limited by context window size and prompt sensitivity. Option D is wrong because full fine-tuning updates all model parameters, which is computationally expensive, requires large labeled datasets, and significantly increases the risk of catastrophic forgetting when data is scarce.

24
MCQeasy

A developer wants to generate high-quality images from text descriptions using Google Cloud. Which service should they use?

A.Chirp
B.Imagen
C.Codey
D.Gemini
AnswerB

Imagen is Google's image generation model.

Why this answer

Imagen is Google Cloud's text-to-image diffusion model, specifically designed to generate high-quality images from natural language descriptions. It leverages deep learning to produce photorealistic outputs, making it the correct choice for this use case.

Exam trap

The trap here is that candidates may confuse Gemini's multimodal capabilities with a dedicated text-to-image service, overlooking that Imagen is the specialized tool for high-quality image generation from text.

How to eliminate wrong answers

Option A is wrong because Chirp is a speech-to-text model for audio transcription, not image generation. Option C is wrong because Codey is a code generation model for writing and completing code, not for creating images. Option D is wrong because Gemini is a multimodal large language model that can process text, images, and code, but it is not optimized or primarily designed for text-to-image generation; Imagen is the dedicated service for that task.

25
MCQeasy

What is the primary purpose of the temperature parameter when using a generative language model?

A.It sets the maximum number of tokens the model can generate in a single response
B.It determines the number of candidate tokens considered at each step
C.It adjusts the similarity threshold for vector search retrieval
D.It controls the randomness of token selection; lower values make output more deterministic
AnswerD

Temperature affects the probability distribution — low temperature sharpens the distribution, making high-probability tokens more likely.

Why this answer

Temperature controls the randomness of token selection. Lower values produce more deterministic and conservative outputs, while higher values increase diversity and creativity.

26
MCQmedium

A developer is building a multimodal application that needs to analyze images, understand spoken language, and generate text responses. Which Google Cloud generative AI model is BEST suited for this task?

A.PaLM 2 for text generation
B.Imagen for image generation
C.Chirp for speech recognition
D.Gemini 1.5 Pro
AnswerD

Gemini 1.5 Pro is a multimodal model capable of processing text, images, audio, and video in a single model.

Why this answer

Gemini models are natively multimodal, accepting text, image, audio, and video inputs. PaLM 2 is text-only. Imagen is for image generation.

Chirp is for speech recognition/synthesis.

27
MCQhard

A machine learning engineer is fine-tuning a large language model using LoRA (Low-Rank Adaptation) to reduce memory usage. During training, they notice that the model's performance on the downstream task is not improving. What is the most likely issue?

A.The rank r of the LoRA adapter is too low to capture the task complexity
B.The learning rate is too high, causing the loss to oscillate
C.The LoRA adapter is applied to all layers, which slows convergence
D.The base model is too large for the dataset, causing overfitting
AnswerA

A very low rank may not provide enough capacity for the adaptation, leading to underfitting.

Why this answer

When fine-tuning with LoRA, the rank r determines the expressiveness of the low-rank adaptation matrices. If r is too low, the adapter lacks the capacity to learn the necessary task-specific features, causing the model's performance to stagnate. This is the most likely issue because the engineer observes no improvement, indicating the adapter's representational power is insufficient for the downstream task complexity.

Exam trap

Cisco often tests the misconception that LoRA's rank is a hyperparameter that only affects memory and speed, not model capacity, leading candidates to overlook rank as the root cause of poor performance when the adapter is too constrained.

How to eliminate wrong answers

Option B is wrong because a high learning rate typically causes loss oscillation or divergence, not a complete lack of improvement; the engineer would see fluctuating loss values. Option C is wrong because applying LoRA to all layers is standard practice and does not inherently slow convergence; in fact, it can improve adaptation by allowing more layers to be fine-tuned. Option D is wrong because a large base model with a small dataset usually leads to overfitting, which would show improving training performance but poor validation, not a lack of improvement in downstream task performance.

28
MCQmedium

A company is building a chatbot that must answer questions based on a large internal knowledge base that is updated weekly. They want to avoid retraining the model frequently. Which technique should they use?

A.Use Retrieval-Augmented Generation (RAG) with a vector database
B.Use prompt engineering to instruct the model to ignore outdated information
C.Increase the model's context window and include all documents in the prompt
D.Fine-tune the model weekly on the updated knowledge base
AnswerA

RAG retrieves relevant documents from the latest knowledge base at query time, ensuring up-to-date responses.

Why this answer

RAG retrieves relevant documents at inference time, keeping answers up-to-date without retraining. Fine-tuning would require frequent retraining; prompt engineering alone cannot incorporate new knowledge.

29
MCQmedium

An enterprise is deploying a generative AI solution that must comply with GDPR data residency requirements. They plan to use Vertex AI with Gemini. Which configuration is necessary?

A.Use the default settings on Vertex AI with Gemini, as Google automatically handles data residency
B.Fine‑tune Gemini on a private Google Cloud Storage bucket in the target region
C.Use the Gemini API directly with a restricted API key
D.Deploy Gemini on Vertex AI with VPC Service Controls and Customer-Managed Encryption Keys (CMEK)
AnswerD

VPC Service Controls prevent data exfiltration and restrict processing to a chosen region; CMEK gives the customer control over encryption keys.

Why this answer

Option D is correct because GDPR data residency requires that data remain within a specific geographic boundary. VPC Service Controls provide a security perimeter that prevents data exfiltration from the chosen region, while CMEK ensures encryption keys are managed by the customer, not Google, meeting the control requirements for data residency compliance.

Exam trap

Cisco often tests the misconception that default settings or simple access controls (like API keys) are sufficient for data residency, when in fact explicit network and encryption controls are required to meet regulatory boundaries.

How to eliminate wrong answers

Option A is wrong because default settings on Vertex AI with Gemini do not automatically enforce data residency; Google's default infrastructure may replicate or process data across regions, violating GDPR requirements. Option B is wrong because fine-tuning Gemini on a private GCS bucket in the target region addresses storage residency but does not control data processing or inference runtime location, which could still occur outside the required region. Option C is wrong because using the Gemini API directly with a restricted API key only controls access, not data residency; the API may process requests in any Google Cloud region, failing to guarantee geographic data boundaries.

30
MCQmedium

A developer wants to generate Python code to extract data from a CSV file using a generative AI model on Google Cloud. Which model is specifically designed for code generation?

A.Gemini
B.Codey
C.Imagen
D.PaLM 2
AnswerB

Codey is Google's specialized code generation model.

Why this answer

Codey is the correct choice because it is Google Cloud's family of models specifically designed and fine-tuned for code generation tasks, including generating Python code from natural language prompts. Unlike general-purpose language models, Codey is trained on a large corpus of source code and can produce syntactically correct code snippets, making it ideal for extracting data from a CSV file.

Exam trap

Cisco often tests the distinction between general-purpose LLMs (like PaLM 2 or Gemini) and specialized models (like Codey for code or Imagen for images), so the trap is assuming any model that can handle code is the correct answer, when the question explicitly asks for a model 'specifically designed for code generation'.

How to eliminate wrong answers

Option A is wrong because Gemini is a multimodal model capable of understanding text, images, audio, and code, but it is not specifically designed or optimized for code generation as its primary function; Codey is the dedicated code generation model. Option C is wrong because Imagen is a text-to-image generation model, not designed for code generation tasks. Option D is wrong because PaLM 2 is a general-purpose large language model that can generate code but is not specifically designed or fine-tuned for code generation like Codey is.

31
MCQmedium

A startup is building a multimodal application that allows users to upload a photo of a plant and ask questions about its care. They want to use Google Cloud generative AI services. Which combination of services is MOST suitable?

A.Use Cloud Vision API to extract text from the image, then send that text to a text‑only model
B.Use Vertex AI Agent Builder with a text‑only model and ignore the image
C.Use Gemini on Vertex AI with a prompt that includes the image and the user’s question
D.Use Imagen to generate a description of the plant, then feed that description to a text model
AnswerC

Gemini natively handles image and text inputs, so a single prompt can process the photo and answer the question.

Why this answer

Option C is correct because Gemini on Vertex AI natively supports multimodal inputs, allowing the model to directly process the uploaded plant image alongside the user's care question in a single prompt. This eliminates the need for intermediate text extraction or image description generation, providing the most accurate and context-aware response for the application's requirements.

Exam trap

Cisco often tests the misconception that multimodal tasks require separate vision and language models chained together, when in fact a single multimodal model like Gemini can handle both modalities natively, offering superior accuracy and simplicity.

How to eliminate wrong answers

Option A is wrong because Cloud Vision API extracts text from images (e.g., OCR), not plant characteristics like species or health, and sending that text to a text-only model loses visual context critical for plant care advice. Option B is wrong because Vertex AI Agent Builder with a text-only model ignores the image entirely, making it impossible to answer questions based on the plant's visual appearance. Option D is wrong because Imagen is a text-to-image generation model, not an image-to-text description model; using it to generate a description would be technically incorrect and inefficient, and feeding that description to a text model introduces potential information loss and latency.

32
MCQmedium

A research team wants to generate high-quality images from text descriptions for a marketing campaign. They need the ability to edit specific regions of generated images while preserving the rest. Which Google Cloud AI service should they use?

A.Gemini 1.5 Pro with image input
B.Codey on Vertex AI
C.Veo on Vertex AI
D.Imagen on Vertex AI
AnswerD

Imagen is a text-to-image model that supports inpainting and editing capabilities through mask-based region editing.

Why this answer

Imagen supports inpainting (region editing via masks) natively. Gemini is multimodal but not primarily for image generation. Veo generates video, not images.

Codey is for code.

33
MCQmedium

A developer is using the Gemini 1.5 Pro model via Vertex AI and needs to process a large PDF document (500 pages) to generate a summary. The developer tries to send the entire PDF in a single prompt but gets an error. What is the most likely cause?

A.The model does not support PDF input
B.The temperature setting is too low
C.The model requires the PDF to be converted to text first
D.The total token count of the PDF exceeds the context window limit
AnswerD

A 500-page PDF can easily exceed the context window, even with Gemini's large capacity.

Why this answer

Option D is correct because Gemini 1.5 Pro has a context window of up to 1 million tokens, but a 500-page PDF can easily exceed this limit depending on the density of text, images, and embedded content. When the total token count surpasses the model's maximum context window, the API returns an error, as the model cannot process the entire input in a single request.

Exam trap

Cisco often tests the misconception that models like Gemini 1.5 Pro have unlimited context windows or that PDFs must be pre-converted to text, when in fact the limitation is purely token-based and the model can natively handle PDF input.

How to eliminate wrong answers

Option A is wrong because Gemini 1.5 Pro natively supports PDF input directly via Vertex AI, including text extraction and multimodal understanding. Option B is wrong because the temperature setting controls randomness in output generation, not the ability to process large inputs or token limits. Option C is wrong because Gemini 1.5 Pro can process PDFs directly without requiring manual conversion to text; it handles PDF parsing internally.

34
MCQhard

A company is using a fine-tuned LLM for generating technical support responses. After deployment, they notice that the model sometimes produces incorrect but plausible-sounding answers (hallucinations). They have a large repository of verified technical manuals. Which technique would BEST reduce hallucinations while minimizing the need for additional training?

A.Increase the temperature parameter to make the model more conservative
B.Use a larger base model with a longer context window
C.Fine-tune the model again with a larger dataset of verified responses
D.Implement RAG by indexing the verified manuals and retrieving relevant sections during inference
AnswerD

RAG provides the model with factual context from the manuals at inference time, significantly reducing hallucinations without retraining.

Why this answer

RAG with the verified manuals as the knowledge base allows the model to ground its responses in authoritative sources without retraining. Fine-tuning might still hallucinate if the data is insufficient. Prompt engineering alone cannot eliminate hallucinations.

Using a larger context window may help but is not as reliable as RAG.

35
MCQhard

A company is deploying a generative AI application that must comply with GDPR's right to explanation. The application must be able to justify its decisions. Which model or approach provides the MOST inherent interpretability?

A.Use Gemini 1.5 Pro with a system prompt asking for explanations
B.Use a smaller, simpler model that is inherently more interpretable, such as a logistic regression or decision tree
C.Use Retrieval-Augmented Generation (RAG) to ground responses in source documents
D.Use PaLM 2 with prompt engineering to provide step-by-step reasoning
AnswerB

Simpler models are transparent and decisions can be directly traced, satisfying the right to explanation more reliably.

Why this answer

Smaller, simpler models are inherently more interpretable. Large black-box models (Gemini, PaLM) are difficult to explain. RAG improves factual grounding but not interpretability of the model's reasoning.

Prompting does not make the model's internal logic transparent.

36
MCQeasy

What is the fundamental difference between a foundation model and a fine-tuned model?

A.A foundation model is pre-trained on a large, diverse corpus; a fine-tuned model is adapted from a foundation model on a specific domain or task
B.Foundation models only generate text; fine-tuned models can generate images
C.Foundation models are open-source; fine-tuned models are proprietary
D.Foundation models require no inference infrastructure; fine-tuned models do
AnswerA

Foundation models like Gemini are pre-trained broadly; fine-tuning adapts them for specialized tasks.

Why this answer

A foundation model is pre-trained on broad data for general tasks, while a fine-tuned model is further trained on a specific dataset to specialize for a particular use case.

37
MCQhard

A data scientist is using Vertex AI to fine‑tune a PaLM 2 model for a legal document summarization task. They have 10,000 labeled document‑summary pairs. After supervised fine‑tuning, the model performs well on the training set but often hallucinates names and dates on unseen documents. Which next step is MOST likely to improve factual accuracy?

A.Increase the number of fine‑tuning epochs to 10
B.Use a larger base model like Gemini Ultra without fine‑tuning
C.Apply reinforcement learning from human feedback (RLHF) using a preference dataset that penalizes factual inaccuracies
D.Add more training examples from a different domain
AnswerC

RLHF can directly optimize for factual correctness by rewarding accurate summaries and penalizing hallucinations.

Why this answer

RLHF (Reinforcement Learning from Human Feedback) is specifically designed to align model outputs with human preferences, which can reduce hallucinations by penalizing factually incorrect generations. More data or longer training may not fix the underlying alignment issue. RAG is a separate approach but RLHF directly addresses the hallucination from the model's behavior.

38
MCQhard

A data scientist is fine-tuning a large language model for a legal document summarization task. The dataset contains only 500 examples, and the model must not forget its general language capabilities. Which fine-tuning method is most suitable?

A.Retraining the model from scratch on the legal dataset
B.Adapter-based fine-tuning using LoRA
C.In-context learning with a few examples in the prompt
D.Full fine-tuning of all model parameters
AnswerB

LoRA fine-tunes a small set of adapter parameters, preserving general knowledge while adapting to the specific task with a small dataset.

Why this answer

LoRA (Low-Rank Adaptation) is an adapter-based fine-tuning method that updates a small number of parameters while keeping the base model frozen. This prevents catastrophic forgetting and works well with small datasets. Full fine-tuning would risk overfitting and losing general capabilities.

39
MCQeasy

What is the primary advantage of using embeddings and vector search for semantic search over traditional keyword search?

A.Faster retrieval speed
B.Lower storage requirements
C.No need for indexing
D.Ability to find documents with similar meaning even without exact keyword matches
AnswerD

Semantic search using embeddings captures context and synonyms.

Why this answer

Option D is correct because embeddings and vector search capture semantic meaning by converting text into high-dimensional vectors, enabling retrieval of documents with similar meaning even when they lack exact keyword matches. This is the primary advantage over traditional keyword search, which relies on literal term matching and fails with synonyms, paraphrases, or conceptual similarity.

Exam trap

Cisco often tests the misconception that vector search is faster or requires less storage than keyword search, but the real advantage is semantic understanding, not performance or resource efficiency.

How to eliminate wrong answers

Option A is wrong because vector search is typically slower than keyword search due to the computational cost of approximate nearest neighbor (ANN) algorithms and high-dimensional distance calculations, whereas keyword search uses inverted indexes for near-instant term lookups. Option B is wrong because embeddings and vector indexes often require more storage than keyword indexes, as each document is represented by a dense vector (e.g., 768 or 1536 dimensions) plus the index structure, while keyword indexes store sparse term-document mappings. Option C is wrong because vector search still requires indexing—specifically, building an ANN index (e.g., HNSW, IVF) to enable efficient similarity search; without indexing, a brute-force scan of all vectors would be impractical at scale.

40
MCQhard

A team is building a multilingual customer support chatbot using Gemini. They notice that for low-resource languages, the model frequently produces grammatically incorrect responses. Which strategy would MOST effectively improve quality for these languages without sacrificing latency?

A.Implement a separate translation step: translate user input to English, generate response in English, then translate back
B.Fine-tune the model on a large parallel corpus for each low-resource language
C.Use few-shot prompting with high-quality examples in the target language
D.Increase temperature to 1.5 to encourage more diverse grammar
AnswerC

Few-shot examples can improve output by demonstrating correct grammar and style, and do not add latency beyond the prompt length.

Why this answer

Few-shot prompting with high-quality examples in the target language directly conditions the model to produce grammatically correct responses for low-resource languages without adding inference latency. This approach leverages Gemini's in-context learning capability, allowing it to adapt to the target language's grammar patterns without requiring additional model training or translation round-trips.

Exam trap

Cisco often tests the misconception that translation pipelines or fine-tuning are the only ways to handle low-resource languages, but the trap here is that few-shot prompting can achieve comparable quality improvements with zero additional latency and minimal data requirements.

How to eliminate wrong answers

Option A is wrong because implementing a separate translation step (translate to English, generate, translate back) introduces significant latency from two additional inference calls and risks compounding translation errors, especially for low-resource languages where translation quality is poor. Option B is wrong because fine-tuning on a large parallel corpus for each low-resource language is resource-intensive, requires substantial labeled data that may not exist, and does not guarantee latency preservation since fine-tuning does not affect inference speed but the data requirement is impractical. Option D is wrong because increasing temperature to 1.5 encourages more diverse but often less coherent outputs, which would exacerbate grammatical errors in low-resource languages rather than improve them.

41
MCQmedium

A developer is building an application that generates code snippets based on natural language descriptions. They want to minimize latency and cost while maintaining high accuracy. Which Google Cloud service should they choose?

A.Codey APIs on Vertex AI
B.Gemini 1.5 Flash
C.Chirp
D.Imagen
AnswerB

Gemini 1.5 Flash is a faster, more cost-effective variant than Pro, and can handle code generation well. However, Codey is more specialized; but Flash may be cheaper and faster for simple tasks.

Why this answer

Gemini 1.5 Flash is designed for high-throughput, low-latency tasks like code generation from natural language, offering a balance of speed and cost while maintaining strong accuracy. It is the optimized choice for real-time applications where minimizing latency and cost per request is critical, unlike larger models that prioritize maximum capability over efficiency.

Exam trap

Cisco often tests the misconception that specialized APIs (like Codey) are always the best choice for domain-specific tasks, when in fact a general-purpose but optimized model (Gemini 1.5 Flash) can meet accuracy needs while better satisfying latency and cost constraints.

How to eliminate wrong answers

Option A is wrong because Codey APIs on Vertex AI, while specialized for code generation, are based on larger, more computationally expensive models that incur higher latency and cost compared to Gemini 1.5 Flash, making them suboptimal for minimizing both factors. Option C is wrong because Chirp is a speech recognition model (ASR) for transcribing audio, not a text-to-code generation service, so it is irrelevant to the use case. Option D is wrong because Imagen is a text-to-image generation model, not designed for code snippets, and would fail to produce accurate code outputs.

42
MCQeasy

Which parameter controls the randomness of a language model's output?

A.Embedding dimension
B.Context window
C.Top-k
D.Temperature
AnswerD

Temperature directly controls the randomness of token selection.

Why this answer

Temperature is the parameter that directly controls the randomness of a language model's output by scaling the logits before applying the softmax function. A higher temperature (e.g., >1.0) makes the probability distribution more uniform, increasing randomness, while a lower temperature (e.g., <1.0) sharpens the distribution, making the model more deterministic and focused on high-probability tokens.

Exam trap

Cisco often tests the distinction between sampling parameters (temperature, top-k, top-p) and architectural parameters (embedding dimension, context window), so candidates may confuse top-k as a randomness control when it actually restricts the candidate pool rather than adjusting the probability distribution's entropy.

How to eliminate wrong answers

Option A is wrong because embedding dimension controls the size of the vector representation for tokens, affecting the model's capacity to capture semantic relationships, not the randomness of output. Option B is wrong because context window defines the maximum number of tokens the model can consider as input for generating the next token, influencing coherence and memory, not randomness. Option C is wrong because top-k is a sampling strategy that limits the next token selection to the k most likely tokens, which reduces randomness by cutting off the tail of the distribution, but it does not control the overall randomness of the output like temperature does.

43
MCQmedium

A company is using a large language model for a customer-facing chat application. They notice that the model sometimes generates plausible-sounding but incorrect information. Which strategy is most effective to reduce this issue?

A.Fine-tune the model on a dataset of corrected conversations.
B.Reduce the context window to limit the information the model considers.
C.Implement Retrieval-Augmented Generation (RAG) to retrieve relevant documents before generating a response.
D.Increase the temperature parameter to make the model more deterministic.
AnswerC

RAG grounds the model's output in retrieved evidence, directly reducing hallucinations.

Why this answer

RAG grounds responses by retrieving relevant, verified information from a trusted source, reducing hallucinations. Fine-tuning on incorrect data would not help; increasing temperature increases randomness; reducing context window may lose relevant info.

44
MCQhard

A research team is using Imagen to generate images for a marketing campaign. They notice that the generated images sometimes contain distorted faces or unnatural object placements. They want to improve consistency without sacrificing image diversity. Which approach should they try first?

A.Reduce the top-p value (e.g., from 1.0 to 0.9).
B.Increase the temperature to 1.5
C.Switch from Imagen to Gemini Pro for image generation
D.Set top-k to 1
AnswerA

Lowering top-p cuts off the tail of unlikely tokens, reducing artifacts while still allowing a range of plausible tokens for diversity.

Why this answer

Adjusting top-p (nucleus sampling) reduces the set of possible tokens to the most probable ones, which often removes low‑probability artifacts while preserving diversity. Prompt engineering and fine‑tuning can also help but are more involved. Temperature reduction may lower diversity too much.

45
MCQeasy

Which of the following best describes the transformer architecture's key innovation that enabled modern large language models?

A.Recurrent connections that process sequences one element at a time
B.Self-attention mechanism that captures dependencies between all words in parallel
C.Memory-augmented neural networks
D.Convolutional layers that extract local features
AnswerB

Self-attention allows each word to attend to every other word, enabling parallel computation and capturing long-range dependencies.

Why this answer

The transformer architecture's key innovation is the self-attention mechanism, which allows the model to compute attention scores between every pair of tokens in the input sequence simultaneously, rather than processing tokens sequentially. This parallelization enables the model to capture long-range dependencies efficiently and scale to massive datasets, which is the foundation for modern large language models like GPT and BERT.

Exam trap

Cisco often tests the misconception that transformers are just an evolution of RNNs or that their innovation is about memory or local feature extraction, when the true breakthrough is the parallel self-attention mechanism that eliminates sequential processing.

How to eliminate wrong answers

Option A is wrong because recurrent connections (as in RNNs/LSTMs) process sequences one element at a time, which introduces sequential bottlenecks and difficulty capturing long-range dependencies, unlike the transformer's parallel self-attention. Option C is wrong because memory-augmented neural networks (e.g., Neural Turing Machines) are a separate line of research that adds external memory, but they are not the core innovation that enabled modern LLMs; transformers achieve this without explicit external memory. Option D is wrong because convolutional layers extract local features via sliding windows, which are effective for spatial data like images but inherently limited in capturing global dependencies across a sequence, whereas self-attention directly models all pairwise interactions.

46
Multi-Selecteasy

A developer wants to use Google Cloud generative AI to build a multimodal application that can answer questions about images and text. Which TWO services are most appropriate?

Select 2 answers
A.Chirp
B.Vertex AI
C.Imagen
D.Codey
E.Gemini Pro Vision
AnswersB, E

Vertex AI provides a unified platform to deploy and serve models like Gemini for multimodal applications.

Why this answer

Vertex AI is correct because it provides a unified platform for building, deploying, and managing multimodal generative AI models, including support for both image and text inputs. It offers access to foundational models like Gemini Pro Vision, enabling developers to create applications that answer questions about images and text through a single API endpoint.

Exam trap

The trap here is that candidates often confuse specialized single-modal services (like Imagen for image generation or Codey for code) with the multimodal platform (Vertex AI) that integrates multiple capabilities, leading them to select a single-purpose service instead of the correct platform and its multimodal model.

47
Multi-Selectmedium

An enterprise wants to use generative AI to help employees search through internal documents, including text, scanned PDFs, and images. They need to index the content and enable semantic search. Which TWO Google Cloud services should they use? (Choose two.)

Select 2 answers
A.Document AI
B.Vertex AI Vector Search
C.Vertex AI Embeddings API
D.Cloud Storage with Object Versioning
E.Gemini API for content generation
AnswersB, C

Performs similarity search over embeddings for semantic retrieval.

Why this answer

Vertex AI Embeddings API generates vector embeddings from text and images. Vertex AI Vector Search allows semantic search over those embeddings. Document AI is for document processing but not for embeddings or search.

Gemini API is for generation, not indexing. Cloud Storage is for storage, not search.

48
MCQmedium

A developer is using Gemini 1.5 Pro and needs to process a large PDF (500 pages). The model's context window is 1 million tokens. However, the API returns an error that the input exceeds the context window. What is the most likely cause?

A.The PDF is not properly formatted as text; Gemini only accepts raw text
B.The token count of the PDF exceeds 1 million tokens
C.The model version is actually Gemini 1.0 which has a smaller context window
D.The temperature is set too high causing the model to reject long inputs
AnswerB

A 500-page document with images and dense text can easily exceed 1M tokens.

Why this answer

Option B is correct because the error indicates the input exceeds the model's context window. While Gemini 1.5 Pro supports a 1 million token context window, a 500-page PDF can easily surpass this limit when considering that a single page of dense text may contain 2,000–3,000 tokens, and images or embedded tables further inflate token counts. The API strictly enforces the token budget, and exceeding it triggers the error regardless of the model's theoretical capacity.

Exam trap

Cisco often tests the misconception that 'context window size' is a hard limit for all input types, but the trap here is that candidates overlook how multimodal content (images, tables) dramatically increases token consumption beyond simple text estimates.

How to eliminate wrong answers

Option A is wrong because Gemini 1.5 Pro accepts PDFs directly (including images and text) via multimodal processing, not just raw text; the error is about token limits, not format. Option C is wrong because the question explicitly states the developer is using Gemini 1.5 Pro, and the error message would specify the model version if a mismatch occurred; the context window size is not the issue here. Option D is wrong because temperature controls output randomness, not input length validation; the API rejects inputs based on token count, not sampling parameters.

49
Multi-Selectmedium

A machine learning engineer wants to reduce the latency of a Gemini-based chatbot running in production. Which TWO strategies would be MOST effective?

Select 2 answers
A.Switch from Gemini Pro to Gemini Flash
B.Reduce the max_output_tokens parameter
C.Enable streaming mode
D.Fine-tune the model on the specific task
E.Increase the temperature to 1.0
AnswersA, B

Flash is optimized for lower latency and cost compared to Pro.

Why this answer

Option A is correct because Gemini Flash is a lighter, more efficient model variant designed for lower latency and higher throughput compared to Gemini Pro. By switching to Flash, the engineer reduces the computational overhead per request, directly decreasing response time for the chatbot.

Exam trap

Cisco often tests the misconception that streaming reduces total latency, but it only improves perceived latency (time-to-first-token) while total processing time remains unchanged.

50
MCQeasy

Which Google Cloud service is specifically designed for generating code from natural language descriptions?

A.Codey
B.Imagen
C.Chirp
D.Gemini
AnswerA

Codey is a family of models specialized for code generation, completion, and chat.

Why this answer

Codey is the correct answer because it is Google Cloud's foundational code generation model, specifically designed to accept natural language prompts and produce source code in various programming languages. It powers features like code completion, code generation, and chat-based code assistance within Vertex AI, directly addressing the requirement of generating code from natural language descriptions.

Exam trap

The trap here is that candidates may confuse Gemini's broad multimodal capabilities with the specific, dedicated code generation service (Codey), leading them to select Gemini because it can also generate code, but the question asks for the service 'specifically designed' for that purpose.

How to eliminate wrong answers

Option B (Imagen) is wrong because Imagen is a text-to-image generation model, not a code generation model; it converts natural language descriptions into images. Option C (Chirp) is wrong because Chirp is a speech-to-text and speech recognition model, focused on transcribing audio, not generating code. Option D (Gemini) is wrong because while Gemini is a multimodal model capable of code generation, it is a broader foundation model for general-purpose tasks, not the service specifically designed and optimized for code generation from natural language; Codey is the dedicated service for that purpose.

51
MCQeasy

What is the primary purpose of the temperature parameter when generating text with an LLM?

A.It sets the maximum number of tokens in the response
B.It controls the randomness or creativity of the output
C.It determines the number of most likely tokens considered at each step
D.It sets the cumulative probability threshold for token selection
AnswerB

Temperature scales the logits before sampling; higher values produce more diverse outputs.

Why this answer

Temperature controls the randomness of token selection. Higher temperature increases creativity, lower temperature makes outputs more deterministic.

52
MCQhard

A healthcare startup needs to generate synthetic patient records for research. They require accurate output that adheres to medical syntax and semantics, and they must be able to explain why the model produces certain outputs for regulatory compliance. Which combination of techniques should they use?

A.Use RAG with a vector store of medical literature
B.Supervised fine-tuning on a large medical corpus, followed by RLHF
C.Use Gemini in a zero-shot prompt with a strict system instruction
D.Apply LoRA adapter fine-tuning with a small medical dataset
AnswerB

Supervised fine-tuning adapts the model to medical knowledge, and RLHF helps align outputs with desired behavior and can improve the model's ability to explain its reasoning.

Why this answer

Supervised fine-tuning on medical data adapts the model to the domain, while RLHF aligns outputs with human preferences and can improve interpretability. LoRA is efficient but doesn't directly help with explainability. RAG is for knowledge retrieval, not explainability.

In-context learning may not be reliable enough for regulatory compliance.

53
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly
B.Use a larger foundation model with a longer context window and paste all documents into each prompt
C.Train a custom model from scratch on the policy documents each month
D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions by retrieving relevant chunks from the policy documents stored in a vector store, without requiring model retraining. When documents are updated monthly, only the vector index needs to be refreshed, which is far more cost-effective and faster than fine-tuning or retraining a model. This approach leverages the base LLM's language understanding while grounding responses in the latest policy content.

Exam trap

Cisco often tests the misconception that fine-tuning or retraining is necessary for domain-specific knowledge, when in fact RAG provides a dynamic, cost-effective solution that avoids retraining and handles frequently updated documents.

How to eliminate wrong answers

Option A is wrong because fine-tuning a base LLM monthly on policy documents is expensive, time-consuming, and introduces risk of catastrophic forgetting, where the model may lose general language capabilities. Option B is wrong because pasting all documents into each prompt is impractical due to context window limits (even large models like GPT-4 have ~128K tokens, which is insufficient for extensive policy libraries) and leads to high inference costs and latency. Option C is wrong because training a custom model from scratch each month is prohibitively expensive and computationally intensive, requiring massive datasets and GPU resources, and is unnecessary when RAG can achieve the same goal with far less overhead.

54
MCQmedium

A healthcare organization wants to use generative AI to draft patient education materials. They are concerned about the model generating incorrect medical information. Which combination of Google Cloud services should they use to ground the model's responses in trusted medical literature?

A.Use Retrieval-Augmented Generation (RAG) with Vertex AI Search and Gemini
B.Use Imagen to generate visual aids and combine with Gemini for text, then manually review
C.Fine-tune Gemini on trusted medical literature and deploy with Vertex AI Endpoints
D.Use only Gemini Pro with careful prompt engineering and system instructions
AnswerA

RAG retrieves relevant, up-to-date medical documents from a search index, grounding Gemini's responses in trusted sources, reducing hallucination.

Why this answer

Option A is correct because Retrieval-Augmented Generation (RAG) with Vertex AI Search allows the model to retrieve and cite information from a curated corpus of trusted medical literature before generating responses. This grounds the output in verified sources, reducing the risk of hallucination or incorrect medical advice, while Gemini provides the generative capabilities.

Exam trap

Cisco often tests the misconception that fine-tuning or careful prompting alone can ensure factual accuracy, when in reality RAG provides a more reliable grounding mechanism by retrieving and citing external, up-to-date sources.

How to eliminate wrong answers

Option B is wrong because Imagen generates visual aids but does not address the core requirement of grounding text responses in trusted medical literature; manual review is not a scalable or reliable grounding mechanism. Option C is wrong because fine-tuning Gemini on trusted literature embeds knowledge into the model weights, which can still lead to outdated or hallucinated information if the training data is static, and it does not provide real-time retrieval or citation of specific sources. Option D is wrong because using only Gemini Pro with prompt engineering and system instructions does not guarantee factual accuracy; without retrieval from a trusted corpus, the model may still generate plausible but incorrect medical information.

55
MCQeasy

What is the primary purpose of the 'top-p' (nucleus sampling) parameter in text generation?

A.To define a cumulative probability threshold for token selection
B.To set a fixed number of highest probability tokens to consider
C.To limit the maximum number of tokens in the output
D.To control the randomness of the output by scaling logits
AnswerA

Top-p chooses tokens until the sum of their probabilities reaches the threshold p, adapting the number of tokens considered.

Why this answer

The top-p parameter, also known as nucleus sampling, selects the smallest set of tokens whose cumulative probability exceeds the threshold p (e.g., 0.9). This dynamically adapts the candidate pool based on the model's confidence, allowing more diverse tokens when the distribution is flat and fewer when it is peaked. It directly implements a cumulative probability cutoff, not a fixed count or scaling factor.

Exam trap

Cisco often tests the confusion between top-p and top-k, where candidates mistakenly think top-p selects a fixed number of tokens rather than a probability-based dynamic set.

How to eliminate wrong answers

Option B is wrong because it describes top-k sampling, which selects a fixed number of highest probability tokens regardless of their cumulative probability. Option C is wrong because it refers to the max_tokens or max_length parameter, which controls output length, not token selection diversity. Option D is wrong because it describes temperature scaling, which adjusts logit probabilities via a softmax divisor before sampling, not a cumulative probability threshold.

56
MCQmedium

A company has a Gemini-based application that sometimes produces factually incorrect answers. They want to improve accuracy without retraining the model. Which technique should they implement?

A.Reduce the top-k value to 10
B.Implement Retrieval-Augmented Generation (RAG) with a curated knowledge base
C.Use prompt engineering to instruct the model to 'be more accurate'
D.Increase the temperature to 1.0 for more diverse outputs
AnswerB

RAG provides relevant context from trusted sources, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct technique because it grounds the model's responses in a curated, external knowledge base, providing factual context that reduces hallucinations without modifying the model's weights. This directly addresses the need for improved accuracy in a Gemini-based application while avoiding costly retraining.

Exam trap

Cisco often tests the misconception that prompt engineering alone can fix factual accuracy issues, when in reality, without external knowledge grounding, the model remains reliant on its parametric memory which is prone to hallucination.

How to eliminate wrong answers

Option A is wrong because reducing top-k to 10 limits the number of candidate tokens considered during generation, which can reduce diversity but does not inherently improve factual accuracy; it may even cause the model to miss correct but less probable tokens. Option C is wrong because instructing the model to 'be more accurate' via prompt engineering is a vague directive that does not provide new factual information; the model cannot correct its own knowledge gaps through simple instruction. Option D is wrong because increasing temperature to 1.0 increases output randomness and diversity, which typically worsens factual accuracy by encouraging the model to explore less probable, often incorrect, token sequences.

57
MCQmedium

A company is building a legal document review assistant using Gemini 1.5 Pro. They want to ensure the model can handle large documents of up to 500 pages in a single prompt. Which feature of Gemini 1.5 Pro is MOST important for this requirement?

A.Temperature
B.Top-k sampling
C.Fine-tuning
D.Large context window
AnswerD

Gemini 1.5 Pro supports up to 1 million tokens, enabling processing of hundreds of pages in a single prompt.

Why this answer

Gemini 1.5 Pro has a large context window (up to 1 million tokens), allowing it to process large documents. Temperature, top-k, and fine-tuning do not directly address context length.

58
MCQmedium

An e-commerce company wants to generate realistic product images from text descriptions using Google Cloud AI. Which service should they use?

A.Imagen on Vertex AI
B.Gemini 1.5 Pro
C.Chirp
D.Vertex AI Codey
AnswerA

Imagen is Google's text-to-image model available on Vertex AI.

Why this answer

Imagen on Vertex AI is the correct service because it is specifically designed for text-to-image generation, allowing the e-commerce company to create realistic product images from textual descriptions. It leverages Google's advanced diffusion models to produce high-fidelity visuals, making it the ideal choice for this use case.

Exam trap

The trap here is that candidates may confuse Gemini's multimodal capabilities (which can process images but not generate them from scratch) with a dedicated image generation service, leading them to select Gemini 1.5 Pro instead of Imagen.

How to eliminate wrong answers

Option B (Gemini 1.5 Pro) is wrong because it is a multimodal large language model focused on understanding and generating text, code, and images in a conversational context, not a dedicated text-to-image generation service. Option C (Chirp) is wrong because it is a speech-to-text and text-to-speech model for audio processing, not for generating images. Option D (Vertex AI Codey) is wrong because it is a code generation and completion model for software development, not for creating visual content.

59
MCQmedium

A company uses a Gemini 1.5 Pro model with a 1 million token context window. They want to process a large 500-page PDF for Q&A. What is the MAIN advantage of using the long context window over a RAG approach?

A.Simpler architecture with no need for chunking or retrieval systems
B.Lower cost per API call
C.Faster inference speed
D.Higher accuracy for all queries
AnswerA

The entire PDF can be placed in the prompt, avoiding the complexity of building a vector index and retrieval pipeline.

Why this answer

Option A is correct because the primary advantage of using a 1 million token context window is architectural simplicity. By ingesting the entire 500-page PDF as a single prompt, the company eliminates the need for document chunking, embedding generation, and a retrieval system (RAG). This reduces system complexity, maintenance overhead, and potential failure points, as the model can directly attend to all content in one pass.

Exam trap

Cisco often tests the misconception that a larger context window is always cheaper, faster, or more accurate, when in reality it trades off simplicity for higher cost, slower inference, and potential accuracy degradation for mid-context information.

How to eliminate wrong answers

Option B is wrong because a 1 million token context window incurs significantly higher cost per API call due to the massive input token count, making it more expensive than a RAG approach that only sends relevant chunks. Option C is wrong because processing a 1 million token context window is slower than RAG, as the model must compute attention over all tokens, leading to higher latency. Option D is wrong because a long context window does not guarantee higher accuracy for all queries; RAG can be more accurate for specific, localized questions by retrieving only relevant information, while long context models may suffer from 'lost in the middle' effects where information in the middle of the context is less attended to.

60
Multi-Selecthard

A company is deploying a generative AI application that must comply with GDPR. They need to ensure user data is not used for model training and that responses do not contain personal data. Which THREE measures should they implement?

Select 3 answers
A.Use prompt engineering to instruct the model not to output personal data
B.Implement output filtering with a PII detection API (e.g., DLP)
C.Set temperature to 0 to ensure deterministic outputs
D.Configure data retention policies to prevent Google from using prompts for model improvement
E.Use a dedicated model instance to avoid data mixing with other customers
AnswersB, D, E

Post-processing redacts personal data from responses, reducing compliance risk.

Why this answer

Data retention policies prevent storing user data for training; output filtering redacts personal data; using a dedicated instance isolates the model from other tenants. Prompt engineering is not reliable for compliance.

61
MCQmedium

A healthcare company needs to generate synthetic medical images for research while ensuring compliance with patient privacy regulations. Which Google Cloud generative AI service should they use?

A.Codey for code generation
B.Chirp for speech recognition
C.Imagen on Vertex AI
D.Gemini 1.5 Pro with multimodal prompting
AnswerC

Imagen is purpose-built for text-to-image generation and can be deployed securely on Vertex AI with compliance controls.

Why this answer

Imagen on Vertex AI is Google's image generation service that can create synthetic images and offers controls for responsible AI and data governance.

62
MCQmedium

A developer is using the Gemini API to generate product descriptions. They want the output to be more focused and less random. Which parameter adjustment would BEST achieve this?

A.Decrease top-p to 0.1
B.Decrease temperature to 0.2
C.Increase top-k to 100
D.Increase temperature to 1.5
AnswerB

Lowering temperature reduces randomness, making the model more deterministic and focused on high-probability tokens.

Why this answer

Lowering temperature makes the model more deterministic and focused. Top-k and top-p control sampling but are secondary; raising temperature increases randomness.

63
Multi-Selecthard

An organization is building a multi‑agent workflow on Vertex AI where one agent analyzes an image (e.g., a scanned contract), another agent extracts text from the image, and a third agent answers questions about the contract. The solution must be low‑latency. Which THREE services are most appropriate?

Select 2 answers
A.Imagen
B.Vertex AI Search
C.Gemini on Vertex AI
D.Document AI
E.Cloud Vision API
AnswersC, E

Gemini can process images and text, serving as both the image analyst and the Q&A agent.

Why this answer

Gemini on Vertex AI (C) is the most appropriate service because it is a multimodal model that can natively analyze images, extract text, and answer questions about the content in a single, low-latency inference call. This eliminates the need to chain separate services for image analysis, OCR, and Q&A, reducing overall latency and architectural complexity.

Exam trap

Cisco often tests the misconception that multimodal tasks require separate specialized services (e.g., Cloud Vision for OCR + a separate LLM for Q&A), when in fact a single multimodal model like Gemini can perform all steps in one low-latency call.

64
MCQhard

A data scientist wants to generate photorealistic images of products from text descriptions for an e-commerce catalog. The images must be brand-consistent and avoid generating distorted product features. Which Google Cloud generative AI service should they use?

A.Veo
B.Imagen
C.Chirp
D.Gemini Pro Vision
AnswerB

Imagen is Google's text-to-image diffusion model, built for creating realistic, high-fidelity images from prompts, with features for brand consistency.

Why this answer

Imagen is Google's text-to-image model that produces high-quality, photorealistic images. It is designed for brand consistency and safe image generation.

65
MCQhard

A company wants to generate a video from a text description using Google Cloud. Which service is designed for this?

A.Codey
B.Chirp
C.Imagen
D.Veo
AnswerD

Veo is Google's text-to-video generation model.

Why this answer

Veo is Google Cloud's generative AI model specifically designed for creating high-quality videos from text or image prompts. It leverages advanced diffusion and transformer architectures to generate coherent video sequences, making it the correct choice for text-to-video generation.

Exam trap

The trap here is that candidates often confuse Imagen (text-to-image) with Veo (text-to-video), assuming any generative visual model can handle video, but Google Cloud explicitly separates these capabilities into distinct services.

How to eliminate wrong answers

Option A is wrong because Codey is Google's model for code generation and chat, not video creation. Option B is wrong because Chirp is a speech-to-text and text-to-speech model, focused on audio processing. Option C is wrong because Imagen is a text-to-image model, capable of generating static images but not video sequences.

66
Multi-Selecteasy

A developer is using the Gemini API to generate text summaries. They want to control the creativity and diversity of the output. Which THREE parameters can they adjust?

Select 3 answers
A.Context window
B.Top-p
C.Embedding dimension
D.Top-k
E.Temperature
AnswersB, D, E

Top-p (nucleus sampling) selects from the smallest set of tokens whose cumulative probability exceeds p, controlling diversity.

Why this answer

Option B (Top-p) is correct because it controls the nucleus sampling threshold, where the model considers only the smallest set of tokens whose cumulative probability exceeds the specified p value (e.g., 0.9). This directly influences the diversity of the generated text by limiting the token pool to the most likely candidates, reducing the chance of sampling very low-probability tokens.

Exam trap

Cisco often tests the distinction between parameters that affect input processing (context window, embedding dimension) versus those that control output generation (temperature, top-k, top-p), leading candidates to mistakenly select context window as a creativity parameter.

67
Multi-Selectmedium

A developer is using the Gemini API to generate marketing copy. They want the output to be diverse and creative but still relevant to the topic. Which THREE parameter adjustments would help achieve this? (Choose 3)

Select 3 answers
A.Increase temperature to 0.9
B.Increase top-k to 50
C.Decrease temperature to 0.1
D.Decrease top-k to 10
E.Increase top-p to 0.95
AnswersA, B, E

Higher temperature increases randomness, leading to more creative outputs.

Why this answer

Higher temperature increases randomness and creativity. Higher top-k and higher top-p both allow more tokens to be considered, increasing diversity. Lowering these would make output more focused.

68
MCQeasy

Which Google Cloud generative AI model is specifically designed for code generation tasks?

A.Gemini
B.PaLM 2
C.Imagen
D.Codey
AnswerD

Codey is a family of models fine-tuned from PaLM 2 specifically for code generation, code completion, and code chat.

Why this answer

Codey is Google's model fine-tuned for code generation, completion, and chat. PaLM 2 is a general purpose LLM, Gemini is multimodal, and Imagen is for image generation.

69
Multi-Selectmedium

A company needs to deploy a generative AI application on Google Cloud that meets data residency requirements. Which THREE features should they enable? (Select three.)

Select 3 answers
A.Enable VPC Service Controls
B.Use a multi-region storage bucket
C.Use the global endpoint for low latency
D.Enable data residency boundaries in IAM
E.Select a specific region for Vertex AI resources
AnswersA, D, E

VPC-SC helps prevent data exfiltration and restricts data movement.

Why this answer

To satisfy data residency, the company must control where data is stored (region selection), ensure no data leaves that region (data boundaries), and use a service that supports regional endpoints (Vertex AI).

70
MCQhard

A researcher wants to use Google's AlphaFold for a project. What is the primary capability of AlphaFold?

A.Generating realistic human speech
B.Playing the game of Go at superhuman level
C.Predicting 3D protein structures from amino acid sequences
D.Generating code from natural language descriptions
AnswerC

AlphaFold is known for protein structure prediction.

Why this answer

AlphaFold, developed by Google DeepMind, is specifically designed to predict the 3D structure of proteins from their amino acid sequences. This capability solves a fundamental challenge in biology, as the function of a protein is largely determined by its 3D shape, and experimental methods like X-ray crystallography are time-consuming and expensive. AlphaFold achieves this using a deep learning architecture that integrates multiple sequence alignment (MSA) and pairwise distance predictions to model the spatial coordinates of atoms.

Exam trap

Cisco often tests the distinction between different Google DeepMind projects (AlphaGo vs. AlphaFold vs. AlphaZero), so the trap here is confusing the domain of game-playing AI with the domain of scientific prediction, leading candidates to pick Option B if they recall AlphaGo's fame but not AlphaFold's specific purpose.

How to eliminate wrong answers

Option A is wrong because generating realistic human speech is the primary capability of text-to-speech models like WaveNet or Tacotron, not AlphaFold, which focuses on structural biology. Option B is wrong because playing the game of Go at superhuman level is the achievement of AlphaGo and AlphaZero, which use reinforcement learning and Monte Carlo tree search, not the protein folding task AlphaFold was built for. Option D is wrong because generating code from natural language descriptions is the domain of large language models like Codex or GPT-4, not AlphaFold, which has no code generation functionality.

71
MCQeasy

In the transformer architecture, what is the role of the attention mechanism?

A.It normalizes the output of each layer
B.It decides which parts of the input to focus on when generating each token
C.It predicts the next token directly
D.It converts tokens into numerical vectors
AnswerB

Attention computes relevance scores between tokens, allowing the model to focus on relevant parts of the input.

Why this answer

The attention mechanism in the Transformer architecture computes a weighted sum of all input token representations, allowing the model to dynamically focus on the most relevant parts of the input sequence when generating each output token. This is achieved through learned query, key, and value projections that produce attention scores, enabling the model to capture long-range dependencies and contextual relationships. Option B correctly identifies this core function of selectively attending to input elements during token generation.

Exam trap

Cisco often tests the distinction between the attention mechanism's role in focusing on input parts versus the final prediction layer's role in outputting the next token, leading candidates to mistakenly select Option C.

How to eliminate wrong answers

Option A is wrong because normalization of layer outputs is performed by layer normalization, not the attention mechanism; attention computes relevance weights, not normalization statistics. Option C is wrong because predicting the next token directly is the role of the final linear layer and softmax over the vocabulary, while the attention mechanism provides contextualized representations that feed into that prediction. Option D is wrong because converting tokens into numerical vectors is the function of the embedding layer (token embeddings), not the attention mechanism, which operates on those vectors to compute attention scores.

72
MCQmedium

Which Google AI model was the first to demonstrate that transformers could be pre-trained bidirectionally on a large corpus, leading to major improvements in language understanding?

A.GPT-3
B.AlphaGo
C.Transformer (the paper)
D.BERT
AnswerD

BERT is a bidirectional transformer pre-trained on a large corpus, setting new benchmarks for language understanding.

Why this answer

BERT (Bidirectional Encoder Representations from Transformers) was the first model to demonstrate that transformers could be pre-trained bidirectionally on a large corpus (BooksCorpus and English Wikipedia). By using a masked language model (MLM) objective, BERT conditions on both left and right context simultaneously, unlike previous unidirectional models, leading to significant improvements on 11 NLP benchmarks at its release.

Exam trap

Cisco often tests the distinction between introducing the transformer architecture (Option C) and being the first to apply bidirectional pre-training to it (Option D), causing candidates to confuse the original Transformer paper with BERT's specific contribution.

How to eliminate wrong answers

Option A is wrong because GPT-3 is a unidirectional (autoregressive) transformer model that predicts the next token left-to-right, not bidirectionally, and it was released after BERT. Option B is wrong because AlphaGo is a reinforcement learning model for playing the board game Go, not a language model, and it uses convolutional neural networks and Monte Carlo tree search, not bidirectional transformer pre-training. Option C is wrong because the Transformer paper ("Attention Is All You Need") introduced the transformer architecture itself but did not demonstrate bidirectional pre-training on a large corpus; it was a supervised translation model, not a pre-trained language model.

73
MCQhard

An enterprise is deploying a customer-facing chatbot using a foundation model on Vertex AI. They need to ensure the model does not produce toxic outputs. Which combination of settings and features should they implement?

A.Use Reinforcement Learning from Human Feedback (RLHF) during fine-tuning
B.Enable Vertex AI Model Monitoring and set up alerts for toxic outputs
C.Reduce the temperature to 0.0 and increase top-k to 50
D.Configure safety filters and safety settings in the model deployment to block harmful categories
AnswerD

Safety filters provide real-time content moderation, blocking toxic outputs before they reach users.

Why this answer

Option D is correct because safety filters and safety settings in Vertex AI model deployment are the direct mechanism to block harmful categories of output at inference time. These settings allow administrators to define thresholds for categories like toxicity, harassment, and hate speech, ensuring the model refuses to generate prohibited content without requiring retraining or post-hoc monitoring.

Exam trap

Cisco often tests the distinction between training-time alignment techniques (like RLHF) and inference-time safety controls (like safety filters), tempting candidates to choose a fine-tuning approach when the question explicitly asks for deployment settings to prevent toxic outputs.

How to eliminate wrong answers

Option A is wrong because RLHF is a fine-tuning technique that aligns model behavior based on human preferences, but it does not provide real-time blocking of toxic outputs during inference; it only influences the model's general behavior after training. Option B is wrong because Vertex AI Model Monitoring is designed for detecting data drift and performance anomalies, not for filtering or blocking toxic content in real-time responses. Option C is wrong because reducing temperature to 0.0 makes the model deterministic and increasing top-k to 50 broadens token selection, which does not prevent toxicity; these parameters control randomness, not content safety.

74
MCQmedium

A company wants to generate high-quality product images from text descriptions for an e-commerce catalog. They need photorealistic results. Which model and approach should they choose?

A.Use Veo for video generation and extract frames
B.Fine-tune Gemini 1.5 Pro on product images
C.Use Imagen on Vertex AI with appropriate prompts
D.Use Codey to generate code that renders images
AnswerC

Imagen is optimized for photorealistic image generation from text.

Why this answer

Imagen on Vertex AI is specifically designed for high-quality, photorealistic text-to-image generation, making it the ideal choice for creating product images from text descriptions. It leverages advanced diffusion models to produce detailed and visually accurate outputs that meet the requirements of an e-commerce catalog.

Exam trap

Cisco often tests the distinction between models specialized for different modalities (text, image, video, code) to see if candidates recognize that a dedicated image generation model like Imagen is required for photorealistic text-to-image tasks, rather than repurposing video or code models.

How to eliminate wrong answers

Option A is wrong because Veo is a video generation model, and extracting frames from video would introduce motion artifacts, temporal inconsistencies, and lower resolution compared to a dedicated image generation model, failing to achieve photorealistic results. Option B is wrong because Gemini 1.5 Pro is a multimodal large language model optimized for understanding and generating text, code, and reasoning, not for high-fidelity image generation; fine-tuning it on product images would not produce photorealistic outputs as it lacks a diffusion-based image generation architecture. Option D is wrong because Codey is a code generation model designed to produce code snippets, not to render images; using it to generate code that renders images would require additional rendering engines and would not directly produce photorealistic images from text descriptions.

75
Multi-Selectmedium

A company is deploying a chatbot using Gemini 1.5 Pro. They want to reduce the risk of the chatbot generating toxic or harmful content. Which TWO techniques should they implement? (Choose two.)

Select 2 answers
A.Apply Reinforcement Learning from Human Feedback (RLHF) after deployment
B.Include a system prompt that instructs the model to be helpful and harmless
C.Use a RAG system to ground responses in a knowledge base
D.Configure Google's safety filters and thresholds in Vertex AI
E.Fine-tune the model on a curated dataset of safe conversations
AnswersD, E

Safety filters can block categories of harmful content before generation.

Why this answer

Safety filters (e.g., Google's safety settings) block harmful content. Fine-tuning with curated safe examples reduces the likelihood of generating harmful outputs. Prompt engineering alone is insufficient, RLHF is post-training and may not catch all cases, and RAG is for grounding, not safety.

Page 1 of 2 · 122 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Gcl Genai Concepts Tech questions.