CCNA Gcl Genai Concepts Tech Questions — Page 2 of 2

MCQeasy

A data scientist needs to generate high-quality images from text prompts using Google Cloud. Which service should they use?

A.Imagen

B.PaLM 2

C.Gemini Pro Vision

D.Codey

AnswerA

Imagen is Google's diffusion-based model for generating images from text prompts.

Why this answer

Imagen is Google Cloud's text-to-image diffusion model. PaLM 2 and Gemini are primarily text models; Codey is for code generation.

Practice this question →

MCQeasy

What is the primary benefit of using foundation models (like Gemini) as opposed to training a model from scratch?

A.They are always faster at inference

B.They guarantee 100% accuracy on all tasks

C.They require less data and compute to adapt to new tasks

D.They are open-source and free to use

AnswerC

Pre-training provides a strong base; fine-tuning or prompting needs fewer resources.

Why this answer

Foundation models like Gemini are pre-trained on vast datasets, capturing general language understanding and patterns. Adapting them to new tasks via fine-tuning requires significantly less task-specific data and computational resources compared to training a model from scratch, which demands enormous datasets and compute for initial training. This transfer learning approach is the primary benefit, enabling efficient customization for specialized applications.

Exam trap

Cisco often tests the misconception that 'pre-trained' means 'free' or 'always faster,' leading candidates to pick options that confuse inference speed or licensing with the core benefit of reduced data and compute for adaptation.

How to eliminate wrong answers

Option A is wrong because inference speed depends on model architecture, size, and optimization, not on whether the model is a foundation model or trained from scratch; a smaller custom model can be faster at inference than a large foundation model. Option B is wrong because no model, including foundation models, can guarantee 100% accuracy on all tasks due to inherent data biases, distribution shifts, and the complexity of real-world tasks. Option D is wrong because while some foundation models are open-source (e.g., Llama), many, including Gemini, are proprietary and require API access or licensing fees, so they are not universally free or open-source.

Practice this question →

Multi-Selecthard

A financial institution wants to deploy a generative AI system for automated report generation. They require that the model does NOT expose sensitive information from its training data and that outputs are factually accurate. Which THREE techniques should they combine?

Select 3 answers

A.Use Retrieval-Augmented Generation (RAG) to ground outputs in verified documents

B.Train the model with differential privacy

C.Use a larger model with more parameters

D.Apply Reinforcement Learning from Human Feedback (RLHF) to align model behavior

E.Increase the temperature to 2.0 for creativity

AnswersA, B, D

RAG ensures facts come from trusted sources, reducing hallucinations and exposure of training data.

Why this answer

RAG grounds outputs in verified sources, RLHF can reduce hallucination and harmful outputs, and differential privacy during training prevents memorization of sensitive data.

Practice this question →

MCQmedium

An organization wants to use generative AI to automatically generate code snippets from natural language descriptions. The solution must be integrated into their existing CI/CD pipeline on Google Cloud. Which service should they use?

A.Chirp on Vertex AI

B.Codey on Vertex AI

C.Imagen on Vertex AI

D.Gemini 1.5 Flash on Vertex AI

AnswerB

Codey is a family of models for code generation, code chat, and code completion, directly integrated with Vertex AI.

Why this answer

Codey (Codey for code generation) is designed for code generation and is available via Vertex AI. Gemini can also generate code but Codey is specialized. Imagen and Chirp are for images and speech, respectively.

Practice this question →

Multi-Selecteasy

Which THREE of the following are generative AI modalities supported by Google Cloud services?

Select 3 answers

A.Image generation

B.Code generation

C.Text generation

D.Tabular data generation

E.Speech generation

AnswersA, B, C

Imagen supports image generation.

Why this answer

Option A is correct because Google Cloud's Vertex AI and Imagen APIs provide image generation capabilities, allowing users to create and edit images from text prompts. This is a core generative AI modality supported by Google Cloud services.

Exam trap

Cisco often tests the distinction between core generative AI modalities (text, code, image) and other AI services (like speech or tabular data) that are not considered primary generative AI capabilities in the context of Vertex AI foundation models.

Practice this question →

MCQmedium

A company wants to generate high-quality images from text descriptions for their marketing materials. They need the ability to edit specific regions of an image without regenerating the entire image. Which Google Cloud service should they use?

A.Gemini

B.Codey

C.Imagen

D.Veo

AnswerC

Imagen supports text-to-image generation and includes features like inpainting and outpainting for region-specific editing.

Why this answer

Imagen offers inpainting and outpainting capabilities for editing specific regions. Gemini is multimodal but not optimized for image editing. Veo generates video, not still images.

Codey is for code.

Practice this question →

MCQmedium

A startup wants to generate realistic product videos from text descriptions for social media ads. Which Google Cloud service should they use?

A.Imagen

B.Gemini Pro Vision

C.Veo

D.Codey

AnswerC

Veo is Google's generative video model, capable of producing high-quality videos from text descriptions.

Why this answer

Veo is Google Cloud's advanced video generation model that can create high-quality, realistic videos from text prompts, making it the ideal choice for generating product videos for social media ads. Unlike other services, Veo is specifically designed for video synthesis, offering capabilities like style control and cinematic effects directly from text descriptions.

Exam trap

The trap here is that candidates often confuse multimodal understanding (Gemini Pro Vision) with generative creation (Veo), or assume that image generation (Imagen) can be trivially extended to video without understanding the distinct temporal modeling required.

How to eliminate wrong answers

Option A is wrong because Imagen is a text-to-image generation model, not a video generation service; it produces static images, not dynamic video content. Option B is wrong because Gemini Pro Vision is a multimodal model that can analyze and understand images and videos, but it does not generate new video content from text descriptions. Option D is wrong because Codey is a code generation model designed for assisting with programming tasks, not for generating visual media like videos.

Practice this question →

MCQhard

A team is building a multi-modal agent that needs to accept a user's image of a handwritten note, convert it to text, and then run a sentiment analysis. They want to minimize latency and cost. Which approach is best?

A.Fine-tune Gemini Pro on handwritten notes and sentiment labels

B.Use Document AI for OCR and then call Codey for sentiment analysis

C.Use Gemini 1.5 Flash with a prompt that includes the image and asks for sentiment analysis in one call

D.Use Cloud Vision API for OCR, then feed the text to a sentiment analysis model via Vertex AI

AnswerC

Gemini Flash is optimized for low latency and cost, and its multimodal capability handles both tasks in one inference.

Why this answer

Option C is correct because Gemini 1.5 Flash is a multimodal model that can directly process images and perform sentiment analysis in a single API call, eliminating the need for separate OCR and NLP services. This minimizes both latency (by reducing the number of sequential calls) and cost (by using a single, efficient model instead of multiple specialized services).

Exam trap

Cisco often tests the candidate's ability to recognize that multimodal models like Gemini 1.5 Flash can replace multi-step pipelines (OCR + NLP) in a single call, and the trap here is that candidates default to traditional separate-service architectures (like Cloud Vision + Vertex AI) without considering the latency and cost benefits of a unified multimodal approach.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini Pro on handwritten notes and sentiment labels is overkill for this task, incurring high training costs and latency, and Gemini Pro is a larger, more expensive model than needed for simple OCR and sentiment analysis. Option B is wrong because Document AI is designed for structured document extraction, not general handwritten note OCR, and Codey is a code generation model, not a sentiment analysis model, making this combination technically mismatched. Option D is wrong because using Cloud Vision API for OCR followed by a separate sentiment analysis model via Vertex AI introduces additional latency and cost from multiple API calls, whereas Gemini 1.5 Flash can achieve the same result in one step.

Practice this question →

MCQmedium

A developer is using Gemini 1.5 Flash for a real-time chat application and notices that responses are sometimes too slow. Which model parameter or configuration change would MOST likely reduce latency without significantly harming quality?

A.Decrease the top-k value to 10

B.Decrease the max output tokens setting

C.Increase the temperature to 1.5

D.Increase the top-p value to 1.0

AnswerB

Fewer generated tokens means faster response completion, directly reducing latency.

Why this answer

Lowering max output tokens reduces the number of tokens generated, directly decreasing response time. Temperature and top-p affect creativity, not latency. Reducing top-k may slightly speed up sampling but has a minimal effect compared to output length.

Practice this question →

MCQmedium

A machine learning engineer wants to convert text into numerical vectors for similarity search. Which Google Cloud service should they use?

A.Vertex AI Embeddings API

B.Natural Language API

C.Vector Search

D.Gemini API

AnswerA

This API generates text embeddings for downstream tasks.

Why this answer

The Vertex AI Embeddings API is the correct choice because it is specifically designed to convert text (and other data types) into dense numerical vectors (embeddings) that capture semantic meaning. These embeddings are the fundamental input for similarity search, enabling efficient comparison of text based on conceptual closeness rather than exact keyword matching.

Exam trap

The trap here is that candidates confuse the Natural Language API's text analysis capabilities (like entity extraction) with the embedding generation required for similarity search, or they assume Vector Search or Gemini API can generate embeddings directly when they are actually downstream or generative tools.

How to eliminate wrong answers

Option B is wrong because the Natural Language API performs entity extraction, sentiment analysis, and syntax analysis, but it does not generate embeddings for similarity search. Option C is wrong because Vector Search is a service for indexing and querying embeddings at scale, not for generating them from raw text. Option D is wrong because the Gemini API is a multimodal generative model for chat and content generation, not a dedicated embedding service for converting text into vectors.

Practice this question →

MCQmedium

A startup wants to integrate a GenAI assistant into Google Workspace (Docs, Gmail, Sheets) to help employees draft emails and create charts. Which Google AI offering is designed for this purpose?

A.Gemini for Workspace

B.Colab Enterprise

C.Vertex AI Agent Builder

D.NotebookLM

AnswerA

Gemini for Workspace provides AI assistance directly in Workspace applications.

Why this answer

Gemini for Workspace (formerly Duet AI) is Google's AI assistant embedded across Workspace apps for tasks like drafting and analysis.

Practice this question →

MCQmedium

A data scientist is using Vertex AI to fine-tune a Gemini model for a specialized legal document summarization task. They have a small set of labeled examples (200 pairs). Which fine-tuning method is MOST cost-effective and likely to perform well?

A.Full fine-tuning of all model parameters

B.Adapter-based fine-tuning (e.g., LoRA)

C.Training a small custom model from scratch

D.Prompt engineering with few-shot examples only

AnswerB

LoRA updates low-rank matrices, preserving the base model and reducing memory/storage requirements while adapting to the new task.

Why this answer

Adapter-based fine-tuning (like LoRA) updates only a small fraction of parameters, making it efficient with small datasets and low cost, while still adapting the model to the task.

Practice this question →

MCQeasy

What is the primary purpose of the temperature parameter when configuring a generative AI model?

A.Adjusts the number of highest-probability tokens considered at each step

B.Controls the diversity of the output by scaling the log probabilities before sampling

C.Specifies the minimum probability threshold for token selection

D.Sets the maximum number of tokens in the response

AnswerB

Temperature is applied to the logits before softmax; higher values flatten the distribution, making lower‑probability tokens more likely.

Why this answer

Temperature controls the randomness of token selection. Higher temperature increases creativity/diversity; lower temperature produces more deterministic and focused responses.

Practice this question →

Multi-Selectmedium

A company is building a customer support agent that can answer questions about product manuals and also generate images of the products from descriptions. Which TWO Google Cloud services should they combine? (Select 2)

Select 2 answers

A.Codey

B.Chirp

C.Cloud Vision API

D.Imagen on Vertex AI

E.Gemini on Vertex AI

AnswersD, E

Imagen generates images from text descriptions.

Why this answer

Imagen on Vertex AI is correct because it is Google Cloud's text-to-image generation service, capable of creating product images from textual descriptions. This directly fulfills the requirement to 'generate images of the products from descriptions'.

Exam trap

The trap here is that candidates may confuse Cloud Vision API (which analyzes images) with Imagen (which generates images), or assume that a single model like Gemini can handle both text and image generation natively, when in fact Gemini is primarily a multimodal understanding model and Imagen is the dedicated image generation service.

Practice this question →

MCQhard

An AI team is choosing between supervised fine-tuning and reinforcement learning from human feedback (RLHF) for a chatbot. They want the model to follow instructions closely and avoid toxic outputs. Which statement correctly compares these approaches?

A.RLHF is more effective at aligning the model with human values, including reducing toxicity

B.Supervised fine-tuning and RLHF achieve the same results, but RLHF is faster

C.Supervised fine-tuning is better for reducing toxicity because it uses labeled safe examples

D.In-context learning is always superior to both for safety, as it can adapt on the fly

AnswerA

RLHF uses human feedback to reward non-toxic and helpful responses, directly aligning with desired values.

Why this answer

RLHF aligns the model with human preferences by rewarding desired behaviors (e.g., non-toxic, helpful). Supervised fine-tuning teaches format but not safety; in-context learning can steer but is less reliable for safety.

Practice this question →

Multi-Selecthard

A company wants to build a multimodal AI application that accepts text and image inputs and provides text responses. They need to process sensitive customer data and require that the model be hosted within their own Google Cloud project for data residency. Which TWO components are essential? (Choose 2)

Select 2 answers

A.Vertex AI with private endpoints or VPC-SC

B.Gemini API with multimodal capabilities

C.Codey for code generation

D.Chirp for speech recognition

E.Imagen for image input processing

AnswersA, B

Vertex AI allows deployment within a project and supports data residency controls.

Why this answer

Gemini is a multimodal model that can process text and images. Vertex AI provides a managed environment for deploying models with data residency controls. The other options are not essential for this requirement.

Practice this question →

MCQmedium

A team is using a generative AI model to create marketing copy. They want the responses to be more focused and less random. Which parameter should they adjust?

A.Decrease temperature

B.Increase top-k

C.Decrease context window

D.Increase temperature

AnswerA

Decreasing temperature makes the model more deterministic and focused.

Why this answer

Decreasing the temperature parameter reduces the randomness of the model's token selection by lowering the probability of sampling less likely tokens. This makes the output more focused and deterministic, which is ideal for marketing copy that needs to stay on-brand and consistent.

Exam trap

Cisco often tests the misconception that increasing temperature or top-k makes outputs more focused, when in fact both increase randomness and diversity.

How to eliminate wrong answers

Option B is wrong because increasing top-k would actually increase the diversity of token selection by allowing more high-probability tokens to be considered, making responses less focused. Option C is wrong because decreasing the context window limits the amount of input text the model can reference, which can reduce coherence and relevance, not improve focus. Option D is wrong because increasing temperature amplifies randomness, making outputs more creative but less predictable and focused.

Practice this question →

Multi-Selecthard

A company is deploying a Gemini-based application and needs to ensure low latency for real-time user interactions. They also want to reduce cost. Which THREE strategies should they consider? (Select 3)

Select 3 answers

A.Use Gemini 1.5 Flash instead of Pro

B.Implement response caching for common queries

C.Increase the model's max output tokens to ensure comprehensive answers

D.Use full fine-tuning to make the model faster

E.Keep the context window as short as possible by trimming input

AnswersA, B, E

Flash is optimized for speed and lower cost.

Why this answer

Option A is correct because Gemini 1.5 Flash is a lighter, distilled version of the Pro model, designed for lower latency and reduced computational cost while still maintaining strong performance for real-time interactions. Flash models use fewer parameters and optimized inference paths, making them ideal for latency-sensitive applications where cost efficiency is critical.

Exam trap

Cisco often tests the misconception that increasing output tokens or fine-tuning improves speed, when in reality these actions increase computational load or add overhead, making them counterproductive for latency and cost goals.

Practice this question →

MCQmedium

A data scientist is fine-tuning a foundation model for a specialized legal document summarization task. The labeled dataset is only 5,000 examples. Which fine-tuning technique would be MOST efficient to adapt the model without catastrophic forgetting and with minimal computational cost?

A.Low-Rank Adaptation (LoRA)

B.Reinforcement Learning from Human Feedback (RLHF)

C.Full supervised fine-tuning of all model parameters

D.In-context learning with few-shot examples

AnswerA

LoRA inserts trainable low-rank matrices into transformer layers, requiring far fewer parameters to update, which is efficient and reduces forgetting.

Why this answer

LoRA (Low-Rank Adaptation) is an adapter-based method that trains only a small number of added parameters, making it efficient and less prone to catastrophic forgetting compared to full fine-tuning. Supervised fine-tuning full model is expensive; RLHF is for alignment after fine-tuning; in-context learning requires no training but may not suffice.

Practice this question →

MCQeasy

Which Google AI milestone introduced the Transformer architecture that underpins modern LLMs?

A.AlphaGo

B.Transformer paper

C.AlphaFold

D.BERT

AnswerB

The Transformer paper introduced the foundational architecture for LLMs.

Why this answer

The Transformer architecture, which is the foundational technology behind modern large language models (LLMs) like GPT and BERT, was introduced in the 2017 paper 'Attention Is All You Need' by Vaswani et al. This paper proposed the self-attention mechanism and the encoder-decoder structure that replaced recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, enabling parallelized training and superior handling of long-range dependencies in sequence data.

Exam trap

Cisco often tests the distinction between the original research paper that introduced a concept (the Transformer paper) and later implementations or applications of that concept (like BERT or GPT), causing candidates to confuse the milestone with its derivative products.

How to eliminate wrong answers

Option A is wrong because AlphaGo is a reinforcement learning-based system for playing the board game Go, not a milestone in neural network architecture for language models. Option C is wrong because AlphaFold is a deep learning model for protein structure prediction, not a foundational architecture for LLMs. Option D is wrong because BERT is a pre-trained language model that itself is built on the Transformer architecture, not the original paper that introduced the Transformer.

Practice this question →

Multi-Selectmedium

A data scientist is evaluating how to ground a generative AI model to reduce hallucinations when answering questions about a private knowledge base. Which TWO techniques are most suitable?

Select 2 answers

A.Using a larger model like Gemini Ultra

B.Fine‑tuning on the private knowledge base

C.Increasing the temperature to 0.9

D.Retrieval-Augmented Generation (RAG)

E.Prompt engineering to instruct the model to answer based only on the provided context

AnswersD, E

RAG injects retrieved knowledge into the prompt, grounding the response in the source.

Why this answer

Retrieval-Augmented Generation (RAG) is the most suitable technique because it retrieves relevant, up-to-date chunks from the private knowledge base at inference time and conditions the generative model's output on that retrieved context. This grounds the model in factual data, directly reducing hallucinations by ensuring answers are based on the retrieved evidence rather than the model's parametric memory alone.

Exam trap

Cisco often tests the misconception that fine-tuning is the primary method for grounding a model on private data, when in fact RAG is preferred for dynamic or large knowledge bases because it avoids retraining and allows real-time updates without modifying model weights.

Practice this question →

MCQeasy

Which statement best describes the difference between the Gemini Flash and Gemini Pro models on Vertex AI?

A.Gemini Flash is a distilled version of Gemini Pro that requires fine‑tuning before use

B.Gemini Pro is deployed on Google’s TPU v5p chips, while Flash uses TPU v4

C.Gemini Flash is optimized for speed and cost, while Gemini Pro provides higher quality for complex tasks

D.Gemini Flash is only available for image inputs, while Gemini Pro handles text

AnswerC

Flash is a lightweight model for faster, cheaper inference; Pro is more capable for nuanced reasoning.

Why this answer

Option C is correct because Gemini Flash is specifically designed for low-latency, high-throughput, and cost-efficient inference, making it ideal for high-volume, simpler tasks. In contrast, Gemini Pro is a larger, more capable model that delivers superior quality and reasoning for complex, multi-step tasks, though at higher latency and cost. This distinction is fundamental to the Gemini model family on Vertex AI, where Flash serves as the lightweight, fast option and Pro as the premium, high-quality option.

Exam trap

The trap here is that candidates often assume 'Flash' implies a distilled or pruned version of 'Pro' (like a student model), but in reality, Flash is a distinct model trained from scratch with a different architecture optimized for speed, not a compressed version of Pro.

How to eliminate wrong answers

Option A is wrong because Gemini Flash is not a distilled version of Gemini Pro; it is a separate, independently trained model optimized for speed and cost, and it does not require fine-tuning before use—it is available as a pre-trained model for inference. Option B is wrong because both Gemini Flash and Gemini Pro are deployed on Google's TPU v5p chips; the difference in performance is due to model architecture and size, not the underlying TPU generation. Option D is wrong because Gemini Flash handles both text and image inputs (multimodal), just like Gemini Pro; the limitation to image-only inputs is a misconception.

Practice this question →

MCQmedium

A company wants to use generative AI to create short product videos from text descriptions. Which Google Cloud service should they consider?

A.Chirp

B.Imagen

C.Gemini

D.Veo

AnswerD

Veo is Google's model for generating high-quality videos from text prompts.

Why this answer

Veo is Google's video generation model that creates videos from text. Imagen creates images, Chirp creates audio, and Gemini is multimodal but not specialized for video generation.

Practice this question →

MCQmedium

A company is building a multilingual customer support chatbot that needs to understand and respond in 20 languages. Which Google model is most suitable for this task?

A.Chirp

B.Imagen

C.Codey

D.Gemini

AnswerD

Gemini supports many languages and multimodal understanding.

Why this answer

Gemini is a multimodal large language model (LLM) designed for understanding and generating text across multiple languages, making it the most suitable choice for a multilingual customer support chatbot. Unlike specialized models, Gemini's architecture supports over 100 languages natively, enabling it to handle the 20-language requirement without needing separate language-specific models.

Exam trap

Cisco often tests the distinction between specialized models (like Chirp for audio, Imagen for images, Codey for code) and general-purpose multimodal LLMs (like Gemini) that can handle diverse tasks including multilingual text generation.

How to eliminate wrong answers

Option A is wrong because Chirp is a speech-to-text and text-to-speech model focused on audio processing, not on understanding or generating multilingual text for a chatbot. Option B is wrong because Imagen is a text-to-image generation model, not designed for natural language understanding or multilingual text responses. Option C is wrong because Codey is a code generation model specialized in programming languages and code completion, not in handling natural language conversations across multiple human languages.

Practice this question →

100

MCQmedium

A researcher wants to adapt a large language model for a specialized medical terminology domain without retraining the entire model. Which fine-tuning method is MOST parameter-efficient?

A.In-context learning with 50 examples

B.Adapter-based fine-tuning using LoRA

C.RLHF (Reinforcement Learning from Human Feedback)

D.Full supervised fine-tuning of all model weights

AnswerB

LoRA injects trainable low-rank matrices into the model, updating only a tiny fraction of parameters while achieving strong performance.

Why this answer

LoRA (Low-Rank Adaptation) is the most parameter-efficient fine-tuning method because it injects trainable low-rank matrices into the transformer layers, updating only a tiny fraction (often <1%) of the model's parameters while keeping the original weights frozen. This allows the model to adapt to specialized medical terminology without the memory and compute cost of full fine-tuning, making it ideal for domain adaptation with limited resources.

Exam trap

Cisco often tests the distinction between 'fine-tuning' and 'prompt engineering'—the trap here is that candidates mistake in-context learning (Option A) for a fine-tuning method because it adapts behavior, but it does not update model parameters, making it ineligible as a parameter-efficient fine-tuning technique.

How to eliminate wrong answers

Option A is wrong because in-context learning with 50 examples does not modify model weights at all; it relies on the prompt context window, which is limited in length and cannot reliably encode specialized medical terminology for consistent generation, making it a zero-shot/prompting technique, not a fine-tuning method. Option C is wrong because RLHF is a training paradigm that aligns model outputs with human preferences using a reward model, but it is not parameter-efficient—it typically requires full model fine-tuning or at least significant weight updates, and it is designed for alignment, not domain-specific knowledge injection. Option D is wrong because full supervised fine-tuning updates all model weights, which is extremely parameter-inefficient (requires storing and computing gradients for billions of parameters), prone to catastrophic forgetting, and demands substantial computational resources, contradicting the requirement for parameter efficiency.

Practice this question →

101

MCQmedium

A company is using Gemini to generate marketing copy. They want the outputs to be more creative and varied. Which generation parameters should they adjust?

A.Set temperature to 0 and top-k to 1

B.Decrease temperature and increase top-k

C.Increase temperature and adjust top-p to a higher value

D.Increase the context window length

AnswerC

Higher temperature increases randomness; higher top-p allows a wider pool of tokens, boosting creativity.

Why this answer

Option C is correct because increasing temperature raises the randomness of token selection, making outputs more creative and varied, while adjusting top-p to a higher value (e.g., 0.9) allows the model to sample from a larger cumulative probability mass of likely tokens, further increasing diversity. Together, these parameters directly control the stochasticity of generation, which is essential for creative marketing copy.

Exam trap

Cisco often tests the misconception that increasing context length or adjusting a single parameter (like top-k) is sufficient for creativity, when in fact temperature and top-p must be increased together to achieve controlled randomness without generating gibberish.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 and top-k to 1 forces deterministic, greedy decoding (always picking the most probable token), which eliminates creativity and variation entirely. Option B is wrong because decreasing temperature reduces randomness, making outputs more focused and repetitive, which is the opposite of what is needed for creativity. Option D is wrong because increasing the context window length only allows the model to consider more input tokens (e.g., longer prompt or history), but does not affect the randomness or diversity of token selection during generation.

Practice this question →

102

MCQhard

A financial services firm is deploying a generative AI chatbot for customer inquiries. Due to regulatory requirements, all answers must be traceable to specific source documents and must not include information beyond those documents. Which approach BEST satisfies these requirements?

A.Use in-context learning by providing all documents in the prompt each time

B.Use prompt engineering with a strict instruction to only use the provided documents

C.Fine-tune a model on the source documents and use a high temperature for creativity

D.Use RAG with a vector store containing only the approved documents, and enable grounding

AnswerD

RAG retrieves from the approved documents, and grounding links each response to the retrieved sources, ensuring traceability and restricting knowledge.

Why this answer

Option D is correct because Retrieval-Augmented Generation (RAG) with a vector store ensures that the model retrieves content exclusively from the approved source documents, and grounding (e.g., via Azure OpenAI Grounding or AWS Bedrock Knowledge Bases) enforces that the generated response is directly traceable to those retrieved passages. This architecture inherently prevents hallucination or inclusion of external knowledge, satisfying regulatory traceability and scope requirements.

Exam trap

Cisco often tests the misconception that prompt engineering or in-context learning alone can reliably constrain model behavior, when in fact only architectural approaches like RAG with grounding provide the deterministic traceability required for regulated industries.

How to eliminate wrong answers

Option A is wrong because in-context learning with all documents in the prompt is impractical for large document sets due to token limits (e.g., 4K–32K tokens) and does not guarantee the model will not use its internal knowledge; it also lacks a retrieval mechanism to pinpoint specific source passages. Option B is wrong because prompt engineering with a strict instruction is a soft constraint that the model can still violate (e.g., hallucinate or use pre-training knowledge), as it does not enforce retrieval or grounding at the architecture level. Option C is wrong because fine-tuning on source documents with high temperature increases randomness and creativity, which directly contradicts the requirement to avoid generating information beyond the documents; high temperature amplifies the risk of hallucination.

Practice this question →

103

MCQeasy

What is the main advantage of using a model with a larger context window?

A.Better performance on image generation tasks

B.Lower cost per API call

C.Ability to process longer documents or conversations without truncation

D.Faster inference speed

AnswerC

More tokens can be processed in a single forward pass, enabling handling of longer content.

Why this answer

A larger context window allows the model to process and retain more tokens in a single pass, enabling it to handle longer documents, extended conversations, or large codebases without needing to truncate or chunk the input. This is critical for tasks like summarizing entire research papers, maintaining coherent multi-turn dialogues, or analyzing long legal contracts, where preserving full context directly impacts output quality and accuracy.

Exam trap

Cisco often tests the misconception that 'bigger is always better' by pairing a clear benefit (longer context) with attractive but unrelated options like lower cost or faster speed, tempting candidates to conflate model capability with operational efficiency.

How to eliminate wrong answers

Option A is wrong because context window size is a token-length constraint for text and multimodal inputs, not a direct factor in image generation quality; image generation performance depends on model architecture, training data, and diffusion/transformer design, not context length. Option B is wrong because larger context windows typically increase computational cost per API call due to the quadratic scaling of attention mechanisms (e.g., O(n²) in standard transformers), leading to higher latency and cost, not lower. Option D is wrong because processing more tokens in a larger context window generally increases inference time, as the model must attend to a longer sequence; faster inference is achieved through model optimization, quantization, or pruning, not by expanding context length.

Practice this question →

104

MCQmedium

A developer is using Gemini 1.5 Pro and needs to process a 2-hour video to answer questions about its content. The video is stored in Cloud Storage. What is the most efficient approach?

A.Extract frames using Video Intelligence API and then send them as images to Gemini

B.Transcribe the video using Chirp, then analyze the text with Gemini

C.Use a custom model fine-tuned on video understanding tasks

D.Send the video file as part of the prompt to Gemini 1.5 Pro

AnswerD

Gemini 1.5 Pro can directly process video files, understanding both audio and visual content.

Why this answer

Gemini 1.5 Pro supports video input natively; you can pass the video directly (via GCS URI) and ask questions. Transcribing first adds latency and loses visual context.

Practice this question →

105

MCQeasy

What is the key advantage of using adapter-based fine-tuning methods like LoRA compared to full fine-tuning of a large language model?

A.LoRA significantly reduces the number of trainable parameters, making fine-tuning more memory-efficient

B.LoRA is faster at inference time compared to the fully fine-tuned model

C.LoRA eliminates the need for a base model

D.LoRA enables training on a larger dataset than full fine-tuning

AnswerA

LoRA updates only low‑rank matrices, drastically cutting trainable parameters and memory usage while maintaining performance.

Why this answer

LoRA (Low-Rank Adaptation) injects trainable low-rank matrices into the transformer layers while keeping the original model weights frozen. This drastically reduces the number of trainable parameters (often by 10,000x), which lowers GPU memory requirements for storing optimizer states and gradients during training, making fine-tuning feasible on consumer hardware.

Exam trap

Cisco often tests the misconception that parameter-efficient methods like LoRA improve inference speed, when in reality they primarily reduce memory during training and do not accelerate inference.

How to eliminate wrong answers

Option B is wrong because LoRA does not change the inference path; the adapter weights are merged into the base model or applied as a separate forward pass, so inference speed is comparable to or slightly slower than a fully fine-tuned model, not faster. Option C is wrong because LoRA is an additive method that requires the original base model to remain frozen; it cannot function without the base model. Option D is wrong because LoRA does not inherently allow training on a larger dataset; the dataset size is independent of the fine-tuning method, and full fine-tuning can also use large datasets if sufficient memory is available.

Practice this question →

106

MCQmedium

A developer is using the Gemini API to classify customer emails. They want to ensure that the model always returns one of three predefined labels: 'complaint', 'inquiry', or 'feedback'. Which model configuration is MOST appropriate?

A.Set temperature to 1.0 and top-p to 0.9 to allow creativity while constraining via system instructions

B.Fine-tune the model on a dataset of labeled emails to memorize the three classes

C.Use top-k sampling with k=50 and no temperature adjustment

D.Set temperature to 0.0 and use few-shot examples with required labels in the prompt

AnswerD

Low temperature makes the model deterministic. Combined with explicit labels in few-shot examples, it strongly biases output to the allowed set.

Why this answer

Setting temperature to 0.0 makes the model deterministic, minimizing randomness and ensuring consistent output. Combined with few-shot examples that explicitly list the three required labels ('complaint', 'inquiry', 'feedback') in the prompt, this configuration reliably constrains the model to return only those labels, which is the most appropriate approach for a strict classification task.

Exam trap

Cisco often tests the misconception that higher creativity settings (temperature, top-p) are needed for classification tasks, when in fact deterministic settings (temperature 0.0) combined with prompt engineering are the correct approach for strict label constraints.

How to eliminate wrong answers

Option A is wrong because temperature 1.0 and top-p 0.9 maximize randomness and creativity, which is counterproductive for a deterministic classification task where the model must output only three fixed labels. Option B is wrong because fine-tuning on labeled emails would teach the model to generate the labels, but it does not guarantee the model will never output other tokens; fine-tuning is overkill and less reliable than prompt engineering for such a simple constraint. Option C is wrong because top-k sampling with k=50 still introduces randomness and does not force the model to choose only from the three predefined labels; it only limits the pool of candidate tokens to the top 50, which may still include irrelevant tokens.

Practice this question →

107

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Train a custom model from scratch on the policy documents each month

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions based on the latest policy documents without retraining the model. RAG retrieves relevant document chunks from a vector store at query time and injects them into the prompt, enabling the model to use up-to-date information while keeping the underlying LLM static. This avoids the cost and complexity of monthly fine-tuning or retraining.

Exam trap

Cisco often tests the misconception that fine-tuning or retraining is required for domain-specific knowledge, when in fact RAG provides a cost-effective, update-friendly alternative that keeps the model static while dynamically injecting relevant context.

How to eliminate wrong answers

Option A is wrong because fine-tuning a base LLM monthly on updated policy documents is expensive, time-consuming, and risks catastrophic forgetting of previous knowledge; it also requires maintaining separate model versions for each update. Option B is wrong because pasting all documents into each prompt exceeds typical context window limits (e.g., 4K–128K tokens) for large document sets, leading to truncation, high latency, and increased cost per query. Option D is wrong because training a custom model from scratch each month is prohibitively expensive and computationally intensive, requiring massive datasets and GPU resources, and is unnecessary when RAG can achieve the same goal with far less overhead.

Practice this question →

108

MCQhard

An enterprise wants to use Gemini 1.5 Flash for a real-time chat application with low latency. Which trade-off should they expect compared to Gemini 1.5 Pro?

A.Higher quality and lower latency

B.Lower latency but potentially lower quality

C.Lower cost but longer context window

D.Higher accuracy but slower responses

AnswerB

Flash is designed for low latency and lower cost, with some trade-off in quality.

Why this answer

Gemini 1.5 Flash is specifically optimized for lower latency and cost efficiency, making it ideal for real-time chat applications. However, this optimization comes at the cost of reduced model capacity and reasoning depth compared to Gemini 1.5 Pro, which prioritizes higher quality and accuracy over speed. Therefore, the expected trade-off is lower latency but potentially lower quality.

Exam trap

Cisco often tests the misconception that 'faster' automatically means 'better' or that cost and context window are the primary trade-offs, when in reality the core trade-off is between latency and output quality due to model architecture differences.

How to eliminate wrong answers

Option A is wrong because it claims both higher quality and lower latency, which contradicts the fundamental trade-off between model complexity and speed; Flash sacrifices quality for speed. Option C is wrong because while Flash does offer lower cost, it does not provide a longer context window—both Flash and Pro support up to 1 million tokens, so this is not a distinguishing trade-off. Option D is wrong because it describes higher accuracy but slower responses, which is characteristic of Gemini 1.5 Pro, not the trade-off when choosing Flash over Pro.

Practice this question →

109

Multi-Selectmedium

A data scientist wants to improve the performance of a text classification model for customer feedback. They have a small labeled dataset of 500 examples and a large unlabeled corpus of 100,000 feedback messages. Which TWO strategies would be most effective? (Choose 2)

Select 2 answers

A.Increase the context window of the model

B.Apply semi-supervised learning by pseudo-labeling the unlabeled data

C.Use RAG to retrieve similar examples from the unlabeled corpus during inference

D.Train a model from scratch on the labeled data only

E.Use a pre-trained LLM (e.g., Gemini) and fine-tune on the labeled data

AnswersB, E

Semi-supervised learning can use the unlabeled data to improve the model by generating pseudo-labels.

Why this answer

Option B is correct because semi-supervised learning with pseudo-labeling leverages the large unlabeled corpus to augment the small labeled dataset. The model first trains on the 500 labeled examples, then generates pseudo-labels for the unlabeled data, and retrains on the combined set, effectively increasing the training signal without requiring manual annotation.

Exam trap

Cisco often tests the misconception that RAG is a universal solution for any data scarcity problem, but in this context, RAG does not improve the model's training signal and is instead used for retrieval during inference, not for semi-supervised learning.

Practice this question →

110

MCQmedium

A research team wants to use Google's AI to generate video content from text prompts for a creative project. Which Google Cloud generative AI model should they use?

A.Imagen

B.Codey

C.Veo

D.Gemini

AnswerC

Veo is a generative video model that can create videos from text and image prompts.

Why this answer

Veo is Google's video generation model. Imagen generates images, Gemini is multimodal but not primarily for video generation, and Codey is for code.

Practice this question →

111

MCQhard

An enterprise deploys a generative AI chatbot that must comply with GDPR right to deletion. Users can request deletion of their personal data. The chatbot uses a RAG pipeline with a vector database. What is the MOST effective way to handle deletion requests?

A.Delete the user's documents from the vector index and original storage, then rebuild the index

B.Update the user's records in the vector index with anonymized placeholders

C.Add a filter to the chat application to block the user's name from appearing in responses

D.Retrain the LLM from scratch without the user's data

AnswerA

This removes the data from retrieval so the chatbot cannot access it, satisfying GDPR deletion requirements.

Why this answer

GDPR requires that personal data be erased when requested. In a RAG system, the source documents must be deleted from the vector store and the original storage. Fine-tuning or model training is not the right approach.

Practice this question →

112

Multi-Selectmedium

A data scientist is building a RAG pipeline for a legal document retrieval system. Which TWO components are essential for this system? (Select two.)

Select 2 answers

A.A large language model for final response generation

B.A vector database for similarity search

C.A fine-tuned LLM for generation

D.An embedding model to vectorize documents

E.A diffusion model for document generation

AnswersB, D

Vector database stores and retrieves document embeddings efficiently.

Why this answer

Option B is correct because a vector database is essential for storing and efficiently retrieving document embeddings via similarity search (e.g., cosine similarity or Euclidean distance), which is the core retrieval mechanism in a RAG pipeline. Without it, the system cannot quickly find the most relevant document chunks to augment the LLM's context.

Exam trap

Cisco often tests the misconception that a fine-tuned LLM is required for RAG, when in fact the essential components are the embedding model and vector database for retrieval, while the generator can be any pre-trained LLM.

Practice this question →

113

MCQeasy

Which of the following best describes how large language models (LLMs) generate text?

A.They retrieve the most similar text from a database and return it

B.They use a rule-based grammar engine to construct sentences

C.They predict the next token in a sequence based on the preceding tokens

D.They randomly select words from a fixed vocabulary

AnswerC

LLMs use the transformer architecture to model the probability distribution of the next token given the context.

Why this answer

LLMs are trained to predict the next token given the preceding tokens. During inference, they generate one token at a time autoregressively.

Practice this question →

114

Multi-Selectmedium

A media company wants to generate video content from text descriptions. They need a Google Cloud solution that can produce high-quality videos with realistic motion. Which TWO services should they consider?

Select 1 answer

A.Chirp

B.Codey

C.Gemini (multimodal)

D.Veo

E.Imagen

AnswersD

Veo is Google's generative video model that creates videos from text prompts.

Why this answer

Veo is Google Cloud's advanced video generation model that creates high-quality videos with realistic motion from text descriptions. It leverages diffusion transformers and temporal coherence techniques to ensure smooth, lifelike movement across frames, making it the correct choice for this use case.

Exam trap

The trap here is that candidates may confuse Imagen (text-to-image) or Gemini (multimodal) as capable of video generation, but only Veo is purpose-built for high-quality video synthesis with realistic motion in Google Cloud's generative AI portfolio.

Practice this question →

115

MCQhard

An organization needs to deploy a generative AI application with strict compliance requirements, including data residency and auditability of model decisions. Which Google Cloud feature should they prioritize?

A.Colab Enterprise

B.Gemini API

C.Vertex AI

D.Model Garden

AnswerC

Vertex AI offers deployment options with data residency, audit logging, and governance features.

Why this answer

Vertex AI provides enterprise controls including data residency options, audit logs, and model monitoring for compliance. Other options are important but do not directly address data residency and auditability comprehensively.

Practice this question →

116

MCQhard

An organization is building a RAG system using Vertex AI Vector Search. They notice that the retrieved documents are not relevant to the user's query. What is the most likely cause?

A.The context window of the LLM is too small

B.The embedding model used does not capture the semantic meaning of the documents effectively

C.The chunk size of the documents is too large

D.The temperature setting in the LLM is too high

AnswerB

Poor embeddings lead to poor similarity matching.

Why this answer

The most likely cause is that the embedding model fails to map the semantic meaning of the documents and queries into a shared vector space effectively. In Vertex AI Vector Search, retrieval quality depends entirely on the cosine similarity between query and document embeddings; if the embeddings are poor, even a perfect vector index will return irrelevant results.

Exam trap

Cisco often tests the distinction between retrieval-stage failures (embedding quality) and generation-stage parameters (temperature, context window), leading candidates to incorrectly blame LLM settings for poor retrieval results.

How to eliminate wrong answers

Option A is wrong because the context window size affects how much of the retrieved text the LLM can process, not the relevance of the retrieved documents themselves. Option C is wrong because chunk size impacts granularity and potential information loss, but the primary cause of irrelevant retrieval is poor embedding quality, not chunk size alone. Option D is wrong because temperature controls the randomness of the LLM's response generation, not the retrieval step; it has no effect on which documents are fetched from the vector index.

Practice this question →

117

MCQmedium

A company wants to use Gemini to process invoices that contain both text and images (scanned documents). The invoices vary in layout. Which Gemini model version should they use?

A.PaLM 2

B.Gemini 1.5 Pro

C.Gemini 1.0 Pro

D.Gemini Nano

AnswerB

Gemini 1.5 Pro handles multimodal inputs and large context windows, perfect for varied invoice layouts.

Why this answer

Gemini 1.5 Pro is the correct choice because it is a multimodal model capable of processing both text and images (scanned documents) natively, and its long context window (up to 1 million tokens) allows it to handle invoices with varying layouts without requiring preprocessing or layout-specific training. This version excels at understanding mixed-format documents, making it ideal for invoice processing where text and visual elements like tables and logos must be interpreted together.

Exam trap

Cisco often tests the misconception that any multimodal model (like Gemini 1.0 Pro) is sufficient for complex document processing, but the trap here is that candidates overlook the importance of the long context window and advanced multimodal reasoning in Gemini 1.5 Pro for handling variable-layout invoices, leading them to choose a less capable version.

How to eliminate wrong answers

Option A is wrong because PaLM 2 is a text-only large language model that cannot process images or scanned documents, lacking the multimodal capabilities required for this use case. Option C is wrong because Gemini 1.0 Pro, while multimodal, has a shorter context window and less robust handling of complex, variable-layout documents compared to Gemini 1.5 Pro, which offers superior performance for mixed-format invoice processing. Option D is wrong because Gemini Nano is designed for on-device, lightweight tasks with limited context and multimodal capabilities, making it unsuitable for enterprise-grade invoice processing that requires handling diverse layouts and high accuracy.

Practice this question →

118

MCQeasy

What is the primary benefit of using embeddings and vector search in a generative AI application?

A.They improve the model's ability to generate code

B.They reduce the size of the model by compressing weights

C.They enable efficient retrieval of semantically similar content

D.They allow the model to process images directly

AnswerC

Embeddings allow similarity search in vector space, enabling RAG and other retrieval tasks.

Why this answer

Option C is correct because embeddings convert text into dense vector representations that capture semantic meaning, and vector search enables efficient retrieval of semantically similar content by finding nearest neighbors in vector space. This retrieval-augmented generation (RAG) approach grounds the generative AI model in relevant external knowledge, improving accuracy and reducing hallucinations without retraining.

Exam trap

The trap here is that candidates confuse embeddings and vector search with model optimization or multimodal capabilities, when in fact they are a retrieval mechanism for grounding generative outputs in external knowledge.

How to eliminate wrong answers

Option A is wrong because embeddings and vector search are not specifically designed to improve code generation; they enhance retrieval of any text or data, but code generation benefits more from specialized training data and fine-tuning. Option B is wrong because embeddings and vector search do not reduce model size or compress weights; they operate on the input/output side, storing vectors separately, while model compression is achieved through techniques like pruning or quantization. Option D is wrong because embeddings and vector search primarily handle text or other data types via vector representations, not direct image processing; images require separate vision encoders or multimodal models to be processed directly.

Practice this question →

119

MCQeasy

What is the key advantage of using vector search for retrieval in a RAG system compared to keyword search?

A.Vector search eliminates the need for a foundation model

B.Vector search can find conceptually similar documents even without exact keyword matches

C.Vector search is faster than keyword search

D.Vector search requires no preprocessing of documents

AnswerB

Embeddings represent meaning, so vector search retrieves documents that are semantically related, not just keyword matches.

Why this answer

Vector search in a RAG system encodes documents and queries into dense vector embeddings using a foundation model, then retrieves documents based on semantic similarity in the embedding space. This allows it to find conceptually related documents even when they share no exact keywords with the query, overcoming the lexical gap that limits keyword search.

Exam trap

Cisco often tests the misconception that vector search is faster than keyword search, but the trap is that while vector search excels at semantic matching, it incurs higher latency and computational overhead compared to the simple inverted index lookup of keyword search.

How to eliminate wrong answers

Option A is wrong because vector search actually requires a foundation model (or an embedding model) to generate the vector representations; it does not eliminate the need for one. Option C is wrong because vector search is generally slower than keyword search due to the computational cost of embedding generation and approximate nearest neighbor (ANN) search, though it offers better recall. Option D is wrong because vector search requires preprocessing of documents to generate and store embeddings, which is a significant upfront step.

Practice this question →

120

MCQhard

A financial services firm wants to use Gemini to analyze customer support transcripts and generate summaries. Compliance requires that the model never output any personally identifiable information (PII). Which combination of techniques should they implement?

A.Configure Gemini safety settings to block PII and use a separate PII detection API for post-processing

B.Fine-tune Gemini on redacted transcripts and rely on the model to not generate PII

C.Use a smaller model that has never seen PII in training

D.Only use prompt instructions telling the model to avoid PII

AnswerA

Safety filters reduce PII in outputs, and a post-processing API (e.g., DLP) redacts any remaining PII.

Why this answer

Using Gemini with safety filters and a post-processing step to redact PII provides defense in depth. Fine-tuning on redacted data might not cover all cases, and prompt instructions alone are not reliable.

Practice this question →

121

MCQmedium

A data scientist is fine-tuning a large language model for a specialized domain using limited labeled data. To avoid catastrophic forgetting and reduce computational cost, which approach is recommended?

A.Using prompt engineering with in-context learning

B.Full fine-tuning of all model parameters

C.Training a new model from scratch on the domain data

D.Adapter-based fine-tuning using LoRA

AnswerD

LoRA is parameter-efficient, cost-effective, and reduces forgetting.

Why this answer

Adapter-based fine-tuning using LoRA (Low-Rank Adaptation) is recommended because it freezes the pre-trained model weights and injects trainable low-rank matrices into the transformer layers. This approach drastically reduces the number of parameters to update (often by 10,000x), lowering memory and compute requirements, while preserving the original knowledge to prevent catastrophic forgetting on limited domain data.

Exam trap

Cisco often tests the misconception that prompt engineering (Option A) is a form of fine-tuning, when in fact it is a zero-shot or few-shot inference technique that does not modify model parameters, making it unsuitable for persistent domain adaptation with limited labeled data.

How to eliminate wrong answers

Option A is wrong because prompt engineering with in-context learning does not update model weights, so it cannot adapt the model to a specialized domain with limited labeled data in a persistent manner; it relies on the model's existing knowledge and context window, which is insufficient for deep domain adaptation. Option B is wrong because full fine-tuning of all model parameters updates the entire model, which is computationally expensive and, with limited labeled data, risks catastrophic forgetting of the original pre-trained knowledge. Option C is wrong because training a new model from scratch on domain data requires a massive amount of labeled data and compute resources, defeating the purpose of leveraging a pre-trained LLM and reducing cost.

Practice this question →

122

MCQeasy

What is the primary purpose of the transformer architecture in large language models (LLMs)?

A.To generate images from text descriptions

B.To enable parallel processing of tokens and capture long-range dependencies through self-attention

C.To convert text into numerical embeddings for downstream tasks

D.To store and retrieve information from a vector database

AnswerB

Self-attention allows each token to attend to all others, enabling parallelism and long-range context.

Why this answer

The transformer architecture's primary purpose is to enable parallel processing of all tokens in a sequence while capturing long-range dependencies through its self-attention mechanism. Unlike recurrent neural networks (RNNs) that process tokens sequentially, transformers compute attention scores between every pair of tokens simultaneously, allowing the model to weigh the relevance of distant tokens without the vanishing gradient problem. This parallelization and global context capture are the foundational innovations that make large language models (LLMs) scalable and effective for tasks like text generation and understanding.

Exam trap

The trap here is that candidates confuse the transformer's core innovation (parallel self-attention for sequence modeling) with auxiliary tasks like embedding generation or retrieval, which are separate components in an LLM pipeline, leading them to pick C or D as plausible but incorrect answers.

How to eliminate wrong answers

Option A is wrong because generating images from text descriptions is the domain of multimodal models like DALL·E or Stable Diffusion, which use diffusion or GAN architectures, not the transformer's primary purpose. Option C is wrong because converting text into numerical embeddings is a preprocessing step (e.g., tokenization and embedding layers) that occurs before the transformer processes tokens, but the transformer's core role is to model relationships between those embeddings via self-attention, not merely to create embeddings. Option D is wrong because storing and retrieving information from a vector database is a retrieval-augmented generation (RAG) technique that supplements LLMs with external knowledge; the transformer architecture itself does not inherently function as a database.

Practice this question →