Knowledge + Practice

CCNA Oci Genai Llm Fundamentals Questions

75 of 145 questions · Page 1/2 · Oci Genai Llm Fundamentals topic · Answers revealed

Practice these questions Exam hub All questions

1

Multi-Selectmedium

An enterprise is building a document Q&A application with OCI Generative AI. They want to minimize hallucinations. Which TWO techniques should they implement? (Choose two.)

Select 2 answers

A.Fine-tune the model on domain-specific data

B.Increase the model's context window to its maximum

C.Use Retrieval-Augmented Generation (RAG)

D.Use a higher temperature in sampling

E.Apply greedy decoding

AnswersA, C

Fine-tuning improves factual accuracy for the domain, reducing hallucinations.

Why this answer

RAG grounds answers in retrieved documents, and fine-tuning on domain-specific data reduces errors. Increasing context window alone does not reduce hallucinations; greedy decoding reduces creativity but not factual errors.

Practice this question →

2

MCQhard

A data scientist is comparing BLEU, ROUGE, and BERTScore to evaluate a summarization model. The client cares most about whether the summary captures all key facts from the source document. Which metric is most aligned with this requirement?

A.Perplexity

B.BERTScore

C.BLEU

D.ROUGE

AnswerD

ROUGE recall (e.g., ROUGE-1, ROUGE-L) measures overlap with the reference, directly indicating how many key facts are captured.

Why this answer

ROUGE is recall-oriented, measuring how much of the reference content appears in the generated summary, which aligns with capturing key facts. BLEU is precision-oriented, and BERTScore measures semantic similarity but is not specifically recall-focused.

Practice this question →

3

MCQeasy

Which of the following is a known limitation of large language models where the model generates plausible-sounding but factually incorrect information?

A.Bias

B.Context window limitation

C.Hallucination

D.Knowledge cutoff

AnswerC

Hallucination describes the generation of plausible but false information.

Why this answer

Hallucination is the term used when an LLM produces content that is not grounded in training data or provided context, appearing confident but being incorrect.

Practice this question →

4

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Fine-tune a base LLM on the policy documents monthly

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

5

MCQhard

When evaluating a summarization model, the team notices that the ROUGE-L score is high but human evaluators rate the summaries poorly for coherence. What does this discrepancy MOST likely indicate?

A.The model is suffering from hallucination

B.The reference summaries are too short

C.ROUGE-L is not sensitive to coherence and fluency, only to n-gram overlap

D.The human evaluators are biased

AnswerC

ROUGE-L measures longest common subsequence; it does not evaluate readability or logical flow.

Why this answer

ROUGE-L measures longest common subsequence overlap, focusing on word order recall. It does not capture coherence, factual consistency, or fluency. High ROUGE-L with low human scores suggests the summaries are lexically similar but not well-formed.

Practice this question →

6

MCQmedium

A developer is using the OCI Generative AI service and notices that the cost per API call is higher than expected. Which factor contributes MOST to the cost of an LLM inference call?

A.Number of input and output tokens

B.Model size (number of parameters)

C.Context window size

D.Temperature setting

AnswerA

Pricing is usually based on token usage; more tokens mean higher cost.

Why this answer

Most LLM APIs charge based on the number of input and output tokens. The token count directly affects cost. Model size, context window, and temperature are related but the direct billing metric is token count.

Practice this question →

7

MCQmedium

A developer is implementing a text generation pipeline using OCI Generative AI and needs to produce diverse, creative outputs for a marketing campaign. Which sampling strategy should they choose?

A.Beam search

B.Temperature sampling with temperature=0.0

C.Top-p sampling with p=0.9

D.Greedy decoding

AnswerC

Top-p sampling introduces controlled randomness, making outputs more diverse and creative while maintaining coherence.

Why this answer

Top-p (nucleus) sampling selects from a dynamically chosen set of tokens whose cumulative probability exceeds p, allowing diversity while avoiding the long tail of improbable tokens.

Practice this question →

8

MCQmedium

An OCI user wants to generate embeddings for a large corpus of technical documents to enable semantic search. Which type of model should they use?

A.A summarization model

B.A classification model

C.A generation model like Cohere Command

D.An embedding model like Cohere Embed

AnswerD

Cohere Embed is designed to create dense vector embeddings that represent the semantic meaning of text, ideal for semantic search.

Why this answer

Embedding models are specifically designed to produce dense vector representations that capture semantic meaning. They are distinct from generation models. For semantic search, embeddings from an embedding model are compared using cosine similarity.

Practice this question →

9

MCQhard

A team is deploying a chatbot that must never output harmful or biased statements. They plan to use a pre-trained LLM with in-context learning. Which additional measure is MOST effective at reducing harmful outputs without retraining?

A.Apply a larger context window to include more safety instructions

B.Fine-tune the model on a curated dataset of safe dialogues

C.Use beam search with a high beam width

D.Include few-shot examples in the system prompt that demonstrate appropriate responses

AnswerD

In-context learning with few-shot examples can bias the model toward desired behavior without any model update.

Why this answer

Providing few-shot examples of desired behavior in the system prompt (in-context learning) can guide the model toward safe responses. Fine-tuning would require retraining, and prompt engineering is broader; few-shot examples are a specific, effective technique.

Practice this question →

10

Multi-Selecthard

An OCI customer is deploying a chatbot using a pre-trained LLM. They are concerned about the model generating biased or harmful content. Which TWO strategies should they implement as part of their responsible AI approach? (Choose two.)

Select 2 answers

A.Train the model from scratch on a curated dataset

B.Increase the context window to include more examples

C.Set up a human-in-the-loop review for sensitive queries

D.Use top-k sampling with k=1

E.Implement a content filtering layer to detect and block harmful outputs

AnswersC, E

Human review ensures oversight for high-risk interactions.

Why this answer

Content filtering and human review are direct mitigations. Training from scratch is impractical; modifying sampling does not address bias; increasing context window is irrelevant.

Practice this question →

11

MCQhard

An organization is evaluating two LLMs for a code generation task. Model A has a perplexity of 1.5 on the validation set, and Model B has a perplexity of 3.0. However, Model A generates more syntactically incorrect code. Which conclusion is MOST valid?

A.Model A's perplexity indicates overfitting, so Model B is preferable

B.Model B is better because higher perplexity correlates with more diverse outputs

C.Perplexity alone is insufficient; evaluate with task-specific metrics like BLEU or human review

D.Model A is better because lower perplexity always indicates higher quality outputs

AnswerC

Perplexity does not capture correctness, fluency, or functional validity. Code generation requires additional metrics or manual inspection.

Why this answer

Perplexity measures how well a language model predicts a sequence, but it does not directly assess code correctness or syntactic validity. Model A's lower perplexity (1.5) indicates it is more confident in its predictions, yet it produces more syntactically incorrect code, showing that perplexity alone is not a reliable indicator of code generation quality. Task-specific metrics like BLEU (for n-gram overlap) or human review are necessary to evaluate functional and syntactic correctness, making option C the most valid conclusion.

Exam trap

Cisco often tests the misconception that lower perplexity is always better, but the trap here is that perplexity measures prediction confidence, not task-specific quality like code syntax or correctness.

How to eliminate wrong answers

Option A is wrong because lower perplexity does not necessarily indicate overfitting; overfitting is diagnosed by a large gap between training and validation perplexity, not by the absolute value. Option B is wrong because higher perplexity does not correlate with better or more diverse outputs in a way that guarantees code quality; it simply indicates higher uncertainty in predictions. Option D is wrong because lower perplexity does not always indicate higher quality outputs, as demonstrated by Model A generating more syntactically incorrect code despite lower perplexity.

Practice this question →

12

MCQeasy

Which of the following is a known limitation of large language models that Retrieval-Augmented Generation (RAG) aims to address?

A.High cost of training

B.Slow inference speed

C.Hallucination of facts

D.Lack of multilingual support

AnswerC

RAG provides external knowledge to reduce fabricated content.

Why this answer

RAG directly addresses the hallucination problem by grounding LLM outputs in retrieved, factual documents from an external knowledge base. Instead of relying solely on the model's parametric memory, RAG fetches relevant context at inference time, reducing the likelihood of generating plausible-sounding but incorrect facts.

Exam trap

Cisco often tests the misconception that RAG improves inference speed or reduces training cost, when in fact its primary purpose is to enhance factual accuracy by mitigating hallucinations through external knowledge retrieval.

How to eliminate wrong answers

Option A is wrong because RAG does not reduce training cost; in fact, it adds a retrieval component that may increase infrastructure cost, while the core LLM still requires expensive pre-training. Option B is wrong because RAG typically increases inference latency due to the additional retrieval step (e.g., vector database lookup and document re-ranking), so it does not address slow inference speed. Option D is wrong because RAG is language-agnostic and can be applied to any language supported by the retriever and LLM; it does not specifically target or solve lack of multilingual support.

Practice this question →

13

MCQeasy

Which of the following is a key limitation of large language models that RAG (Retrieval-Augmented Generation) aims to address?

A.Hallucinations (factual errors)

B.Context length constraints

C.Bias in training data

D.Knowledge cutoff date

AnswerA

RAG retrieves factual documents from a knowledge base and provides them as context, significantly reducing the likelihood of the model generating incorrect facts.

Why this answer

RAG addresses hallucinations by grounding the model's output in retrieved documents that contain factual information. Knowledge cutoff, bias, and context length constraints are separate issues that RAG may partially help with, but its primary purpose is to reduce factual errors.

Practice this question →

14

MCQhard

An OCI Generative AI practitioner observes that a Cohere Command model generates responses with outdated information about a recent event. The model was fine-tuned six months ago. Which technique should be applied to incorporate new knowledge without retraining the model?

A.Use a longer context window and include all new articles in the prompt

B.Fine-tune the model again with the new data

C.Implement a RAG pipeline that indexes the latest documents into a vector store and retrieves relevant passages at query time

D.Increase the temperature parameter to encourage more creative outputs

AnswerC

RAG solves knowledge staleness without retraining.

Why this answer

RAG retrieves relevant, current documents at inference time, providing up-to-date context without modifying the model parameters.

Practice this question →

15

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Fine-tune a base LLM on the policy documents monthly

D.Use a larger foundation model with a longer context window and paste all documents into each prompt

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

16

MCQhard

A team is building a code generation assistant using OCI Generative AI. They notice that the model occasionally produces code with subtle security vulnerabilities. Which approach would most effectively reduce this risk without compromising the assistant's usefulness?

A.Use a larger context window to include all project files in every prompt

B.Switch to a model with more parameters

C.Use greedy decoding to reduce randomness in code generation

D.Fine-tune the model on a dataset of secure code examples and security best practices

AnswerD

Fine-tuning on secure examples helps the model learn to generate safer code by adjusting its weights.

Why this answer

Fine-tuning on a curated dataset of secure code examples can teach the model to avoid common vulnerability patterns while retaining its general coding ability. RAG with security docs could also help, but fine-tuning directly addresses the model's behavior more comprehensively.

Practice this question →

17

MCQhard

A researcher is comparing BLEU and ROUGE scores for a machine translation model. They notice that the BLEU score is high but the ROUGE score is low. Which scenario is MOST consistent with this observation?

A.The model outputs very long and verbose translations

B.The model outputs concise translations that capture key words but miss some reference phrases

C.The model is overfitting to the training data

D.The reference translations are of poor quality

AnswerB

Concise translations achieve high precision (high BLEU) but low recall (low ROUGE) because they miss some n-grams from the reference.

Why this answer

High BLEU and low ROUGE indicate that the generated text has high precision (many n-grams match reference) but low recall (missing many reference n-grams). This often occurs when the output is short or overly cautious.

Practice this question →

18

MCQeasy

Which of the following best describes the difference between an encoder-only model (e.g., BERT) and a decoder-only model (e.g., GPT)?

A.Encoder-only uses bidirectional attention and is suited for classification or NER; decoder-only uses causal attention and is suited for text generation

B.Encoder-only is trained for text generation; decoder-only is trained for classification

C.Both use the same attention pattern but differ in number of layers

D.Encoder-only uses causal attention; decoder-only uses bidirectional attention

AnswerA

Correct distinction between the two architectures.

Why this answer

Option A is correct because encoder-only models like BERT employ bidirectional attention, allowing each token to attend to all other tokens in both directions, which is ideal for tasks requiring full context understanding such as classification or named entity recognition (NER). In contrast, decoder-only models like GPT use causal (masked) attention, where each token can only attend to previous tokens, making them suitable for autoregressive text generation.

Exam trap

Cisco often tests the reversal of attention patterns (bidirectional vs. causal) as a common trap, leading candidates to confuse which architecture is suited for generation versus understanding tasks.

How to eliminate wrong answers

Option B is wrong because encoder-only models are not trained for text generation; they are typically trained for understanding tasks (e.g., masked language modeling) and fine-tuned for classification or NER, while decoder-only models are trained for text generation via autoregressive language modeling. Option C is wrong because encoder-only and decoder-only models use fundamentally different attention patterns (bidirectional vs. causal), not the same pattern with differing layer counts. Option D is wrong because it reverses the attention patterns: encoder-only uses bidirectional attention, not causal, and decoder-only uses causal attention, not bidirectional.

Practice this question →

19

Multi-Selectmedium

Which THREE of the following are known limitations of LLMs that practitioners must account for?

Select 3 answers

A.Knowledge cutoff: the model only knows information up to its training data date

B.Bias: training data may contain societal biases that the model can amplify

C.Hallucination: generating plausible-sounding but factually incorrect information

D.Inability to produce creative text

E.Complete lack of understanding of language syntax

AnswersA, B, C

LLMs have no inherent knowledge of events after their training cutoff.

Why this answer

LLMs can produce hallucinations (factual errors), have a knowledge cutoff date, and can exhibit bias from training data. They do not inherently lack creativity, and context length is a limitation but not a 'lack of understanding'.

Practice this question →

20

MCQeasy

Which of the following best describes the difference between pre-training and fine-tuning?

A.Pre-training uses labeled data; fine-tuning uses unlabeled data

B.Pre-training requires a GPU; fine-tuning can be done on CPU

C.Pre-training trains a model from scratch on a general corpus; fine-tuning adapts the model to a specific task

D.Pre-training is only for encoder models; fine-tuning is only for decoder models

AnswerC

Pre-training is the initial phase, fine-tuning is the task-specific phase.

Why this answer

Pre-training involves training a large language model from scratch on a vast, general corpus of unlabeled data to learn language patterns, grammar, and world knowledge. Fine-tuning then takes this pre-trained model and further trains it on a smaller, labeled dataset specific to a downstream task, such as sentiment analysis or question answering, adapting the model's weights for that particular objective. Option C correctly captures this fundamental distinction in the LLM workflow.

Exam trap

Cisco often tests the misconception that pre-training uses labeled data (like supervised learning) and fine-tuning uses unlabeled data, reversing the actual roles of data types in these phases.

How to eliminate wrong answers

Option A is wrong because pre-training typically uses unlabeled data in a self-supervised manner (e.g., masked language modeling), while fine-tuning uses labeled data for supervised learning on a specific task. Option B is wrong because both pre-training and fine-tuning can be performed on GPUs or CPUs, though pre-training is computationally intensive and almost always requires GPUs; the statement that fine-tuning can only be done on CPU is false. Option D is wrong because both encoder models (e.g., BERT) and decoder models (e.g., GPT) can undergo pre-training and fine-tuning; the distinction is not tied to architecture type.

Practice this question →

21

MCQmedium

Which of the following sampling strategies is most likely to produce the most diverse and creative text?

A.Beam search with width 5

B.Greedy decoding

C.Top-p sampling with p=0.1

D.Temperature sampling with temperature=1.2

AnswerD

High temperature flattens the distribution, increasing diversity.

Why this answer

Temperature sampling with a higher temperature (>1) increases the probability of less likely tokens, promoting creativity and diversity.

Practice this question →

22

MCQmedium

When using an LLM for code generation, a developer notices the model occasionally produces syntactically incorrect code. Which approach is most likely to reduce syntax errors while still allowing diverse output?

A.Use top-k sampling with k=100

B.Increase the context window size

C.Set temperature to 0 and use greedy decoding

D.Increase the temperature to 1.5

AnswerC

Greedy decoding (temperature=0) is deterministic and lowers syntax errors.

Why this answer

Lowering temperature reduces randomness, making outputs more deterministic and less prone to errors, while still allowing some variation.

Practice this question →

23

MCQeasy

An organization wants to use an LLM for translation tasks but is concerned about data privacy and wants to keep all data within OCI. Which model family is natively available in OCI Generative AI service?

A.Gemini from Google

B.GPT-4 from OpenAI

C.Claude from Anthropic

D.Oracle's own models (e.g., OCI Generative AI models)

AnswerD

OCI Generative AI service includes Oracle's own models along with Cohere, Llama, etc.

Why this answer

Oracle's OCI Generative AI service natively provides Oracle's own LLM models, such as the OCI Generative AI models, which are hosted entirely within OCI's infrastructure. This ensures that all data processing and storage remain within OCI, addressing the organization's data privacy concerns. Third-party models like Gemini, GPT-4, and Claude are not natively available in OCI Generative AI and would require external API calls, violating the data residency requirement.

Exam trap

Cisco often tests the misconception that OCI Generative AI supports popular third-party models like GPT-4 or Gemini, but the correct answer is that only Oracle's own models are natively available to guarantee data privacy within OCI.

How to eliminate wrong answers

Option A is wrong because Gemini from Google is a third-party model not natively hosted within OCI Generative AI; using it would require data to leave OCI, breaking data privacy. Option B is wrong because GPT-4 from OpenAI is an external model accessed via API, which would transmit data outside OCI, conflicting with the requirement to keep all data within OCI. Option C is wrong because Claude from Anthropic is also a third-party model not natively integrated into OCI Generative AI, necessitating external data transfer.

Practice this question →

24

Multi-Selectmedium

A data scientist is evaluating an LLM for a summarization task. They have a set of human-written reference summaries. Which THREE metrics are commonly used to evaluate summarization quality? (Choose three.)

Select 3 answers

A.BLEU

B.Cosine similarity

C.Perplexity

D.BERTScore

E.ROUGE

AnswersA, D, E

BLEU is precision-oriented and often used alongside ROUGE.

Why this answer

ROUGE, BLEU, and BERTScore are all used for summarization evaluation. Perplexity measures model confidence, and cosine similarity is for embedding comparison.

Practice this question →

25

MCQmedium

Which of the following best describes the difference between an embedding model and a generation model?

A.Embedding models cannot be used with RAG; generation models can

B.Embedding models output a single vector per input; generation models output a sequence of tokens

C.Embedding models are used only for classification; generation models are used only for translation

D.Embedding models require fine-tuning; generation models do not

AnswerB

Embedding models map text to a fixed-size vector; generation models produce variable-length token sequences.

Why this answer

Option B is correct because embedding models transform input data (text, images, etc.) into a fixed-size vector representation (a single vector per input) that captures semantic meaning, while generation models (like GPT or LLaMA) produce variable-length sequences of tokens as output, predicting one token at a time. This fundamental architectural difference defines their distinct roles: embeddings for similarity search and retrieval, generation for producing coherent text.

Exam trap

Cisco often tests the misconception that embedding models are only for classification or that they cannot be used in RAG, when in fact they are essential for the retrieval step, and the key differentiator is the output type: a single vector versus a sequence of tokens.

How to eliminate wrong answers

Option A is wrong because embedding models are actually a core component of RAG (Retrieval-Augmented Generation), used to encode documents and queries into vectors for retrieval; generation models then use the retrieved context to produce answers. Option C is wrong because embedding models are not limited to classification—they are used for clustering, semantic search, and as input to other models—and generation models are not limited to translation; they handle summarization, question answering, creative writing, and more. Option D is wrong because both embedding models and generation models can be used without fine-tuning (e.g., via API calls or pre-trained weights), and fine-tuning is optional for both depending on the use case.

Practice this question →

26

Multi-Selectmedium

A developer is building a text generation application using OCI Generative AI and wants to control the creativity of the output. Which THREE sampling parameters can they adjust? (Choose three.)

Select 3 answers

A.Top-k

B.Top-p

C.Max tokens

D.Temperature

E.Beam search width

AnswersA, B, D

Top-k limits the sampling pool to the k most likely tokens, controlling diversity.

Why this answer

Temperature scales the logits before softmax. Top-k limits the sampling pool to the k most likely tokens. Top-p (nucleus) sampling selects from tokens whose cumulative probability exceeds p.

Beam search is a decoding strategy (not a sampling parameter), max tokens controls output length, and frequency penalty reduces repetition.

Practice this question →

27

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Fine-tune a base LLM on the policy documents monthly

C.Use a larger foundation model with a longer context window and paste all documents into each prompt

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

28

Multi-Selectmedium

A data scientist is evaluating an LLM's performance on a summarization task. Which TWO metrics are most suitable for this evaluation?

Select 2 answers

A.Perplexity

B.Human evaluation

C.BLEU

D.ROUGE

E.BERTScore

AnswersD, E

ROUGE is specifically designed for summarization evaluation, measuring n-gram overlap and recall.

Why this answer

ROUGE measures recall-oriented overlap between generated and reference summaries, suitable for summarization. BERTScore uses semantic similarity via embeddings. BLEU is for translation, perplexity for language modeling, and human evaluation is qualitative.

Practice this question →

29

MCQmedium

A team is deploying an LLM-based application that must adhere to strict data residency requirements. All processing must occur within a specific OCI region. Which OCI service should they use to host and serve the LLM?

A.OCI Streaming

B.OCI Functions

C.OCI Object Storage

D.OCI Data Science with a model deployment endpoint

AnswerD

OCI Data Science provides model deployment capabilities within a chosen region, ensuring data stays in that region.

Why this answer

OCI Data Science with a model deployment endpoint is the correct choice because it provides a managed infrastructure for hosting and serving LLMs within a specific OCI region, ensuring all processing and data remain within that region to meet strict data residency requirements. The model deployment endpoint runs on dedicated compute resources in the chosen region, allowing low-latency inference while adhering to regional data boundaries.

Exam trap

Cisco often tests the distinction between storage, compute, and serving services, and the trap here is that candidates may confuse OCI Object Storage (which stores model artifacts) with the actual serving infrastructure needed to run inference, or assume OCI Functions can handle LLM workloads despite its stateless, memory-constrained design.

How to eliminate wrong answers

Option A is wrong because OCI Streaming is a real-time data ingestion and processing service for streaming data (e.g., logs, events), not designed to host or serve LLMs; it lacks the compute and inference capabilities required for model serving. Option B is wrong because OCI Functions is a serverless compute service for running stateless code snippets (functions) in response to events, but it is not optimized for hosting large language models due to cold start latency, limited memory, and lack of GPU support. Option C is wrong because OCI Object Storage is a scalable storage service for unstructured data (e.g., model artifacts, datasets), but it cannot serve inference requests or run compute workloads; it is used to store model files, not to host or serve the LLM.

Practice this question →

30

Multi-Selecteasy

Which TWO of the following are true about positional encoding in transformer models?

Select 2 answers

A.It is learned independently for each position in the training data

B.It provides the model with information about the order of tokens in a sequence

C.It is added to the token embeddings before passing through the transformer layers

D.It replaces the need for self-attention

E.It is only used in encoder-only architectures like BERT

AnswersB, C

Since self-attention is permutation-invariant, positional encodings are necessary to capture sequence order.

Why this answer

Positional encoding provides information about token order and is added to the input embeddings. It is used in encoder-decoder and decoder-only models. It does not replace attention, and it is not learned for each token position in the original formulation (it uses fixed sine/cosine functions).

Practice this question →

31

Multi-Selecthard

An OCI user wants to reduce the cost of running a generative AI model while maintaining output quality. Which THREE strategies can help achieve this?

Select 3 answers

A.Use greedy decoding instead of sampling

B.Use a smaller model from the same family

C.Increase max tokens to ensure complete answers

D.Implement caching for repeated queries

E.Optimize prompts to be more concise

AnswersB, D, E

Smaller models are cheaper to run.

Why this answer

Option B is correct because using a smaller model from the same family (e.g., switching from Llama 3 70B to Llama 3 8B) reduces the number of parameters and computational resources required per inference, directly lowering cost. Smaller models often retain strong performance on many tasks, especially when the task complexity does not demand the full capacity of the larger model, thus maintaining output quality while reducing token processing costs.

Exam trap

Cisco often tests the misconception that greedy decoding reduces cost (it does not—it only changes the decoding strategy, not the model size or token count) and that increasing max tokens improves quality (it actually increases cost and can degrade output by encouraging rambling).

Practice this question →

32

MCQeasy

Which tokenization algorithm is commonly used by models like GPT and BERT, and works by merging frequently occurring character pairs iteratively?

A.Morpheme-based tokenization

B.Byte-Pair Encoding (BPE)

C.SentencePiece

D.WordPiece

AnswerB

BPE iteratively merges the most frequent character pairs to build a vocabulary of subword tokens.

Why this answer

Byte-Pair Encoding (BPE) starts with characters and merges the most frequent pairs to create subword units. WordPiece uses a similar likelihood-based approach. SentencePiece is a framework that can use BPE or unigram.

Practice this question →

33

MCQhard

A data scientist is evaluating a summarization model on a news article dataset. They compute ROUGE-L and BLEU scores. The ROUGE-L score is high, but the BLEU score is low. Which of the following best explains this discrepancy?

A.The BLEU score is unreliable for summarization tasks

B.The model is overfitting to the training data

C.The summaries are very short and use different vocabulary from the reference

D.The summaries are too long and contain many irrelevant n-grams

AnswerC

Short summaries that capture key content (high ROUGE-L) but with different word choices (low n-gram overlap) cause low BLEU.

Why this answer

ROUGE-L measures the longest common subsequence, emphasizing recall of key content (n-grams in order). BLEU measures n-gram precision, penalizing short or non-fluent outputs. A high ROUGE-L but low BLEU suggests the summaries capture the main ideas but use different phrasing or are too short.

Practice this question →

34

Multi-Selectmedium

An enterprise is deploying an LLM application on OCI and must minimize hallucinations. Which TWO strategies should they implement? (Choose two.)

Select 2 answers

A.Apply prompt engineering techniques such as asking the model to cite sources

B.Increase the temperature parameter to encourage more diverse outputs

C.Use a smaller model to reduce complexity

D.Fine-tune the model on a dataset of factual question-answer pairs

E.Implement Retrieval-Augmented Generation (RAG) with a curated knowledge base

AnswersA, E

Prompt engineering can guide the model to rely on provided context and cite sources, reducing hallucinations.

Why this answer

RAG grounds the model in retrieved documents, and prompt engineering (e.g., asking the model to cite sources) can reduce hallucinations. Fine-tuning on factual data helps but may not eliminate hallucinations entirely. Increasing temperature increases randomness, which can worsen hallucinations.

Using a smaller model typically reduces capability, making hallucinations more likely.

Practice this question →

35

Multi-Selectmedium

A team wants to compare the semantic similarity between two sentences using embeddings. Which THREE steps are required?

Select 3 answers

A.Use a text generation model to compare the sentences

B.Normalize the resulting vectors to unit length

C.Compute the cosine similarity between the two vectors

D.Train a new neural network on the two sentences

E.Pass both sentences through an embedding model to obtain dense vector representations

AnswersB, C, E

Normalization ensures that cosine similarity is equivalent to the dot product of the normalized vectors.

Why this answer

Generate embeddings for both sentences, then compute cosine similarity. Normalizing vectors ensures cosine similarity equals the dot product. Training a new model is unnecessary, and using a generation model is not appropriate for embeddings.

Practice this question →

36

Multi-Selectmedium

An ML engineer is choosing an LLM for a code generation assistant. The model must generate syntactically correct code, handle multiple programming languages, and be cost-efficient. Which THREE characteristics should the engineer prioritize?

Select 3 answers

A.Model with a large context window (e.g., 128K tokens)

B.Model optimized for text summarization tasks

C.Model that achieves high pass@k on coding benchmarks like HumanEval

D.Model trained on a diverse multilingual code corpus

E.Model with the highest number of parameters regardless of performance

AnswersA, C, D

Large context allows the model to consider entire files, improving correctness.

Why this answer

A large context window (e.g., 128K tokens) is critical for code generation because it allows the model to process entire code files, long dependency chains, and multi-file projects in a single pass. This reduces the risk of syntax errors caused by missing imports or incomplete function definitions, and it supports the generation of coherent, syntactically correct code across multiple languages without truncation.

Exam trap

Cisco often tests the misconception that more parameters always mean better performance, but in code generation, efficiency and benchmark scores (like pass@k) and training data diversity are more important than raw parameter count.

Practice this question →

37

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

B.Fine-tune a base LLM on the policy documents monthly

C.Use a larger foundation model with a longer context window and paste all documents into each prompt

D.Train a custom model from scratch on the policy documents each month

AnswerA

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

38

MCQeasy

Which of the following is a recognized limitation of large language models?

A.They always require internet access

B.They cannot perform translation between languages

C.They can generate factually incorrect information (hallucinations)

D.They can only process numerical data

AnswerC

Hallucinations are a key limitation where models produce confident but false statements.

Why this answer

Option C is correct because large language models (LLMs) are known to generate factually incorrect information, often called 'hallucinations.' This occurs because LLMs are probabilistic models that predict the next token based on training data patterns, not verified facts. They lack a grounding mechanism to validate outputs against real-world truth, making hallucinations a fundamental limitation.

Exam trap

Cisco often tests the misconception that LLMs are infallible or always correct, leading candidates to overlook the well-documented hallucination problem in favor of incorrect assumptions about connectivity or data types.

How to eliminate wrong answers

Option A is wrong because many LLMs can run locally without internet access (e.g., offline inference with models like Llama 2 or GPT-4-all), though some cloud-based APIs require connectivity. Option B is wrong because LLMs excel at translation between languages, as demonstrated by models like GPT-4 and Google's PaLM 2 supporting dozens of language pairs. Option D is wrong because LLMs process text tokens (words, subwords) via transformer architectures, not numerical data alone; they handle natural language, code, and symbolic inputs.

Practice this question →

39

MCQmedium

A data scientist is fine-tuning a Llama 2 model on a custom dataset for a summarization task. After fine-tuning, the model produces summaries that are too similar to the input text, often copying sentences verbatim. Which adjustment is MOST likely to reduce copying and improve abstractive summarization?

A.Increase the temperature to 0.8 and use top-p sampling

B.Switch to greedy decoding

C.Increase the beam search width

D.Reduce the context window size

AnswerA

Higher temperature flattens the probability distribution, making the model more likely to generate novel phrases rather than copying.

Why this answer

Lowering the temperature increases randomness, reducing the likelihood of exact copying. Greedy decoding and beam search both favor high-probability tokens, which can lead to copy behavior. Top-k and top-p control sampling but do not directly address copying if temperature is too low.

Practice this question →

40

Multi-Selecthard

A developer is debugging a RAG pipeline where the LLM frequently ignores retrieved documents and produces hallucinations. Which THREE factors could contribute to this problem?

Select 3 answers

A.The generation temperature is set to 0

B.The prompt does not explicitly instruct the model to base its answer on the provided context

C.The retrieval top-K parameter is set too low

D.The embedding model's output dimension is too large

E.The chunking strategy uses zero overlap between consecutive chunks

AnswersB, C, E

If the prompt does not instruct the model to use the retrieved documents, it may default to its internal knowledge.

Why this answer

Low chunk overlap can cause loss of context, improper prompt instructions may lead the model to ignore retrieved content, and a small top-K retrieval may miss relevant documents. A large embedding dimension improves retrieval precision, and a high temperature increases randomness but does not cause ignoring documents.

Practice this question →

41

MCQmedium

A data scientist needs to compare two summaries generated by different models against a reference summary. Which metric focuses on recall of n-grams and is commonly used for summarization evaluation?

A.BLEU

B.Perplexity

C.ROUGE

D.BERTScore

AnswerC

ROUGE emphasizes recall and is widely used for summarization.

Why this answer

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is the correct metric because it focuses on recall of n-grams, measuring how many of the reference summary's n-grams appear in the generated summary. This makes it the standard metric for summarization evaluation, as it directly assesses content coverage.

Exam trap

Cisco often tests the distinction between BLEU (precision-focused, for translation) and ROUGE (recall-focused, for summarization), so the trap here is confusing BLEU's n-gram precision with ROUGE's n-gram recall, especially since both use n-gram overlap.

How to eliminate wrong answers

Option A is wrong because BLEU (Bilingual Evaluation Understudy) emphasizes precision of n-grams, not recall, and was designed for machine translation evaluation, not summarization. Option B is wrong because Perplexity measures how well a language model predicts a sequence, not the overlap of n-grams between summaries, and is used for language model evaluation, not summarization quality. Option D is wrong because BERTScore uses contextual embeddings from BERT to compute semantic similarity, not n-gram recall, and while it can be used for summarization, it is not the metric that focuses on recall of n-grams.

Practice this question →

42

MCQmedium

A developer notices that a text generation model produces repetitive phrases when using greedy decoding. Which sampling strategy would best introduce controlled randomness to reduce repetition while maintaining coherence?

A.Use greedy decoding but with a repetition penalty

B.Apply temperature sampling with temperature > 1

C.Increase the beam width in beam search

D.Disable top-k sampling

AnswerB

Higher temperature flattens the probability distribution, making less likely tokens more probable, reducing repetition while maintaining coherence.

Why this answer

Temperature sampling scales the logits before applying softmax, controlling the randomness of token selection. A temperature > 1 increases diversity, reducing repetition, while still allowing the model to generate coherent text.

Practice this question →

43

MCQmedium

A company needs to generate embeddings for a large corpus of legal documents to enable semantic search. Which type of model should they use?

A.An encoder-only embedding model like Cohere Embed

B.A decoder-only generation model like GPT

C.A text-to-speech model

D.A machine translation model

AnswerA

Embedding models are specifically trained to output high-quality embeddings for similarity.

Why this answer

An encoder-only embedding model like Cohere Embed is designed to convert text into dense vector representations (embeddings) that capture semantic meaning, which is exactly what is needed for semantic search over a large corpus of legal documents. These models use a bidirectional transformer architecture to encode context from both directions, producing fixed-size embeddings that can be efficiently compared using cosine similarity or other distance metrics.

Exam trap

Cisco often tests the misconception that any large language model (LLM) can generate embeddings, but the trap here is that decoder-only models (like GPT) are fundamentally designed for generation, not for producing fixed-size, bidirectional embeddings suitable for semantic search.

How to eliminate wrong answers

Option B is wrong because decoder-only generation models like GPT are optimized for autoregressive text generation, not for producing fixed-size embeddings; they lack the bidirectional context needed for high-quality semantic representations. Option C is wrong because text-to-speech models convert text into audio waveforms, which is irrelevant for generating text embeddings for semantic search. Option D is wrong because machine translation models are designed to map text from one language to another, not to produce general-purpose embeddings for similarity search.

Practice this question →

44

MCQmedium

A data scientist wants to reduce the cost of token usage when summarizing large documents using an LLM on OCI. Which tokenization approach is MOST likely to lower token count for English text?

A.Use a model that tokenizes with Byte-Pair Encoding (BPE)

B.Use a model that tokenizes with WordPiece

C.Use a model that tokenizes with SentencePiece

D.Use a model that tokenizes at the character level

AnswerA

BPE splits frequent words into single tokens and rare words into subword units, minimizing total token count.

Why this answer

Byte-Pair Encoding (BPE) is a subword tokenization algorithm that iteratively merges the most frequent byte pairs, creating a fixed-size vocabulary of common subwords. For English text, BPE efficiently represents common words as single tokens and splits rare words into meaningful subword units, resulting in fewer tokens overall compared to other methods. This directly reduces token usage and associated costs when processing large documents with an LLM on OCI.

Exam trap

Cisco often tests the misconception that all subword tokenizers are equally efficient for English, but the trap here is that SentencePiece's lack of pre-tokenization (treating spaces as tokens) actually increases token count for English text, making BPE the superior choice for cost reduction.

How to eliminate wrong answers

Option B is wrong because WordPiece uses a greedy, likelihood-based approach to build subwords, which often produces more tokens for English text than BPE due to its different merging strategy and reliance on a unigram language model. Option C is wrong because SentencePiece treats the input as a raw byte sequence without pre-tokenization (e.g., no whitespace splitting), which can lead to higher token counts for English text since spaces are encoded as tokens, inflating the total. Option D is wrong because character-level tokenization breaks every character into a separate token, dramatically increasing token count (e.g., a 1000-character document becomes 1000+ tokens), which is the least efficient approach for cost reduction.

Practice this question →

45

MCQhard

A machine learning engineer needs to select an embedding model to compute semantic similarity between customer reviews. Which property is MOST important for the embedding model to produce useful similarity scores?

A.The embedding model should produce high-dimensional vectors that can be compared using cosine similarity

B.The embedding model should be a generative model like GPT

C.The embedding model should be fine-tuned on a classification task

D.The embedding model should have a large context window to encode long reviews

AnswerA

Embedding models output dense vectors; cosine similarity is the standard measure for comparing them.

Why this answer

Cosine similarity measures the angle between vectors, making it ideal for comparing semantic similarity regardless of vector magnitude. High-dimensional embeddings from models like text-embedding-ada-002 or all-MiniLM-L6-v2 are specifically trained to produce isotropic vector spaces where cosine similarity correlates with semantic relatedness. This property is fundamental for tasks like clustering or retrieval-augmented generation (RAG) where relative direction, not absolute distance, matters.

Exam trap

Cisco often tests the misconception that generative models (like GPT) can double as embedding models, but they lack the necessary contrastive training and output structure for direct similarity computation.

How to eliminate wrong answers

Option B is wrong because generative models like GPT are autoregressive language models optimized for text generation, not for producing fixed-size, semantically meaningful embeddings; they lack a dedicated pooling layer for similarity tasks. Option C is wrong because fine-tuning on a classification task may distort the embedding space to separate classes, reducing its general semantic similarity performance; contrastive learning (e.g., SimCSE) is more appropriate. Option D is wrong because while a large context window helps encode long reviews, it does not guarantee useful similarity scores; the embedding model must still produce comparable vectors (e.g., via normalized embeddings and cosine similarity), and truncation or chunking strategies are often used instead.

Practice this question →

46

Multi-Selecthard

A team is designing a text classification system using OCI Generative AI. They have a small labeled dataset of 200 examples per class. Which THREE techniques can help improve model performance without requiring additional labeled data?

Select 3 answers

A.Generate synthetic labeled examples using an LLM and include them in training

B.Increase the context window of the model to include all training examples

C.Use an embedding model to extract features and train a simple classifier (e.g., SVM)

D.Fine-tune a large decoder-only model on the 200 examples

E.Use in-context learning with a few labeled examples in the prompt

AnswersA, C, E

Data augmentation with an LLM can expand the dataset and improve robustness.

Why this answer

Option A is correct because generating synthetic labeled examples using an LLM is a data augmentation technique that expands the small dataset without requiring new human annotations. This approach leverages the LLM's ability to produce diverse, class-consistent text, which helps the model generalize better and reduces overfitting when training on only 200 examples per class.

Exam trap

Cisco often tests the misconception that simply increasing the context window or fine-tuning a large model on tiny data will magically improve performance, when in reality these approaches either do not add training signal or cause overfitting without sufficient data.

Practice this question →

47

Multi-Selectmedium

A data scientist is evaluating BERTScore to compare model-generated summaries with reference summaries. Which TWO statements about BERTScore are correct?

Select 2 answers

A.BERTScore uses n-gram overlap to measure recall

B.BERTScore can capture paraphrases better than BLEU because of its semantic nature

C.BERTScore is fully deterministic and always yields the same result as ROUGE

D.BERTScore requires the generation model's training data to compute scores

E.BERTScore leverages pre-trained contextual embeddings to compute semantic similarity

AnswersB, E

BERTScore's use of embeddings allows it to recognize paraphrases that BLEU (n-gram precision) would miss.

Why this answer

Option B is correct because BERTScore leverages pre-trained contextual embeddings from BERT to compute token-level similarity between candidate and reference texts, allowing it to capture semantic equivalence even when surface forms differ. This makes it more robust to paraphrases than BLEU, which relies on exact n-gram matching and cannot recognize synonymous or rephrased content.

Exam trap

Cisco often tests the distinction between semantic metrics like BERTScore and lexical metrics like BLEU/ROUGE, and the trap here is that candidates may confuse BERTScore's embedding-based approach with n-gram overlap or assume it requires training data, when in fact it is a reference-free metric that uses a pre-trained model.

Practice this question →

48

Multi-Selectmedium

A data scientist is building a text summarization system using an LLM. They want to evaluate the model's output against human-written summaries. Which TWO metrics are most appropriate for this evaluation? (Choose two.)

Select 2 answers

A.Human evaluation rubrics

B.BLEU

C.ROUGE

D.Perplexity

E.BERTScore

AnswersC, E

ROUGE is recall-oriented and widely used for summarization evaluation.

Why this answer

ROUGE is recall-oriented and measures overlap of n-grams, making it a standard metric for summarization. BERTScore measures semantic similarity using embeddings, which can capture meaning even when wording differs. BLEU is more for translation, perplexity measures fluency, and human evaluation is qualitative but not a metric.

Practice this question →

49

MCQmedium

An organization needs to deploy a model that can both understand and generate text, such as for a translation task where the input is in English and output is in French. Which model architecture is most suitable?

A.Encoder-decoder (e.g., T5)

B.Decoder-only (e.g., GPT)

C.Encoder-only (e.g., BERT)

D.Mixture of Experts (MoE)

AnswerA

T5 is an encoder-decoder model specifically designed for text-to-text tasks like translation, where the full input is encoded and the decoder generates the output.

Why this answer

Encoder-decoder architectures like T5 are designed for sequence-to-sequence tasks. The encoder processes the input sequence, and the decoder generates the output sequence, making it ideal for translation.

Practice this question →

50

MCQmedium

Which of the following metrics is most suitable for evaluating a translation model's output against multiple reference translations?

A.ROUGE

B.Perplexity

C.BERTScore

D.BLEU

AnswerD

BLEU is the standard metric for machine translation.

Why this answer

BLEU (Bilingual Evaluation Understudy) is the most suitable metric for evaluating a translation model's output against multiple reference translations because it measures n-gram precision between the candidate translation and one or more reference translations. It directly quantifies how many words and phrases in the candidate match those in the references, making it the standard metric for machine translation tasks.

Exam trap

The trap here is that candidates often confuse ROUGE (recall-based) with BLEU (precision-based) or assume BERTScore's semantic matching is better for translation, but BLEU is explicitly the standard for multi-reference translation evaluation in the NLP community.

How to eliminate wrong answers

Option A is wrong because ROUGE (Recall-Oriented Understudy for Gisting Evaluation) focuses on recall of n-grams and is primarily designed for summarization evaluation, not translation. Option B is wrong because Perplexity measures how well a language model predicts a sequence of tokens, but it does not compare against reference translations and is not a direct evaluation metric for translation quality. Option C is wrong because BERTScore uses contextual embeddings from BERT to compute similarity between candidate and reference, but it is a semantic similarity metric and not specifically optimized for evaluating translation output against multiple references; BLEU remains the standard for this task.

Practice this question →

51

Multi-Selectmedium

A data scientist is debugging a RAG system where the generated answers are not relevant to the retrieved documents. Which TWO factors are MOST likely causing this issue?

Select 2 answers

A.The retriever is returning irrelevant chunks due to poor embeddings or low similarity threshold

B.The generation model is not conditioned on the retrieved chunks, possibly because the prompt does not instruct it to use them

C.The context window of the generation model is smaller than the retrieved chunks

D.The temperature is set too low, making outputs deterministic

E.The chunking strategy produces chunks that are too small, losing context

AnswersA, B

Irrelevant chunks lead to irrelevant answers; this is a common cause.

Why this answer

If the retriever returns irrelevant chunks or the generation model ignores the context, the answers will be off-topic. Checking retrieval relevance and prompt instruction is key.

Practice this question →

52

MCQmedium

Which evaluation metric is designed to measure the overlap of n-grams between a generated summary and a reference summary, focusing on recall of content words?

A.Perplexity

B.ROUGE

C.BERTScore

D.BLEU

AnswerB

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures recall of n-grams between generated and reference summaries.

Why this answer

ROUGE-N measures n-gram recall (how many n-grams from the reference appear in the generated text). BLEU measures n-gram precision. BERTScore leverages contextual embeddings.

Perplexity measures likelihood under the model.

Practice this question →

53

MCQmedium

A practitioner wants to evaluate an LLM-generated summary against a human-written reference using a metric that focuses on recall of key information. Which metric is most appropriate?

A.BLEU

B.Perplexity

C.Cosine similarity

D.ROUGE

AnswerD

ROUGE metrics (especially ROUGE-L) are recall-oriented.

Why this answer

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is the most appropriate metric because it specifically measures recall of key information by comparing n-gram overlap between the generated summary and a reference summary. This aligns directly with the practitioner's goal of evaluating how well the LLM-generated summary captures the essential content from the human-written reference.

Exam trap

Cisco often tests the distinction between precision-focused metrics (BLEU) and recall-focused metrics (ROUGE), and the trap here is that candidates mistakenly choose BLEU because it is well-known for text evaluation, without recognizing that the question explicitly asks for a metric focusing on recall of key information.

How to eliminate wrong answers

Option A is wrong because BLEU (Bilingual Evaluation Understudy) is precision-focused, measuring how many n-grams in the generated text appear in the reference, which penalizes missing content and is designed for machine translation, not summary recall. Option B is wrong because Perplexity measures how well a language model predicts a sequence of tokens, not the overlap or recall of key information between two texts, and is used for intrinsic model evaluation, not summary comparison. Option C is wrong because Cosine similarity measures the cosine of the angle between two vector embeddings, capturing semantic similarity but not explicitly focusing on recall of key information; it can be influenced by irrelevant content and does not provide a direct recall score like ROUGE.

Practice this question →

54

MCQmedium

A team wants to use an LLM to answer questions about a private codebase that is updated hourly. They cannot afford to fine-tune every hour. Which OCI feature or approach is most suitable?

A.Use a long-context model with full codebase in prompt

B.Implement Retrieval-Augmented Generation (RAG) with a vector database

C.Fine-tune a model on the codebase daily

D.Use a smaller model with faster inference

AnswerB

RAG provides up-to-date retrieval without retraining.

Why this answer

Retrieval-Augmented Generation (RAG) with a vector database is the most suitable approach because it allows the LLM to answer questions about a frequently updated private codebase without retraining. RAG retrieves relevant code snippets from a vector index at query time, ensuring the model always has access to the latest code without the cost and latency of hourly fine-tuning.

Exam trap

The trap here is that candidates often assume fine-tuning is the only way to incorporate private or dynamic data, overlooking RAG's ability to provide real-time, cost-effective access to frequently updated information without retraining.

How to eliminate wrong answers

Option A is wrong because a long-context model with the full codebase in the prompt is impractical: the codebase is updated hourly and likely exceeds the model's context window, leading to truncation, high token costs, and degraded performance. Option C is wrong because fine-tuning a model daily (or hourly) is too expensive and time-consuming for a rapidly changing codebase, and it does not support real-time updates without retraining. Option D is wrong because using a smaller model with faster inference does not solve the core problem of accessing up-to-date private code; it only addresses inference speed, not knowledge freshness or retrieval.

Practice this question →

55

MCQmedium

A practitioner is using a Cohere Command model on OCI for a translation task. They notice that the output is often incomplete and cuts off mid-sentence. Which parameter should they adjust to address this?

A.Temperature

B.Max tokens

C.Frequency penalty

D.Top-p

AnswerB

Max tokens sets the maximum length of the generated output.

Why this answer

The 'Max tokens' parameter controls the maximum length of the generated output. When a model cuts off mid-sentence, it means the token limit has been reached before the model could complete its response. Increasing this value allows the model to generate more tokens, thus completing the translation.

Exam trap

Cisco often tests the misconception that temperature or top-p controls output length, when in fact they only affect token selection probability and diversity, not the maximum number of tokens generated.

How to eliminate wrong answers

Option A is wrong because Temperature controls the randomness of the output, not the length; lowering it makes the model more deterministic but does not prevent truncation. Option C is wrong because Frequency penalty reduces repetition by penalizing tokens that have already appeared, but it does not affect the total number of tokens generated. Option D is wrong because Top-p (nucleus sampling) controls the cumulative probability threshold for token selection, influencing diversity, not the output length.

Practice this question →

56

Multi-Selectmedium

Which TWO components are essential in a Retrieval-Augmented Generation (RAG) pipeline?

Select 2 answers

A.Chunking the documents into smaller pieces

B.Embedding the chunks into a vector space

C.Fine-tuning the LLM on the documents

D.Knowledge distillation

E.Beam search decoding

AnswersA, B

Chunking is necessary for indexing.

Why this answer

Chunking splits documents, embedding converts chunks to vectors, retrieval fetches relevant chunks, and generation produces the answer.

Practice this question →

57

Multi-Selectmedium

Which TWO of the following are characteristics of the transformer decoder-only architecture (e.g., GPT)?

Select 2 answers

A.It uses masked self-attention to prevent attending to future tokens

B.It generates text autoregressively, one token at a time

C.It uses bidirectional self-attention to capture context from both directions

D.It processes all tokens in parallel during generation

E.It implements cross-attention between encoder and decoder layers

AnswersA, B

Causal masking ensures each token can only attend to previous tokens, which is essential for autoregressive generation.

Why this answer

Option A is correct because the transformer decoder-only architecture, such as GPT, employs masked self-attention (also known as causal attention) to ensure that each token can only attend to previous tokens in the sequence. This masking prevents the model from 'seeing' future tokens during training and generation, which is essential for maintaining the autoregressive property where predictions depend only on past context.

Exam trap

Cisco often tests the distinction between attention mechanisms across architectures, and the trap here is that candidates confuse the parallel training capability (which is true for all transformers) with parallel generation (which is false for autoregressive models), leading them to incorrectly select Option D.

Practice this question →

58

MCQhard

A practitioner needs to choose a pre-trained model for a sentiment analysis task on customer reviews. The model must be efficient for inference and capable of handling multiple languages. Which architecture is MOST suitable?

A.Encoder-only BERT model

B.Encoder-decoder T5 model

C.Decoder-only GPT model

D.Mixture of Experts model

AnswerA

BERT is designed for understanding tasks like classification; it produces contextual embeddings efficiently and has multilingual versions.

Why this answer

Encoder-only models like BERT are well-suited for classification tasks (e.g., sentiment analysis) because they produce contextualized representations for each token and can be fine-tuned with a classification head. BERT supports multiple languages via multilingual variants.

Practice this question →

59

MCQeasy

Which of the following is a distinguishing feature of in-context learning compared to fine-tuning?

A.In-context learning modifies the model's weights based on examples

B.In-context learning does not update the model's weights; instead, examples are provided in the prompt

C.In-context learning is only possible with encoder-only models

D.In-context learning requires additional training on a labeled dataset

AnswerB

In-context learning uses examples in the prompt at inference time without any weight updates.

Why this answer

In-context learning does not update model weights; it provides examples in the prompt at inference time. Fine-tuning updates the model weights through additional training on a dataset.

Practice this question →

60

MCQeasy

What is the key advantage of multi-head attention over single-head attention in transformer models?

A.It eliminates the need for positional encoding

B.It reduces the total number of parameters

C.It allows the model to focus on different parts of the sequence simultaneously from different representation subspaces

D.It makes the model non-autoregressive

AnswerC

Each head learns different attention patterns, improving model capacity.

Why this answer

Multi-head attention allows the model to attend to information from different representation subspaces at different positions, capturing a richer understanding.

Practice this question →

61

MCQeasy

What is the primary difference between pre-training and fine-tuning in the context of large language models?

A.Pre-training trains from scratch, fine-tuning updates all weights on a new dataset

B.Pre-training uses a smaller dataset, fine-tuning uses a larger dataset

C.Pre-training produces embeddings, fine-tuning produces text generation

D.Pre-training is unsupervised, fine-tuning is always supervised

AnswerA

Pre-training involves training from random initialization on a large corpus; fine-tuning starts from pre-trained weights and updates them on a smaller dataset.

Why this answer

Pre-training trains a model on a large, general corpus to learn language representations; fine-tuning adapts the pre-trained model to a specific task or domain using a smaller labeled dataset.

Practice this question →

62

MCQmedium

A developer is using OCI Generative AI with a Cohere Command model for text generation. They want the output to be more creative and diverse, but still relevant. Which sampling strategy should they use?

A.Temperature sampling (temperature > 1)

B.Top-k sampling

C.Top-p (nucleus) sampling

D.Greedy decoding

AnswerC

Top-p sampling dynamically chooses the set of tokens with cumulative probability p, balancing creativity and relevance by adapting to the model's confidence.

Why this answer

Top-p (nucleus) sampling selects from the smallest set of tokens whose cumulative probability exceeds p. It adapts to the model's confidence, allowing diversity while maintaining relevance. Greedy decoding is deterministic, temperature scales all probabilities, top-k fixes the number of candidates, and beam search explores multiple sequences but tends to produce safe outputs.

Practice this question →

63

Multi-Selectmedium

A data scientist needs to evaluate the quality of a text summarization model. Which TWO metrics are appropriate for this task?

Select 2 answers

A.Perplexity

B.BLEU

C.F1 score

D.BERTScore

E.ROUGE

AnswersD, E

BERTScore captures semantic similarity.

Why this answer

BERTScore is correct because it leverages contextual embeddings from BERT to compute semantic similarity between generated and reference summaries, capturing meaning beyond exact n-gram overlap. This makes it highly effective for evaluating text summarization models where paraphrasing and semantic equivalence are critical.

Exam trap

Cisco often tests the distinction between metrics designed for generation tasks (like summarization) versus those for classification or language modeling, trapping candidates who confuse BLEU (translation) or perplexity (language modeling) with summarization-specific metrics like ROUGE and BERTScore.

Practice this question →

64

MCQeasy

Which component of the Transformer architecture allows the model to focus on different parts of the input sequence when generating each output token?

A.Self-attention mechanism

B.Positional encoding

C.Feed-forward network

D.Layer normalization

AnswerA

Self-attention allows each token to attend to all other tokens.

Why this answer

Self-attention computes attention scores between all pairs of positions, enabling the model to weigh the importance of different input tokens.

Practice this question →

65

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Use a larger foundation model with a longer context window and paste all documents into each prompt

D.Train a custom model from scratch on the policy documents each month

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

66

Multi-Selecthard

An OCI practitioner is comparing BERTScore with traditional n-gram metrics (ROUGE, BLEU) for evaluating summarization. Which THREE statements about BERTScore are true?

Select 3 answers

A.BERTScore uses pre-trained contextual embeddings from BERT

B.BERTScore does not require a reference text

C.BERTScore computes similarity based on exact n-gram overlap

D.BERTScore typically has higher correlation with human judgment than ROUGE

E.BERTScore is more robust to paraphrasing than ROUGE or BLEU

AnswersA, D, E

It leverages BERT's embeddings to compute semantic similarity.

Why this answer

BERTScore uses contextual embeddings, captures semantic similarity, and correlates better with human judgment than n-gram metrics.

Practice this question →

67

MCQmedium

An organization wants to deploy a model that can summarize long financial reports (5000+ tokens) without losing context. Which model architecture is best suited for this requirement?

A.Encoder-decoder model (e.g., T5)

B.Mixture-of-experts model

C.Decoder-only model (e.g., GPT)

D.Encoder-only model (e.g., BERT)

AnswerA

Encoder-decoder architecture excels at summarization and can handle long inputs via the encoder.

Why this answer

Encoder-decoder models like T5 or BART are designed for sequence-to-sequence tasks such as summarization, and can handle long inputs with their encoder.

Practice this question →

68

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Use a larger foundation model with a longer context window and paste all documents into each prompt

D.Fine-tune a base LLM on the policy documents monthly

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

69

MCQmedium

An organization needs to select a tokenisation algorithm for a multilingual LLM that will process English, Chinese, and Korean text efficiently. Which tokenisation method is BEST suited for this requirement?

A.WordPiece

B.Byte-Pair Encoding (BPE)

C.SentencePiece

D.Character-level tokenisation

AnswerC

SentencePiece treats the input as a raw byte stream and does not require pre-tokenisation, making it ideal for multilingual corpora including CJK languages.

Why this answer

SentencePiece is language-agnostic, works directly on raw text without requiring pre-tokenisation (e.g., whitespace splitting), and handles languages like Chinese and Korean where word boundaries are not obvious. BPE and WordPiece typically require pre-tokenisation, making them less suitable for CJK languages.

Practice this question →

70

Multi-Selectmedium

A data scientist is building a RAG pipeline on OCI. Which TWO components are essential for the retrieval step?

Select 2 answers

A.Fine-tuned generation model

B.Embedding model to convert chunks into vectors

C.Document chunking

D.Beam search decoder

E.Human feedback loop

AnswersB, C

Embeddings are required for vector search.

Why this answer

Chunking splits documents into manageable pieces, and embedding converts them into vectors for similarity search.

Practice this question →

71

MCQhard

An OCI user observes that their Mistral model produces very repetitive text when temperature is set to 0.9 and top-p to 1.0. Which adjustment is most likely to reduce repetition?

A.Decrease top-p to 0.5

B.Increase temperature to 1.5

C.Enable frequency penalty

D.Set top-k to 1

AnswerC

Frequency penalty penalizes tokens based on their frequency in the generated text.

Why this answer

Enabling frequency penalty directly reduces the likelihood of the model repeating the same tokens by subtracting a penalty proportional to the token's existing frequency in the generated text. This is the most targeted way to combat repetitive output without altering the core sampling parameters (temperature and top-p) that control randomness.

Exam trap

Cisco often tests the misconception that reducing randomness (lower top-p or top-k) will fix repetition, when in fact it exacerbates it, and that increasing temperature is a safe fix, when it actually risks incoherence.

How to eliminate wrong answers

Option A is wrong because decreasing top-p to 0.5 narrows the set of tokens considered for sampling, which can actually increase repetition by focusing on the most probable tokens. Option B is wrong because increasing temperature to 1.5 makes the output more random and chaotic, which may reduce repetition but often at the cost of coherence and relevance. Option D is wrong because setting top-k to 1 forces greedy decoding (always picking the single most probable token), which dramatically increases repetition and is the opposite of what is needed.

Practice this question →

72

MCQhard

An LLM generates a response that contains a plausible-sounding but factually incorrect statement about a historical event. This is an example of which known limitation?

A.Knowledge cutoff

B.Hallucination

C.Bias in training data

D.Context length constraint

AnswerB

Hallucination is when the model produces factually incorrect or nonsensical content that appears plausible.

Why this answer

Option B is correct because hallucination in LLMs refers to the generation of content that is plausible-sounding but factually incorrect or nonsensical. This occurs when the model's probabilistic next-token prediction produces statements that are not grounded in its training data or real-world facts, often due to overgeneralization or lack of factual recall mechanisms.

Exam trap

Cisco often tests the distinction between hallucination and knowledge cutoff, where candidates mistakenly think that incorrect facts are due to the model not knowing recent events, but hallucination can occur for any topic regardless of the training data cutoff date.

How to eliminate wrong answers

Option A is wrong because knowledge cutoff refers to the date after which the model has no training data, not to the generation of incorrect facts within its training period. Option C is wrong because bias in training data leads to systematic skewing of outputs (e.g., stereotypes), not to isolated factually incorrect statements about specific events. Option D is wrong because context length constraint limits the amount of input text the model can process at once, not the factual accuracy of its generated responses.

Practice this question →

73

Multi-Selecthard

A practitioner is choosing a model for a code generation assistant that must run on OCI with low latency. Which THREE considerations are most important?

Select 3 answers

A.Model size (number of parameters)

B.Whether the model is available as a managed endpoint or requires self-hosting on OCI GPU shapes

C.The cost of fine-tuning the model on internal codebases

D.Availability of a model specifically fine-tuned for code (e.g., Code Llama, StarCoder)

E.The model's context window length

AnswersA, B, D

Smaller models generally have lower inference latency, which is critical for real-time code generation.

Why this answer

Smaller models typically have lower latency, model families (Llama, Code Llama) are designed for code, and deployment choices affect latency. Context window size and fine-tuning cost are secondary or not directly relevant to latency.

Practice this question →

74

MCQeasy

Which of the following is a known limitation of large language models?

A.They can generate text in multiple languages

B.They sometimes produce plausible but incorrect information (hallucinations)

C.They support few-shot learning via in-context examples

D.They can be fine-tuned for domain-specific tasks

AnswerB

Hallucinations are a well-known limitation where models generate factually incorrect content.

Why this answer

Option B is correct because a well-documented limitation of large language models (LLMs) is their tendency to generate plausible-sounding but factually incorrect or nonsensical information, commonly referred to as 'hallucinations'. This occurs because LLMs are trained to predict the next token based on statistical patterns in their training data, not to verify facts or maintain a ground-truth database. This limitation is a core challenge in deploying LLMs for tasks requiring high factual accuracy, such as legal or medical advice.

Exam trap

Cisco often tests the distinction between a model's capabilities and its limitations, so the trap here is that candidates may confuse a desirable feature (like multilingual generation or few-shot learning) with a limitation, when the question specifically asks for a known drawback.

How to eliminate wrong answers

Option A is wrong because generating text in multiple languages is a capability, not a limitation, of large language models; it stems from multilingual training data and tokenization. Option C is wrong because supporting few-shot learning via in-context examples is a feature of LLMs, not a limitation; it allows them to adapt to tasks without fine-tuning by conditioning on a few examples in the prompt. Option D is wrong because being fine-tunable for domain-specific tasks is a strength of LLMs, enabling specialization through transfer learning, not a limitation.

Practice this question →

75

MCQhard

An OCI user is comparing two embedding models: one with 768 dimensions and another with 1024 dimensions. Which of the following trade-offs is most relevant?

A.The 1024-dimensional model always yields better accuracy with no additional cost

B.The 768-dimensional model is always faster and more storage-efficient

C.Higher dimensions may capture more fine-grained semantic information but require more storage and slower similarity search

D.Dimension size has no impact on performance or storage

AnswerC

This is the standard trade-off.

Why this answer

Higher dimensions can capture more nuanced semantics but require more storage and may increase retrieval latency.

Practice this question →

Page 1 of 2 · 145 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Oci Genai Llm Fundamentals questions.

Start 20-question session

CCNA Oci Genai Llm Fundamentals Questions — Page 1 of 2 | Courseiva