CCNA Fundamentals of Large Language Models Questions

53 of 128 questions · Page 2/2 · Fundamentals of Large Language Models · Answers revealed

76
Multi-Selecthard

A developer is evaluating OCI GenAI model families. Which three are correct characteristics of the available models? (Choose three.)

Select 3 answers
A.Llama models are open-source and available for fine-tuning
B.All models support real-time streaming of tokens
C.Cohere embedding models produce vector representations
D.OCI GenAI provides both hosted and dedicated deployment options
E.Cohere Command models are optimized for multilingual tasks
AnswersA, C, D

Meta's Llama models are open-source and supported by OCI GenAI for fine-tuning.

Why this answer

Llama models, such as Llama 2 and Llama 3, are open-source large language models originally developed by Meta. OCI GenAI provides them as pre-built models that developers can fine-tune using their own datasets, enabling customization for domain-specific tasks without training from scratch.

Exam trap

Oracle often tests the misconception that all models in a platform share the same capabilities, such as streaming or multilingual optimization, when in reality each model family (e.g., Llama, Cohere Command, Cohere Embed) has distinct design goals and feature sets.

77
MCQhard

A company fine-tunes an LLM on internal support tickets. After deployment, the model hallucinates company-specific product names. What is the most effective mitigation?

A.Switch to a smaller model to reduce hallucination risk
B.Use prompt engineering to remind the model to be accurate
C.Implement RAG with a verified product database
D.Fine-tune further with more ticket data
AnswerC

RAG provides factual grounding, reducing hallucinations.

Why this answer

RAG (Retrieval-Augmented Generation) grounds the LLM's output in a verified product database, providing factual context that prevents hallucination of company-specific product names. Unlike fine-tuning, which only adjusts model weights and can still produce plausible but incorrect names, RAG retrieves exact records at inference time, ensuring accuracy for proprietary terminology.

Exam trap

Oracle often tests the misconception that fine-tuning alone can fix factual accuracy for domain-specific entities, when in reality RAG is required to ground outputs in a verifiable external knowledge source.

How to eliminate wrong answers

Option A is wrong because switching to a smaller model reduces capacity and often increases hallucination risk due to lower parameter count and less memorization ability. Option B is wrong because prompt engineering is a fragile, surface-level fix that cannot enforce factual accuracy for specific product names; the model may still generate plausible but incorrect names. Option D is wrong because further fine-tuning with more ticket data risks overfitting and does not guarantee elimination of hallucinated product names, as the model can still invent names not present in the training distribution.

78
MCQhard

A healthcare startup is building a chatbot to answer patient inquiries using a large language model (LLM) deployed on OCI Data Science AI Quick Actions. The chatbot must comply with HIPAA regulations, so all patient data must remain within the OCI tenancy and never be sent to third-party APIs. The team has fine-tuned a Llama 2 7B model on de-identified medical records using OCI Data Science notebooks. The model is deployed as a managed endpoint via AI Quick Actions. Early testing shows that the chatbot sometimes generates responses containing specific patient names or dates of birth that were present in the fine-tuning dataset. Moreover, the model occasionally hallucinates medication dosages that are not medically accurate. Which course of action should the team take to address both issues while maintaining HIPAA compliance?

A.Deploy a rule-based post-processing script that checks each response against a list of known patient names and medication dosages, and rejects any response containing them.
B.Switch to a larger model (e.g., Llama 2 70B) to improve accuracy and reduce hallucinations, and apply output filtering to remove any detected PII from responses.
C.Increase the fine-tuning dataset size with more varied de-identified records to reduce overfitting, and apply a temperature setting of 0 to make outputs deterministic.
D.Re-fine-tune the model using differential privacy to limit memorization of training data, and implement retrieval-augmented generation (RAG) with a curated medical knowledge base to ground medication-related responses.
AnswerD

Differential privacy during training reduces the risk of memorizing private data, and RAG grounds responses in a trusted knowledge base, reducing hallucinations. This combination addresses both issues effectively.

Why this answer

Option D is correct because it addresses both memorization of PII and hallucination of medication dosages while maintaining HIPAA compliance. Differential privacy during fine-tuning limits the model's ability to memorize specific patient data, and retrieval-augmented generation (RAG) grounds responses in a curated medical knowledge base, reducing hallucinations without sending data outside the OCI tenancy.

Exam trap

Oracle often tests the misconception that simply filtering outputs or increasing model size can solve memorization and hallucination issues, when in fact only training-time techniques like differential privacy and inference-time grounding like RAG address the root causes.

How to eliminate wrong answers

Option A is wrong because a rule-based post-processing script cannot catch all variations of patient names or hallucinated dosages (e.g., misspellings, new names), and rejecting responses containing known names does not prevent the model from generating them in the first place. Option B is wrong because switching to a larger model (Llama 2 70B) does not inherently reduce memorization of training data or hallucinations; it may even increase both, and output filtering alone cannot guarantee removal of all PII without risking false positives or missing subtle leaks. Option C is wrong because increasing dataset size does not guarantee reduced overfitting or memorization, and setting temperature to 0 makes outputs deterministic but does not prevent the model from reproducing memorized PII or hallucinating dosages; it only removes randomness.

79
MCQeasy

An OCI GenAI practitioner wants to deploy a model that can generate code from natural language descriptions. Which type of model is most suitable?

A.T5
B.ResNet
C.BERT
D.GPT
AnswerD

GPT (decoder-only) excels at autoregressive text generation, ideal for code generation.

Why this answer

GPT models (decoder-only) are designed for text generation, including code generation. BERT is encoder-only, T5 is encoder-decoder but not as optimized for code, and ResNet is for images.

80
Multi-Selecteasy

Which two factors are essential for calculating the cost of using OCI Generative AI for text generation? (Choose two.)

Select 2 answers
A.Model architecture (encoder-only vs decoder-only).
B.Number of API calls per minute.
C.Temperature setting.
D.Number of input tokens.
E.Number of output tokens.
AnswersD, E

Input tokens are a direct factor in cost calculation.

Why this answer

The cost of using OCI Generative AI for text generation is primarily determined by the number of input tokens (the prompt you send) and the number of output tokens (the generated response). OCI charges per token processed, making these two factors essential for cost calculation.

Exam trap

Oracle often tests the misconception that API call frequency or model architecture parameters directly influence cost, when in reality only token counts (input and output) are the billing units.

81
Multi-Selecthard

Which THREE factors should be considered when choosing between a fine-tuning and a prompt engineering approach?

Select 3 answers
A.Latency requirements
B.Need for model personalization
C.Availability of foundation model in OCI
D.Amount of labeled data available
E.Budget for GPU compute
AnswersB, D, E

Fine-tuning is necessary for deep personalization.

Why this answer

Option B is correct because model personalization is a key driver for choosing fine-tuning over prompt engineering. Fine-tuning modifies the model's weights to adapt it to a specific domain or task, enabling deeper customization that prompt engineering alone cannot achieve, especially when the desired behavior requires learning new patterns or knowledge not present in the base model.

Exam trap

Oracle often tests the misconception that latency or model availability are primary differentiators, when in fact the core trade-off is between the need for deep personalization (fine-tuning) versus the ease and speed of prompt engineering, with labeled data and compute budget being practical constraints.

82
MCQhard

Refer to the exhibit. A developer encounters this error. Which action should they take to resolve the issue?

A.Wait and retry after some time.
B.Change the model to cohere.command-light.
C.Increase the max-tokens value.
D.Decrease the temperature to 0.0.
AnswerA

Rate limit errors require waiting for the quota to reset, typically after a short period. Automatic retries with backoff are recommended.

Why this answer

The error indicates a rate limit or throttling issue, typically returned by the OCI Generative AI service when the API request quota is exceeded. Waiting and retrying after the cooldown period allows the rate limit to reset, which is the correct resolution for transient throttling errors.

Exam trap

Oracle often tests the misconception that model parameters (like temperature or max-tokens) can resolve API-level errors, when in fact throttling errors require waiting or implementing retry logic with backoff.

How to eliminate wrong answers

Option B is wrong because changing the model to cohere.command-light does not address rate limiting; it only changes the underlying LLM, which may have different quotas but does not resolve the current throttling error. Option C is wrong because increasing max-tokens affects the length of generated responses, not the request rate or quota limits. Option D is wrong because decreasing temperature to 0.0 controls output randomness and determinism, not API request throttling or rate limits.

83
MCQeasy

A data scientist is using OCI Data Science to fine-tune a Cohere command model on domain-specific documents. They observe that the fine-tuned model generates repetitive text. What is the most likely cause?

A.The number of epochs was insufficient.
B.The training dataset lacked diversity.
C.The learning rate was too high.
D.The batch size was too small.
AnswerB

Lack of diversity in training data leads to overfitting and repetitive outputs.

Why this answer

Repetitive text in fine-tuned models is a classic symptom of overfitting to a narrow or homogeneous training dataset. When the domain-specific documents lack diversity in phrasing, topics, or contexts, the model learns to latch onto the most common patterns and repeats them, rather than generalizing. This is not a hyperparameter tuning issue but a data quality issue.

Exam trap

The trap here is that candidates often blame hyperparameters (epochs, learning rate, batch size) for overfitting symptoms, but Cisco specifically tests the understanding that data diversity is the root cause of repetitive generation in fine-tuned LLMs.

How to eliminate wrong answers

Option A is wrong because insufficient epochs typically cause underfitting, not repetitive text; the model would fail to learn patterns at all. Option C is wrong because a learning rate that is too high usually leads to training instability or divergence, not repetitive outputs. Option D is wrong because a batch size that is too small increases gradient noise and can slow convergence, but it does not directly cause repetitive text generation.

84
MCQeasy

Which of the following best describes the role of attention in transformer models?

A.It assigns equal weight to all words in the input.
B.It is used only during training, not inference.
C.It allows the model to focus on relevant parts of the input sequence when generating output.
D.It replaces the need for positional encoding.
AnswerC

This is the core function of attention: it enables the model to selectively attend to important input parts.

Why this answer

Option C is correct because the attention mechanism in transformer models dynamically computes a weighted sum of all input tokens, allowing the model to focus on the most relevant parts of the input sequence when generating each output token. This is achieved through scaled dot-product attention, which assigns higher weights to tokens that are more contextually important, enabling the model to capture long-range dependencies effectively.

Exam trap

Oracle often tests the misconception that attention is only for training or that it replaces positional encoding, so candidates must remember that attention is inherently order-agnostic and requires positional encoding to capture sequence order, and that it is used in both training and inference phases.

How to eliminate wrong answers

Option A is wrong because attention does not assign equal weight to all words; instead, it computes a distribution of weights (attention scores) that vary based on the relevance of each token to the current query, with some tokens receiving much higher weights than others. Option B is wrong because attention is used during both training and inference; during inference, the model still computes attention over the input sequence to generate each output token, though the key-value cache may be used for efficiency. Option D is wrong because attention does not replace the need for positional encoding; the self-attention operation is permutation-invariant (it treats the input as a set), so positional encodings are required to inject information about the order of tokens in the sequence.

85
MCQhard

A multinational corporation is deploying a generative AI chatbot for customer support using Oracle Cloud Infrastructure's Generative AI service. The chatbot is powered by a large language model (LLM) accessed via the on-demand serving mode. During initial testing, the chatbot provides accurate answers for well-known products but frequently hallucinates or gives incorrect specifications for niche products. The company maintains a comprehensive internal database of product specifications, updated daily. The support team prefers not to fine-tune the LLM due to cost and maintenance overhead. Additionally, the chatbot must respond within 2 seconds to maintain a good customer experience. The team considers several approaches: A. Increasing the 'temperature' parameter to make the model more creative, hoping it will generate more accurate responses when unsure. B. Using few-shot prompting with three manually curated examples of correct product specifications included in every prompt. C. Implementing a Retrieval Augmented Generation (RAG) pipeline that retrieves relevant product documents from the internal database and prepends them to the prompt before inference. D. Reducing the 'topP' parameter to 0.1 to force the model to sample only from the highest probability tokens, thereby reducing randomness. Which approach best meets the requirements of improving factual accuracy while maintaining low latency?

A.Reduce the 'topP' parameter to 0.1 to force the model to sample only from the highest probability tokens, thereby reducing randomness.
B.Implement a Retrieval Augmented Generation (RAG) pipeline that retrieves relevant product documents from the internal database and prepends them to the prompt before inference.
C.Use few-shot prompting with three manually curated examples of correct product specifications included in every prompt.
D.Increase the 'temperature' parameter to make the model more creative, hoping it will generate more accurate responses when unsure.
AnswerB

RAG injects accurate, domain-specific context, improving factual accuracy without fine-tuning, and can be implemented with efficient retrieval for low latency.

Why this answer

Option A is correct because Retrieval Augmented Generation (RAG) provides relevant, up-to-date context from the internal database, improving factual accuracy without fine-tuning, and can be optimized for low latency. Option B (few-shot) is limited by context window size and increases token usage, potentially increasing latency. Option C (increasing temperature) is counterproductive as it increases randomness.

Option D (reducing topP) does not add factual knowledge and may reduce output quality.

86
MCQhard

A company uses RAG (Retrieval-Augmented Generation) with OCI OpenSearch and OCI Generative AI. The system retrieves irrelevant documents. What is the first step to debug?

A.Use a different LLM
B.Increase the number of retrieved documents
C.Check the embeddings quality
D.Lower the temperature
AnswerC

Embeddings directly impact retrieval relevance; low-quality embeddings cause irrelevant results.

Why this answer

Option A is correct because poor quality embeddings often cause retrieval of irrelevant documents. Checking and improving embeddings (e.g., using a better model or fine-tuning) should be the first step. Option B (increasing retrieved documents) may include more noise.

Option C (different LLM) does not address retrieval. Option D (lower temperature) affects generation but not retrieval.

87
MCQhard

An AI assistant needs to solve complex math word problems step by step. Which prompting technique is most suitable?

A.Chain-of-thought prompting with few-shot examples.
B.Zero-shot prompting with the problem only.
C.Prompting with a high temperature setting.
D.Using a model with a larger context window.
AnswerA

Correct: CoT with examples guides reasoning.

Why this answer

Chain-of-thought prompting with few-shot examples is most suitable because it guides the LLM to break down complex math word problems into intermediate reasoning steps, mimicking human problem-solving. Few-shot examples provide a template for the desired reasoning structure, which significantly improves accuracy on multi-step arithmetic tasks compared to direct answer generation.

Exam trap

Oracle often tests the misconception that simply increasing model capacity (context window) or randomness (temperature) can substitute for structured reasoning, when in fact the prompting strategy itself is the critical factor for multi-step tasks.

How to eliminate wrong answers

Option B is wrong because zero-shot prompting lacks the explicit reasoning structure needed for multi-step math problems, often leading to incorrect or incomplete answers. Option C is wrong because a high temperature setting increases randomness in token selection, which is counterproductive for deterministic math tasks requiring precise calculations. Option D is wrong because a larger context window does not inherently improve reasoning quality; it only allows more input tokens, but without structured prompting the model may still fail to perform step-by-step logic.

88
MCQmedium

A developer sends this request but receives an error: "modelId not found". Which is the most likely cause?

A.The temperature parameter is out of range.
B.The compartment ID is incorrect.
C.The modelId "cohere.command" is deprecated.
D.The presencePenalty parameter is misspelled.
AnswerC

Deprecated model IDs return 'not found' errors; the correct ID should be used.

Why this answer

The error indicates the model ID is not recognized. "cohere.command" is a legacy model ID; the correct current ID might be "cohere.command-r" or similar. Compartment issues, temperature range, or parameter spelling would produce different errors.

89
MCQmedium

Refer to the exhibit. A developer runs this command and sees that the 'cohere.embed-english-v3.0' model is INACTIVE. What is the most likely cause?

A.The model is not supported in the current region.
B.The API call lacks the required OCI policy for the model.
C.The model has been deprecated and is no longer available.
D.The compartment does not have access to the model.
AnswerC

An INACTIVE state indicates the model has been deprecated or retired, making it unavailable for new inference requests.

Why this answer

The 'cohere.embed-english-v3.0' model is listed as INACTIVE because Oracle Cloud Infrastructure (OCI) has deprecated it, meaning it is no longer available for inference. When a model is deprecated, its status changes to INACTIVE, and any attempt to invoke it will fail, even if the region, policies, and compartment permissions are correctly configured.

Exam trap

Oracle often tests the distinction between model lifecycle states (INACTIVE vs. ACTIVE) and common operational errors (policy, region, compartment), leading candidates to confuse a deprecation event with a configuration or permission issue.

How to eliminate wrong answers

Option A is wrong because if the model were unsupported in the current region, the command would typically return a 'not found' or 'unsupported' error, not an INACTIVE status. Option B is wrong because a missing OCI policy would result in a 403 Forbidden or authorization error, not an INACTIVE model status. Option D is wrong because compartment access issues would produce a permissions error, not an INACTIVE status; the model's availability is independent of compartment-level access.

90
Multi-Selectmedium

Which TWO of the following are benefits of using OCI Generative AI service compared to self-hosting an LLM?

Select 2 answers
A.Lower latency always
B.No data egress costs
C.Built-in content safety filters
D.Automatic scaling
E.Full control over model weights
AnswersC, D

OCI Generative AI includes safety filters.

Why this answer

Options A and C are correct. OCI Generative AI provides automatic scaling (A) and built-in content safety filters (C). Self-hosting gives full control over weights (B) and may have lower latency if optimized (D is not always true).

Data egress costs (E) may still apply.

91
MCQhard

A data scientist is using the OCI Generative AI SDK to create embeddings for a large corpus of legal documents. They want to perform semantic search. Which endpoint should they use?

A./v1/classify
B./v1/embed
C./v1/generate
D./v1/chat
AnswerB

The /v1/embed endpoint returns embeddings that can be stored in a vector database and used for semantic search.

Why this answer

The /v1/embed endpoint is specifically designed to generate vector embeddings from input text, which are numerical representations that capture semantic meaning. For semantic search over a large corpus of legal documents, embeddings must be created to enable similarity comparisons, making this the correct choice.

Exam trap

Oracle often tests the distinction between embedding endpoints and generation/classification endpoints, trapping candidates who confuse the purpose of semantic search (which requires embeddings) with text generation or classification tasks.

How to eliminate wrong answers

Option A is wrong because /v1/classify is used for text classification tasks (e.g., sentiment analysis or topic labeling), not for generating embeddings. Option C is wrong because /v1/generate is for text generation (e.g., completing a prompt or producing new content), not for creating vector representations. Option D is wrong because /v1/chat is designed for conversational interactions with a chat model, not for producing embeddings for semantic search.

92
MCQeasy

A developer is using the OCI Generative AI API to generate text. The responses are often too short and incomplete. Which parameter adjustment is most likely to produce longer, more complete responses?

A.Decrease the max_tokens parameter.
B.Increase the max_tokens parameter.
C.Increase the top_p parameter.
D.Decrease the frequency_penalty parameter.
AnswerB

Increasing max_tokens gives the model more room to generate a complete response, directly addressing the issue of short outputs.

Why this answer

The max_tokens parameter controls the maximum number of tokens (words or subwords) the model can generate in a single response. By increasing max_tokens, the model is allowed to produce longer sequences, which directly addresses the issue of responses being too short and incomplete. In the OCI Generative AI API, this is the primary parameter for capping output length.

Exam trap

Oracle often tests the distinction between parameters that control output length (max_tokens) versus those that control output diversity or repetition (top_p, frequency_penalty), leading candidates to confuse 'more complete' with 'more creative' or 'less repetitive'.

How to eliminate wrong answers

Option A is wrong because decreasing max_tokens would further restrict the output length, making responses even shorter and more incomplete. Option C is wrong because increasing top_p adjusts nucleus sampling (the cumulative probability threshold for token selection) to control randomness and diversity, not the length of the output. Option D is wrong because decreasing frequency_penalty reduces the penalty for repeating tokens, which may increase repetition but does not directly extend the overall length or completeness of the response.

93
MCQmedium

Refer to the exhibit. A user in group GenAIGroup cannot see models in the Production compartment using OCI Generative AI. What is the most likely issue?

A.Statement should be 'use' instead of 'read'
B.Missing 'inspect' permission
C.Policy syntax incorrect (missing quotes)
D.Resource type should be 'generative-ai-model'
AnswerD

The correct resource type is 'generative-ai-model' for OCI Generative AI models.

Why this answer

The resource type in the policy is specified as 'oci-generativeai:model' but the correct resource type for OCI Generative AI models is 'generative-ai-model'. The policy syntax is invalid, so access is denied. Option B is correct.

94
Multi-Selecthard

Which THREE of the following are known limitations of large language models that practitioners must consider?

Select 3 answers
A.Hallucination of facts not present in the input.
B.Generation of toxic or harmful language.
C.Limited to processing only one language at a time.
D.Bias amplification from training data.
E.Inability to process inputs longer than a few hundred tokens.
AnswersA, B, D

LLMs often generate plausible but false information.

Why this answer

Option A is correct because large language models (LLMs) are prone to hallucination, where they generate plausible-sounding but factually incorrect information that was not present in the input. This occurs because LLMs are next-token predictors without a built-in fact-checking mechanism, and they can invent details, citations, or events to maintain coherence. Practitioners must implement retrieval-augmented generation (RAG) or external verification to mitigate this risk.

Exam trap

Oracle often tests the misconception that LLMs have a hard token limit of a few hundred tokens, but the trap is that modern models have large context windows (e.g., 128K tokens) and the real limitation is the quadratic computational cost of attention, not a strict inability to process longer inputs.

95
MCQhard

An architect needs to ensure that an LLM deployed in OCI does not reveal sensitive information in its outputs. Which technique should be used?

A.Limiting max tokens
B.OCI Data Safe masking
C.Output filtering via custom inference wrapper
D.Input sanitization
AnswerC

A custom wrapper can filter outputs to remove sensitive information.

Why this answer

Option C is correct because output filtering via a custom inference wrapper allows the architect to inspect and sanitize the model's generated text before it reaches the user, preventing the leakage of sensitive information such as PII, credentials, or internal data. This technique operates at the application layer, intercepting the LLM's response and applying rules or regex patterns to redact or block prohibited content, which is essential for compliance and data security in production deployments.

Exam trap

Oracle often tests the distinction between input-side controls (like sanitization) and output-side controls (like filtering), and the trap here is that candidates confuse input sanitization with output filtering, assuming that cleaning the input is sufficient to prevent data leakage from the model's training or internal knowledge.

How to eliminate wrong answers

Option A is wrong because limiting max tokens only restricts the length of the output, not its content, and does nothing to prevent sensitive information from appearing within the allowed token count. Option B is wrong because OCI Data Safe masking is designed for structured databases and relational data, not for unstructured text generated by an LLM; it cannot be applied to model outputs in real-time. Option D is wrong because input sanitization focuses on cleaning user prompts before they reach the model, which is important for prompt injection prevention but does not control what the model generates in its response.

96
MCQmedium

A machine learning engineer is fine-tuning a model on OCI Data Science and notices that the training loss decreases but then suddenly increases. What is the most likely cause?

A.Reduce model size
B.Add dropout
C.Increase batch size
D.Increase learning rate
AnswerA

Reducing model size reduces capacity and helps prevent overfitting, making it the best solution among given options.

Why this answer

The sudden increase in training loss after a period of decrease is a classic sign of gradient explosion, often caused by an excessively large learning rate. When the learning rate is too high, the optimizer overshoots the minima, causing the loss to diverge. Reducing the learning rate stabilizes training by ensuring smaller, more controlled weight updates.

Exam trap

Oracle often tests the misconception that overfitting is the cause of loss divergence, leading candidates to choose regularization techniques like dropout, when the actual issue is an unstable learning rate causing gradient explosion.

How to eliminate wrong answers

Option A is correct because reducing the model size does not directly address the loss divergence caused by an overly large learning rate; it may even reduce capacity. Option B is wrong because adding dropout is a regularization technique to prevent overfitting, not to fix gradient explosion or learning rate issues. Option C is wrong because increasing batch size can improve gradient stability but does not prevent the loss from spiking due to a high learning rate.

Option D is wrong because increasing the learning rate would exacerbate the problem, making the loss diverge further.

97
MCQhard

An OCI GenAI model generates English to French translation. Which metric is most appropriate to evaluate its quality?

A.Perplexity
B.ROUGE
C.F1 score
D.BLEU
AnswerD

BLEU is the standard metric for translation tasks.

Why this answer

BLEU (Bilingual Evaluation Understudy) is the standard metric for machine translation tasks because it measures the n-gram overlap between the generated translation and one or more reference translations, directly assessing fluency and adequacy. For English-to-French translation, BLEU correlates well with human judgment of translation quality, making it the most appropriate choice.

Exam trap

Oracle often tests the distinction between metrics for generation tasks (BLEU for translation, ROUGE for summarization, perplexity for language modeling) and classification metrics (F1 score), leading candidates to confuse their appropriate domains.

How to eliminate wrong answers

Option A is wrong because perplexity measures how well a language model predicts a sequence of tokens, not the quality of a translation against a reference. Option B is wrong because ROUGE is designed for summarization tasks, focusing on recall of n-grams and longest common subsequences, not translation accuracy. Option C is wrong because F1 score is a classification metric (precision and recall) that does not capture the sequential and lexical alignment required for evaluating translation output.

98
MCQeasy

A researcher wants to compare the performance of two LLMs on OCI Generative AI: a base model and an instruct model. They notice the instruct model often refuses to generate certain types of content. Which factor most likely explains this behavior?

A.The base model was programmed to follow stricter rules.
B.The instruct model has been fine-tuned with reinforcement learning from human feedback (RLHF) to align with safety guidelines.
C.The instruct model was trained on a smaller dataset.
D.The base model rejects content more often.
AnswerB

RLHF makes instruct models more likely to reject unsafe requests.

Why this answer

Option B is correct because instruct models are typically fine-tuned using reinforcement learning from human feedback (RLHF) to align with safety guidelines and ethical constraints. This fine-tuning process teaches the model to refuse generating harmful, biased, or unsafe content, which explains why the instruct model refuses certain types of content while the base model does not.

Exam trap

Oracle often tests the misconception that refusal behavior is due to dataset size or rule-based programming, when in fact it is a direct result of RLHF-based safety alignment in instruct models.

How to eliminate wrong answers

Option A is wrong because base models are not programmed with explicit rule-based filters; they are trained on large text corpora without specific refusal mechanisms. Option C is wrong because the training dataset size does not directly cause refusal behavior; instruct models are often fine-tuned on smaller, curated datasets but the refusal stems from RLHF alignment, not dataset size. Option D is wrong because base models typically do not reject content more often; they generate outputs freely without the safety alignment that instruct models undergo.

99
MCQmedium

An AI engineer is testing a large language model on OCI Generative AI and receives this error: 'Token limit exceeded. Maximum context length is 4096 tokens.' The prompt is 4000 tokens long. What is the most effective way to resolve the issue without losing important context?

A.Reduce the prompt length by summarizing or trimming less relevant information.
B.Switch to a model with a larger context window, if available.
C.Increase the max_tokens parameter in the API call.
D.Split the prompt into multiple requests and combine outputs.
AnswerA

Reducing prompt length ensures it fits within the token limit while preserving key context.

Why this answer

Option A is correct because the error indicates that the combined prompt and generated output exceed the model's maximum context length of 4096 tokens. Since the prompt alone is 4000 tokens, there is very little room for the model to generate a response. Trimming or summarizing less relevant parts of the prompt directly reduces the token count, allowing the model to produce a complete output without exceeding the limit.

This approach preserves the most critical context while staying within the model's constraints.

Exam trap

Oracle often tests the misconception that increasing max_tokens or switching models can bypass the token limit, but the core issue is the total context length, which is a fixed architectural constraint of the model.

How to eliminate wrong answers

Option B is wrong because switching to a model with a larger context window may not be available in the current environment or may introduce additional costs and latency; the question asks for the most effective way to resolve the issue without losing important context, and reducing the prompt is a more direct and universally applicable solution. Option C is wrong because increasing the max_tokens parameter does not change the total context length limit; it only controls the maximum number of tokens the model can generate, and if the prompt already consumes 4000 tokens, increasing max_tokens would still cause the total to exceed 4096. Option D is wrong because splitting the prompt into multiple requests and combining outputs can lead to loss of coherence and context across the separate calls, and the model does not maintain state between requests, so important relationships between parts of the prompt would be lost.

100
MCQhard

A data scientist is designing a prompt to extract structured information (e.g., JSON) from text using an instruct model on OCI Generative AI. The model sometimes outputs additional text beyond the JSON, breaking parsing. Which prompt engineering technique is most effective to enforce structured output?

A.Use a base model instead of an instruct model.
B.Set the temperature to 0.0 to reduce randomness.
C.Include a few-shot example of the expected JSON output in the prompt.
D.Increase max_tokens to allow for additional output.
AnswerC

Few-shot examples teach the model to output precisely in the desired format.

Why this answer

Option C is correct because few-shot prompting provides explicit examples of the desired output format, which instructs the model to follow the exact JSON structure and reduces the likelihood of extraneous text. This technique leverages the model's in-context learning ability to adhere to formatting constraints, making it the most effective for enforcing structured output in OCI Generative AI instruct models.

Exam trap

Oracle often tests the misconception that lowering temperature or increasing tokens can enforce output format, when in reality only explicit formatting examples (few-shot) reliably constrain the model's output structure.

How to eliminate wrong answers

Option A is wrong because base models lack instruction-following capabilities and are more prone to generating unstructured or irrelevant text, making them less suitable for structured output tasks. Option B is wrong because setting temperature to 0.0 reduces randomness but does not prevent the model from outputting additional explanatory text beyond the JSON; it only makes outputs more deterministic, not format-compliant. Option D is wrong because increasing max_tokens allows more room for additional output, which would exacerbate the problem of extra text beyond the JSON, not solve it.

101
MCQeasy

An OCI AI Language text classification request returns the output shown. Which conclusion is most accurate?

A.The model is uncertain about the sentiment.
B.The text is classified as Positive with high confidence.
C.The API endpoint is misconfigured.
D.The --endpoint parameter is optional.
AnswerB

The label 'Positive' with score 0.98 confirms high-confidence classification.

Why this answer

The output shows a sentiment label of 'Positive' with a confidence score of 0.98, indicating the model is highly confident in its classification. Option B correctly identifies this as a positive sentiment with high confidence, which is the most accurate conclusion based on the provided data.

Exam trap

The trap here is that candidates may misinterpret a high confidence score as uncertainty (Option A) due to a common misconception that AI models always express doubt, but in OCI AI Language, a score near 1.0 explicitly indicates high certainty.

How to eliminate wrong answers

Option A is wrong because a confidence score of 0.98 indicates the model is very certain, not uncertain, about the sentiment. Option C is wrong because the API endpoint is not misconfigured; the request returned a valid response with a sentiment label and confidence score, which would not happen if the endpoint were misconfigured. Option D is wrong because the --endpoint parameter is not optional; it is required to specify the OCI AI Language endpoint for the API call, and its absence would cause a request failure.

102
MCQmedium

A model generates code with security issues. Which approach is best to mitigate this?

A.Reduce max_tokens
B.Increase temperature
C.Use a different model
D.Add a system prompt with security guidelines
AnswerD

System prompts can guide the model to produce secure code.

Why this answer

Adding a system prompt with security guidelines (option C) instructs the model to follow best practices, directly addressing security concerns without changing model training.

103
MCQhard

Refer to the exhibit. A developer runs the OCI CLI command and receives the output. However, the text "Hello, how are you?" is actually a mix of English and French words. Why does the model assign only 0.03 to French?

A.The text is overwhelmingly English, so the model assigns a low probability to French.
B.The model is limited to identifying a single language per query.
C.The model cannot detect multiple languages in a single text.
D.The model's scores are normalized to sum to 1, so a high English score forces low others.
AnswerA

The phrase is mostly English, so the model is confident it is English.

Why this answer

Option A is correct because the model's output shows a probability distribution over languages, and the text is predominantly English with only a few French words. The model assigns a low probability (0.03) to French because the overwhelming majority of tokens are English, making the text far more likely to be classified as English. This reflects how language identification models evaluate the overall composition of the input.

Exam trap

Oracle often tests the misconception that normalized probabilities force a single language to dominate, but the trap here is that candidates may think the low French score is an artifact of normalization rather than a reflection of the actual token distribution in the text.

How to eliminate wrong answers

Option B is wrong because the model can output probabilities for multiple languages simultaneously, as shown in the exhibit where both English and French scores are present. Option C is wrong because the model can detect multiple languages in a single text, as evidenced by the non-zero probability assigned to French; it does not have a hard limit of one language per query. Option D is wrong because while scores are normalized to sum to 1, the low French score is due to the actual token composition, not merely a forced consequence of normalization; normalization reflects the relative likelihoods, but the model could assign high scores to multiple languages if the text were genuinely multilingual.

104
MCQeasy

What is the role of the softmax function in the output layer of an LLM?

A.Apply attention
B.Tokenize input
C.Compute gradients
D.Convert logits to probabilities
AnswerD

Softmax normalizes logits into a probability distribution.

Why this answer

The softmax function in the output layer of an LLM converts the raw, unnormalized scores (logits) produced by the final linear layer into a probability distribution over the vocabulary. This allows the model to output a valid probability for each token, where all probabilities sum to 1, enabling sampling or greedy decoding for next-token prediction.

Exam trap

The trap here is that candidates may confuse the role of softmax with other transformer components like attention or tokenization, especially since all are critical to LLM operation, but only softmax directly converts logits to probabilities in the output layer.

How to eliminate wrong answers

Option A is wrong because attention is a mechanism within the transformer architecture (e.g., self-attention in the encoder/decoder blocks) that computes weighted sums of values based on queries and keys, not a function applied in the output layer. Option B is wrong because tokenization is a preprocessing step that splits input text into tokens (e.g., using BPE or WordPiece) before the model processes them, not a function of the output layer. Option C is wrong because gradient computation is part of the backpropagation algorithm during training, not an inference-time operation of the output layer; softmax itself is differentiable but its role is to produce probabilities, not compute gradients.

105
MCQeasy

A user has a prompt that exceeds the model's token limit. What is the best practice to handle this?

A.Summarize the earlier parts of the prompt and include the summary.
B.Increase the max tokens parameter in the API call.
C.Truncate the prompt and hope the model understands.
D.Split the input into multiple calls and merge results.
AnswerA

Correct: Summarization preserves context while reducing token count.

Why this answer

Option A is correct because when a prompt exceeds the model's token limit, the best practice is to summarize the earlier parts of the prompt and include the summary. This preserves the essential context without exceeding the token limit, as the model's context window is fixed (e.g., 4,096 tokens for GPT-3.5 or 8,192 for GPT-4). Summarization reduces token count while retaining key information, enabling the model to process the entire input within its constraints.

Exam trap

Oracle often tests the misconception that increasing the max tokens parameter can extend the input capacity, when in reality it only affects output length, not the fixed context window.

How to eliminate wrong answers

Option B is wrong because increasing the max tokens parameter does not expand the model's context window; it only controls the length of the generated response, not the input prompt limit. Option C is wrong because truncating the prompt arbitrarily removes potentially critical context, leading to incomplete or incorrect model understanding, as the model cannot infer missing information. Option D is wrong because splitting the input into multiple calls and merging results breaks the conversational context; the model has no memory across separate API calls, so the merged output would lack coherence and continuity.

106
Multi-Selectmedium

Which TWO of the following are common applications of large language models in enterprise settings?

Select 2 answers
A.Summarizing lengthy legal documents.
B.Performing real-time signal processing for audio streams.
C.Generating boilerplate code from natural language descriptions.
D.Replacing relational databases for data storage.
E.Enhancing low-resolution images through super-resolution.
AnswersA, C

LLMs are effective for text summarization.

Why this answer

Option A is correct because large language models (LLMs) excel at abstractive summarization, which involves condensing lengthy legal documents into concise summaries while preserving key facts and legal reasoning. This is a common enterprise application for legal departments, as LLMs can process large volumes of text and generate coherent, context-aware summaries without requiring manual reading.

Exam trap

Oracle often tests the distinction between LLMs' text-based capabilities and specialized AI tasks (e.g., signal processing, image enhancement), leading candidates to mistakenly assume LLMs can handle any AI task due to their broad 'general intelligence' appearance.

107
MCQeasy

Based on the exhibit, what is the primary action the developer must take to successfully make the inference request?

A.Increase max_new_tokens to 5000 to get a longer response.
B.Ignore the error and retry the request.
C.Reduce max_new_tokens to 2000 to stay within the context length.
D.Switch to a model in a different region.
AnswerC

This reduces total tokens to 8000, within 8192 limit.

Why this answer

Option B is correct because the total (6000 + 4000 = 10000) exceeds 8192. Reducing max_new_tokens or prompt length lowers the total. Option A (increase max_new_tokens) worsens it.

Option C (change region) is irrelevant. Option D (ignore and retry) will fail again.

108
MCQhard

An architect is designing a multi-tenant application using OCI Generative AI. Each tenant has custom instructions and data. To minimize cost while maintaining isolation, which deployment approach is recommended?

A.Dedicated fine-tuned endpoint per tenant.
B.Shared base model with per-tenant system prompts and retrieval.
C.On-premises deployment of open-source models.
D.Single large fine-tuned model with conditional logic.
AnswerB

This approach uses a shared model with tenant-specific prompts and RAG, balancing cost and isolation.

Why this answer

A shared base model with per-tenant customization via system prompts and retrieval (RAG) is cost-effective and provides isolation through prompt engineering and data segregation. Dedicated fine-tuned endpoints are expensive, a single large model with conditional logic risks prompt injection, and on-premises deployment may not be feasible or scalable.

109
MCQmedium

A healthcare startup is building an AI assistant to help doctors draft clinical notes from patient-physician conversations. They have a large language model that is fine-tuned on medical data. During testing, they notice the model occasionally generates plausible-sounding but incorrect medical recommendations. The startup wants to deploy the assistant to assist doctors, not replace them. They have the following options: (A) Deploy the model as-is and rely on doctors to catch errors, (B) Add a disclaimer that the model may make mistakes, (C) Implement a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base before presenting to doctors, (D) Reduce the model's temperature to 0 to ensure deterministic outputs. Which option best balances safety and utility?

A.Implement a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base.
B.Add a disclaimer that the model may make mistakes.
C.Deploy the model as-is and rely on doctors to catch errors.
D.Reduce the model's temperature to 0 to ensure deterministic outputs.
AnswerA

Fact-checking reduces hallucinations and ensures accuracy.

Why this answer

Option C is correct because it directly addresses the factual accuracy issue by validating outputs. Option A is wrong because relying on doctors to catch all errors is unsafe and burdensome. Option B is wrong because a disclaimer does not prevent harm.

Option D is wrong because deterministic outputs do not guarantee correctness; the model can still be confidently wrong.

110
Multi-Selecthard

Which THREE techniques are commonly used to improve the quality of text generation?

Select 3 answers
A.Temperature scaling
B.Top-k sampling
C.Greedy decoding
D.Random sampling
E.Beam search
AnswersA, B, E

Temperature scaling smooths token probabilities and can improve the quality-diversity trade-off.

Why this answer

Temperature scaling is correct because it controls the randomness of token probability distributions by dividing logits before softmax; lower temperatures (e.g., 0.1) make the model more deterministic, while higher temperatures (e.g., 1.5) increase diversity. This directly influences the quality of generated text by balancing coherence and creativity.

Exam trap

Oracle often tests the misconception that greedy decoding or random sampling are valid quality-improvement techniques, when in fact they either cause repetition (greedy) or incoherence (random) without the controlled stochasticity of temperature, top-k, or the global optimization of beam search.

111
MCQmedium

An organization is concerned about the safety of generated content. Which OCI feature allows them to define custom policies to block inappropriate outputs?

A.OCI IAM policies
B.Content filtering and safety controls in Generative AI
C.OCI Audit logs
D.OCI Vault
AnswerB

The Generative AI service includes configurable safety filters that can block inappropriate content based on defined categories and thresholds.

Why this answer

Option B is correct because OCI Generative AI includes built-in content filtering and safety controls that allow organizations to define custom policies to block inappropriate or harmful outputs. These controls operate at the model inference layer, enabling fine-grained filtering based on categories such as toxicity, hate speech, or personally identifiable information (PII). This directly addresses the concern about generated content safety.

Exam trap

The trap here is that candidates often confuse IAM policies (access control) with content safety policies, or assume that logging (Audit) or encryption (Vault) can prevent inappropriate outputs, when in fact only the Generative AI service's built-in content filtering provides that capability.

How to eliminate wrong answers

Option A is wrong because OCI IAM policies govern access control and permissions for OCI resources, not the filtering or safety of generated content from AI models. Option C is wrong because OCI Audit logs capture API calls and operational events for compliance and monitoring, but they do not provide any mechanism to block or filter inappropriate outputs in real time. Option D is wrong because OCI Vault is a key management service for storing and managing secrets, encryption keys, and certificates; it has no role in content safety or output filtering for generative AI.

112
MCQhard

A financial institution uses an LLM for generating investment advice. They are concerned about hallucinations. Which method is most effective?

A.Fine-tune on general financial data.
B.Use RAG with a verified corpus of regulations and reports.
C.Increase the temperature to get more creative responses.
D.Use a larger model to improve accuracy.
AnswerB

Correct: Grounding in trusted data reduces hallucinations.

Why this answer

Option B is correct because Retrieval-Augmented Generation (RAG) grounds the LLM's output in a verified, external knowledge base (e.g., regulations and reports). By retrieving relevant documents at inference time, RAG reduces the model's reliance on its parametric memory, directly mitigating hallucinations in high-stakes domains like financial advice.

Exam trap

Oracle often tests the misconception that simply fine-tuning or scaling a model can fix hallucinations, when in fact grounding via retrieval (RAG) is the most effective technique for factual accuracy in domain-specific applications.

How to eliminate wrong answers

Option A is wrong because fine-tuning on general financial data does not provide a mechanism to verify or update the model's knowledge at inference time; it only adjusts weights on static data, leaving the model prone to hallucinating outdated or fabricated details. Option C is wrong because increasing temperature makes the output more random and creative, which amplifies the risk of hallucinations rather than reducing them. Option D is wrong because using a larger model does not inherently solve hallucination; larger models can still confidently generate false information, and without a retrieval or grounding mechanism, they remain susceptible to fabricating details.

113
MCQhard

A data engineer wants to migrate a large corpus of PDFs to OCI for use with GenAI. Which storage and preprocessing approach is most efficient for RAG?

A.Store PDFs in OCI Object Storage, then use OCI AI Document Understanding to extract text and create embeddings.
B.Convert PDFs to text locally, upload to OCI Database, use SQL queries to retrieve.
C.Use OCI Data Flow to process in batch and store in NoSQL.
D.Store PDFs in OCI File Storage, mount to compute, run offline extraction.
AnswerA

This leverages cloud-native services for scalable extraction and embedding, ideal for RAG.

Why this answer

Option A is correct because OCI Object Storage is optimized for large-scale, unstructured data like PDFs, and OCI AI Document Understanding provides a managed service to extract text from PDFs, which can then be directly fed into embedding pipelines for RAG. This eliminates the need for manual preprocessing or local compute, ensuring scalability and integration with GenAI services.

Exam trap

Oracle often tests the misconception that any storage service (like File Storage or Database) can be used for RAG, but the key is that Object Storage combined with a managed AI extraction service is the most efficient for unstructured data at scale, avoiding local processing overhead.

How to eliminate wrong answers

Option B is wrong because converting PDFs to text locally introduces a bottleneck and inefficiency for large corpora, and storing text in OCI Database with SQL queries is not designed for vector search or RAG workflows, lacking native embedding support. Option C is wrong because OCI Data Flow (Apache Spark) is for batch processing but storing in NoSQL does not provide the vector indexing or retrieval capabilities required for RAG, and it adds unnecessary complexity. Option D is wrong because OCI File Storage is a shared file system for compute instances, not optimized for high-throughput object access, and running offline extraction on a mounted compute instance is manual, lacks scalability, and does not leverage managed AI services.

114
MCQmedium

A data scientist wants to improve the accuracy of a summarization model on medical texts. Which OCI service feature is most suitable?

A.OCI Data Flow
B.OCI Language service
C.OCI Generative AI fine-tuning
D.OCI Anomaly Detection
AnswerC

Fine-tuning adapts a model to domain-specific data, improving accuracy.

Why this answer

C is correct because OCI Generative AI fine-tuning allows a data scientist to adapt a pre-trained large language model (LLM) specifically for medical text summarization by training it on domain-specific data. This improves accuracy by aligning the model's outputs with the terminology, context, and nuances of medical literature, which generic models may not capture well.

Exam trap

The trap here is that candidates may confuse the OCI Language service's pre-built summarization capabilities with the ability to customize a model for a specialized domain, overlooking that fine-tuning is required for significant accuracy improvements on niche text like medical records.

How to eliminate wrong answers

Option A is wrong because OCI Data Flow is a serverless Apache Spark-based data processing service for ETL and big data analytics, not designed for fine-tuning or improving summarization model accuracy. Option B is wrong because OCI Language service provides pre-trained NLP capabilities like sentiment analysis and entity extraction but does not support custom fine-tuning of generative models for summarization tasks. Option D is wrong because OCI Anomaly Detection is used for identifying unusual patterns in time-series data, such as equipment failures or fraud, and has no relevance to improving text summarization accuracy.

115
MCQhard

Refer to the exhibit. A user in GenAI-Users group tries to run a text generation inference but gets permission denied. What is the most likely issue?

A.The policy resource type is wrong.
B.The operation condition is too restrictive.
C.The group name mismatch.
D.The user is not in the compartment.
AnswerB

The condition likely does not match the actual operation, causing denial.

Why this answer

The policy attached to the GenAI-Users group includes a condition that restricts the operation to a specific compartment or resource, but the user is attempting to run inference in a different compartment or without meeting the condition. Since the condition is too restrictive, the IAM policy denies the action even though the user is in the correct group and the resource type is valid.

Exam trap

Oracle often tests the nuance that a policy with overly restrictive conditions (e.g., scoping to a specific compartment or resource) will deny access even when the group, resource type, and user compartment are all correct, leading candidates to incorrectly blame the group or resource type.

How to eliminate wrong answers

Option A is wrong because the policy resource type (e.g., 'ai-language-models' or 'genai-models') is correct for text generation inference in OCI Generative AI, so a mismatch would cause a different error. Option C is wrong because the group name mismatch would result in no policy being applied at all, not a permission denied error with a valid group. Option D is wrong because the user being in the compartment is not the issue; the condition in the policy is what restricts the operation, not the user's compartment membership.

116
MCQmedium

An AI specialist is troubleshooting why a fine-tuned model produces inconsistent results across different inference calls. What is the most likely cause?

A.The base model is not suitable
B.The temperature is set too high
C.The model is overfitted
D.The fine-tuning dataset is too small
AnswerB

High temperature increases randomness, causing variable outputs.

Why this answer

Temperature controls the randomness of token sampling during inference. A high temperature (e.g., >1.0) increases the probability of selecting less likely tokens, causing the model to produce varied outputs for the same input across different calls. This is the most direct cause of inconsistent results when the base model and fine-tuning are otherwise sound.

Exam trap

Oracle often tests the misconception that overfitting (Option C) causes inconsistency, but overfitting actually reduces variance by memorizing patterns; the trap is confusing output variability with poor generalization.

How to eliminate wrong answers

Option A is wrong because an unsuitable base model would cause consistently poor or biased outputs, not inconsistency across calls; the base model's suitability affects overall quality, not per-call variance. Option C is wrong because overfitting leads to memorization of training data, producing deterministic or near-identical outputs for similar inputs, not random inconsistency. Option D is wrong because a small fine-tuning dataset typically causes underfitting or poor generalization, not random variation across inference calls; inconsistency from small data would manifest as high variance across different inputs, not across repeated calls with the same input.

117
Multi-Selecthard

An organization is planning to use OCI Generative AI for sensitive customer data. Which three OCI services or features should they consider for data governance and security?

Select 3 answers
A.OCI Vault for managing API keys
B.OCI Data Safe for data masking and encryption
C.OCI IAM for access control
D.OCI Data Labeling for annotating data
E.OCI Audit for logging API calls
AnswersA, C, E

Secure storage of API keys and secrets is crucial for authentication to Generative AI endpoints.

Why this answer

Option A is correct because OCI Vault is a dedicated service for securely storing and managing secrets, including API keys used to authenticate to OCI Generative AI. By centralizing API key management in Vault, organizations can enforce rotation policies, access controls, and audit trails, which is critical for protecting sensitive customer data when invoking generative AI models.

Exam trap

Oracle often tests the misconception that database security services like Data Safe apply to all data in OCI, but candidates must recognize that Generative AI operates through API calls and does not use a relational database, making Data Safe irrelevant here.

118
MCQhard

An engineer sets beam search width to 1 during inference on OCI Generative AI. What is the most likely effect on output?

A.More memory usage
B.More diverse outputs
C.Better quality
D.Faster inference
AnswerD

Greedy decoding is the fastest decoding method as it considers only one candidate path.

Why this answer

Beam search width of 1 corresponds to greedy decoding, which selects the highest probability token at each step. This results in deterministic but often less diverse and potentially lower quality outputs compared to wider beam search. Option B is correct.

119
MCQmedium

During fine-tuning of a Cohere model on OCI Data Science, the loss curve shows a sharp spike after epoch 3. What is the most appropriate action?

A.Gradient clipping.
B.Reduce learning rate.
C.Add more training data.
D.Increase batch size.
AnswerA

Gradient clipping limits gradient values, preventing explosion and stabilizing training.

Why this answer

A sharp spike in the loss curve after epoch 3 during fine-tuning indicates a gradient explosion, where the gradients become excessively large and destabilize the model's weights. Gradient clipping is the most appropriate action because it directly caps the gradient norm (e.g., using `max_grad_norm=1.0` in Cohere's fine-tuning API) to prevent these spikes, ensuring stable training without altering the learning dynamics.

Exam trap

Oracle often tests the distinction between gradient explosion (sharp spikes) and learning rate divergence (gradual increase), leading candidates to incorrectly choose reducing the learning rate instead of gradient clipping.

How to eliminate wrong answers

Option B is wrong because reducing the learning rate addresses gradual divergence or oscillation, not sudden spikes; a sharp spike is a sign of gradient explosion, not a learning rate that is too high. Option C is wrong because adding more training data improves generalization and reduces overfitting but does not mitigate gradient instability during training. Option D is wrong because increasing batch size can stabilize gradient estimates but may also increase memory usage and does not directly prevent individual gradient values from becoming too large; it can even exacerbate gradient explosion by averaging over more samples.

120
MCQhard

You are a machine learning engineer at a large e-commerce company. You have been tasked with deploying a large language model to power a customer service chatbot that handles product returns and refunds. The model will answer customer queries based on a knowledge base of return policies and FAQs. The company has strict requirements: (1) responses must be factually accurate and grounded in the knowledge base, (2) the system must be cost-effective, and (3) latency should be under 2 seconds per response. You decide to use a pre-trained LLM from OCI Data Science and implement retrieval-augmented generation (RAG). You have two options for the retriever: a dense embedding-based retriever (e.g., using OCI AI Language embeddings) or a sparse keyword-based retriever (e.g., BM25). You also need to decide on the generation model size: a 7B parameter model or a 70B parameter model. You run a pilot test: with the dense retriever + 7B model, average latency is 1.8 seconds and accuracy is 85%. With the sparse retriever + 7B model, latency is 1.2 seconds but accuracy drops to 75%. With the 70B model (any retriever), latency exceeds 5 seconds. Which combination should you choose to meet all requirements?

A.Sparse retriever + 70B model.
B.Dense retriever + 70B model.
C.Sparse retriever + 7B model.
D.Dense retriever + 7B model.
AnswerD

Meets both latency and accuracy requirements.

Why this answer

Option D (dense retriever + 7B model) is correct because it meets all three requirements: factual accuracy (85% accuracy from dense retrieval grounding), latency under 2 seconds (1.8 seconds), and cost-effectiveness (7B model is cheaper to run than 70B). The dense retriever provides better semantic matching for nuanced return policy queries, while the 7B model keeps inference fast and affordable.

Exam trap

Oracle often tests the trade-off between retrieval accuracy and model size, where candidates mistakenly prioritize a larger model (70B) for better generation quality, ignoring that the latency constraint makes it infeasible, or choose a sparse retriever thinking it's faster, but overlook the critical accuracy requirement for grounded responses.

How to eliminate wrong answers

Option A is wrong because the 70B model with any retriever exceeds 5 seconds latency, violating the 2-second requirement. Option B is wrong because the 70B model also exceeds 5 seconds latency, failing the latency requirement. Option C is wrong because the sparse retriever (BM25) with the 7B model yields only 75% accuracy, which is below the acceptable factual accuracy threshold given the strict requirement for grounded responses.

121
Multi-Selectmedium

Which three techniques are commonly used to reduce the risk of prompt injection in LLM applications? (Choose three.)

Select 3 answers
A.Enabling prompt validation against regex patterns.
B.Output filtering.
C.Increasing temperature.
D.Input sanitization.
E.Using role-based system prompts.
AnswersB, D, E

Filtering outputs can block dangerous responses.

Why this answer

Output filtering (B) is correct because it acts as a post-processing defense that scans the LLM's generated output for malicious content, such as leaked system prompts or injected commands, before it reaches the user. This technique helps mitigate the impact of successful prompt injections by catching and neutralizing harmful outputs that bypass input controls.

Exam trap

Oracle often tests the distinction between security controls and model parameters, so the trap here is that candidates mistakenly think adjusting model settings like temperature can reduce injection risk, when in fact only input/output controls and system prompt design are effective.

122
MCQeasy

An organization wants to use an LLM to summarize legal documents. Which consideration is most important for ensuring accurate summaries?

A.Fine-tune the model on a curated legal corpus
B.Use the largest available general-purpose model
C.Rely on zero-shot summarization with careful prompting
D.Pre-train a new model from scratch on legal texts
AnswerA

Domain-specific fine-tuning teaches the model legal terminology and reasoning.

Why this answer

Legal documents require precise understanding, so fine-tuning on legal data is critical. Option B is wrong because larger models don't guarantee domain accuracy. Option C is wrong because pre-training from scratch is expensive and unnecessary.

Option D is wrong because zero-shot may miss legal nuances.

123
MCQhard

During fine-tuning of a large language model on OCI, you notice that the model's performance on the validation set is not improving after several epochs, but the training loss continues to decrease. What is the most likely cause?

A.The learning rate is too high.
B.The validation set is not representative.
C.The model is overfitting to the training data.
D.The training data is too small.
AnswerC

Overfitting occurs when the model memorizes training examples, causing training loss to drop while validation performance plateaus or declines. This is the most likely cause.

Why this answer

When training loss decreases but validation performance stagnates or worsens, the model is overfitting to the training data. It memorizes the training examples but fails to generalize. A high learning rate might cause divergence, not this pattern.

Too small training data can contribute to overfitting but is not the direct symptom. An unrepresentative validation set could cause mismatch, but the described pattern is classic overfitting.

124
MCQeasy

A healthcare company is using OCI GenAI to generate patient summaries from clinical notes. The model output sometimes includes hallucinated medical facts, such as incorrect dosages or diagnoses, which could be dangerous. The team needs to improve factual accuracy while maintaining data privacy. They have a large collection of internal medical knowledge bases (clinical guidelines, drug databases) that are stored in OCI Object Storage. The current implementation uses a zero-shot prompt with the base Cohere Command model. The data science team has limited GPU resources and wants to avoid building a complex pipeline. Which course of action best addresses the hallucination problem?

A.Increase the temperature parameter to 0.9 to encourage more deterministic outputs.
B.Use prompt engineering to add 'Only provide facts that are absolutely certain.'
C.Implement a RAG pipeline that retrieves relevant documents from the internal knowledge bases and includes them in the prompt.
D.Fine-tune the Cohere model on a publicly available medical dataset like PubMed.
AnswerC

RAG grounds generation in retrieved facts, significantly reducing hallucinations.

Why this answer

Option C is correct because a Retrieval-Augmented Generation (RAG) pipeline directly addresses hallucination by grounding the model's output in verified, internal medical knowledge bases stored in OCI Object Storage. This approach retrieves relevant clinical guidelines or drug database entries and includes them in the prompt, providing factual context without requiring fine-tuning or complex GPU-intensive pipelines. It also preserves data privacy by keeping sensitive medical data within OCI and avoids exposing it to external model training.

Exam trap

Oracle often tests the misconception that prompt engineering alone can reliably eliminate hallucinations, but the trap here is that without external knowledge injection (RAG), the model cannot overcome its inherent tendency to fabricate facts, especially in high-stakes domains like healthcare.

How to eliminate wrong answers

Option A is wrong because increasing the temperature to 0.9 actually increases randomness and creativity, making outputs less deterministic and more prone to hallucinations, not less. Option B is wrong because prompt engineering with a vague instruction like 'Only provide facts that are absolutely certain' does not supply the model with actual factual data; the model still relies on its internal parametric knowledge, which is the source of hallucinations. Option D is wrong because fine-tuning on a publicly available dataset like PubMed introduces public, non-confidential data that may not align with the company's internal medical knowledge, and it requires significant GPU resources and complex pipeline management, which the team explicitly wants to avoid.

125
MCQmedium

A developer runs an OCI GenAI chat request with system prompt "You are a sarcastic assistant." The output is offensive. How can the developer enforce safety policies?

A.Use the OCI GenAI content moderation filter.
B.Change model to LLAMA.
C.Increase maxTokens.
D.Set temperature to 0.
AnswerA

Content moderation filters explicitly block harmful or offensive content in outputs.

Why this answer

Option A is correct because the OCI GenAI content moderation filter is specifically designed to enforce safety policies by detecting and blocking offensive, harmful, or policy-violating content in both input prompts and model outputs. By enabling this filter, the developer can prevent the model from generating offensive responses even when a system prompt like 'You are a sarcastic assistant' encourages undesirable behavior.

Exam trap

Oracle often tests the misconception that adjusting model parameters (like temperature or maxTokens) or switching model families can substitute for explicit content moderation, when in fact safety enforcement requires dedicated filtering mechanisms that operate independently of model behavior.

How to eliminate wrong answers

Option B is wrong because changing the model to LLAMA does not inherently enforce safety policies; LLAMA models have their own safety risks and require separate content moderation or fine-tuning to block offensive outputs. Option C is wrong because increasing maxTokens only extends the maximum length of the generated response, which does nothing to prevent offensive content—it may even allow the model to produce more harmful text. Option D is wrong because setting temperature to 0 makes the model deterministic (greedy decoding) but does not filter or moderate content; it can still generate offensive responses if the training data or system prompt encourages such behavior.

126
MCQeasy

A company is building a chatbot using OCI Generative AI service. They want to ensure that the model responses are grounded in their internal knowledge base. Which approach should they use?

A.Prompt engineering with few-shot examples
B.Fine-tuning the model on the internal knowledge base
C.Model distillation to compress the knowledge base
D.Retrieval-Augmented Generation (RAG)
AnswerD

RAG retrieves relevant documents from a knowledge base and uses them to generate grounded responses.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct approach because it retrieves relevant documents from the company's internal knowledge base at inference time and provides them as context to the LLM, ensuring the model's responses are grounded in verifiable, up-to-date information without modifying the model itself. This directly addresses the requirement to ground responses in an internal knowledge base while avoiding the cost and complexity of retraining.

Exam trap

The trap here is that candidates often confuse fine-tuning (Option B) as the only way to incorporate proprietary data, overlooking that RAG provides a more flexible, cost-effective, and updatable method for grounding responses in a dynamic knowledge base without altering model weights.

How to eliminate wrong answers

Option A is wrong because prompt engineering with few-shot examples only provides a handful of static examples in the prompt, which cannot dynamically retrieve or incorporate the full breadth of an internal knowledge base, leading to hallucinations on unseen or specific internal data. Option B is wrong because fine-tuning the model on the internal knowledge base would embed that data into the model's weights, making it expensive to update, prone to catastrophic forgetting, and unable to guarantee factual grounding for new or changing documents without retraining. Option C is wrong because model distillation compresses a larger model into a smaller one for efficiency, but it does not introduce external knowledge retrieval; it merely replicates the behavior of the teacher model, which still lacks access to the internal knowledge base.

127
MCQhard

A machine learning team is fine-tuning a 7B parameter Llama 2 model on a custom dataset of 10,000 documents using OCI Data Science and GPU instances. They encounter out-of-memory (OOM) errors during the fine-tuning process. They are using a batch size of 8 and a sequence length of 2048. They cannot increase the GPU memory. Which change should they prioritize to resolve the OOM?

A.Enable gradient accumulation with steps of 4 or more.
B.Use mixed precision training (FP16).
C.Reduce the model size by using a 3B parameter version.
D.Decrease the number of training epochs.
AnswerA

Correct: Gradient accumulation reduces memory per step without changing effective batch size.

Why this answer

Option B is correct because enabling gradient accumulation allows the effective batch size to be maintained while reducing per-step memory usage. Option A changes the model entirely, Option C may not fix the memory issue, and Option D helps but may still OOM if the batch size is too high; gradient accumulation is more directly targeted.

128
MCQeasy

A developer is using OCI GenAI to generate structured data. They often get responses that include additional commentary or markdown. Which prompt engineering technique should they use to ensure only JSON output?

A.Set top_p to 0.1.
B.Use a model with a larger context window.
C.Add 'Return only JSON' at the end of the prompt.
D.Increase the temperature to 1.5.
AnswerC

Correct: Direct instruction enforces format.

Why this answer

Option C is correct because explicitly instructing the model to 'Return only JSON' directly constrains the output format, reducing the likelihood of extraneous commentary or markdown. This technique leverages prompt engineering to guide the model's behavior without altering inference parameters like temperature or top_p, which control randomness rather than output structure.

Exam trap

Oracle often tests the misconception that adjusting sampling parameters (like temperature or top_p) can enforce output format, when in fact these parameters control randomness and diversity, not structural constraints—leading candidates to overlook the direct prompt engineering solution.

How to eliminate wrong answers

Option A is wrong because setting top_p to 0.1 reduces the nucleus sampling threshold, making the model more deterministic but not preventing it from generating additional text or markdown; it controls token selection diversity, not output format. Option B is wrong because a larger context window allows the model to process more input tokens but does not enforce a specific output structure; it addresses memory limitations, not format constraints. Option D is wrong because increasing temperature to 1.5 raises randomness, which can actually increase the likelihood of unpredictable or verbose responses, including unwanted commentary, rather than ensuring strict JSON output.

← PreviousPage 2 of 2 · 128 questions total

Ready to test yourself?

Try a timed practice session using only Fundamentals of Large Language Models questions.