Knowledge + Practice

CCNA Llm Fundamentals Questions

75 of 128 questions · Page 1/2 · Llm Fundamentals topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQeasy

Refer to the exhibit. A user deployed a custom model via OCI Data Science and registered it in the Model Catalog. They use the correct OCID but get this error. What is the most likely issue?

A.The model is not fine-tuned

B.The model is not deployed to an endpoint

C.The compartment OCID is missing

D.The model is in a different region

AnswerB

Deployment is required to serve inference requests; registration is not sufficient.

Why this answer

A model must be deployed to an endpoint to be accessible via the inference API. Registration in the Model Catalog alone does not create an endpoint. Option A is correct.

Practice this question →

2

Multi-Selectmedium

Which three factors most significantly affect the quality of an LLM's output? (Select THREE)

Select 3 answers

A.Model's context window size

B.Clarity of the prompt

C.Number of GPUs used during inference

D.Temperature setting

E.Quality of training data

AnswersB, D, E

Correct: Clear prompts yield more accurate responses.

Why this answer

Option B is correct because the clarity of the prompt directly determines how well the LLM interprets the user's intent. A well-structured, unambiguous prompt reduces ambiguity and guides the model toward generating relevant and coherent responses, while a vague or poorly worded prompt often leads to off-target or nonsensical output.

Exam trap

Oracle often tests the misconception that hardware resources like GPU count directly improve output quality, whereas in reality they only affect performance metrics like latency and throughput, not the semantic quality of the generated text.

Practice this question →

3

Multi-Selecteasy

Which TWO are advantages of using retrieval-augmented generation (RAG) over fine-tuning for incorporating new knowledge?

Select 2 answers

A.Better at capturing domain-specific writing style

B.Enables the model to access up-to-date information without retraining

C.Eliminates the need for a vector database

D.Reduces token usage and latency compared to fine-tuning

E.Cost-effective for large corpora that change frequently

AnswersB, E

RAG retrieves fresh data from external sources.

Why this answer

Option B is correct because RAG retrieves relevant, up-to-date information from an external knowledge base at inference time, allowing the model to answer questions about recent events or proprietary data without requiring any retraining. This is a key advantage over fine-tuning, which would need a new training cycle to incorporate the same new knowledge.

Exam trap

Oracle often tests the misconception that RAG is always faster or cheaper than fine-tuning, when in reality RAG introduces retrieval latency and higher token usage, making it less suitable for low-latency or high-throughput scenarios.

Practice this question →

4

MCQeasy

A startup needs to deploy an LLM for a simple FAQ chatbot on OCI with low latency. Which model choice is most appropriate?

A.Use a medium-sized model with high precision.

B.Use an ensemble of models.

C.Use the largest available model for best quality.

D.Use a smaller, task-specific fine-tuned model.

AnswerD

Correct: Small models are fast and adequate for simple tasks.

Why this answer

Option B is correct because a smaller fine-tuned model offers faster inference and sufficient accuracy for simple FAQs. Option A is overkill and slow, Option C may still be large, and Option D adds unnecessary complexity.

Practice this question →

5

MCQeasy

A user wants to use OCI Generative AI to generate marketing copy. They want the output to be more creative and varied. Which parameter should they adjust?

A.Set temperature to 0.

B.Increase the temperature parameter.

C.Decrease the temperature parameter.

D.Increase the max_tokens parameter.

AnswerB

Higher temperature increases randomness, leading to more creative and varied text generation.

Why this answer

Increasing the temperature parameter makes the model's output more random and diverse, which is ideal for creative tasks like generating marketing copy. A higher temperature (e.g., 0.7–1.0) increases the probability of sampling less likely tokens, leading to more varied and imaginative text. Setting temperature to 0 would make the output deterministic and repetitive, which is the opposite of what the user wants.

Exam trap

Oracle often tests the misconception that increasing max_tokens or adjusting other parameters like top_p can substitute for temperature when the goal is to increase creativity, but only temperature directly controls randomness and diversity in token selection.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 forces the model to always choose the most likely token, resulting in deterministic, repetitive, and less creative output. Option C is wrong because decreasing the temperature reduces randomness, making the output more conservative and less varied, which contradicts the goal of creativity. Option D is wrong because increasing max_tokens only extends the maximum length of the generated text; it does not affect the creativity or variability of the output.

Practice this question →

6

Multi-Selecteasy

Which TWO statements about tokens in large language models are correct?

Select 2 answers

A.Common tokenization methods include word-based and subword-based.

B.All tokens have the same embedding size.

C.Tokens are only used during training.

D.Tokens are always whole words.

E.The maximum number of tokens a model can process is called the context window.

AnswersA, E

Word-based and subword-based are standard tokenization approaches.

Why this answer

Option A is correct because common tokenization methods in large language models include word-based tokenization (splitting text into whole words) and subword-based tokenization (like Byte-Pair Encoding or WordPiece), which handle out-of-vocabulary words and morphological variations more effectively. Subword tokenization is widely used in models like GPT and BERT to balance vocabulary size and coverage.

Exam trap

Oracle often tests the distinction between tokenization methods and the fixed embedding dimension, leading candidates to incorrectly assume that tokens vary in embedding size or that tokens must be whole words.

Practice this question →

7

MCQhard

A research team is experimenting with few-shot prompting to improve a model's performance on a complex reasoning task. They find that the model's performance degrades when the few-shot examples are too similar to each other. What is the likely cause and best remedy?

A.The model has not seen enough examples. Increase the number of few-shot examples.

B.The examples are presented in a confusing order. Reorder them by difficulty.

C.The examples lack diversity, causing the model to overfit to a narrow pattern. Use more diverse examples.

D.The temperature is too low, making the model too deterministic. Increase temperature slightly.

AnswerC

Diverse examples reduce bias and improve generalization.

Why this answer

When few-shot examples are too similar, the model overfits to a narrow pattern, reducing its ability to generalize to the diverse reasoning paths required by the task. This is a known limitation of in-context learning: the model treats the examples as a template rather than as diverse demonstrations. Using more diverse examples exposes the model to a wider range of reasoning patterns, improving robustness.

Exam trap

Oracle often tests the misconception that more examples always improve performance, when in fact diversity is critical to prevent overfitting in few-shot prompting.

How to eliminate wrong answers

Option A is wrong because the issue is not the quantity of examples but their lack of diversity; adding more similar examples would worsen overfitting. Option B is wrong because the order of examples (by difficulty) does not address the core problem of pattern overfitting; confusing order may affect performance but is not the likely cause here. Option D is wrong because temperature controls randomness in token sampling, not the model's sensitivity to example diversity; a low temperature would make outputs more deterministic but does not cause or fix overfitting to narrow patterns.

Practice this question →

8

MCQmedium

An application using OCI GenAI experiences high response times. Which change will most directly reduce latency?

A.Reduce the number of output tokens requested.

B.Increase the max tokens parameter.

C.Switch to a fine-tuned version of the same model.

D.Enable batched inference.

AnswerA

Correct: Fewer tokens means faster generation.

Why this answer

Option A is correct because reducing the number of output tokens directly speeds up generation. Option B increases latency, Option C may not affect latency, and Option D helps throughput but not individual request latency.

Practice this question →

9

Multi-Selectmedium

Which TWO factors most directly impact the consistency of text generated by an LLM when the same prompt is used multiple times?

Select 2 answers

A.Top_p

B.Batch size

C.Max_tokens

D.Temperature

E.Seed

AnswersA, D

Top_p controls the nucleus of tokens considered; lower values make output more focused.

Why this answer

Top_p (nucleus sampling) directly impacts consistency by controlling the cumulative probability threshold for token selection. A lower Top_p (e.g., 0.1) restricts the model to only the most probable tokens, reducing randomness and making outputs more deterministic across repeated prompts. This parameter, along with Temperature, is a primary lever for managing output variability in LLMs.

Exam trap

Oracle often tests the distinction between inference-time parameters (Temperature, Top_p) and training/hardware parameters (Batch size), or between parameters that control randomness (Temperature, Top_p) versus those that control output length (Max_tokens), leading candidates to mistakenly select Seed as a primary consistency factor.

Practice this question →

10

MCQeasy

A startup needs to deploy a large language model for a customer support chatbot that requires low latency and cost efficiency. They are evaluating OCI Generative AI models. Which model type is most appropriate?

A.Embedding model (e.g., cohere.embed)

B.Instruct model (e.g., cohere.command)

C.Image generation model

D.Base model (e.g., cohere.base)

AnswerB

Instruct models are fine-tuned to follow instructions, making them ideal for chatbots.

Why this answer

The startup requires low latency and cost efficiency for a customer support chatbot. Instruct models like cohere.command are specifically fine-tuned to follow conversational instructions and generate concise, task-oriented responses, making them ideal for interactive chatbot applications. They balance performance and cost better than base models, which lack instruction-following capability, and embedding models, which are designed for semantic search rather than text generation.

Exam trap

Oracle often tests the distinction between base models and instruct models, trapping candidates who assume a base model can be used directly for task-specific applications without fine-tuning or instruction alignment.

How to eliminate wrong answers

Option A is wrong because embedding models (e.g., cohere.embed) are designed to convert text into vector representations for tasks like semantic search or clustering, not for generating conversational responses. Option C is wrong because image generation models are used for creating or editing images, not for text-based customer support interactions. Option D is wrong because base models (e.g., cohere.base) are general-purpose language models that have not been fine-tuned for instruction following, leading to less relevant and less controllable outputs for a chatbot use case.

Practice this question →

11

MCQeasy

A team wants to deploy an LLM for real-time inference with low latency. Which OCI deployment option is best?

A.OCI Data Science Model Deployment with GPU shapes

B.OCI Functions with CPU

C.OCI Events

D.OCI Streaming

AnswerA

GPU shapes provide the compute power needed for low-latency LLM inference.

Why this answer

OCI Data Science Model Deployment with GPU shapes is the best option because it provides managed, scalable, low-latency inference endpoints for LLMs. GPU shapes (e.g., VM.GPU.A10) are essential for the parallel matrix computations required by transformer-based models, and the deployment service supports auto-scaling and load balancing to maintain real-time response times.

Exam trap

The trap here is that candidates may confuse OCI Functions (a serverless compute service) with a viable inference platform, overlooking the GPU requirement for LLM workloads, or mistakenly think OCI Streaming can process inference requests because of its 'real-time' label.

How to eliminate wrong answers

Option B (OCI Functions with CPU) is wrong because OCI Functions is a serverless compute service designed for short-lived, stateless functions, and CPU-only execution cannot meet the low-latency requirements of LLM inference due to the lack of GPU acceleration for large matrix operations. Option C (OCI Events) is wrong because OCI Events is a notification and orchestration service for reacting to infrastructure changes, not a compute platform for running inference workloads. Option D (OCI Streaming) is wrong because OCI Streaming is a real-time data ingestion and messaging service (based on Apache Kafka) for handling event streams, not for executing LLM inference.

Practice this question →

12

MCQmedium

A developer is building a code generation assistant. The model occasionally produces syntactically correct but semantically wrong code. Which technique directly addresses semantic correctness?

A.Expand the token vocabulary

B.Lower the temperature to 0

C.Apply RLHF using human-validated code examples

D.Increase beam search width

AnswerC

RLHF directly optimizes for desired outcomes like semantic correctness.

Why this answer

Reinforcement Learning from Human Feedback (RLHF) directly addresses semantic correctness by fine-tuning the model using human-validated code examples. This process teaches the model to prefer outputs that are not only syntactically valid but also logically correct and aligned with developer intent, reducing semantically wrong code generation.

Exam trap

Oracle often tests the misconception that adjusting decoding parameters (temperature, beam search) or tokenization can fix semantic errors, when in fact only training techniques like RLHF that incorporate human feedback can directly improve semantic correctness.

How to eliminate wrong answers

Option A is wrong because expanding the token vocabulary increases the range of tokens the model can generate but does not improve the model's ability to reason about code semantics or correct logical errors. Option B is wrong because lowering the temperature to 0 makes the model deterministic, reducing randomness but not fixing underlying semantic misunderstandings; it may still produce the same incorrect logic repeatedly. Option D is wrong because increasing beam search width explores more candidate sequences during decoding, which can improve syntactic fluency but does not directly address semantic correctness or logical accuracy.

Practice this question →

13

MCQeasy

Which technique allows an LLM to be adapted to a new task with only a few examples?

A.Few-shot learning

B.Fine-tuning

C.Pre-training

D.Prompt engineering

AnswerA

Few-shot learning provides examples in the prompt to adapt the model to a new task.

Why this answer

Few-shot learning (option A) uses a handful of examples in the prompt to guide the model's behavior without fine-tuning. This is a form of prompt engineering specifically designed for few-shot scenarios.

Practice this question →

14

MCQmedium

A developer is using the OCI Generative AI Python SDK. They receive a 400 error 'InvalidParameter'. What is the most likely reason?

A.Exceeded token limit

B.Network timeout

C.Invalid model name

D.Missing API key

AnswerC

An invalid model name is a parameter error, resulting in 400 InvalidParameter.

Why this answer

Option C is correct because an invalid model name is a common cause of InvalidParameter errors. Option A (missing API key) would cause authentication errors (401). Option B (exceeded token limit) may cause a different error.

Option D (network timeout) would cause a timeout error.

Practice this question →

15

MCQeasy

Refer to the exhibit. A user in group GenAIUsers reports that they cannot call the OCI Generative AI API. What is the most likely issue?

A.The policy statement is missing the 'inspect' verb.

B.The policy is in INACTIVE state.

C.The compartment ID in the policy does not match the user's compartment.

D.The user is not in the group GenAIUsers.

AnswerC

The policy applies to 'ExampleCompartment' by name, but the user may be in a different compartment. The compartment OCID in the policy header does not match the compartment name in the statement, indicating a mismatch.

Why this answer

The policy is scoped to a specific compartment ID, but the user's compartment does not match that ID. For OCI IAM policies to grant access to resources like the Generative AI API, the policy must be written for the compartment where the resource resides or where the user operates. Since the user is in a different compartment, the policy does not apply, causing the API call to fail.

Exam trap

Oracle often tests the misconception that a user's group membership is the sole factor for policy applicability, ignoring that the compartment scope in the policy statement must match the user's compartment or resource compartment for the policy to take effect.

How to eliminate wrong answers

Option A is wrong because the 'inspect' verb is not required for calling the Generative AI API; the policy uses 'allow group GenAIUsers to manage generative-ai-family in compartment ...', which includes all verbs (inspect, read, use, manage) and is sufficient. Option B is wrong because the exhibit shows the policy is in ACTIVE state, not INACTIVE; an INACTIVE policy would be explicitly marked and would not enforce any rules. Option D is wrong because the user reports being in group GenAIUsers, and the policy targets that group; if the user were not in the group, the error would be an authorization failure, but the issue here is compartment mismatch.

Practice this question →

16

MCQhard

The job fails with "InvalidParameter: trainingDatasetUri". What should the administrator check first?

A.Whether the compartment has sufficient budget.

B.Whether the model ID supports fine-tuning.

C.Whether the bucket exists and the file is accessible.

D.Whether the parameters JSON is correctly formatted.

AnswerC

Invalid URI errors often stem from missing buckets or incorrect paths.

Why this answer

The error indicates the training dataset URI is invalid. The most common cause is that the bucket does not exist or the file path is incorrect. Model compatibility, budget, and parameter format would yield different errors.

Practice this question →

17

MCQhard

During inference with OCI Generative AI, you notice that the model is generating repetitive phrases. Which combination of parameters can help reduce repetition?

A.Top_p = 0.1, frequency_penalty = 0.5

B.Top_p = 0.9, frequency_penalty = 0.5

C.Top_p = 0.9, frequency_penalty = 0.0

D.Top_p = 1.0, frequency_penalty = 0.0

AnswerB

This combination applies a gentle penalty on repeated tokens while keeping token selection diverse, effectively reducing repetition.

Why this answer

Option B is correct because a high Top_p value (0.9) allows the model to consider a diverse set of tokens, reducing the chance of getting stuck in repetitive loops, while a positive frequency_penalty (0.5) actively penalizes tokens that have already been generated, discouraging the model from repeating the same phrases. Together, these parameters balance creativity and repetition suppression.

Exam trap

Oracle often tests the misconception that lowering Top_p (making it more restrictive) reduces repetition, when in fact it can worsen repetition by limiting the model to only the most probable tokens, which are often the same ones already used.

How to eliminate wrong answers

Option A is wrong because Top_p = 0.1 is too restrictive, forcing the model to sample from only the top 10% of probable tokens, which actually increases the likelihood of repetitive patterns by narrowing the token pool. Option C is wrong because frequency_penalty = 0.0 means no penalty is applied for repeated tokens, so even with a high Top_p, the model has no disincentive to repeat phrases. Option D is wrong because Top_p = 1.0 (no nucleus sampling) combined with frequency_penalty = 0.0 provides no mechanism to reduce repetition, effectively using raw probability sampling without any diversity-enhancing constraints.

Practice this question →

18

MCQhard

An architect is optimizing an LLM application that processes long documents. The model has a 4096 token limit, but the documents are often 8000 tokens. They are using a chunking strategy. However, model responses sometimes miss key information that spans across chunks. Which technique most directly addresses this issue?

A.Randomly select parts of the document to include.

B.Increase the max_tokens parameter for longer outputs.

C.Use overlapping chunks to maintain context continuity.

D.Use a model with a larger context window.

AnswerC

Overlapping ensures that information at chunk boundaries is not lost.

Why this answer

Option C is correct because overlapping chunks ensure that tokens at the boundaries of one chunk are also present at the start of the next, preserving context continuity. This prevents the model from losing information that spans across chunk boundaries, which is a common issue when processing documents longer than the model's 4096-token context window.

Exam trap

Oracle often tests the misconception that increasing output length (max_tokens) can compensate for input context limitations, but the trap here is that max_tokens only affects the response length, not the model's ability to see the full document.

How to eliminate wrong answers

Option A is wrong because randomly selecting parts of the document discards structured information and introduces unpredictability, making it impossible to reliably capture cross-chunk dependencies. Option B is wrong because increasing the max_tokens parameter only controls the length of the generated output, not the input context size; the model still cannot process the full 8000-token document at once. Option D is wrong because while using a model with a larger context window would solve the problem, it is not a chunking strategy and may not be feasible due to cost, latency, or model availability; the question specifically asks for a technique that addresses the issue within the current chunking approach.

Practice this question →

19

Multi-Selecthard

Which TWO factors most significantly influence the computational cost of fine-tuning a large language model?

Select 2 answers

A.Batch size

B.Number of model parameters

C.Maximum sequence length

D.Quantization bits

E.Dataset size

AnswersB, C

More parameters increase compute and memory requirements.

Why this answer

Model size (parameters) directly determines FLOPs and memory. Training sequence length affects memory and compute per step. Option C is wrong because batch size affects throughput but not fundamental cost per token.

Option D is wrong because quantization usually reduces cost. Option E is wrong because dataset size affects total steps but per-step cost is dominated by model size and sequence length.

Practice this question →

20

MCQmedium

A multi-turn chatbot needs to maintain context across user queries. The context window is limited. What design should be used?

A.Use a summary of previous turns and add new input.

B.Store context in a separate database and retrieve each time.

C.Reset context after each turn.

D.Keep the entire conversation history in each request.

AnswerA

Correct: Summarization preserves context within limits.

Why this answer

Option A is correct because summarizing previous turns and appending the new input efficiently manages the limited context window of large language models (LLMs). This approach preserves essential conversational context without exceeding token limits, ensuring coherent multi-turn interactions.

Exam trap

Oracle often tests the misconception that storing context externally (Option B) bypasses the context window limit, but the retrieved data must still be injected into the model's input, which is constrained by the same token budget.

How to eliminate wrong answers

Option B is wrong because storing context in a separate database and retrieving it each time introduces latency and does not inherently solve the context window limitation; the retrieved context still needs to fit into the model's input. Option C is wrong because resetting context after each turn breaks conversational continuity, making the chatbot unable to reference prior exchanges. Option D is wrong because keeping the entire conversation history in each request quickly exceeds the context window's token limit, causing truncation or errors.

Practice this question →

21

MCQeasy

A company wants to use OCI Generative AI to summarize customer reviews. Which model parameter should be adjusted to control the creativity of the summary?

A.Temperature

B.Frequency penalty

C.Top-k

D.Presence penalty

AnswerA

Temperature directly controls randomness and creativity.

Why this answer

Temperature controls the randomness of token selection in the model's output distribution. A higher temperature (e.g., 0.9) makes the summary more creative and diverse, while a lower temperature (e.g., 0.1) makes it more deterministic and focused. For summarizing customer reviews, adjusting temperature directly influences how novel or conservative the generated text will be.

Exam trap

Oracle often tests the distinction between parameters that control randomness (temperature) versus those that control repetition (frequency/presence penalties) or sampling pool size (top-k), leading candidates to confuse diversity with creativity.

How to eliminate wrong answers

Option B (Frequency penalty) is wrong because it reduces the likelihood of repeating the same tokens or phrases, which controls redundancy rather than creativity. Option C (Top-k) is wrong because it limits the sampling pool to the k most likely next tokens, which affects diversity but not the overall creativity or randomness of the output. Option D (Presence penalty) is wrong because it penalizes tokens that have already appeared in the text, encouraging the model to introduce new topics, but this does not directly control the creativity of the summary.

Practice this question →

22

MCQeasy

An OCI administrator wants to limit which users can invoke a specific LLM endpoint. Which resource type should be used?

A.OCI Audit

B.OCI Vault

C.Network security groups

D.IAM policies

AnswerD

IAM policies define who can perform actions on resources.

Why this answer

Option A is correct because IAM policies control access to OCI resources, including Generative AI endpoints. Option B (Network security groups) control network traffic, not user access. Option C (Vault) manages secrets.

Option D (Audit) logs events but does not enforce access.

Practice this question →

23

Multi-Selecthard

Which three statements about transformer architecture are correct? (Choose three.)

Select 3 answers

A.The softmax function is used in the attention mechanism to normalize attention scores.

B.The feed-forward network applies a different set of weights for each token position.

C.Positional encodings are necessary because the model is not recurrent.

D.The self-attention layer allows the model to weigh the importance of different tokens.

E.The encoder-decoder structure is used in GPT models.

AnswersA, C, D

Softmax converts attention scores into probabilities.

Why this answer

Option A is correct because the softmax function is applied to the raw attention scores (the dot products between queries and keys) to convert them into a probability distribution that sums to 1. This normalization allows the model to assign a relative weight to each token in the sequence, ensuring that the weighted sum of values is stable and interpretable.

Exam trap

Oracle often tests the distinction between encoder-decoder and decoder-only architectures, trapping candidates who assume all transformer-based models follow the original encoder-decoder design, when in fact GPT and other autoregressive models use only the decoder stack.

Practice this question →

24

MCQeasy

Which OCI service provides pre-trained models for custom text classification without requiring fine-tuning?

A.OCI Generative AI

B.OCI AI Language

C.OCI Data Science

D.OCI Vision

AnswerB

OCI AI Language provides pre-trained models for text classification without fine-tuning.

Why this answer

B is correct because OCI AI Language provides pre-trained models that can perform custom text classification out-of-the-box without requiring fine-tuning. It offers built-in models for common NLP tasks like sentiment analysis, entity extraction, and text classification, allowing users to classify text into custom categories defined by their own labels without additional training.

Exam trap

Oracle often tests the distinction between pre-trained models that require no fine-tuning versus platforms that require custom model training, leading candidates to mistakenly choose OCI Data Science or OCI Generative AI when the question specifically asks for a service that provides pre-trained models for custom text classification without fine-tuning.

How to eliminate wrong answers

Option A is wrong because OCI Generative AI focuses on generating text, images, and code using large language models, not on pre-trained models for custom text classification without fine-tuning. Option C is wrong because OCI Data Science is a platform for building, training, and deploying custom machine learning models, requiring users to fine-tune or train models from scratch rather than providing pre-trained classification models. Option D is wrong because OCI Vision is designed for image analysis tasks such as object detection and image classification, not for text classification.

Practice this question →

25

MCQeasy

A team wants to evaluate an LLM's performance on a text classification task. Which metric is most appropriate for a balanced dataset?

A.BLEU score

B.Perplexity

C.Accuracy

D.ROUGE score

AnswerC

Accuracy directly measures correct predictions, appropriate for balanced data.

Why this answer

Accuracy is the most appropriate metric for evaluating an LLM on a text classification task with a balanced dataset because it directly measures the proportion of correctly predicted labels out of total predictions. For balanced classes, accuracy provides a reliable and intuitive performance indicator without the distortion caused by class imbalance.

Exam trap

Oracle often tests the distinction between metrics for generation tasks (BLEU, ROUGE, perplexity) versus classification tasks (accuracy, F1-score), and the trap here is assuming a language model metric like perplexity applies to any NLP task, when it is specific to probabilistic language modeling.

How to eliminate wrong answers

Option A is wrong because BLEU score is designed for evaluating machine translation quality by comparing n-gram overlap between generated and reference text, not for classification tasks. Option B is wrong because perplexity measures how well a language model predicts a sequence of tokens, typically used for language modeling or generation, not for discrete label classification. Option D is wrong because ROUGE score is used for summarization evaluation by measuring recall-oriented overlap of n-grams, not for classification accuracy.

Practice this question →

26

MCQeasy

A developer notices that an LLM's responses are too verbose. Which parameter adjustment would most effectively reduce verbosity?

A.Increase frequency_penalty

B.Increase top_p

C.Decrease max_tokens

D.Decrease temperature

AnswerC

Max_tokens directly controls the maximum output length, reducing verbosity.

Why this answer

Decreasing max_tokens directly limits the maximum length of the LLM's response, which is the most straightforward way to reduce verbosity. This parameter caps the number of tokens the model can generate, forcing it to produce shorter completions. Other parameters like frequency_penalty, top_p, and temperature influence the style, diversity, or randomness of the output but do not directly control response length.

Exam trap

The trap here is that candidates confuse parameters that affect output style (temperature, top_p, frequency_penalty) with the one that directly controls output length (max_tokens), leading them to choose a parameter that changes how the model says something rather than how much it says.

How to eliminate wrong answers

Option A is wrong because increasing frequency_penalty reduces repetition by penalizing tokens that have already appeared, which can actually make responses more varied and potentially longer as the model avoids reusing words. Option B is wrong because increasing top_p (nucleus sampling) considers a larger set of probable tokens, which can increase diversity and often leads to longer, more exploratory responses. Option D is wrong because decreasing temperature makes the model more deterministic and focused on high-probability tokens, but it does not cap the length of the response; the model can still generate verbose text if it deems it likely.

Practice this question →

27

MCQmedium

An enterprise is deploying a chat application using a large language model. Users report that the model sometimes generates toxic or biased responses. Which best practice should be applied to mitigate this issue?

A.Use few-shot prompting with examples of toxic responses so the model learns to avoid them.

B.Increase the max_tokens parameter to allow the model more context to correct itself.

C.Disable the temperature parameter to make outputs deterministic.

D.Implement a content filtering layer using a safety classifier to detect and block toxic outputs.

AnswerD

Safety classifiers directly filter toxic content.

Why this answer

Option D is correct because implementing a content filtering layer using a safety classifier is a proven best practice to detect and block toxic or biased outputs in real-time. This approach acts as a guardrail, intercepting harmful responses before they reach users, and is independent of the model's internal parameters or training data.

Exam trap

Oracle often tests the misconception that adjusting model parameters (like temperature or max_tokens) can fix safety issues, when in reality, safety requires external guardrails like content filters.

How to eliminate wrong answers

Option A is wrong because few-shot prompting with examples of toxic responses would not teach the model to avoid them; instead, it could inadvertently reinforce undesirable patterns, as the model may learn to mimic the toxic examples rather than suppress them. Option B is wrong because increasing the max_tokens parameter does not help the model correct its own toxicity; it simply allows longer outputs, which could include more harmful content. Option C is wrong because disabling the temperature parameter (setting it to 0) makes outputs deterministic but does not address the underlying issue of toxic or biased generation; the model can still produce harmful responses consistently.

Practice this question →

28

Multi-Selectmedium

An organization is implementing a RAG system using OCI GenAI. Which two are best practices for optimizing retrieval and generation? (Choose two.)

Select 2 answers

A.Use the same embedding model for both retrieval and generation

B.Store all documents in a single large index

C.Use semantic search (embeddings) for document retrieval

D.Implement caching for frequently asked questions

E.Disable summarization to save inference costs

AnswersC, D

Semantic search captures meaning beyond keywords, improving relevance.

Why this answer

Option C is correct because semantic search using embeddings retrieves documents based on meaning rather than keyword matching, which significantly improves the relevance of context provided to the LLM in a RAG system. This aligns with best practices for OCI GenAI, where embedding models convert text into vector representations for similarity search in a vector database.

Exam trap

Oracle often tests the misconception that retrieval and generation should share the same model, but in practice they are optimized separately, and candidates may confuse 'embedding model' with 'generation model' in a RAG context.

Practice this question →

29

MCQmedium

A developer is reviewing the model card for an LLM on OCI Generative AI and notices it was trained on a dataset that is predominantly English. The application will serve users in multiple languages. What is the most likely limitation of using this model without additional steps?

A.The embedding vectors will be less accurate for any language.

B.The model may produce lower quality responses in non-English languages.

C.The model will hallucinate facts more frequently.

D.The context window size will be effectively reduced.

AnswerB

Training data imbalance leads to weaker performance on underrepresented languages.

Why this answer

Option B is correct because a model trained mainly on English may perform poorly on non-English inputs due to biased language representations. Option A (always hallucinating) is not specific to language. Option C (token limit reduced) is unrelated.

Option D (embedding quality drop) is a possibility but the primary limitation is language coverage.

Practice this question →

30

Multi-Selectmedium

A data scientist is evaluating different models for a summarization task. Which two metrics are commonly used to evaluate the quality of generated summaries?

Select 2 answers

A.F1 score

B.Mean Average Precision

C.ROUGE

D.Perplexity

E.BLEU

AnswersC, E

ROUGE measures overlap of n-grams between generated and reference summaries, commonly used for summarization.

Why this answer

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a standard metric for summarization that measures the overlap of n-grams, word sequences, or word pairs between the generated summary and reference summaries. It focuses on recall, making it well-suited for evaluating how well the generated summary captures the key content from the reference.

Exam trap

Oracle often tests the distinction between metrics used for summarization (ROUGE) versus translation (BLEU) versus language modeling (Perplexity), and candidates may confuse BLEU as a summarization metric because it also evaluates text generation, but it is primarily designed for translation tasks.

Practice this question →

31

MCQhard

Refer to the exhibit. The output is very short and cuts off mid-sentence. Which parameter is most likely the cause?

A.Max-tokens is too low

B.Temperature is too high

C.Model ID incorrect

D.Top-p is too high

AnswerA

If the output exceeds max-tokens, it gets truncated, causing the cut-off.

Why this answer

The 'max-tokens' parameter limits the number of tokens in the generated response. Setting it to 500, while typically sufficient, might still cause truncation if the model's context window is nearly full or if the prompt is long. However, among options, 'max-tokens' is the direct control for output length.

Option C is correct.

Practice this question →

32

MCQmedium

A company uses OCI Generative AI service to power a chatbot. After deployment, the chatbot starts generating inappropriate responses. Which action should be taken first?

A.Increase the temperature parameter.

B.Fine-tune the model on customer-specific data.

C.Switch to a larger model.

D.Adjust the prompt template to include safety instructions.

AnswerD

Adding safety instructions in the prompt is a quick and effective safeguard.

Why this answer

Option D is correct because adjusting the prompt template to include safety instructions is the fastest and most direct way to mitigate inappropriate responses without retraining or changing model parameters. In OCI Generative AI, prompt engineering—including explicit safety guidelines—can immediately constrain the model's output behavior by providing clear guardrails in the context window.

Exam trap

Oracle often tests the misconception that safety issues require model retraining or parameter tuning, when in fact prompt engineering is the first-line, low-cost intervention recommended in OCI documentation.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter would make the model's output more random and creative, likely worsening inappropriate responses rather than fixing them. Option B is wrong because fine-tuning on customer-specific data requires significant time, cost, and labeled data, and does not directly address safety issues—it is an over-engineered solution for a problem that can be solved with prompt adjustments. Option C is wrong because switching to a larger model does not inherently improve safety; larger models may even generate more complex or unexpected inappropriate content without proper guardrails.

Practice this question →

33

MCQhard

A team uses OCI Generative AI's summarization feature to condense legal documents. The summaries sometimes omit critical clauses. Which parameter adjustment is most likely to improve completeness?

A.Adjust frequencyPenalty.

B.Increase temperature.

C.Decrease topP.

D.Increase maxTokens.

AnswerD

A larger token limit enables longer summaries, helping to include critical clauses.

Why this answer

Increasing maxTokens (option D) is the most direct way to improve completeness because it extends the maximum length of the generated summary, allowing the model to include more content from the source legal document. Critical clauses are often omitted when the token limit truncates the output before the model can cover all essential sections. This parameter controls the output length, not the style or randomness of the generation.

Exam trap

Oracle often tests the misconception that randomness parameters (temperature, topP) or repetition penalties control output length, when in fact only maxTokens directly determines how much text the model can produce.

How to eliminate wrong answers

Option A is wrong because frequencyPenalty reduces repetition by penalizing tokens that have already appeared, which does not address the omission of critical clauses—it only discourages the model from repeating itself. Option B is wrong because increasing temperature adds randomness to token selection, which can make the summary less coherent and more likely to skip important details, not improve completeness. Option C is wrong because decreasing topP narrows the set of candidate tokens to only the most probable ones, which can make the output more conservative and even more likely to omit less common but critical clauses.

Practice this question →

34

MCQhard

A developer in the GenAIDevelopers group tries to call the OCI Generative AI inference API but receives an unauthorized error. Which statement best explains the issue?

A.The developer lacks the required permission to 'use generative-ai-inference' to invoke the model.

B.The developer does not have permission to use the generative-ai-model-family.

C.The policy does not allow any operations in the compartment.

D.The compartment name in the policy is incorrect.

AnswerA

They only have 'create', but inference invocation requires 'use'.

Why this answer

Option C is correct. The policy only allows creating inference requests but not to 'inspect' or 'use' inference. The developer needs the 'use' permission on the inference resource to actually call it.

Option A is wrong because model-family is allowed. Option B is wrong because the policy allows operations. Option D is wrong because the compartment is correct.

Practice this question →

35

Multi-Selecteasy

Which two are essential components of the Transformer architecture? (Select TWO)

Select 2 answers

A.Pooling layers

B.Recurrent connections

C.Self-attention mechanism

D.Feed-forward neural network

E.Convolutional layers

AnswersC, D

Correct: Core component of Transformers.

Why this answer

The self-attention mechanism is essential because it allows each token in the input sequence to attend to every other token, capturing long-range dependencies without the sequential bottleneck of RNNs. This mechanism computes attention scores using queries, keys, and values, enabling parallel processing and forming the core of the Transformer's ability to model context.

Exam trap

Oracle often tests the misconception that Transformers still use recurrence or convolution for sequence processing, when in fact they rely solely on self-attention and feed-forward networks.

Practice this question →

36

MCQmedium

Refer to the exhibit. What is the solution?

A.Use a different base model that supports fine-tuning.

B.Change the learning rate.

C.Increase the training epochs.

D.Use a different compartment.

AnswerA

The error indicates the base model does not support fine-tuning; switch to a supported model.

Why this answer

The exhibit indicates that the base model does not support fine-tuning, which is a prerequisite for adapting a large language model to a specific task or domain. Using a different base model that supports fine-tuning allows the model to be customized through supervised learning on task-specific data, enabling it to learn new patterns and improve performance. This is the correct solution because without fine-tuning capability, the model cannot be effectively adapted regardless of other hyperparameter adjustments.

Exam trap

Oracle often tests the distinction between hyperparameter tuning (learning rate, epochs) and fundamental model capability (fine-tuning support), leading candidates to mistakenly choose a hyperparameter adjustment when the core issue is that the model cannot be fine-tuned at all.

How to eliminate wrong answers

Option B is wrong because changing the learning rate only affects the optimization process during training, but if the base model does not support fine-tuning, no amount of learning rate adjustment will enable the model to be trained on new data. Option C is wrong because increasing the training epochs will not help if the model cannot be fine-tuned at all; epochs only matter when the model is actually being trained or fine-tuned. Option D is wrong because using a different compartment (a tenancy or organizational boundary in Oracle Cloud Infrastructure) does not change the underlying model's architecture or its ability to be fine-tuned; it only affects resource isolation and access control.

Practice this question →

37

MCQhard

A research team is using OCI Data Science and OCI GenAI to build a multilingual chatbot for customer service. They have training data in English, Spanish, and French. The model currently struggles with code-switching—users often mix languages in a single query (e.g., 'Quiero cancel my order'), and the model responds inconsistently, sometimes in English, sometimes mixing incorrectly. The team wants to improve performance on code-switching while maintaining fluency in each language. They have limited compute resources and cannot deploy separate models per language. Which approach should they take?

A.Train separate fine-tuned models for each language and route queries based on detected language.

B.Fine-tune a multilingual model on a combined dataset that includes code-switching examples.

C.Use language detection to route the query to a specific language model, then translate the response.

D.Use a multilingual embedding model for retrieval to improve context understanding.

AnswerB

This directly trains the model to handle mixed-language inputs and outputs.

Why this answer

Option C is correct because fine-tuning a multilingual model (e.g., Cohere Command with multilingual support) on a combined dataset that includes code-switching examples directly teaches the model to handle mixed-language inputs. Option A is wrong because multilingual embeddings improve retrieval but do not address generation fluency for code-switching. Option B is wrong because training separate models per language would prevent any code-switching capability.

Option D is wrong because language detection and routing is complex, may not handle mixed queries, and could lose cross-lingual context.

Practice this question →

38

MCQhard

During multi-turn conversation with an OCI GenAI model, the model repeats user messages from earlier turns. What is the most likely cause?

A.Low top-p

B.High temperature

C.Low presence penalty

D.High frequency penalty

AnswerC

Low presence penalty means the model is less penalized for repeating topics, leading to repetition.

Why this answer

A low presence penalty reduces the model's incentive to avoid repeating previously mentioned content. In multi-turn conversations, this can cause the model to echo user messages from earlier turns because the penalty is too weak to discourage repetition of tokens that have already appeared in the context window.

Exam trap

Oracle often tests the distinction between presence penalty (which penalizes any occurrence) and frequency penalty (which penalizes based on count), leading candidates to mistakenly think a high frequency penalty causes repetition when it actually prevents it.

How to eliminate wrong answers

Option A is wrong because low top-p limits the cumulative probability mass for token sampling, which reduces diversity but does not directly cause repetition of earlier user messages; it may instead make outputs more deterministic. Option B is wrong because high temperature increases randomness in token selection, which can lead to more creative or even nonsensical outputs, not specifically the repetition of prior user messages. Option D is wrong because a high frequency penalty actively discourages the model from using tokens that have already appeared, which would reduce repetition, not cause it.

Practice this question →

39

MCQhard

A company uses an LLM to generate product descriptions. The outputs are consistently too verbose and include irrelevant details. The prompt includes a simple instruction: 'Describe the product.' Which adjustment to the prompt is most likely to yield concise, relevant descriptions?

A.Set temperature to 0.

B.Increase max_tokens to 500.

C.Add constraints like 'Max 30 words. Focus on key features.'

D.Include a few examples of desired short descriptions.

AnswerC

Explicit constraints directly limit length and scope.

Why this answer

Option C is correct because adding explicit constraints like 'Max 30 words. Focus on key features.' directly instructs the LLM to limit verbosity and prioritize relevant details. This technique, known as prompt engineering with constraints, is the most effective way to control output length and content without altering model parameters or relying on examples that may not generalize.

Exam trap

Oracle often tests the misconception that adjusting model parameters (temperature or max_tokens) is the primary way to control output quality, when in fact prompt engineering with explicit constraints is a more direct and reliable method for achieving specific formatting or length requirements.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 makes the model deterministic (greedy decoding), which reduces randomness but does not inherently shorten or focus the output—it may still produce verbose descriptions. Option B is wrong because increasing max_tokens to 500 actually allows the model to generate longer responses, which is counterproductive to achieving concise descriptions. Option D is wrong because including a few examples (few-shot prompting) can guide the style but does not guarantee brevity; the model may still extrapolate irrelevant details or exceed the desired length without explicit constraints.

Practice this question →

40

MCQmedium

An e-commerce company fine-tuned a Cohere Command model on their product catalog to generate product descriptions. During inference, they notice the model outputs are too repetitive: it often repeats similar phrases across different products, and the descriptions lack diversity. The team wants to increase the variety of the generated text without sacrificing relevance. They are currently using temperature=0.8, top_p=0.9, frequency_penalty=0, and presence_penalty=0. Which parameter adjustment should they make to most effectively increase diversity?

A.Decrease temperature from 0.8 to 0.5.

B.Set frequency_penalty to a negative value (e.g., -0.5).

C.Increase max_tokens from 200 to 500.

D.Increase top_p from 0.9 to 0.95.

AnswerD

Higher top_p includes more tokens in the sampling pool, increasing diversity.

Why this answer

Increasing top_p from 0.9 to 0.95 expands the nucleus of tokens considered during sampling, allowing the model to select from a wider set of plausible next tokens. This directly increases output diversity while still maintaining relevance, as tokens outside the top 90% probability mass are now included. The current settings already have moderate temperature and no penalties, so broadening top_p is the most effective single adjustment to reduce repetitiveness.

Exam trap

Oracle often tests the misconception that increasing temperature always increases diversity, when in fact decreasing temperature reduces randomness, and the most effective lever for diversity in a fine-tuned model is often adjusting top-p or adding a positive frequency penalty.

How to eliminate wrong answers

Option A is wrong because decreasing temperature from 0.8 to 0.5 makes the model more deterministic, reducing randomness and likely increasing repetitiveness, which is the opposite of the desired outcome. Option B is wrong because setting frequency_penalty to a negative value (e.g., -0.5) encourages the model to repeat tokens, exacerbating the repetitiveness problem rather than solving it. Option C is wrong because increasing max_tokens from 200 to 500 only extends the length of generated text; it does not alter the sampling strategy, so the model will continue to repeat phrases within the longer output.

Practice this question →

41

MCQmedium

A financial institution uses OCI GenAI to power a customer support chatbot. The compliance team requires that responses are strictly consistent with regulatory guidelines and approved responses. The company has a curated set of question-answer pairs that cover common scenarios. They want to ensure that the chatbot never deviates from these approved answers. The data science team is considering various approaches to enforce this consistency. Which approach is most effective?

A.Few-shot prompting with three example responses in every query.

B.Fine-tuning the model on the curated dataset of question-answer pairs.

C.Using a large context window to include all regulatory guidelines in the prompt.

D.Setting a low temperature (0.1) to make outputs deterministic.

AnswerB

Fine-tuning adapts the model to mimic the approved responses, providing strong consistency.

Why this answer

Option B is correct because fine-tuning the model on the curated dataset of approved responses teaches the model to output similar responses for related questions, ensuring consistency. Option A is wrong because few-shot prompting may fail for unseen variations and does not guarantee strict adherence. Option C is wrong because using a large context window does not enforce specific content.

Option D is wrong because setting a low temperature reduces randomness but does not guarantee the model will choose approved responses.

Practice this question →

42

MCQhard

An enterprise wants to deploy a large language model for processing sensitive internal documents. They must ensure that data does not leave their OCI tenancy. Which OCI GenAI deployment option meets this requirement?

A.Using a third-party model via OCI Marketplace

B.Accessing models through OCI Console only

C.Dedicated AI Cluster with on-demand model hosting

D.Using the OCI GenAI API with the default endpoint

AnswerC

A dedicated cluster runs in your own tenancy, providing complete data isolation.

Why this answer

Option B is correct because a Dedicated AI Cluster provides isolated compute resources within the customer's tenancy, ensuring data stays within tenancy boundaries. Option A is wrong because the default API endpoint may use shared infrastructure. Option C is wrong because third-party models via Marketplace may not guarantee data isolation.

Option D is wrong because the console is just a management interface, not a deployment option.

Practice this question →

43

MCQeasy

A company wants to build a customer support chatbot using OCI Generative AI. They have a large number of historical support tickets. Which approach is most effective for leveraging this data to improve the chatbot's responses?

A.Use a pre-loaded prompt template from the OCI console.

B.Fine-tune the Cohere Command model on the historical tickets using OCI Data Science.

C.Increase the temperature parameter to 1.0 to encourage diverse responses.

D.Use zero-shot prompting with the base model and include few-shot examples in the prompt.

AnswerB

Fine-tuning on the company's own support tickets adapts the model to the specific language, context, and resolutions, significantly improving response quality.

Why this answer

Fine-tuning the Cohere Command model on the historical support tickets using OCI Data Science is the most effective approach because it adapts the model's weights to the specific domain language, terminology, and resolution patterns found in the company's data. This supervised learning process creates a specialized model that can generate accurate, context-aware responses for customer support queries, unlike generic prompting methods that lack deep domain adaptation.

Exam trap

Oracle often tests the misconception that increasing temperature or using few-shot examples can substitute for fine-tuning when adapting a model to proprietary domain data, but in reality only fine-tuning modifies model weights to deeply learn domain-specific patterns from large datasets.

How to eliminate wrong answers

Option A is wrong because pre-loaded prompt templates in the OCI console are generic and not trained on the company's specific historical ticket data, so they cannot capture domain-specific nuances or improve response accuracy beyond basic instruction following. Option C is wrong because increasing the temperature parameter to 1.0 maximizes randomness in token selection, which reduces coherence and factual reliability—exactly the opposite of what is needed for a customer support chatbot that requires consistent, accurate answers. Option D is wrong because zero-shot prompting with few-shot examples only provides a few static examples in the context window, which does not modify the model's underlying weights and cannot match the depth of learning achieved by fine-tuning on thousands of historical tickets.

Practice this question →

44

MCQhard

A data scientist observes that their fine-tuned LLM performs well on training data but generates repetitive and dull responses in production. What is the most likely cause and best solution?

A.The model is overfitted; apply stronger regularization

B.The temperature is set too low; increase temperature during inference

C.The training data lacks diversity; add more varied examples

D.The model has too many layers; reduce model size

AnswerB

Low temperature makes outputs deterministic and repetitive; increasing it adds variability.

Why this answer

The model's repetitive and dull responses indicate that the temperature parameter is too low, causing the model to always select the most probable tokens, leading to deterministic and monotonous outputs. Increasing temperature during inference introduces randomness into token sampling, allowing for more diverse and creative responses. This is a common issue in production LLMs where low temperature settings optimized for training metrics fail to produce engaging real-world outputs.

Exam trap

Oracle often tests the misconception that poor production performance is always due to overfitting or data issues, when in fact inference-time hyperparameters like temperature are the direct cause of repetitive/dull outputs.

How to eliminate wrong answers

Option A is wrong because overfitting would cause poor generalization to new inputs, not specifically repetitive/dull outputs; regularization reduces overfitting but does not address the deterministic token selection caused by low temperature. Option C is wrong because while training data diversity affects model knowledge, the described symptom of repetitive outputs in production despite good training performance points to inference-time sampling issues, not data diversity. Option D is wrong because having too many layers might cause overfitting or computational inefficiency, but it does not directly cause repetitive or dull responses; reducing model size would not fix the temperature-related sampling behavior.

Practice this question →

45

MCQeasy

A company wants to build a retrieval-augmented generation (RAG) system using OCI Generative AI and a vector database. Which model type should they use to convert documents into vector embeddings?

A.Instruct model (e.g., cohere.command)

B.Image generation model

C.Embedding model (e.g., cohere.embed)

D.Base model (e.g., cohere.base)

AnswerC

Embedding models produce vector embeddings for similarity search.

Why this answer

Option C is correct because embedding models are specifically designed to generate vector representations of text for retrieval. Option A (instruct models) are for generation. Option B (base models) are for general text generation.

Option D (image models) are for images.

Practice this question →

46

MCQhard

A team is fine-tuning a large language model for a domain-specific Q&A application. After fine-tuning, they observe that the model performs well on the training distribution but struggles with out-of-distribution (OOD) questions. Which approach would best improve OOD robustness?

A.Include a diverse set of examples from related domains in the fine-tuning dataset.

B.Use early stopping based on training loss to avoid overfitting.

C.Reduce the model size to prevent overfitting to the training data.

D.Increase the learning rate during fine-tuning to adapt faster to new patterns.

AnswerA

Diverse data improves generalization and OOD performance.

Why this answer

Option C is correct because incorporating diverse data during fine-tuning helps the model generalize to OOD inputs. Option A is wrong because increasing learning rate may cause catastrophic forgetting. Option B is wrong because reducing model size reduces capacity.

Option D is wrong because early stopping on training loss may not help OOD.

Practice this question →

47

MCQeasy

A developer is testing the OCI Generative AI API by sending a request to generate text using the Cohere Command R model. The request returns the following error: 'The model 'cohere.command-r-08-2024' is not available in this region. Please check the model availability in your region.' The developer is using the us-ashburn-1 region. What is the most likely cause of this error?

A.The request body format is incorrect.

B.The model is not deployed in the us-ashburn-1 region.

C.The model name is misspelled (e.g., 'cohere.command-r-08-2024' vs 'cohere.command-r-08-2024').

D.The API key used in the request is invalid.

AnswerB

Cohere Command R may not be available in all regions; check supported regions in OCI documentation.

Why this answer

The error message explicitly states that the model 'cohere.command-r-08-2024' is not available in the region. OCI Generative AI models are deployed regionally, and the Cohere Command R model is not available in the us-ashburn-1 (Ashburn) region. The developer must select a supported region, such as us-chicago-1, where this model is deployed.

Exam trap

Oracle often tests the misconception that model names must be perfectly spelled or that API keys are the cause of all errors, but here the trap is that candidates overlook regional availability and assume the error is due to a typo or authentication failure.

How to eliminate wrong answers

Option A is wrong because an incorrect request body format would typically result in a 400 Bad Request or validation error, not a model availability error. Option C is wrong because the model name in the error matches the one sent, so a misspelling would cause a different error (e.g., 'model not found'), not a region availability error. Option D is wrong because an invalid API key would result in a 401 Unauthorized or 403 Forbidden error, not a model availability error.

Practice this question →

48

MCQmedium

Based on the exhibit, which model is best suited for a conversational chatbot that needs to handle multi-turn dialogues?

A.cohere.embed

B.A model with embeddings capability

C.cohere.base

D.cohere.command

AnswerD

Has 'chat' capability, ideal for multi-turn dialogue.

Why this answer

Option A is correct because cohere.command has the 'chat' capability explicitly listed. Options B and C only have text-generation or embeddings. Option D (embed) is not for generation.

Practice this question →

49

MCQmedium

A company is deploying a large language model for a customer service chatbot. The model needs to understand industry-specific jargon and maintain low latency. Which approach best balances these requirements?

A.Employ retrieval-augmented generation (RAG) with a general model

B.Rely solely on prompt engineering with a general model

C.Use a large general-purpose LLM with zero-shot prompting

D.Fine-tune a small open-source LLM on domain-specific data

AnswerD

Fine-tuning adapts the model to jargon and a smaller model keeps latency low.

Why this answer

Fine-tuning a small open-source LLM on domain-specific data is the best approach because it adapts the model to understand industry-specific jargon while keeping the model small enough to maintain low latency. Unlike larger models, a fine-tuned small model can run efficiently on local hardware, reducing inference time and avoiding the overhead of external API calls or large model sizes.

Exam trap

Oracle often tests the misconception that larger models always perform better or that RAG alone solves domain adaptation, ignoring the latency and efficiency trade-offs that make fine-tuning a smaller model the optimal choice for production systems with strict response time requirements.

How to eliminate wrong answers

Option A is wrong because retrieval-augmented generation (RAG) with a general model still relies on a general model that may not inherently understand industry-specific jargon, and the retrieval step adds latency, which conflicts with the low-latency requirement. Option B is wrong because relying solely on prompt engineering with a general model does not embed domain-specific knowledge into the model weights, so the model may still misinterpret or fail to generate accurate responses for niche jargon, and it often requires longer prompts that increase latency. Option C is wrong because a large general-purpose LLM with zero-shot prompting has high inference latency due to its size and lacks domain-specific training, making it unsuitable for both understanding jargon and meeting low-latency constraints.

Practice this question →

50

MCQeasy

A developer is using a large language model to generate code snippets. The model often produces code that is syntactically correct but functionally incorrect. What is the most effective way to improve the functional correctness of the generated code?

A.Provide few-shot examples of correct code in the prompt.

B.Increase the temperature parameter to generate more creative solutions.

C.Ask the model to only output syntactically valid code.

D.Set max_tokens to a very high value to allow the model more room to think.

AnswerA

Few-shot examples help the model understand the expected output.

Why this answer

Option A is correct because providing few-shot examples of correct code in the prompt directly demonstrates the desired functional behavior to the model. This technique, known as few-shot prompting, grounds the model's output in concrete examples, significantly improving the likelihood that the generated code will be functionally correct by aligning the model's pattern completion with the intended logic, not just syntax.

Exam trap

Oracle often tests the misconception that increasing model parameters like temperature or max_tokens can improve output quality, when in fact these parameters control randomness and length, not functional correctness, which is best addressed through prompt engineering techniques like few-shot learning.

How to eliminate wrong answers

Option B is wrong because increasing the temperature parameter makes the model's output more random and creative, which typically reduces functional correctness by increasing the chance of generating plausible but incorrect logic. Option C is wrong because asking the model to only output syntactically valid code does not address functional correctness; the model already generates syntactically valid code by default, and this instruction does not guide it toward correct logic. Option D is wrong because setting max_tokens to a very high value does not improve reasoning quality; it only allows longer outputs, which can actually increase the risk of generating more irrelevant or incorrect code without improving functional correctness.

Practice this question →

51

Multi-Selecteasy

Which TWO statements about large language model (LLM) capabilities are correct?

Select 2 answers

A.LLMs have a fixed context window that cannot be extended.

B.LLMs can perform zero-shot learning without any task-specific training.

C.LLMs understand and reason about code as well as natural language.

D.LLMs always produce factually accurate outputs.

E.LLMs require fine-tuning for every new task.

AnswersB, C

Zero-shot learning is a key capability of LLMs.

Why this answer

Option A is correct because LLMs can perform zero-shot learning without task-specific training, generalizing to unseen tasks. Option C is correct because LLMs like Codex are trained on code and understand programming languages. Option B is incorrect because context windows can be extended via techniques like sliding window or ALiBi.

Option D is incorrect due to hallucination risks. Option E is incorrect because few-shot prompting often suffices without fine-tuning.

Practice this question →

52

MCQmedium

A customer support company uses Cohere Command on OCI to answer user queries. They have enabled grounding with a knowledge base of product manuals. However, for about 20% of queries, the model provides incorrect product recommendations that are not in the manuals. The team has verified the knowledge base is up to date. What is the most likely cause and solution?

A.The model's temperature is too high, causing creative responses. Lower temperature to 0.

B.The model is hallucinating; switch to a larger model.

C.The query phrasing may not match the knowledge base; improve the retrieval system or use query rewriting.

D.The grounding settings are too restrictive; increase the number of retrieved documents.

AnswerC

Correct: Query mismatch causes retrieval of irrelevant content, leading to incorrect recommendations.

Why this answer

Option D is correct because query phrasing may not match the knowledge base, leading to retrieval of irrelevant documents. Improving retrieval or using query rewriting bridges the gap. Option A might help if temperature were high, but the core issue is retrieval.

Option B could introduce noise, and Option C may not solve the grounding issue.

Practice this question →

53

MCQeasy

Refer to the exhibit. What is the primary reason the response is incomplete?

A.The temperature is not set.

B.The model-id is incorrect.

C.The max-tokens limit is too low.

D.The prompt is too short.

AnswerC

Setting max-tokens to 100 restricts the output length, causing truncation.

Why this answer

The response is incomplete because the max-tokens limit is too low, causing the model to truncate its output before completing the full answer. When the token budget is exhausted, the generation stops mid-sentence or mid-thought, leaving the response unfinished regardless of prompt length or other parameters.

Exam trap

Oracle often tests the distinction between parameters that affect output quality (temperature, top_p) versus those that constrain output length (max_tokens, stop sequences), and the trap here is that candidates mistake a short prompt or missing temperature for the cause of truncation when the real culprit is the token budget.

How to eliminate wrong answers

Option A is wrong because the temperature parameter controls randomness in token selection, not the length or completeness of the response; a missing temperature would default to 1.0 and still allow full output. Option B is wrong because the model-id identifies which LLM to use (e.g., gpt-3.5-turbo or cohere.command-text-v14) and does not affect whether the response is truncated; an incorrect model-id would either fail to load or produce different output, not an incomplete one. Option D is wrong because a short prompt can still yield a complete response; prompt length influences context and relevance, but the max-tokens parameter is the direct limiter of output length.

Practice this question →

54

MCQmedium

A company is deploying a large language model in a customer-facing chatbot. The model's responses must be both accurate and safe. Which combination of techniques should be employed?

A.Use only a system prompt instructing the model to be accurate and safe.

B.Use retrieval-augmented generation (RAG) for factual accuracy and a content safety filter for safe outputs.

C.Use a high temperature for creativity and a safety classifier for blocking toxic outputs.

D.Fine-tune the model on all historical chat logs and use a high temperature.

AnswerB

RAG improves accuracy; safety filter ensures safety.

Why this answer

Option B is correct because RAG grounds the model's responses in a verified external knowledge base, reducing hallucinations and improving factual accuracy, while a content safety filter (e.g., a classifier or guardrail) actively blocks toxic or unsafe outputs before they reach the user. This combination addresses both accuracy and safety independently, unlike a single system prompt which is easily bypassed.

Exam trap

Oracle often tests the misconception that a single technique (like a system prompt or fine-tuning) can simultaneously guarantee both accuracy and safety, when in practice they require separate, complementary mechanisms.

How to eliminate wrong answers

Option A is wrong because a system prompt alone is a static instruction that can be overridden by user input or model behavior, providing no enforcement mechanism for accuracy or safety. Option C is wrong because a high temperature increases randomness and creativity, which is counterproductive for accuracy and can amplify unsafe outputs; a safety classifier is a partial solution but does not address factual grounding. Option D is wrong because fine-tuning on all historical chat logs may introduce biases, errors, or unsafe patterns from the data, and a high temperature further degrades reliability.

Practice this question →

55

MCQmedium

Refer to the exhibit. A data scientist runs this inference request and receives a response that is incomplete and seems to stop mid-sentence. Which parameter should be adjusted to allow the model to generate longer outputs?

A.maxTokens

B.temperature

C.topP

D.frequencyPenalty

E.presencePenalty

AnswerA

maxTokens sets the maximum number of tokens to generate; increasing it yields longer outputs.

Why this answer

Option B is correct because maxTokens directly limits the number of tokens generated; increasing it allows the model to produce longer responses. Option A (temperature) affects randomness, not length. Option C (topP) affects token selection diversity.

Options D and E affect repetition penalties, not output length.

Practice this question →

56

MCQmedium

A team is fine-tuning an LLM on OCI Generative AI for a domain-specific task. They have a dataset of 10,000 labeled examples. What is a best practice to avoid catastrophic forgetting during fine-tuning?

A.Increase the learning rate to speed up adaptation.

B.Use only the new domain-specific data for fine-tuning.

C.Reduce the number of training epochs to the minimum.

D.Include a small percentage of general-domain data in the training mix.

AnswerD

General data acts as a regularizer to maintain base knowledge.

Why this answer

Option D is correct because catastrophic forgetting occurs when a fine-tuned model loses previously learned general knowledge. By including a small percentage (e.g., 5–10%) of general-domain data in the training mix, the model retains its broad capabilities while adapting to the new domain-specific task. This technique, often called 'replay' or 'experience replay,' is a standard practice in continual learning for LLMs.

Exam trap

Oracle often tests the misconception that fine-tuning should exclusively use the new dataset, whereas the best practice is to blend in general data to preserve prior knowledge.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate can cause the model to overfit to the new domain data and accelerate forgetting, not prevent it. Option B is wrong because using only domain-specific data removes all exposure to general knowledge, which is the primary cause of catastrophic forgetting. Option C is wrong because reducing epochs to the minimum may prevent the model from learning the new task adequately, but it does not address the retention of general knowledge; the model can still forget if the new data dominates the gradient updates.

Practice this question →

57

Multi-Selecteasy

Which TWO are advantages of using LoRA for fine-tuning?

Select 2 answers

A.Requires less GPU memory

B.Guarantees higher accuracy

C.Reduces number of trainable parameters

D.Increases model size

E.Improves inference speed

AnswersA, C

Fewer trainable parameters means lower memory usage during training.

Why this answer

LoRA (Low-Rank Adaptation) reduces GPU memory requirements because it freezes the original model weights and injects trainable low-rank matrices into specific layers. This means only a tiny fraction of parameters need gradients and optimizer states, drastically lowering memory consumption during fine-tuning compared to full fine-tuning.

Exam trap

Oracle often tests the misconception that reducing trainable parameters automatically improves inference speed, but LoRA's memory and parameter savings apply only to training, not to inference latency.

Practice this question →

58

MCQmedium

An LLM-based application must comply with data privacy regulations by not memorizing personally identifiable information (PII). Which technique best reduces memorization of PII?

A.Use a larger model with more parameters

B.Decrease the temperature during inference

C.Train with differential privacy

D.Increase the number of training epochs

AnswerC

Differential privacy bounds the influence of any single data point, reducing memorization.

Why this answer

Differential privacy (DP) is the correct technique because it directly limits the model's ability to memorize training data, including PII, by adding calibrated noise to the gradient updates during training. This ensures that the model's parameters do not encode specific individual records, providing a formal mathematical guarantee against memorization. Other options like model size, temperature, or training epochs do not address the root cause of memorization in the training process.

Exam trap

Oracle often tests the misconception that inference-time parameters like temperature or model size affect training data memorization, when in fact memorization is a training-phase phenomenon that must be addressed during training itself.

How to eliminate wrong answers

Option A is wrong because increasing model parameters generally increases the model's capacity to memorize training data, making PII leakage more likely, not less. Option B is wrong because temperature controls the randomness of output token sampling during inference, not the memorization of training data; it has no effect on whether PII is stored in the model weights. Option D is wrong because increasing training epochs typically leads to overfitting and greater memorization of training examples, including PII, as the model sees the data more times.

Practice this question →

59

MCQmedium

A company wants to deploy a private instance of a large language model on OCI for sensitive data processing. What is the recommended approach?

A.Use OCI Data Science with a publicly accessible model.

B.Use the OCI Generative AI public endpoint with data encryption.

C.Use OCI Dedicated AI Clusters for a private endpoint.

D.Use third-party model hosting outside OCI.

AnswerC

Dedicated AI Clusters offer isolated compute with private networking, meeting security and compliance needs.

Why this answer

Option C is correct because OCI Dedicated AI Clusters provide a fully isolated, private endpoint for deploying large language models, ensuring that sensitive data never traverses the public internet. This approach meets the requirement for private inference with no data leaving the customer's tenancy, unlike public endpoints or third-party hosting.

Exam trap

Oracle often tests the misconception that encryption alone (Option B) is sufficient for private deployment, but the trap is that encryption does not eliminate the need for network isolation when processing sensitive data in a shared infrastructure environment.

How to eliminate wrong answers

Option A is wrong because OCI Data Science with a publicly accessible model exposes the model endpoint to the internet, violating the requirement for private, sensitive data processing. Option B is wrong because the OCI Generative AI public endpoint, even with data encryption, still routes traffic through OCI's shared infrastructure and public IP space, which does not guarantee the isolation needed for sensitive data. Option D is wrong because third-party model hosting outside OCI would require data to leave the OCI tenancy, breaking the requirement for a private deployment within OCI.

Practice this question →

60

Multi-Selecthard

Which THREE are known challenges when deploying large language models in production?

Select 3 answers

A.Bias in training data perpetuating stereotypes

B.High computational cost for inference

C.Hallucination of plausible but incorrect information

D.Fast inference speed due to parallelization

E.Low memory footprint

AnswersA, B, C

Models can reflect and amplify biases from training data.

Why this answer

Option A is correct because large language models (LLMs) are trained on vast, unfiltered internet text corpora that inherently contain societal biases. These biases are learned and can be amplified during inference, leading to outputs that perpetuate harmful stereotypes, which is a well-documented production challenge.

Exam trap

Oracle often tests the distinction between known challenges (bias, cost, hallucination) and desirable properties (fast inference, low memory) that are actually false for LLMs, trapping candidates who confuse optimization goals with current limitations.

Practice this question →

61

MCQhard

A development team wants to generate code snippets from natural language. Which model strategy should they adopt?

A.Use a code-specific model like Code Llama.

B.Use a general-purpose LLM like Llama 2.

C.Use a multimodal model.

D.Use an embedding model for text.

AnswerA

Correct: Code-specific models are fine-tuned for code generation.

Why this answer

Code Llama is a specialized variant of Llama 2 that has been fine-tuned on code datasets, enabling it to generate syntactically and semantically correct code from natural language prompts. This makes it the optimal choice for code generation tasks, as general-purpose LLMs lack the targeted training on code structures and programming languages.

Exam trap

Oracle often tests the distinction between general-purpose and domain-specific models, and the trap here is that candidates assume any large language model can handle code generation equally well, overlooking the critical fine-tuning on code corpora that makes Code Llama superior for this task.

How to eliminate wrong answers

Option B is wrong because a general-purpose LLM like Llama 2 is trained on diverse text but not specifically optimized for code, leading to higher rates of syntax errors and logical inconsistencies in generated code. Option C is wrong because multimodal models process images, audio, and text, but code generation from natural language does not require multiple modalities and would add unnecessary complexity without improving code quality. Option D is wrong because embedding models are designed to convert text into vector representations for similarity search or clustering, not for generating new text or code snippets.

Practice this question →

62

MCQmedium

Refer to the exhibit. A developer ran the OCI CLI command shown and received the JSON output. What does the output indicate about the model's confidence and why?

A.The model is uncertain because all scores are roughly equal.

B.The model is neutral because the neutral score is lowest.

C.The model is unsure because the scores are probabilities that sum to 1.

D.The model is highly confident the text is positive, as indicated by the 0.98 score.

AnswerD

0.98 is very close to 1, indicating high confidence.

Why this answer

Option D is correct because the JSON output shows a sentiment score of 0.98 for 'positive', which is very close to 1.0, indicating the model is highly confident that the text is positive. In sentiment analysis models, scores represent probabilities for each class, and a value near 1.0 for one class with much lower scores for others reflects strong confidence.

Exam trap

Oracle often tests the distinction between the sum of probabilities equaling 1 (a mathematical property) and the actual confidence level indicated by the distribution of those probabilities, leading candidates to mistakenly choose option C.

How to eliminate wrong answers

Option A is wrong because the scores are not roughly equal; the positive score (0.98) is significantly higher than the negative (0.01) and neutral (0.01) scores, indicating high confidence, not uncertainty. Option B is wrong because the neutral score being lowest does not imply neutrality; the model is highly confident the text is positive, not neutral. Option C is wrong because while the scores do sum to 1 (as probabilities should), this fact alone does not indicate uncertainty; the distribution of probabilities matters, and here the high positive score shows confidence.

Practice this question →

63

MCQmedium

A data scientist is using OCI Data Science with the Generative AI service to fine-tune a Cohere Command model on a custom dataset of customer support tickets. After training, the model produces poor, irrelevant responses. What is the most likely cause?

A.Incorrect tokenizer configuration

B.Insufficient training data quality or quantity

C.Too many epochs causing overfitting

D.Model architecture mismatch between fine-tuned and base model

AnswerB

Cohere models need clean, diverse, and task-relevant data; poor data leads to poor fine-tuning.

Why this answer

Insufficient training data quality or quantity is the most likely cause because fine-tuning a Cohere Command model on a custom dataset of customer support tickets requires a sufficiently large and representative dataset to teach the model domain-specific patterns. If the dataset is too small, noisy, or lacks diversity, the model will fail to generalize and produce irrelevant responses, even with correct tokenization and training hyperparameters.

Exam trap

Oracle often tests the misconception that overfitting (Option C) is the primary cause of poor model output after fine-tuning, but in this scenario the irrelevance points to data insufficiency rather than memorization of training examples.

How to eliminate wrong answers

Option A is wrong because incorrect tokenizer configuration would typically cause tokenization errors or mismatched vocabulary, not poor semantic relevance; the Cohere Command model uses a fixed tokenizer that is automatically applied during fine-tuning in OCI Data Science. Option C is wrong because too many epochs causing overfitting would result in the model memorizing training examples and producing overly specific or repetitive responses, not generally irrelevant ones; overfitting typically degrades performance on unseen data but does not cause broad irrelevance. Option D is wrong because model architecture mismatch between fine-tuned and base model is not possible in OCI Data Science's Generative AI service, as the fine-tuning process uses the same architecture as the base model; the service enforces compatibility.

Practice this question →

64

Multi-Selecthard

Which three characteristics of LLMs can lead to hallucinations? (Select THREE)

Select 3 answers

A.Overconfidence in predictions

B.Ability to generate plausible-sounding text

C.Lack of real-world grounding

D.Gaps in training data coverage

E.Large vocabulary size

AnswersB, C, D

Correct: Fluency can mask inaccuracies.

Why this answer

Option B is correct because LLMs are trained to generate text that is statistically plausible and coherent, but they lack mechanisms to verify factual accuracy. This means they can produce sentences that sound convincing and grammatically correct while being entirely false, which is a direct cause of hallucinations.

Exam trap

Oracle often tests the distinction between symptoms and root causes, so the trap here is that candidates might confuse 'overconfidence in predictions' (a symptom) with a direct cause of hallucinations, or mistakenly think 'large vocabulary size' contributes to hallucinations when it is merely an enabler of the model's generative capability.

Practice this question →

65

MCQeasy

A data scientist is using a large language model to summarize customer support tickets. The model occasionally generates summaries that include hallucinated details not present in the original ticket. Which technique would best reduce hallucinations while maintaining summary quality?

A.Implement retrieval-augmented generation (RAG) to ground the model in relevant documents.

B.Use a longer system prompt instructing the model to be factual.

C.Fine-tune the model on a large corpus of general text to improve its knowledge.

D.Increase the temperature parameter to 0.9 to encourage more deterministic outputs.

AnswerA

RAG provides factual context, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) reduces hallucinations by grounding the model's output in external, verifiable documents retrieved from a knowledge base. Instead of relying solely on the model's parametric memory, RAG fetches relevant context (e.g., the original ticket) at inference time, ensuring the summary is factually aligned with the source. This maintains summary quality because the model can still generate fluent text while being constrained to the retrieved evidence.

Exam trap

Oracle often tests the misconception that simply instructing the model to be factual (Option B) or fine-tuning (Option C) can eliminate hallucinations, when in reality grounding via retrieval (RAG) is the only technique that directly supplies external evidence to constrain generation.

How to eliminate wrong answers

Option B is wrong because a longer system prompt instructing the model to be factual does not provide new factual data; it only changes the model's behavior via instruction tuning, which cannot correct hallucinations stemming from missing or incorrect parametric knowledge. Option C is wrong because fine-tuning on a large corpus of general text would not specifically address hallucinations in customer support tickets; it might even dilute domain-specific accuracy and does not provide a retrieval mechanism to verify facts. Option D is wrong because increasing the temperature parameter to 0.9 actually increases randomness and creativity, making outputs less deterministic and more prone to hallucination, not less.

Practice this question →

66

MCQeasy

A developer integrates OCI GenAI into a mobile app to provide product descriptions. The responses sometimes include explanations or questions instead of the requested format. The developer is using a simple prompt: 'Describe product X.' The app expects a single paragraph. Which corrective action should the developer take?

A.Add a structured prompt with format instructions and an example.

B.Lower the temperature to 0 to make responses deterministic.

C.Increase the max tokens to allow longer responses.

D.Switch to a different model with better language understanding.

AnswerA

Correct: Structured prompts effectively enforce output format.

Why this answer

Option B is correct because adding a structured prompt with format instructions and an example guides the model to output exactly as needed. Option A may increase irrelevant content, Option C may not fix the format issue, and Option D could make responses repetitive but still not enforce the format.

Practice this question →

67

MCQhard

A company is using OCI GenAI with a Dedicated AI Cluster to serve a large language model for real-time chat applications. They notice high inference latency (average 2 seconds per response) and want to reduce it to under 500 milliseconds without significantly degrading the quality of responses. The cluster is configured with NVIDIA A100 GPUs. The model is the base Cohere Command model (52B parameters). They have explored increasing batch size, but that increases latency for interactive use cases. Which action should they take?

A.Deploy the model with inference optimization frameworks like vLLM, TensorRT, or ONNX Runtime.

B.Increase batch size to process multiple queries at once.

C.Swap the model to a smaller variant, such as Cohere Command Light (6B).

D.Enable model quantization (e.g., int8) to reduce memory and computation.

AnswerA

These frameworks optimize GPU utilization and reduce latency without changing the model.

Why this answer

Option D is correct because deploying the model with optimization techniques like vLLM or TensorRT leverages GPU acceleration specifically for inference, reducing latency significantly. Option A is wrong because increasing batch size is not suitable for real-time, single-query scenarios. Option B is wrong because using a smaller model (e.g., 6B) would reduce latency but also degrade quality, which they want to avoid.

Option C is wrong because model quantization can reduce model size and latency but may degrade output quality, especially at lower precision.

Practice this question →

68

MCQmedium

A company wants to create a chatbot that answers questions based on a large internal document set that is updated weekly. They have limited ML expertise. Which approach is recommended?

A.Fine-tune a model on the entire document set.

B.Train a custom model from scratch.

C.Include all documents in the system prompt.

D.Use retrieval-augmented generation (RAG) with a vector database.

AnswerD

Correct: RAG handles dynamic data without retraining.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) allows dynamic updates without retraining and is easier to implement. Option A requires frequent retraining, Option B may exceed context limits, and Option D is too resource-intensive.

Practice this question →

69

MCQmedium

A company uses OCI GenAI to build a content moderation system that filters toxic language in user-generated comments. They have a small labeled dataset of 1,000 comments (500 toxic, 500 non-toxic) and need an efficient solution that balances accuracy, cost, and latency. They are considering different model options: fine-tuning a large LLM (e.g., Cohere Command), using a pre-trained LLM with prompting, fine-tuning a smaller BERT-based classifier, or building a rule-based system. The team has moderate ML experience and wants to deploy using OCI Data Science. Which approach is most efficient for this binary classification task?

A.Fine-tune a BERT-based classifier (e.g., 'bert-base-uncased') on the dataset.

B.Develop a rule-based system using regular expressions and keyword lists.

C.Use a pre-trained LLM with a toxic/non-toxic prompt.

D.Fine-tune the Cohere Command model on the labeled dataset.

AnswerA

BERT is efficient for classification, fine-tunes quickly on small data, and has low inference cost.

Why this answer

Fine-tuning a BERT-based classifier (e.g., 'bert-base-uncased') is the most efficient approach because BERT is specifically designed for text classification tasks, requiring far fewer computational resources and lower latency than large LLMs. With only 1,000 labeled samples, BERT can achieve high accuracy through transfer learning, while keeping inference costs minimal—ideal for a production content moderation system on OCI Data Science.

Exam trap

Oracle often tests the misconception that larger LLMs (like Cohere Command) are always superior for classification tasks, ignoring the practical constraints of small datasets, cost, and latency that make fine-tuned BERT models the optimal choice for binary classification.

How to eliminate wrong answers

Option B is wrong because rule-based systems using regex and keyword lists cannot generalize to nuanced toxic language (e.g., sarcasm, misspellings, or context-dependent toxicity) and require constant manual maintenance, leading to poor accuracy and high operational overhead. Option C is wrong because using a pre-trained LLM with prompting (e.g., Cohere Command) incurs high per-token inference costs and latency, and with only 1,000 examples, few-shot prompting may not reliably capture the specific toxicity patterns in the dataset. Option D is wrong because fine-tuning a large LLM like Cohere Command on a tiny dataset of 1,000 samples risks catastrophic forgetting and overfitting, while also being computationally expensive and slower for real-time moderation compared to a smaller BERT model.

Practice this question →

70

Multi-Selecteasy

Which THREE are essential steps in the prompt engineering process for an LLM?

Select 3 answers

A.Test the prompt with a variety of input examples

B.Fine-tune the model on a domain corpus

C.Define the desired output format and constraints

D.Quantize the model to INT8

E.Iteratively refine the prompt based on model responses

AnswersA, C, E

Testing ensures robustness across different inputs.

Why this answer

Option A is correct because testing the prompt with a variety of input examples is essential to evaluate the LLM's generalization, robustness, and sensitivity to different phrasing or contexts. This step helps identify edge cases, biases, or inconsistencies in the model's responses before deployment.

Exam trap

Oracle often tests the distinction between prompt engineering (input-side optimization) and model modification (fine-tuning, quantization) to trap candidates who confuse these fundamentally different processes.

Practice this question →

71

MCQmedium

A developer uses OCI Generative AI's chat endpoint with a system message placed after user messages. The model ignores the system message. What is the most likely reason?

A.The system message is too long

B.Temperature is set too high

C.The model has not been fine-tuned for instruction following

D.The system message is placed after user messages

AnswerD

The standard order is system first, then user; otherwise the model may misinterpret.

Why this answer

In OCI Generative AI's chat endpoint, the system message must be placed before user messages to establish the model's behavior and context. When placed after user messages, the model treats it as part of the conversation history rather than a directive, causing it to be ignored. This ordering is a fundamental requirement for the chat API's message structure.

Exam trap

Oracle often tests the specific API message ordering requirement, where candidates mistakenly attribute the failure to model limitations or hyperparameters rather than the structural placement of the system message.

How to eliminate wrong answers

Option A is wrong because the system message being too long would cause a token limit error or truncation, not silent ignoring. Option B is wrong because temperature controls randomness in output, not whether instructions are followed; a high temperature might produce varied responses but does not cause the model to ignore the system message. Option C is wrong because OCI Generative AI models are pre-trained for instruction following without requiring fine-tuning; the issue is purely about message ordering, not model capability.

Practice this question →

72

MCQmedium

A company uses OCI Generative AI to power a chatbot for customer support. They notice that the model's responses sometimes contain factual inaccuracies. Which strategy would best reduce hallucination?

A.Implementing Retrieval-Augmented Generation (RAG).

B.Increasing the temperature parameter.

C.Reducing the max token limit.

D.Fine-tuning the model on a larger general corpus.

AnswerA

RAG retrieves relevant facts from a knowledge base, grounding the output and reducing hallucination.

Why this answer

Retrieval-Augmented Generation (RAG) grounds the model's responses in retrieved factual information, directly reducing hallucination. Increasing temperature increases randomness, fine-tuning on a larger corpus may not fix factual accuracy, and reducing max tokens does not affect correctness.

Practice this question →

73

Multi-Selectmedium

Which TWO techniques are commonly used to reduce the memory footprint of LLM inference?

Select 2 answers

A.Quantization

B.Increasing batch size

C.KV cache optimization

D.Gradient checkpointing

E.Using full precision (FP32)

AnswersA, C

Reduces memory by using lower precision weights.

Why this answer

Quantization reduces the memory footprint by lowering the precision of model weights and activations from FP32 to lower bit-widths like INT8 or FP16, which directly decreases the memory required to store and compute with the model. KV cache optimization reduces memory usage by efficiently managing the key-value cache during autoregressive decoding, often through techniques like shared memory, pruning, or compression, which is critical for long-context inference.

Exam trap

Oracle often tests the distinction between training and inference techniques, so candidates mistakenly apply gradient checkpointing (a training memory saver) to inference, or confuse batch size scaling with memory reduction.

Practice this question →

74

Multi-Selectmedium

Which TWO factors are most likely to cause hallucinations in LLMs?

Select 2 answers

A.High temperature

B.Short context window

C.Excessive fine-tuning

D.Low top-p

E.Inadequate training data

AnswersA, E

High temperature increases randomness, leading to less factual outputs.

Why this answer

A high temperature setting increases the randomness of token sampling, making the model more likely to generate plausible-sounding but factually incorrect or nonsensical outputs. This directly contributes to hallucinations by encouraging the model to deviate from the most probable, grounded responses.

Exam trap

Oracle often tests the misconception that low top-p or short context windows are primary causes of hallucinations, when in fact high temperature and insufficient training data are the two most direct factors that increase the likelihood of generating false or fabricated content.

Practice this question →

75

Multi-Selecteasy

Which TWO techniques can help reduce bias in LLM outputs?

Select 2 answers

A.Setting temperature to 0

B.Using only English data

C.Using diverse training data

D.Increasing model size

E.Applying adversarial debiasing

AnswersC, E

Diverse data reduces representation bias.

Why this answer

Option C is correct because using diverse training data helps the model learn from a wide range of perspectives, reducing the risk of over-representing any single group or viewpoint. This directly mitigates bias by ensuring the training distribution is more representative of the real world, rather than skewed toward a dominant demographic or cultural norm.

Exam trap

Oracle often tests the misconception that lowering temperature or increasing model size can fix bias, when in reality these parameters affect randomness and capacity, not the underlying distributional fairness of the training data.

Practice this question →

Page 1 of 2 · 128 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Llm Fundamentals questions.

Start 20-question session