Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 1Z0-1127 Questions 676–750 | Page 10/14

676

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Train a custom model from scratch on the policy documents each month

C.Use a larger foundation model with a longer context window and paste all documents into each prompt

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant chunks at query time, avoiding retraining.

Why this answer

RAG allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining.

Full explanation →

677

MCQmedium

A.Train a custom model from scratch on the policy documents each month

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Use a larger foundation model with a longer context window and paste all documents into each prompt

D.Fine-tune a base LLM on the policy documents monthly

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Full explanation →

678

MCQeasy

Which LangChain component is responsible for storing and retrieving message history across multiple turns in a conversation?

A.Memory

B.PromptTemplate

C.Chain

D.Tool

AnswerA

Memory stores and retrieves conversation history, enabling context-aware multi-turn interactions.

Why this answer

Memory is the correct component because it is specifically designed to store and retrieve conversation history across multiple turns, enabling the model to maintain context. Unlike stateless components, Memory persists past interactions (e.g., via buffer or summary mechanisms) and feeds them into the prompt for subsequent LLM calls.

Exam trap

Cisco often tests the misconception that Chain inherently remembers conversation history, but Chain is stateless by default and requires explicit Memory attachment to retain context across turns.

How to eliminate wrong answers

Option B (PromptTemplate) is wrong because it only defines the structure of the input prompt (e.g., placeholders for variables) and has no capability to store or retrieve historical messages. Option C (Chain) is wrong because it orchestrates sequences of calls (e.g., LLM + tools) but does not inherently persist state; any memory must be explicitly attached via a Memory object. Option D (Tool) is wrong because it represents an external function or API (e.g., a search engine or calculator) that the agent can invoke, not a mechanism for storing conversation history.

Full explanation →

679

MCQhard

A developer is using OCI Generative AI for a question-answering system. The model frequently provides outdated information because the training data cutoff is over a year old. Which approach would most effectively address this issue?

A.Implement a Retrieval-Augmented Generation (RAG) pipeline that retrieves up-to-date documents from an external knowledge base

B.Increase the context window to include more of the user's prompt

C.Fine-tune the model on a dataset that includes recent information up to today

D.Switch to a larger model that has a more recent knowledge cutoff

AnswerA

RAG allows the model to access current information dynamically, solving the cutoff problem.

Why this answer

Retrieval-Augmented Generation (RAG) directly addresses the problem of stale training data by dynamically retrieving current documents from an external knowledge base at inference time. This allows the model to generate answers grounded in up-to-date information without requiring retraining or a larger model, making it the most effective and practical solution for a question-answering system.

Exam trap

Cisco often tests the misconception that simply increasing model size or context length can solve knowledge staleness, when in fact only retrieval-based methods like RAG provide a scalable, real-time solution to keep answers current without retraining.

How to eliminate wrong answers

Option B is wrong because increasing the context window only allows the model to process more of the user's prompt, but it does not inject new or recent information into the model's responses — the model's parametric knowledge remains frozen at its training cutoff. Option C is wrong because fine-tuning on recent data up to today would require a new, curated dataset and significant compute resources, and the model would still be limited to the knowledge in that dataset; moreover, fine-tuning is not a real-time solution and cannot adapt to information that changes after the fine-tuning process. Option D is wrong because switching to a larger model with a more recent knowledge cutoff only shifts the staleness problem forward in time — the model will still eventually become outdated, and it does not provide a mechanism to access live or continuously updated information.

Full explanation →

680

Multi-Selecthard

A team is iteratively refining a prompt for a summarization task. Which THREE activities are essential for effective iterative prompt refinement?

Select 3 answers

A.Establish evaluation criteria (e.g., accuracy, coherence, conciseness)

B.Test the prompt with a static set of examples only

C.Increase max_tokens gradually

D.A/B test different prompt variants on a held-out set

E.Test the prompt with diverse and edge-case inputs

AnswersA, D, E

Criteria guide objective assessment.

Why this answer

Testing with diverse inputs, A/B testing variants, and establishing evaluation criteria are key to systematic refinement.

Full explanation →

681

MCQeasy

Which OCI Generative AI service model family supports fine-tuning with custom datasets?

A.Cohere Command

B.Cohere Embed

C.Cohere Summarize

D.GPT-3

AnswerA

Cohere Command models are designed for text generation and support fine-tuning.

Why this answer

Cohere Command is the model family within OCI Generative AI that supports fine-tuning with custom datasets, allowing users to adapt the model for domain-specific tasks like summarization or classification. In contrast, Cohere Embed is designed for generating text embeddings, Cohere Summarize is a specialized endpoint for summarization without fine-tuning support, and GPT-3 is not natively available in OCI Generative AI for fine-tuning.

Exam trap

Oracle often tests the misconception that all Cohere model families (Embed, Summarize, Command) support fine-tuning, but only Command is designed for customization with custom datasets.

How to eliminate wrong answers

Option B (Cohere Embed) is wrong because it is optimized for creating vector embeddings of text, not for generative tasks, and does not support fine-tuning with custom datasets. Option C (Cohere Summarize) is wrong because it is a pre-configured summarization endpoint that does not allow model customization or fine-tuning. Option D (GPT-3) is wrong because it is an OpenAI model not offered within the OCI Generative AI service; OCI uses Cohere and Meta Llama models, and GPT-3 cannot be fine-tuned through OCI.

Full explanation →

682

MCQeasy

A developer wants to use OCI Generative AI Service to summarize long documents. Which endpoint should they use to send the document content?

A./generate

B./classify

C./embed

D./chat

AnswerD

The /chat endpoint accepts a conversation history, suitable for summarization tasks.

Why this answer

Option D is correct because the /chat endpoint in OCI Generative AI Service is designed for conversational interactions and can handle long document summarization by accepting the document content as part of the chat context. This endpoint supports multi-turn dialogues and large input payloads, making it suitable for processing and summarizing lengthy documents.

Exam trap

Oracle often tests the misconception that /generate is the correct endpoint for all text generation tasks, including summarization, but the /chat endpoint is specifically optimized for interactive and context-aware tasks like document summarization.

How to eliminate wrong answers

Option A is wrong because /generate is used for text generation tasks like content creation or completion, not specifically for summarization of long documents. Option B is wrong because /classify is intended for text classification tasks such as sentiment analysis or topic labeling, not summarization. Option C is wrong because /embed is used to generate vector embeddings for text, which are useful for semantic search or similarity comparisons, not for producing summaries.

Full explanation →

683

Multi-Selecthard

Which TWO are common causes of poor answer quality in a RAG system built on OCI Generative AI? (Choose two.)

Select 2 answers

A.Mismatch between the embedding model's training data and the domain of the documents.

B.Using a generation model that is too large for the task.

C.Setting the temperature parameter too low, causing overly deterministic outputs.

D.Insufficient number of relevant chunks in the document corpus for the given query.

E.Using only vector search without keyword-based fallback.

AnswersA, D

Domain mismatch leads to poor semantic alignment and irrelevant retrieval.

Why this answer

Option A is correct because the embedding model's training data determines the semantic space in which documents and queries are represented. If the model was trained on general text (e.g., Wikipedia) but the documents are from a specialized domain (e.g., medical or legal), the embeddings will fail to capture domain-specific nuances, leading to poor retrieval relevance and thus poor answer quality in the RAG system.

Exam trap

Oracle often tests the distinction between retrieval-side failures (like embedding mismatch or insufficient chunks) and generation-side parameters (like temperature or model size), so candidates mistakenly attribute poor answer quality to generation settings rather than the retrieval pipeline.

Full explanation →

684

MCQeasy

A startup is using OCI Generative AI serverless inference for a text generation application. They notice that the latency is high during peak hours. They have a budget to increase costs moderately. Which action would most effectively reduce latency?

A.Switch to dedicated AI cluster.

B.Enable content filtering.

C.Increase the number of concurrent requests.

D.Use a smaller model.

AnswerA

Dedicated clusters offer predictable, low-latency inference.

Why this answer

Option A is correct. Switching to a dedicated AI cluster provides consistent low latency compared to serverless inference. Option B is wrong because using a smaller model might reduce latency but could degrade quality.

Option C is wrong because enabling content filtering does not affect latency. Option D is wrong because increasing concurrent requests may increase load without improving latency.

Full explanation →

685

MCQhard

A multinational corporation is deploying a generative AI chatbot for customer support using Oracle Cloud Infrastructure's Generative AI service. The chatbot is powered by a large language model (LLM) accessed via the on-demand serving mode. During initial testing, the chatbot provides accurate answers for well-known products but frequently hallucinates or gives incorrect specifications for niche products. The company maintains a comprehensive internal database of product specifications, updated daily. The support team prefers not to fine-tune the LLM due to cost and maintenance overhead. Additionally, the chatbot must respond within 2 seconds to maintain a good customer experience. The team considers several approaches: A. Increasing the 'temperature' parameter to make the model more creative, hoping it will generate more accurate responses when unsure. B. Using few-shot prompting with three manually curated examples of correct product specifications included in every prompt. C. Implementing a Retrieval Augmented Generation (RAG) pipeline that retrieves relevant product documents from the internal database and prepends them to the prompt before inference. D. Reducing the 'topP' parameter to 0.1 to force the model to sample only from the highest probability tokens, thereby reducing randomness. Which approach best meets the requirements of improving factual accuracy while maintaining low latency?

A.Reduce the 'topP' parameter to 0.1 to force the model to sample only from the highest probability tokens, thereby reducing randomness.

B.Implement a Retrieval Augmented Generation (RAG) pipeline that retrieves relevant product documents from the internal database and prepends them to the prompt before inference.

C.Use few-shot prompting with three manually curated examples of correct product specifications included in every prompt.

D.Increase the 'temperature' parameter to make the model more creative, hoping it will generate more accurate responses when unsure.

AnswerB

RAG injects accurate, domain-specific context, improving factual accuracy without fine-tuning, and can be implemented with efficient retrieval for low latency.

Why this answer

Option B is correct because Retrieval Augmented Generation (RAG) directly addresses the hallucination problem by providing the LLM with up-to-date, factual product specifications from the internal database as context in the prompt. This approach improves factual accuracy without fine-tuning, and because retrieval can be optimized (e.g., using vector search with approximate nearest neighbor algorithms), it can meet the 2-second latency requirement. RAG leverages the LLM's existing knowledge while grounding responses in authoritative external data.

Exam trap

The trap here is that candidates may confuse parameter tuning (temperature, topP) with a method to improve factual accuracy, when in fact these parameters control randomness, not knowledge grounding; the real solution is to provide the model with external factual context via RAG.

How to eliminate wrong answers

Option A is wrong because increasing the 'temperature' parameter makes the model more random and creative, which would increase the likelihood of hallucinations, not reduce them. Option C is wrong because few-shot prompting with only three examples is insufficient to cover the vast number of niche products and does not incorporate the dynamic, daily-updated internal database, leading to stale or missing information. Option D is wrong because reducing 'topP' to 0.1 forces the model to sample only from the highest probability tokens, which reduces randomness but does not provide the model with any factual context about niche products; the model will still hallucinate if the correct answer is not in its training data.

Full explanation →

686

MCQhard

A company uses RAG (Retrieval-Augmented Generation) with OCI OpenSearch and OCI Generative AI. The system retrieves irrelevant documents. What is the first step to debug?

A.Use a different LLM

B.Increase the number of retrieved documents

C.Check the embeddings quality

D.Lower the temperature

AnswerC

Embeddings directly impact retrieval relevance; low-quality embeddings cause irrelevant results.

Why this answer

When RAG retrieves irrelevant documents, the most common root cause is poor embedding quality—vectors that fail to capture semantic similarity between the query and the documents. Checking embeddings (e.g., cosine similarity scores, dimensionality, or model used) is the logical first step before adjusting retrieval parameters or the LLM itself.

Exam trap

Cisco often tests the misconception that retrieval issues are caused by the LLM or its parameters, when in fact the root cause is almost always in the embedding or indexing pipeline.

How to eliminate wrong answers

Option A is wrong because swapping the LLM changes only the generation step, not the retrieval step; irrelevant documents will still be fed to any LLM. Option B is wrong because increasing the number of retrieved documents only adds more noise if the embeddings are poor, worsening the problem. Option D is wrong because lowering the temperature affects the randomness of the LLM's output, not the relevance of retrieved documents.

Full explanation →

687

MCQmedium

A company wants to use OCI Generative AI Agents for a question-answering system over their internal knowledge base stored in OCI Object Storage. The data consists of PDF and Word documents. What is the first step to make this data usable by the agent?

A.Use the Embedding API to generate embeddings for all documents

B.Create a dedicated AI cluster for inference

C.Create an OCI Generative AI Agent

D.Create a knowledge base and associate it with the Object Storage bucket

AnswerD

The knowledge base indexes the documents from Object Storage, enabling the agent to retrieve relevant content for answering questions.

Why this answer

A knowledge base must be created and linked to the data source (Object Storage bucket) so the agent can index and retrieve the content. Creating an agent first without a knowledge base would not work. The Embedding API is lower-level; the agent service abstracts this.

Full explanation →

688

MCQeasy

Which of the following is a primary limitation of large language models that can lead to generating factually incorrect information?

A.Bias in training data

B.Hallucinations

C.Context window limitation

D.Knowledge cutoff

AnswerB

Hallucinations occur when the model generates content that is not factually accurate or grounded in the training data.

Why this answer

Hallucinations are a primary limitation of large language models because they cause the model to generate text that is factually incorrect, nonsensical, or not grounded in the training data. This occurs due to the probabilistic nature of token prediction, where the model prioritizes fluency and coherence over factual accuracy, especially when the prompt lacks sufficient context or the model is asked to recall specific facts not well-represented in its training.

Exam trap

Cisco often tests the distinction between hallucinations and other limitations like bias or context windows, so the trap here is that candidates confuse 'bias in training data' with factual inaccuracy, when bias is about systematic prejudice, not random or confident fabrication of false facts.

How to eliminate wrong answers

Option A is wrong because bias in training data leads to skewed or prejudiced outputs, not necessarily factually incorrect information; it affects fairness and representation rather than factual accuracy. Option C is wrong because context window limitation restricts the amount of input the model can process at once, which can cause loss of context but does not directly cause the generation of factually incorrect information—it may lead to incomplete or irrelevant responses. Option D is wrong because knowledge cutoff refers to the date after which the model has no training data, meaning it cannot answer about events after that date, but it does not cause the model to fabricate facts; it simply limits the temporal scope of knowledge.

Full explanation →

689

MCQmedium

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

B.Fine-tune a base LLM on the policy documents monthly

C.Train a custom model from scratch on the policy documents each month

D.Use a larger foundation model with a longer context window and paste all documents into each prompt

AnswerA

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions by retrieving relevant chunks from the policy documents stored in a vector store at inference time, without requiring any model retraining. When documents are updated monthly, you only need to re-index the new content into the vector store, and the LLM can use the retrieved context to generate accurate answers. This decouples knowledge updates from model training, making it cost-effective and agile for frequently changing internal documents.

Exam trap

Cisco often tests the misconception that fine-tuning or training from scratch is necessary for domain-specific knowledge, when in fact RAG provides a more efficient and maintainable solution for dynamic document sets.

How to eliminate wrong answers

Option B is wrong because fine-tuning a base LLM monthly on the policy documents would require significant compute resources, time, and expertise for each update, and it risks catastrophic forgetting of previous policy versions. Option C is wrong because training a custom model from scratch each month is prohibitively expensive and impractical for a task that only requires answering questions from a relatively small set of documents. Option D is wrong because pasting all documents into each prompt would exceed typical context window limits (even with large models), incur high token costs, and degrade performance due to the model struggling to attend to the most relevant information among thousands of tokens.

Full explanation →

690

MCQmedium

A legal firm needs an AI assistant that can answer questions based on a large corpus of internal regulations that change quarterly. The firm also requires high accuracy and the ability to cite sources. Which approach should the firm choose?

A.Build a RAG application with vector search and citation generation

B.Use a pre-trained model without customization

C.Implement a rule-based search engine

D.Fine-tune a pre-trained model on the current regulations

AnswerA

RAG retrieves relevant documents and can cite sources, and updating the knowledge base is straightforward.

Why this answer

Option A is correct because Retrieval-Augmented Generation (RAG) with vector search allows the legal firm to index its quarterly-changing regulations into a vector database, retrieve the most relevant chunks for each query, and generate answers with source citations. This approach ensures high accuracy by grounding the LLM's output in the current, authoritative documents without requiring retraining, and citation generation provides the necessary source traceability for legal compliance.

Exam trap

Cisco often tests the misconception that fine-tuning is the best way to incorporate domain-specific knowledge, but the trap here is that fine-tuning cannot handle frequently changing data and does not provide source citations, whereas RAG with vector search is purpose-built for dynamic, citation-required use cases.

How to eliminate wrong answers

Option B is wrong because a pre-trained model without customization has no access to the firm's specific internal regulations, leading to hallucinated or outdated answers and no ability to cite sources. Option C is wrong because a rule-based search engine relies on static keyword matching and cannot understand semantic meaning or generate natural language answers, making it unsuitable for complex legal queries and dynamic content. Option D is wrong because fine-tuning on the current regulations would require retraining every quarter when regulations change, which is resource-intensive and does not inherently provide source citation; moreover, fine-tuning risks catastrophic forgetting of prior regulations and cannot dynamically retrieve the latest documents.

Full explanation →

691

Multi-Selectmedium

A company uses OCI Generative AI to generate product descriptions in XML format. The engineer wants to improve adherence to the XML schema. Which THREE prompt components are most critical? (Select three.)

Select 3 answers

A.Setting temperature to 0.9

B.Context/background about the company's product line

C.Output format specification: 'Use the following XML schema: <product><name>...</name></product>'

D.Task instruction: 'Generate a product description as XML'

E.Few-shot examples of valid XML product descriptions

AnswersC, D, E

Explicit format specification guides the model to produce valid XML.

Why this answer

Task instruction tells the model what to do, output format specification tells it the structure, and few-shot examples provide a concrete reference. Context/background is less critical, and temperature does not enforce schema.

Full explanation →

692

Multi-Selecthard

Which THREE components are essential for a production-grade generative AI deployment on OCI? (Select THREE)

Select 3 answers

A.OCI Logging for audit

B.OCI Vault for secrets

C.OCI Data Flow for data processing

D.Dedicated AI cluster

E.OCI IAM policies for access control

AnswersA, D, E

Logging is critical for monitoring and compliance.

Why this answer

A is correct because OCI Logging provides centralized audit logging for all API calls and resource changes in the generative AI deployment. This is essential for compliance, security monitoring, and troubleshooting in a production environment, as it captures detailed logs of model invocations, data access, and configuration changes.

Exam trap

Oracle often tests the distinction between 'essential' components for deployment versus 'useful but optional' services, leading candidates to select OCI Vault or OCI Data Flow because they are commonly used in AI pipelines, but they are not mandatory for a production-grade deployment.

Full explanation →

693

Multi-Selecthard

An OCI administrator is configuring access control for OCI Generative AI. Which three IAM components are required to allow a group of data scientists to call the GenerateText API? (Choose three.)

Select 3 answers

A.An IAM group for the data scientists

B.A local peering gateway

C.A policy granting ai-services-generative-ai-family in a compartment

D.A dynamic group

E.A compartment for the AI resources

AnswersA, C, E

The group is the subject of the policy.

Why this answer

An IAM group is required to organize the data scientists into a logical set of principals. IAM policies are then attached to this group to grant permissions, ensuring only members of the group can call the GenerateText API. Without a group, you cannot apply a policy to a collection of users.

Exam trap

The trap here is that candidates confuse dynamic groups (for resources) with IAM groups (for users), or mistakenly think a networking component like a local peering gateway is required for API access control.

Full explanation →

694

Multi-Selectmedium

Which TWO parameters directly control the randomness and diversity of generated tokens?

Select 2 answers

A.Temperature

B.Stop sequences

C.Frequency penalty

D.Top-p

E.Max tokens

AnswersA, D

Temperature scales logits to affect randomness.

Why this answer

Temperature and top-p (nucleus sampling) are the primary parameters that influence randomness and diversity.

Full explanation →

695

MCQmedium

A data scientist is evaluating two LLMs for a summarization task. Model X scores 45 on ROUGE-L, while Model Y scores 42. However, in human evaluation, Model Y is preferred 60% of the time. What is the most likely explanation?

A.Human evaluators are biased and cannot be trusted for objective assessment

B.Model Y overfits to the training data, causing poor generalisation

C.ROUGE-L measures lexical overlap, which may not capture the semantic quality that humans value

D.ROUGE-L is not a reliable metric for summarization because it only measures recall

AnswerC

ROUGE relies on n-gram overlap; Model Y might produce more concise or coherent summaries that humans prefer but that share fewer exact n-grams with the reference.

Why this answer

ROUGE-L measures the longest common subsequence (LCS) between generated and reference summaries, focusing on lexical (word-level) overlap. It does not assess semantic meaning, fluency, or factual correctness. Human evaluators often prefer summaries that are coherent and capture key ideas, even if they use different wording, which explains why Model Y can score lower on ROUGE-L but be preferred 60% of the time.

Exam trap

Cisco often tests the distinction between lexical metrics (like ROUGE) and semantic quality, trapping candidates who assume higher automated scores always indicate better performance without considering human preferences.

How to eliminate wrong answers

Option A is wrong because human evaluators are not inherently biased in this context; their preference reflects subjective quality (e.g., coherence, relevance) that automated metrics may miss. Option B is wrong because overfitting would typically cause poor performance on unseen data, but here Model Y performs worse on ROUGE-L yet is preferred by humans, suggesting it generalizes better in terms of human-perceived quality. Option D is wrong because ROUGE-L measures both precision and recall via the F1-score of the LCS, not just recall; the issue is its reliance on lexical overlap, not a limitation to recall.

Full explanation →

696

MCQhard

Refer to the exhibit. A developer sends this JSON payload to the /chat endpoint. The response includes an error that 'maxTokens' must be an integer. What is the issue?

A.The compartmentId is missing

B.The temperature value is too low

C.The parameter should be 'max_tokens' instead of 'maxTokens'

D.The model name 'cohere.command-light' is incorrect

AnswerC

The API expects snake_case parameters.

Why this answer

The OCI Generative AI service expects the parameter name 'max_tokens' (snake_case) for specifying the maximum number of tokens in the response, not 'maxTokens' (camelCase). The error message indicates that the value is not being recognized as an integer because the JSON key itself is incorrect, causing the service to fail validation.

Exam trap

Oracle often tests the difference between snake_case and camelCase parameter names in OCI services, and the trap here is that candidates familiar with OpenAI's API conventions might assume 'maxTokens' is correct, overlooking OCI's strict snake_case requirement.

How to eliminate wrong answers

Option A is wrong because the compartmentId is not required for the /chat endpoint when using a model that is accessible via the service's default compartment or when the request is authenticated via API keys that have the necessary permissions. Option B is wrong because a temperature value of 0.5 is within the valid range (typically 0.0 to 1.0) and does not cause an error about 'maxTokens' needing to be an integer. Option D is wrong because 'cohere.command-light' is a valid model name in OCI Generative AI, and the error message specifically points to the 'maxTokens' parameter, not the model name.

Full explanation →

697

MCQeasy

Which OCI Generative AI API is used to send a message and receive a model-generated response while maintaining a conversation history?

A.InferenceClient

B.Chat API

C.Embeddings API

D.Generate API

AnswerB

Chat API handles multi-turn conversations, system prompts, and history.

Why this answer

The Chat API is specifically designed for multi-turn conversations, supporting system prompts and history. The Generate API is for single-turn text generation. InferenceClient is a client class, not an API endpoint.

The Embeddings API is for vector generation.

Full explanation →

698

MCQhard

A prompt engineer wants the model to adopt a strict, professional tone for a financial report generation task. Which prompt component should be used to set this tone and persona?

A.Few-shot examples

B.User message

C.System prompt

D.Stop sequences

AnswerC

System prompt sets the assistant's persona and behavioral guidelines for the session.

Why this answer

The system prompt is the correct component because it is specifically designed to set the model's behavior, tone, and persona at the beginning of a conversation. In the context of a financial report generation task, a system prompt like 'You are a professional financial analyst. Use a strict, formal tone.' instructs the model to adopt that persona for all subsequent interactions, ensuring consistency across the entire session.

Exam trap

The trap here is that candidates often confuse the system prompt with the user message, thinking they can simply include tone instructions in the user message, but the system prompt is the designated mechanism for persistent persona and tone control in the model's API design.

How to eliminate wrong answers

Option A is wrong because few-shot examples provide the model with input-output pairs to guide the format or style of a response, but they do not establish a persistent persona or tone across the entire conversation; they are more for in-context learning of a specific task pattern. Option B is wrong because the user message is the input from the user that contains the current request or query, and while it can include tone instructions, it is not the designated component for setting a system-level persona that persists across multiple turns. Option D is wrong because stop sequences are tokens or strings that tell the model when to stop generating further output, such as 'END' or a newline, and have no role in defining the model's tone or persona.

Full explanation →

699

Multi-Selecteasy

Which TWO operations are supported by the OCI Generative AI inference API?

Select 2 answers

A.EmbedText

B.SummarizeText

C.TranslateText

D.GenerateText

E.ChatCompletion

AnswersA, D

The embed_text endpoint creates vector embeddings for input text.

Why this answer

The OCI Generative AI inference API directly supports EmbedText for generating vector embeddings from input text, and GenerateText for producing natural language completions. These are the two core operations exposed by the service for embedding and text generation tasks.

Exam trap

Cisco often tests the distinction between dedicated API operations and prompt-based capabilities, leading candidates to mistakenly select SummarizeText or ChatCompletion as separate endpoints when they are actually implemented via GenerateText with specific prompts.

Full explanation →

700

MCQmedium

A developer is building a LangChain-powered application that must maintain conversation history across multiple turns. They want to store the chat history in Oracle Database. Which memory type and persistence approach should they use?

A.ConversationSummaryMemory with in-memory storage

B.ConversationBufferWindowMemory with Redis persistence

C.ConversationKGMemory with file-based storage

D.ConversationBufferMemory with a custom chat message history class backed by Oracle Database

AnswerD

BufferMemory preserves the complete conversation; a custom Oracle-backed history class persists it durably across sessions.

Why this answer

Option D is correct because the requirement explicitly states storing chat history in Oracle Database. ConversationBufferMemory stores the full conversation history without summarization or windowing, and by implementing a custom chat message history class backed by Oracle Database, the developer can persist the conversation history directly in Oracle, meeting both the memory type and persistence requirements.

Exam trap

The trap here is that candidates may choose a memory type like ConversationSummaryMemory or ConversationBufferWindowMemory for efficiency, overlooking the explicit requirement to store the full history in Oracle Database, and instead defaulting to simpler in-memory or Redis-based solutions.

How to eliminate wrong answers

Option A is wrong because ConversationSummaryMemory stores a summarized version of the conversation, which loses detail, and in-memory storage does not persist data to Oracle Database. Option B is wrong because ConversationBufferWindowMemory only retains a fixed window of recent messages, discarding older context, and Redis persistence does not use Oracle Database. Option C is wrong because ConversationKGMemory stores a knowledge graph of entities and relationships, not the raw conversation history, and file-based storage does not use Oracle Database.

Full explanation →

701

MCQmedium

An organization wants to use OCI Generative AI to build a summarization tool but must ensure that all inference requests are logged for audit purposes. Which approach should they take?

A.Implement a custom proxy with logging

B.Enable OCI Audit service

C.Enable OCI Logging on the generative AI endpoint

D.Use OCI Vault to store logs

AnswerC

OCI Logging can capture detailed request and response data for audit.

Why this answer

Option C is correct because OCI Logging can be enabled directly on the Generative AI endpoint to capture all inference requests and responses as logs, which can then be used for audit purposes. This is the native, recommended approach for logging API calls without introducing additional infrastructure or complexity.

Exam trap

Oracle often tests the distinction between management-plane logging (OCI Audit) and data-plane logging (OCI Logging on the service endpoint), leading candidates to mistakenly choose OCI Audit for inference request auditing.

How to eliminate wrong answers

Option A is wrong because implementing a custom proxy with logging introduces unnecessary complexity, latency, and potential security gaps, and is not a native OCI solution for logging inference requests. Option B is wrong because the OCI Audit service captures only management-plane events (e.g., create, update, delete operations on resources), not data-plane events like individual inference API calls. Option D is wrong because OCI Vault is designed for storing secrets (e.g., API keys, passwords), not for storing logs; logs should be stored in OCI Logging or Object Storage.

Full explanation →

702

Multi-Selectmedium

Which TWO actions are required to use a custom fine-tuned model via OCI Generative AI? (Choose two.)

Select 2 answers

A.Deploy the model to an endpoint

B.Provision a private endpoint for the model

C.Enable cross-region replication

D.Grant access to other tenancies

E.Complete the fine-tuning job successfully

AnswersA, E

A deployed endpoint is needed to invoke the model.

Why this answer

Options B and D are required. B: Fine-tuning must be complete. D: Model endpoint must be deployed.

A is optional (private endpoint). C is not needed if within same region. E is not required unless cross-tenant.

Full explanation →

703

Multi-Selecteasy

A developer wants to persist chat history for a LangChain application so that conversations survive application restarts. Which TWO approaches are appropriate?

Select 2 answers

A.Use a Python dictionary as the memory store

B.Store messages in a database and load them on startup

C.Use AgentExecutor with a custom tool for history

D.Save the conversation to a local .txt file

E.Use Redis as a backend for ConversationBufferMemory

AnswersB, E

A database provides durable storage; messages can be reloaded into memory when the app restarts.

Why this answer

Option B is correct because storing messages in a database (e.g., SQLite, PostgreSQL) and loading them on startup provides durable, persistent storage that survives application restarts. This approach decouples conversation history from the application's in-memory state, ensuring data is not lost when the process terminates. LangChain's memory classes like `ConversationBufferMemory` can be initialized with a `chat_memory` parameter backed by a database, enabling seamless restoration of history.

Exam trap

Cisco often tests the distinction between ephemeral in-memory storage (like a Python dict) and durable persistence mechanisms (like databases or Redis with persistence enabled), trapping candidates who assume any file-based or in-memory solution is sufficient for restart survival.

Full explanation →

704

MCQhard

A data scientist is fine-tuning a generative AI model on OCI Data Science using a custom container with GPU resources. The training job fails with an out-of-memory error despite the GPU instance having sufficient memory. The job works fine on a smaller dataset. What is the most likely cause?

A.The training script has a memory leak

B.The GPU instance is not supported by OCI Data Science

C.The model is not compatible with the PyTorch version

D.The batch size is too large for the GPU memory

AnswerD

Large batch size can cause OOM errors; reducing batch size resolves it.

Why this answer

The most likely cause is that the batch size is too large for the GPU memory. Even though the GPU instance has sufficient total memory, a batch size that exceeds the available GPU memory (after accounting for model parameters, gradients, and optimizer states) will trigger an out-of-memory (OOM) error. Reducing the batch size allows the model to fit within the GPU's memory limits, which explains why the job works on a smaller dataset but fails on a larger one.

Exam trap

Oracle often tests the misconception that 'sufficient instance memory' guarantees no OOM errors, ignoring that GPU memory is a separate, finite resource that must accommodate both the model and the batch data simultaneously.

How to eliminate wrong answers

Option A is wrong because a memory leak would cause gradual memory consumption over time, not a consistent OOM error that correlates with dataset size; the error occurs immediately with a larger dataset, not after prolonged execution. Option B is wrong because OCI Data Science supports a wide range of GPU instances (e.g., VM.GPU.A10.1, VM.GPU.A100.1), and if the instance were unsupported, the job would fail with a different error (e.g., 'unsupported instance shape') rather than an OOM error. Option C is wrong because model compatibility with PyTorch version would typically cause import or runtime errors (e.g., 'module not found' or 'operator not implemented'), not an OOM error; PyTorch version mismatches do not directly affect memory allocation.

Full explanation →

705

MCQhard

In the self-attention mechanism, what is the role of the 'scaling factor' (division by sqrt(d_k)) in the softmax computation?

A.To make the attention mechanism translation invariant

B.To prevent the softmax from saturating and producing small gradients

C.To increase the variance of attention scores

D.To ensure the sum of attention weights equals 1

AnswerB

Scaling avoids large values that cause softmax saturation.

Why this answer

Scaling prevents the dot products from growing too large in magnitude, which would push softmax into regions with extremely small gradients.

Full explanation →

706

Multi-Selectmedium

A developer is building a multilingual search application and needs to generate embeddings for user queries in multiple languages. Which two options are correct? (Select TWO)

Select 2 answers

A.Use the embed-english-v3.0 model

B.Set the input type to 'search_document'

C.Use the embed-multilingual-v3.0 model

D.Set the input type to 'search_query'

E.Set the input type to 'classification'

AnswersC, D

This model supports multiple languages for embedding.

Why this answer

OCI Generative AI provides embed-multilingual-v3.0 for multilingual support. The input type 'search_query' is designed for query embeddings. Embed-english-v3.0 is English-only. 'search_document' is for documents, not queries. 'classification' is for classification tasks.

Full explanation →

707

MCQeasy

Which OCI Generative AI model is specifically designed to generate embeddings for English text?

A.Meta Llama 3 8B

B.Cohere embed-multilingual-v3.0

C.Cohere embed-english-v3.0

D.Cohere Command R

AnswerC

This is the dedicated English embedding model.

Why this answer

Cohere embed-english-v3.0 is the embedding model for English text. The other options are either generative models or multilingual embedding models.

Full explanation →

708

MCQeasy

Which of the following best describes the role of positional encoding in the Transformer architecture?

A.To compress the input sequence length

B.To increase the dimensionality of the hidden states

C.To reduce the effect of vanishing gradients

D.To provide information about the order of tokens in the input sequence

AnswerD

Positional encodings inject positional information so the model can use word order.

Why this answer

Positional encoding is essential in the Transformer architecture because the self-attention mechanism processes all tokens in parallel and has no inherent sense of order. By adding sinusoidal or learned positional vectors to the input embeddings, the model gains information about the relative or absolute position of each token in the sequence, enabling it to understand word order and sequence structure.

Exam trap

Cisco often tests the misconception that positional encoding is used to increase model capacity or dimensionality, when in fact it is purely a mechanism to inject sequence order information into a permutation-invariant attention mechanism.

How to eliminate wrong answers

Option A is wrong because positional encoding does not compress the input sequence length; sequence length compression is handled by pooling or stride mechanisms, not by positional encoding. Option B is wrong because positional encoding adds information to the existing embedding dimension but does not increase the dimensionality of the hidden states; it is added element-wise to the input embeddings of the same dimension. Option C is wrong because positional encoding does not address vanishing gradients; vanishing gradients are mitigated by residual connections and layer normalization in the Transformer, not by positional encoding.

Full explanation →

709

MCQmedium

When using LangChain's RetrievalQA chain with `chain_type="stuff"`, what happens if the retrieved documents exceed the model's context window?

A.The chain automatically truncates the documents to fit

B.The chain raises an error because the input is too long

C.The chain uses a sliding window to summarize documents

D.The chain switches to a different model with larger context

AnswerB

Stuff chain will fail due to context length exceeded.

Why this answer

The "stuff" chain type simply concatenates all retrieved documents into the prompt. If the total exceeds the context window, a TokenLimitError or similar error occurs. Other chain types like "map_reduce" or "refine" can handle larger contexts.

Full explanation →

710

MCQeasy

What is the primary purpose of the self-attention mechanism in a Transformer model?

A.To generate token embeddings in parallel

B.To reduce the dimensionality of token embeddings

C.To encode positional information of tokens

D.To compute a weighted sum of all token representations based on pairwise relevance

AnswerD

Self-attention computes attention scores between all pairs and aggregates information.

Why this answer

Self-attention allows each token to attend to every other token in the sequence, capturing contextual relationships regardless of distance.

Full explanation →

711

MCQhard

A developer is implementing a text generation pipeline and wants to produce diverse, creative outputs. They set temperature=1.2, top_k=50, and top_p=1.0. What is the MOST likely effect of this combination?

A.The output will be identical to greedy decoding because top_p=1.0 disables sampling

B.The output will be mostly factual because top_k filters out unlikely tokens

C.The output will be diverse and creative, but may occasionally be incoherent or off-topic

D.The output will be highly deterministic and repetitive

AnswerC

High temperature increases randomness, and the relaxed cutoffs allow less likely tokens, yielding creative but sometimes nonsensical outputs.

Why this answer

Temperature >1 flattens the probability distribution, making low-probability tokens more likely. top_k=50 restricts to top 50 tokens, but top_p=1.0 imposes no cumulative probability cutoff. The combination yields diverse but potentially incoherent outputs.

Full explanation →

712

Multi-Selecteasy

Which TWO of the following sampling strategies introduce randomness into text generation?

Select 2 answers

A.Beam search

B.Greedy decoding

C.Temperature sampling

D.Top-k sampling

E.Top-p (nucleus) sampling

AnswersC, E

Temperature scales the logits before softmax, affecting the randomness of the distribution.

Why this answer

Temperature sampling and top-p (nucleus) sampling both introduce randomness by adjusting the probability distribution. Greedy decoding and beam search are deterministic or near-deterministic. Top-k sampling also introduces randomness but top-p is more dynamic.

Full explanation →

713

MCQhard

During deployment of a generative AI model, the inference endpoint returns high latency and timeouts. The model is deployed on a dedicated AI cluster with multiple nodes. What is the most likely cause?

A.The inference request batch size is too small

B.The model is too large for the cluster memory

C.The cluster nodes are configured with insufficient parallelism or the model is not properly parallelized across nodes

D.The client-side network is slow

AnswerC

Correct: Without proper model parallelism, nodes may be underutilized leading to high per-request latency.

Why this answer

High latency and timeouts in a distributed AI inference deployment typically indicate that the model workload is not efficiently distributed across the cluster nodes. Option C is correct because insufficient parallelism—either due to misconfigured node resources (e.g., insufficient vCPUs, GPU cores, or memory bandwidth) or improper model sharding/parallelization—causes some nodes to become bottlenecks while others remain underutilized, leading to queuing delays and eventual timeouts.

Exam trap

Oracle often tests the misconception that high latency is always due to insufficient resources (e.g., memory or batch size), but the real trap here is that candidates overlook the critical role of parallelization configuration in distributed inference—assuming that simply adding more nodes automatically distributes the workload.

How to eliminate wrong answers

Option A is wrong because a batch size that is too small would actually reduce latency per request (though it might lower throughput), not cause high latency or timeouts; the issue here is overload, not underutilization. Option B is wrong because if the model were too large for the cluster memory, the deployment would fail to load or would crash immediately, not return high latency and timeouts during inference. Option D is wrong because client-side network slowness would manifest as high network round-trip time or packet loss, not as server-side timeouts from the inference endpoint; the problem is explicitly on the deployment side.

Full explanation →

714

MCQeasy

Refer to the exhibit. A user receives this error when using the OCI CLI to chat with a model. What is the most likely cause?

A.The model is not deployed.

B.The model ID is incorrect.

C.The OCI CLI is not configured with the correct region.

D.The user does not have the required IAM policy to invoke the model.

AnswerD

Correct: The 'AuthorizationFailure' error indicates insufficient permissions.

Why this answer

The error occurs because the user lacks the necessary IAM policy to invoke the model. In OCI, even if the model is deployed and the CLI is correctly configured, the IAM policy must grant the user or group the 'inference' permission on the specific model or model family. Without this policy, the OCI CLI returns an authorization error when attempting to chat with the model.

Exam trap

The trap here is that candidates often assume the error is due to a misconfiguration (region or model ID) rather than a missing IAM policy, because the CLI error message may not explicitly say 'authorization' and instead show a generic 'service error'.

How to eliminate wrong answers

Option A is wrong because if the model were not deployed, the error would typically indicate that the model endpoint is unavailable or not found, not an authorization failure. Option B is wrong because an incorrect model ID would result in a 'model not found' or 'invalid parameter' error, not an authorization error. Option C is wrong because an incorrect region configuration would cause connectivity or endpoint resolution errors, such as 'region not found' or 'endpoint unreachable', not an IAM permission error.

Full explanation →

715

MCQmedium

A healthcare company wants to use OCI Generative AI to summarize patient medical records while ensuring PHI compliance. Which OCI service feature should they enable?

A.Configure a Virtual Cloud Network (VCN) with private subnets

B.Deploy a Web Application Firewall (WAF) in front of the API

C.Set up Identity and Access Management (IAM) policies to restrict access

D.Use the data masking capability within OCI Generative AI

AnswerD

OCI Generative AI supports data masking to redact sensitive information like PHI.

Why this answer

Option D is correct because OCI Generative AI includes a built-in data masking capability specifically designed to automatically detect and redact sensitive information such as Protected Health Information (PHI) from input prompts and generated outputs. This feature ensures compliance with healthcare regulations like HIPAA by preventing PHI from being stored, logged, or exposed during inference, without requiring external preprocessing or post-processing.

Exam trap

Cisco often tests the misconception that network security (VCN) or access control (IAM) alone can satisfy PHI compliance, when in fact content-level data masking is required to prevent sensitive data from being processed or stored by the AI model.

How to eliminate wrong answers

Option A is wrong because configuring a VCN with private subnets controls network-level isolation and traffic routing, but it does not inspect or mask PHI within the content processed by the Generative AI service; it only secures the network path. Option B is wrong because a Web Application Firewall (WAF) protects against web-based attacks like SQL injection or XSS, but it lacks the ability to detect or redact sensitive data patterns such as medical record numbers or patient names within API payloads. Option C is wrong because IAM policies govern user and resource permissions (who can call the API), but they do not perform content-level data masking or PHI redaction within the AI model's inputs or outputs.

Full explanation →

716

MCQhard

Your company uses OCI Data Science for model development and deployment. You have a generative AI model that requires dynamic batching for efficient inference. You deployed the model using the OCI Model Deployment service with a custom inference script in a Docker container. However, you notice that the batch size is fixed at 1, leading to low throughput. The model can process multiple requests together efficiently. You want to implement dynamic batching to increase throughput without significantly increasing latency for individual requests. What is the best approach?

A.Modify the model deployment to use a larger GPU shape to handle larger batches

B.Enable the model deployment's built-in request batching feature

C.Use OCI Streaming service to buffer requests and then invoke the model in batches from a consumer

D.Implement a queuing mechanism in the inference script that collects incoming requests and processes them in batches

AnswerD

This is a common pattern for dynamic batching and can be done within the custom container.

Why this answer

Option D is correct because dynamic batching must be implemented at the application level within the custom inference script when using OCI Model Deployment. The service does not provide built-in request batching; instead, you need to collect incoming requests in a queue and process them together in a single forward pass, which maximizes GPU utilization while controlling latency via a timeout or max batch size.

Exam trap

The trap here is that candidates assume OCI Model Deployment has a built-in batching feature similar to some cloud ML services, but OCI requires you to implement batching logic yourself in the custom inference script.

How to eliminate wrong answers

Option A is wrong because simply using a larger GPU shape does not change the fact that the inference script processes one request at a time; throughput gains require batching logic, not just more compute. Option B is wrong because OCI Model Deployment does not have a built-in request batching feature; this is a common misconception—the service routes each request individually to the container. Option C is wrong because OCI Streaming is designed for asynchronous, durable message buffering and would introduce significant latency and complexity; it is not suitable for real-time inference where low latency is critical.

Full explanation →

717

MCQhard

An AI assistant needs to solve complex math word problems step by step. Which prompting technique is most suitable?

A.Chain-of-thought prompting with few-shot examples.

B.Zero-shot prompting with the problem only.

C.Prompting with a high temperature setting.

D.Using a model with a larger context window.

AnswerA

Correct: CoT with examples guides reasoning.

Why this answer

Chain-of-thought prompting with few-shot examples is most suitable because it guides the LLM to break down complex math word problems into intermediate reasoning steps, mimicking human problem-solving. Few-shot examples provide a template for the desired reasoning structure, which significantly improves accuracy on multi-step arithmetic tasks compared to direct answer generation.

Exam trap

Oracle often tests the misconception that simply increasing model capacity (context window) or randomness (temperature) can substitute for structured reasoning, when in fact the prompting strategy itself is the critical factor for multi-step tasks.

How to eliminate wrong answers

Option B is wrong because zero-shot prompting lacks the explicit reasoning structure needed for multi-step math problems, often leading to incorrect or incomplete answers. Option C is wrong because a high temperature setting increases randomness in token selection, which is counterproductive for deterministic math tasks requiring precise calculations. Option D is wrong because a larger context window does not inherently improve reasoning quality; it only allows more input tokens, but without structured prompting the model may still fail to perform step-by-step logic.

Full explanation →

718

MCQhard

An ML engineer is selecting a pre-trained model for a code generation task. The model must be able to generate syntactically correct code in multiple programming languages. Which model family is BEST suited for this task?

A.Meta Llama (Code Llama variant)

B.BERT

C.Cohere Command

D.Mistral

AnswerA

Code Llama is a variant of Llama fine-tuned on code, making it well-suited for code generation across languages.

Why this answer

Models like Code Llama (a variant of Llama) are specifically fine-tuned on code and are known for strong code generation capabilities. While other models can generate code, Code Llama is the best fit among the options.

Full explanation →

719

Multi-Selectmedium

A data scientist is designing a RAG system using OCI Data Science and OCI Generative AI. Which two considerations are critical for optimal retrieval quality? (Choose 2.)

Select 2 answers

A.Use the same embedding model for both indexing and querying.

B.Increase the chunk size to the maximum allowed to capture more context.

C.Fine-tune the generation model on domain-specific data.

D.Apply metadata filtering to restrict search domain before vector search.

E.Use a hierarchical index structure for faster search.

AnswersA, D

Mismatched embeddings reduce similarity accuracy.

Why this answer

Option A is correct because using the same embedding model for both indexing and querying ensures that the vector representations of documents and queries lie in the same semantic space, which is critical for accurate cosine similarity comparisons. If different models are used, the embeddings would be misaligned, leading to poor retrieval quality even if the models are similar in architecture.

Exam trap

Cisco often tests the distinction between retrieval quality and generation quality, so candidates mistakenly choose fine-tuning (Option C) or performance optimizations (Option E) when the question explicitly asks about retrieval quality.

Full explanation →

720

MCQhard

A company is building a customer support chatbot that uses Retrieval-Augmented Generation (RAG) with OCI Generative AI. They need low-latency responses and the ability to update the knowledge base daily. Which architecture best meets these requirements?

A.Store embeddings in OCI Object Storage and use OCI Functions to perform similarity search.

B.Use OCI Data Science Notebook Sessions to run the RAG pipeline with a managed Cohere model.

C.Use OCI Streaming to ingest documents and OCI Data Flow to update a knowledge base in OCI Object Storage.

D.Use OCI Search with OpenSearch for the vector database, OCI Generative AI for inference, and Oracle Database for metadata.

AnswerD

OpenSearch provides low-latency vector search and supports daily indexing updates.

Why this answer

Option D is correct because it combines OCI Search with OpenSearch as a vector database for efficient similarity search, OCI Generative AI for inference, and Oracle Database for metadata management. This architecture provides low-latency responses by leveraging OpenSearch's optimized vector indexing and allows daily knowledge base updates through Oracle Database's robust data management capabilities.

Exam trap

Cisco often tests the misconception that any storage service (like Object Storage) can serve as a vector database, but candidates must recognize that low-latency similarity search requires a purpose-built vector database like OpenSearch.

How to eliminate wrong answers

Option A is wrong because OCI Object Storage is a blob store, not a vector database; using OCI Functions for similarity search would be slow and unscalable due to lack of optimized indexing. Option B is wrong because OCI Data Science Notebook Sessions are designed for development and experimentation, not production-grade low-latency inference, and they lack a vector database for efficient retrieval. Option C is wrong because OCI Streaming is for real-time data ingestion, not for updating a knowledge base, and OCI Data Flow is a batch processing service that does not provide the low-latency query capability required for RAG.

Full explanation →

721

MCQeasy

A developer is building a RAG application using OCI Generative AI. They notice that the generated responses often contain outdated information even though the knowledge base is updated daily. What is the most likely cause?

A.The embedding model is not fine-tuned on the latest data.

B.The vector database index is not rebuilt after data updates.

C.The retrieval top-k is set too high.

D.The chunk size is too small, causing loss of context.

AnswerB

If the index is not refreshed, new data is not searchable, leading to outdated results.

Why this answer

Option B is correct because in a RAG pipeline, the vector database index is a static snapshot of the embedded knowledge base. When the knowledge base is updated daily, the index must be rebuilt or incrementally updated to reflect the new data. Without rebuilding, the retrieval step will still search the old index, returning outdated chunks and causing the LLM to generate stale responses.

Exam trap

Cisco often tests the misconception that embedding model fine-tuning or chunk size adjustments are the primary cause of outdated responses, when in fact the root cause is the failure to rebuild or update the vector index after data changes.

How to eliminate wrong answers

Option A is wrong because fine-tuning the embedding model on the latest data is not required for RAG; embeddings are typically generated by a pre-trained model and the retrieval quality depends on the index reflecting the current data, not on model fine-tuning. Option C is wrong because setting top-k too high would retrieve more chunks, potentially including irrelevant ones, but it would not cause the responses to contain outdated information—the retrieved chunks would still come from the old index. Option D is wrong because a small chunk size may cause loss of context, leading to incomplete or fragmented answers, but it does not directly cause the use of outdated information; the core issue is the index not being refreshed.

Full explanation →

722

MCQmedium

A developer wants to deploy a RAG application using OCI Generative AI for both embedding and text generation while minimizing costs. Which strategy is most effective?

A.Use a larger generation model

B.Cache frequent queries and their embeddings

C.Reduce chunk size to decrease embedding calls

D.Use a larger embedding model for better accuracy

AnswerB

Caching reduces redundant embedding API calls, lowering costs.

Why this answer

Caching embeddings for frequent queries eliminates repeated embedding API calls, directly reducing cost.

Full explanation →

723

MCQhard

A team is fine-tuning a Llama 3 model using OCI Generative AI. The training dataset contains 10,000 prompt-completion pairs in JSONL format. After submitting the fine-tuning job, it fails with a 'Data validation error'. What is the most likely cause?

A.The base model selected does not support fine-tuning

B.The dataset is in the wrong compartment

C.The fine-tuning job does not have enough model units allocated

D.The JSONL file uses 'input' and 'output' as key names instead of 'prompt' and 'completion'

AnswerD

OCI Generative AI expects exact key names 'prompt' and 'completion' in the JSONL training file; using alternate keys causes a data validation error.

Why this answer

OCI fine-tuning requires a specific JSONL format with 'prompt' and 'completion' keys. If the keys are named differently (e.g., 'input'/'output'), validation fails. Model units, compartment, and base model are not typically validation issues.

Full explanation →

724

MCQmedium

A developer is using Cohere Command to answer questions grounded in internal technical manuals. They want to ensure the model only answers based on the provided documents and does not use its pre-trained knowledge. Which Cohere-specific technique should be applied?

A.Fine-tune the model on the technical manuals

B.Set temperature to 0 in the generation parameters

C.Use the document-grounded generation syntax by providing documents in the chat history with explicit citation instructions

D.Use the preamble to instruct the model to answer only from the documents

AnswerC

Cohere supports document-grounded generation where you supply documents and specify they are the only source.

Why this answer

Cohere's document-grounded generation syntax allows you to supply search results or documents and instruct the model to answer solely from those documents, reducing hallucination.

Full explanation →

725

MCQeasy

Which of the following is NOT an available model in OCI Generative AI service?

A.Cohere Embed English v3.0

B.OpenAI GPT-4

C.Meta Llama 3

D.Cohere Command R+

AnswerB

GPT-4 is not part of OCI Generative AI service.

Why this answer

OCI GenAI offers Cohere Command R, Command R+, Meta Llama 3, Cohere Embed models, and Cohere Rerank. GPT-4 is an OpenAI model and is not available in OCI GenAI.

Full explanation →

726

MCQmedium

A developer sends this request but receives an error: "modelId not found". Which is the most likely cause?

A.The temperature parameter is out of range.

B.The compartment ID is incorrect.

C.The modelId "cohere.command" is deprecated.

D.The presencePenalty parameter is misspelled.

AnswerC

Deprecated model IDs return 'not found' errors; the correct ID should be used.

Why this answer

The error 'modelId not found' indicates that the model identifier specified in the request does not exist or is no longer available. In Oracle Cloud Infrastructure (OCI) Generative AI service, model IDs like 'cohere.command' are deprecated and replaced with newer versions (e.g., 'cohere.command-r'). Using a deprecated modelId causes the service to reject the request because it cannot locate a valid model for that identifier.

Exam trap

Cisco often tests the distinction between model-level errors (like 'modelId not found') and parameter-level errors (like out-of-range or misspelled parameters), tempting candidates to confuse a deprecated model ID with a simple configuration mistake.

How to eliminate wrong answers

Option A is wrong because the temperature parameter being out of range would produce a validation error (e.g., 'Invalid parameter: temperature'), not a 'modelId not found' error. Option B is wrong because an incorrect compartment ID would result in an authorization or permission error (e.g., 'Not authorized' or 'Compartment not found'), not a model lookup failure. Option D is wrong because a misspelled parameter like 'presencePenalty' would cause a parameter validation error (e.g., 'Unrecognized parameter'), not a model ID resolution issue.

Full explanation →

727

MCQhard

A large organization is deploying a multi-tenant RAG application on OCI, where each tenant has its own set of documents. They use a shared OCI OpenSearch cluster with tenant_id metadata to filter documents. They observe that occasionally, queries from one tenant return results from another tenant's documents. The security team requires strict isolation. They have verified that the metadata filter is correctly applied in the search request. What is the most likely root cause?

A.The OpenSearch index has not been refreshed after ingestion of new documents.

B.The tenant_id field is not indexed as a keyword, causing incorrect filtering.

C.The embedding model has been trained on data from multiple tenants, causing cross-tenant leakage.

D.The metadata filter is being applied after the vector search instead of before.

AnswerB

If the field is not indexed properly, the filter may not match correctly, returning results from other tenants.

Why this answer

Option D is correct because if the tenant_id field is not indexed as a keyword, the filter may not be applied correctly, leading to cross-tenant results. Option A (index not refreshed) affects availability, not isolation. Option B (order of filter) does not matter.

Option C (embedding model training) does not cause retrieval leakage.

Full explanation →

728

MCQmedium

A company has deployed a generative AI model endpoint on OCI. They want to monitor token usage and latency for cost optimization. Which OCI service should they use to collect these metrics?

A.OCI Monitoring

B.OCI Events

C.OCI Notifications

D.OCI Logging

AnswerA

OCI Monitoring collects and visualizes metrics such as token count and latency.

Why this answer

A is correct because OCI Monitoring is the native telemetry service that collects and stores metrics such as token usage (e.g., input/output token counts) and latency (e.g., model inference latency) from OCI Generative AI endpoints. These metrics are automatically emitted by the OCI Generative AI service and can be queried via the Monitoring API or visualized in the Console, enabling cost optimization by tracking consumption patterns.

Exam trap

The trap here is that candidates confuse OCI Logging (which collects unstructured logs) with OCI Monitoring (which collects structured metrics), leading them to select Logging for numeric performance data like token counts and latency.

How to eliminate wrong answers

Option B (OCI Events) is wrong because OCI Events is a notification service that triggers actions based on changes in OCI resources (e.g., state transitions), not a service for collecting time-series metrics like token usage or latency. Option C (OCI Notifications) is wrong because OCI Notifications is a pub/sub messaging service for distributing alerts and messages, not a metric collection or storage service. Option D (OCI Logging) is wrong because OCI Logging captures log data (e.g., text-based audit logs, error logs) from resources, not structured numeric metrics; metrics require OCI Monitoring's custom or predefined metric streams.

Full explanation →

729

Multi-Selectmedium

A developer is using LangChain to build a RAG pipeline that processes PDF documents. They need to split the text into chunks and store embeddings in Oracle Database 23ai. Which two components are essential? (Choose TWO.)

Select 2 answers

A.WebBaseLoader

B.RecursiveCharacterTextSplitter

C.OCIGenAIEmbeddings

D.OracleVS

E.OCIGenAI

AnswersB, D

This splitter is commonly used to break PDF text into manageable chunks for embedding.

Why this answer

RecursiveCharacterTextSplitter is used to split documents into chunks, and OracleVS is the LangChain wrapper for Oracle AI Vector Search to store embeddings and perform similarity search. The other options are either not essential (WebBaseLoader loads web pages, not PDFs) or not for storage (OCIGenAI is an LLM wrapper).

Full explanation →

730

MCQhard

During fine-tuning, a user notices the loss does not decrease after several epochs. The dataset is a JSONL file with 500 prompt/completion pairs. What is the MOST likely cause?

A.The JSONL format is incorrect because it lacks system prompts

B.The base model is not compatible with the T-Few technique

C.The dataset is too small; T-Few fine-tuning generally needs at least 1000 examples

D.The learning rate is too high, causing the model to diverge

AnswerC

T-Few is efficient but still requires a minimum dataset size to learn effectively.

Why this answer

Fine-tuning with T-Few typically requires at least 1000 examples for meaningful learning. The dataset size is likely insufficient.

Full explanation →

731

MCQhard

A multinational corporation uses OCI Generative AI to power a customer support chatbot. The chatbot uses a fine-tuned model deployed on a dedicated AI cluster in the us-ashburn-1 region. The application is used globally, and users in Europe are experiencing high latency (over 2 seconds) compared to users in North America (under 500 ms). The company has a requirement to keep all data within the US due to compliance, so they cannot deploy in Europe. The latency is not due to network bandwidth but due to the inference time. The monitoring shows that the cluster is at 80% utilization during peak hours. The team wants to reduce the latency for European users without violating data residency. What is the best course of action?

A.Optimize the model using techniques like quantization or pruning to reduce inference time.

B.Implement an edge caching layer in Europe to serve common queries.

C.Increase the number of nodes in the cluster to distribute the load.

D.Deploy an additional endpoint in a European region and use a global load balancer.

AnswerA

Model optimization directly reduces per-request latency without moving data.

Why this answer

Option A is correct because the latency issue is explicitly due to inference time, not network bandwidth or cluster utilization. Model optimization techniques like quantization (reducing precision of weights from FP32 to INT8) and pruning (removing redundant neurons) directly reduce the computational cost per inference, thereby lowering the response time without moving data or changing the deployment region. This approach satisfies the data residency constraint while addressing the root cause of high latency for European users.

Exam trap

The trap here is that candidates may confuse latency caused by inference time with latency caused by network distance or cluster load, leading them to choose scaling or caching solutions that do not address the fundamental computational bottleneck.

How to eliminate wrong answers

Option B is wrong because an edge caching layer in Europe would only serve cached responses for common queries; it does not reduce inference time for unique or dynamic queries, and caching introduces stale data risks for a customer support chatbot that may require real-time accuracy. Option C is wrong because increasing the number of nodes in the cluster addresses throughput (handling more concurrent requests) but does not reduce the per-request inference time; with 80% utilization, the cluster is not saturated, so adding nodes would not lower latency for individual inference calls. Option D is wrong because deploying an additional endpoint in a European region would violate the compliance requirement to keep all data within the US; even with a global load balancer, inference would still require data processing in Europe, which is not permitted.

Full explanation →

732

MCQhard

A data scientist is fine-tuning a model on OCI Generative AI with a custom dataset. They receive a "QuotaExceeded" error during training. What is the most likely cause?

A.Exceeded the training compute unit quota

B.Exceeded the API call rate limit

C.Exceeded the model storage limit

D.Exceeded the data transfer out limit

AnswerA

Fine-tuning uses training compute units; quota may be exceeded.

Why this answer

The 'QuotaExceeded' error during fine-tuning on OCI Generative AI specifically indicates that the training job has consumed more compute units than allocated in the service limit. Fine-tuning requires dedicated training compute units (TCUs) which are a separate quota from inference or API calls. When this quota is exhausted, the service rejects new training jobs with this error.

Exam trap

Cisco often tests the distinction between different types of quotas (compute vs. API rate vs. storage vs. egress) to see if candidates understand that 'QuotaExceeded' in the context of training specifically refers to compute resource limits, not API throttling or storage caps.

How to eliminate wrong answers

Option B is wrong because API call rate limits apply to inference requests (e.g., generating text), not to training compute resources; exceeding them would return a '429 Too Many Requests' error, not 'QuotaExceeded'. Option C is wrong because model storage limits apply to the number or size of models you can store in the OCI Generative AI model catalog, not to the compute resources used during training. Option D is wrong because data transfer out limits are related to egress traffic from OCI to the internet, not to internal training operations within the service.

Full explanation →

733

MCQmedium

Which prompting technique involves generating multiple independent reasoning paths and then selecting the most common answer?

A.Chain-of-thought prompting

B.Few-shot prompting

C.Self-consistency prompting

D.Zero-shot prompting

AnswerC

Self-consistency generates multiple reasoning chains and aggregates the results to increase robustness.

Why this answer

Self-consistency runs chain-of-thought multiple times and aggregates answers (e.g., by majority vote) to improve reliability. The other options are different techniques.

Full explanation →

734

MCQmedium

Refer to the exhibit. A developer runs this command and sees that the 'cohere.embed-english-v3.0' model is INACTIVE. What is the most likely cause?

A.The model is not supported in the current region.

B.The API call lacks the required OCI policy for the model.

C.The model has been deprecated and is no longer available.

D.The compartment does not have access to the model.

AnswerC

An INACTIVE state indicates the model has been deprecated or retired, making it unavailable for new inference requests.

Why this answer

The 'cohere.embed-english-v3.0' model is listed as INACTIVE because Oracle Cloud Infrastructure (OCI) has deprecated it, meaning it is no longer available for inference. When a model is deprecated, its status changes to INACTIVE, and any attempt to invoke it will fail, even if the region, policies, and compartment permissions are correctly configured.

Exam trap

Oracle often tests the distinction between model lifecycle states (INACTIVE vs. ACTIVE) and common operational errors (policy, region, compartment), leading candidates to confuse a deprecation event with a configuration or permission issue.

How to eliminate wrong answers

Option A is wrong because if the model were unsupported in the current region, the command would typically return a 'not found' or 'unsupported' error, not an INACTIVE status. Option B is wrong because a missing OCI policy would result in a 403 Forbidden or authorization error, not an INACTIVE model status. Option D is wrong because compartment access issues would produce a permissions error, not an INACTIVE status; the model's availability is independent of compartment-level access.

Full explanation →

735

Multi-Selectmedium

Which TWO of the following are benefits of using OCI Generative AI service compared to self-hosting an LLM?

Select 2 answers

A.Lower latency always

B.No data egress costs

C.Built-in content safety filters

D.Automatic scaling

E.Full control over model weights

AnswersC, D

OCI Generative AI includes safety filters.

Why this answer

Option C is correct because OCI Generative AI service includes built-in content safety filters that automatically detect and block harmful or inappropriate content (e.g., hate speech, violence, sexual content) without requiring manual configuration. This is a key advantage over self-hosting, where you must implement and maintain your own content moderation pipeline.

Exam trap

The trap here is that candidates often assume a managed service always provides lower latency or eliminates all data transfer costs, but in reality, self-hosting can be optimized for latency and egress costs depend on network architecture, not just the service model.

Full explanation →

736

MCQhard

A researcher is evaluating two LLMs for a summarization task. Model A achieves a ROUGE-L score of 0.45 and a BERTScore of 0.92. Model B achieves a ROUGE-L score of 0.50 and a BERTScore of 0.88. Which model is likely better for producing summaries that are semantically faithful to the source, even if not using the exact same words?

A.Neither model is acceptable because ROUGE-L is below 0.6

B.Both are equally good because the scores are close

C.Model B because ROUGE-L is higher

D.Model A because BERTScore is higher

AnswerD

Higher BERTScore suggests better semantic alignment with the source, which is more important for faithfulness.

Why this answer

BERTScore measures semantic similarity using contextual embeddings, while ROUGE-L measures n-gram overlap. Higher BERTScore indicates better semantic faithfulness even without exact phrase matches.

Full explanation →

737

MCQhard

In Oracle AI Vector Search, which index type is designed for approximate nearest neighbor search and employs a hierarchical navigable small world graph, offering high recall and fast search speeds for high-dimensional data?

A.BTREE

B.IVF

C.HNSW

D.VECTOR

AnswerC

HNSW uses a multi-layer graph structure for fast approximate nearest neighbor search with high recall.

Why this answer

HNSW (Hierarchical Navigable Small World) is a graph-based index that provides efficient approximate nearest neighbor search. IVF (Inverted File) uses clustering, and BTREE is for scalar data. VECTOR is a data type, not an index.

Full explanation →

738

MCQhard

A security team requires that all OCI GenAI API calls be logged and audited. Despite enabling Audit logs in OCI, they do not see GenAI API calls. What is the most likely reason?

A.The audit log retention policy is too short and logs were overwritten.

B.The user is not a tenancy administrator.

C.OCI Audit currently only records control-plane operations; data-plane operations like inference are not logged.

D.The API calls are made by an OCI function, which is not logged.

AnswerC

Data-plane calls (e.g., model inference) are not captured by Audit; use Service Connector Hub for logging.

Why this answer

C is correct because OCI Audit service is designed to log control-plane operations (e.g., creating, updating, or deleting resources) but does not log data-plane operations such as inference API calls to the Generative AI service. The GenAI inference calls (e.g., generating text) are data-plane operations that occur on the service endpoint, not on the OCI control-plane API, so they are not captured by Audit logs. To log data-plane operations, you would need to use a different mechanism, such as OCI Vault for key usage or custom logging via API Gateway.

Exam trap

The trap here is that candidates assume enabling Audit logs captures all API activity, but OCI Audit explicitly excludes data-plane operations, which is a common misconception tested in the 1Z0-1127 exam.

How to eliminate wrong answers

Option A is wrong because audit log retention policies affect how long logs are kept, not whether specific API calls are recorded in the first place; if the calls were never logged, retention is irrelevant. Option B is wrong because tenancy administrator privileges are not required to view Audit logs; any user with the appropriate IAM policies (e.g., Audit Log Readers) can access them, and the issue is about logging scope, not permissions. Option D is wrong because OCI Functions calls are logged if they are control-plane operations; the fact that an API call originates from a function does not exclude it from Audit logging—the exclusion is based on whether the call is control-plane or data-plane.

Full explanation →

739

Multi-Selecthard

An OCI user is troubleshooting a prompt that sometimes produces outputs containing offensive language. The prompt uses a system prompt to set a professional tone. Which THREE steps should the user take to mitigate this issue? (Select three.)

Select 3 answers

A.Apply a frequency penalty to discourage repetition of offensive phrases

B.Increase temperature to 0.9 to dilute the offending outputs

C.Test the prompt with diverse inputs including adversarial examples

D.Add a constraint in the system prompt: 'Do not use offensive or inappropriate language.'

E.Remove the system prompt to avoid overriding model's safety training

AnswersA, C, D

Frequency penalty reduces token repetition, which can help if offensive language appears repeatedly.

Why this answer

Adding explicit constraints in the system prompt, setting frequency/presence penalties to reduce undesirable patterns, and testing with adversarial inputs are effective safeguards. Using high temperature or removing the system prompt would worsen the problem.

Full explanation →

740

Multi-Selectmedium

Which TWO of the following are required to fine-tune a model using OCI Generative AI Service?

Select 2 answers

A.A training dataset in the required format

B.The base model identifier

C.A compartment with sufficient quota

D.An OCI API key

E.A dedicated AI cluster

AnswersA, B

Training data is essential for fine-tuning.

Why this answer

A is correct because fine-tuning a model in OCI Generative AI Service requires a training dataset in the required format (JSONL with prompt-completion pairs) to provide the task-specific examples that adjust the model's weights. B is correct because you must specify the base model identifier (e.g., 'cohere.command-light-14-07-2024') to indicate which pre-trained model to fine-tune, as the service uses this to load the correct architecture and initial parameters.

Exam trap

Oracle often tests the misconception that you need a dedicated AI cluster or an API key for every operation, but OCI Generative AI Service abstracts infrastructure management and supports multiple authentication methods, making those options distractors.

Full explanation →

741

Multi-Selectmedium

A developer is building a RAG application with LangChain and Oracle AI Vector Search. They need to choose a text splitter that respects semantic boundaries (e.g., paragraphs, sentences) and a vector store that supports transactional updates in Oracle Database 23ai. Which TWO options should they select?

Select 2 answers

A.OracleVS

B.FAISS

C.CharacterTextSplitter

D.TokenTextSplitter

E.RecursiveCharacterTextSplitter

AnswersA, E

OracleVS wraps Oracle AI Vector Search, providing transactional vector storage in Oracle Database.

Why this answer

OracleVS (Option A) is correct because it is the LangChain vector store integration specifically designed for Oracle Database 23ai, supporting transactional updates (INSERT, UPDATE, DELETE) via the Oracle AI Vector Search feature. This allows the RAG application to maintain consistency with ACID properties when modifying vector embeddings.

Exam trap

Cisco often tests the distinction between text splitters that respect semantic boundaries (RecursiveCharacterTextSplitter) versus those that split purely by character or token count, and between vector stores with transactional database support (OracleVS) versus in-memory or non-transactional stores (FAISS).

Full explanation →

742

MCQhard

A data scientist is using the OCI Generative AI SDK to create embeddings for a large corpus of legal documents. They want to perform semantic search. Which endpoint should they use?

A./v1/classify

B./v1/embed

C./v1/generate

D./v1/chat

AnswerB

The /v1/embed endpoint returns embeddings that can be stored in a vector database and used for semantic search.

Why this answer

The /v1/embed endpoint is specifically designed to generate vector embeddings from input text, which are numerical representations that capture semantic meaning. For semantic search over a large corpus of legal documents, embeddings must be created to enable similarity comparisons, making this the correct choice.

Exam trap

Oracle often tests the distinction between embedding endpoints and generation/classification endpoints, trapping candidates who confuse the purpose of semantic search (which requires embeddings) with text generation or classification tasks.

How to eliminate wrong answers

Option A is wrong because /v1/classify is used for text classification tasks (e.g., sentiment analysis or topic labeling), not for generating embeddings. Option C is wrong because /v1/generate is for text generation (e.g., completing a prompt or producing new content), not for creating vector representations. Option D is wrong because /v1/chat is designed for conversational interactions with a chat model, not for producing embeddings for semantic search.

Full explanation →

743

MCQhard

A company is using Oracle Database 23ai AI Vector Search for their RAG pipeline. They notice that similarity search often returns chunks that are semantically unrelated but syntactically similar due to token overlap. Which vector index type should they consider to improve semantic relevance?

A.IVF_SQ8 index

B.IVF_FLAT index

C.HNSW index

D.Use the default index type, which is IVF_FLAT

AnswerC

HNSW builds a hierarchical graph that captures semantic neighborhood better, reducing token overlap effects.

Why this answer

HNSW (Hierarchical Navigable Small World) indexes are designed to prioritize semantic similarity over exact token overlap by constructing a multi-layer graph that navigates based on vector distance in the embedding space. Unlike IVF-based indexes, which rely on coarse quantization and can be misled by high-frequency token patterns, HNSW's graph-based search inherently captures the global structure of the embedding space, making it more robust to syntactically similar but semantically unrelated chunks.

Exam trap

Cisco often tests the misconception that IVF indexes are always faster or more memory-efficient, leading candidates to overlook that HNSW provides superior semantic relevance in high-dimensional spaces where token overlap is a problem.

How to eliminate wrong answers

Option A is wrong because IVF_SQ8 (Inverted File with Scalar Quantization 8-bit) reduces precision of stored vectors to save memory, which can further degrade semantic relevance when token overlap is already misleading. Option B is wrong because IVF_FLAT (Inverted File with Flat centroids) partitions the vector space into Voronoi cells based on coarse centroids, and searches only the nearest cells; this can still return syntactically similar results if the centroids are dominated by high-frequency tokens. Option D is wrong because the default index type in Oracle Database 23ai AI Vector Search is actually HNSW, not IVF_FLAT, and using the default would not address the stated problem of token overlap.

Full explanation →

744

MCQeasy

A developer is using the OCI Generative AI API to generate text. The responses are often too short and incomplete. Which parameter adjustment is most likely to produce longer, more complete responses?

A.Decrease the max_tokens parameter.

B.Increase the max_tokens parameter.

C.Increase the top_p parameter.

D.Decrease the frequency_penalty parameter.

AnswerB

Increasing max_tokens gives the model more room to generate a complete response, directly addressing the issue of short outputs.

Why this answer

The max_tokens parameter controls the maximum number of tokens (words or subwords) the model can generate in a single response. By increasing max_tokens, the model is allowed to produce longer sequences, which directly addresses the issue of responses being too short and incomplete. In the OCI Generative AI API, this is the primary parameter for capping output length.

Exam trap

Oracle often tests the distinction between parameters that control output length (max_tokens) versus those that control output diversity or repetition (top_p, frequency_penalty), leading candidates to confuse 'more complete' with 'more creative' or 'less repetitive'.

How to eliminate wrong answers

Option A is wrong because decreasing max_tokens would further restrict the output length, making responses even shorter and more incomplete. Option C is wrong because increasing top_p adjusts nucleus sampling (the cumulative probability threshold for token selection) to control randomness and diversity, not the length of the output. Option D is wrong because decreasing frequency_penalty reduces the penalty for repeating tokens, which may increase repetition but does not directly extend the overall length or completeness of the response.

Full explanation →

745

MCQmedium

Refer to the exhibit. A user in group GenAIGroup cannot see models in the Production compartment using OCI Generative AI. What is the most likely issue?

A.Statement should be 'use' instead of 'read'

B.Missing 'inspect' permission

C.Policy syntax incorrect (missing quotes)

D.Resource type should be 'generative-ai-model'

AnswerD

The correct resource type is 'generative-ai-model' for OCI Generative AI models.

Why this answer

Option D is correct because the resource type in the policy statement must match the exact OCI resource type for Generative AI models, which is 'generative-ai-model'. Using a generic or incorrect resource type (e.g., 'generative-ai' or 'model') will cause the policy to not apply, preventing the user from seeing models even if they have the appropriate permissions.

Exam trap

Cisco often tests the exact resource type naming convention for OCI services, where candidates mistakenly use a generic or intuitive name (like 'model' or 'generative-ai') instead of the precise API resource type 'generative-ai-model'.

How to eliminate wrong answers

Option A is wrong because 'use' is not a valid verb in OCI IAM policies; the correct verb for viewing resources is 'read', and the issue is not about the verb but the resource type. Option B is wrong because 'inspect' permission is not required to see models; 'read' permission on the resource type is sufficient, and the problem is the resource type mismatch. Option C is wrong because the policy syntax does not require quotes around resource types; the syntax is correct without quotes, and the issue is the resource type name itself.

Full explanation →

746

Multi-Selectmedium

Which THREE are valid considerations when designing a RAG pipeline that uses OCI Generative AI and OCI OpenSearch? (Choose three.)

Select 3 answers

A.OCI OpenSearch only supports Euclidean distance for vector similarity.

B.Each document must be converted to a single vector for efficient retrieval.

C.The quality of the text extraction from OCI Document Understanding directly impacts retrieval accuracy.

D.The generation model's context window size limits the number of chunks that can be included in the prompt.

E.The chunk size and overlap must be tuned based on the document type and query patterns.

AnswersC, D, E

Poor extraction leads to noisy embeddings and irrelevant results.

Why this answer

Option C is correct because OCI Document Understanding performs text extraction from documents (e.g., PDFs, images). If the extraction is poor (e.g., missing text, OCR errors), the resulting chunks will be inaccurate, directly degrading the quality of vector embeddings and thus retrieval accuracy in the RAG pipeline.

Exam trap

Oracle often tests the misconception that vector databases only support one similarity metric (like Euclidean) or that documents must be stored as single vectors, when in practice they support multiple metrics and chunking is essential for effective retrieval.

Full explanation →

747

MCQhard

A RAG system returns irrelevant chunks even though the embedding model and vector index are correctly configured. After reviewing, the chunks are too large and contain extraneous information. Which combination of adjustments should be made to improve relevance?

A.Increase chunk overlap only.

B.Decrease chunk size and increase chunk overlap.

C.Use semantic chunking and adjust topK.

D.Reduce chunk size, increase overlap, and adjust topK.

AnswerD

All three adjustments can help refine the retrieved context.

Why this answer

Option D is correct because reducing chunk size removes extraneous information, increasing overlap ensures context continuity across smaller chunks, and adjusting topK limits the number of retrieved chunks to the most relevant ones. This combination directly addresses the problem of large chunks containing irrelevant data while maintaining retrieval precision.

Exam trap

Cisco often tests the misconception that only one parameter (like chunk size or topK) needs adjustment, when in reality a combination of chunk size, overlap, and topK tuning is required to address both chunk granularity and retrieval count.

How to eliminate wrong answers

Option A is wrong because increasing chunk overlap alone does not reduce chunk size or remove extraneous information, so irrelevant content persists. Option B is wrong because while decreasing chunk size and increasing overlap helps, it fails to adjust topK, which may still return too many chunks and dilute relevance. Option C is wrong because semantic chunking improves chunk boundaries but does not guarantee smaller chunks or control the number of retrieved chunks; adjusting topK alone without reducing chunk size still allows large chunks with extraneous data.

Full explanation →

748

MCQhard

A financial company deploys a generative AI model for document analysis. They need to ensure that the model does not expose sensitive information in its responses. Which OCI service should they use to implement content filtering?

A.OCI Data Safe

B.OCI Vault

C.OCI WAF

D.OCI AI Content Moderation

AnswerD

This service can filter sensitive content in model inputs and outputs.

Why this answer

OCI AI Content Moderation is the correct service because it provides pre-trained models and APIs specifically designed to detect and filter sensitive content such as personally identifiable information (PII), profanity, and other unsafe text in generative AI outputs. This allows the financial company to enforce content safety policies on document analysis responses, preventing exposure of sensitive information.

Exam trap

The trap here is that candidates often confuse security services like Data Safe or Vault with content moderation, assuming any 'security' service can filter AI outputs, but OCI AI Content Moderation is the only service purpose-built for analyzing and filtering the semantic content of text generated by AI models.

How to eliminate wrong answers

Option A is wrong because OCI Data Safe is a database security service focused on data masking, auditing, and user risk assessment for Oracle databases, not for filtering content generated by AI models. Option B is wrong because OCI Vault is a key management service for storing and managing encryption keys and secrets, not for content moderation or filtering of AI responses. Option C is wrong because OCI WAF (Web Application Firewall) protects web applications from common attacks like SQL injection and cross-site scripting at the HTTP/HTTPS layer, but it does not inspect or filter the semantic content of generative AI outputs.

Full explanation →

749

MCQmedium

An application using OCI Generative AI produces inconsistent responses to the same user query. The developer suspects the model's output variability is too high. Which parameter adjustment would most directly reduce output randomness?

A.Increase the max tokens parameter.

B.Increase the top_p parameter.

C.Change the model to a smaller variant.

D.Decrease the temperature parameter.

AnswerD

Lower temperature reduces randomness, making responses more consistent.

Why this answer

Temperature directly controls the randomness of token sampling in the model's output distribution. Lowering temperature (e.g., from 0.7 to 0.2) makes the model more deterministic by concentrating probability mass on the most likely next tokens, thus reducing output variability for the same query.

Exam trap

The trap here is that candidates often confuse top_p and temperature, assuming both control randomness similarly, but top_p controls the diversity of the candidate pool while temperature directly sharpens or flattens the probability distribution.

How to eliminate wrong answers

Option A is wrong because increasing max tokens only extends the length limit of the response, not the randomness of token selection; it can even introduce more variability by allowing longer, less constrained sequences. Option B is wrong because increasing top_p (nucleus sampling) expands the cumulative probability threshold for token selection, which actually increases randomness by allowing more low-probability tokens to be considered. Option C is wrong because changing to a smaller variant may reduce model capacity and coherence, but it does not directly control the sampling randomness; variability can persist or even increase due to less confident probability distributions.

Full explanation →

750

MCQmedium

A data scientist wants to compare the semantic similarity between two sentences generated by an LLM. Which evaluation metric is most suitable for this purpose?

A.ROUGE-L

B.BLEU

C.BERTScore

D.Perplexity

AnswerC

BERTScore uses contextual embeddings to evaluate semantic similarity.

Why this answer

BERTScore computes cosine similarity between contextual embeddings, capturing semantic meaning better than surface-level n-gram metrics.

Full explanation →

Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 676–750