Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 1Z0-1127 Questions 976–991 | Page 14/14

976

MCQmedium

A user repeatedly gets the same phrase output by the model. Which parameter adjustment is MOST likely to reduce such repetitive patterns?

A.Decrease max tokens

B.Increase temperature

C.Increase frequency penalty

D.Increase top-p

AnswerC

Frequency penalty penalizes tokens that have been used, reducing repetition.

Why this answer

Frequency penalty reduces the likelihood of repeating tokens that have already appeared, directly combating repetition.

Full explanation →

977

MCQeasy

Which component of the transformer architecture allows the model to weigh the importance of different words in a sentence when processing input?

A.Layer normalization

B.Positional encoding

C.Self-attention mechanism

D.Feed-forward neural network

AnswerC

Self-attention computes pairwise relevance scores and produces context-aware representations for each token.

Why this answer

The self-attention mechanism is the core component of the transformer architecture that enables the model to dynamically assign weights to each word in a sentence relative to every other word. This allows the model to capture contextual relationships and dependencies, such as determining which words are most relevant to the current word being processed, regardless of their distance in the sequence.

Exam trap

Cisco often tests the distinction between components that provide positional information (positional encoding) versus those that compute relational importance (self-attention), leading candidates to confuse the role of positional encoding with the weighting of word significance.

How to eliminate wrong answers

Option A is wrong because layer normalization stabilizes training by normalizing activations across features, not by weighing word importance. Option B is wrong because positional encoding adds information about the order of words in a sequence, but does not perform any weighting of importance. Option D is wrong because the feed-forward neural network applies non-linear transformations to each position independently after attention has been computed, and does not weigh word importance.

Full explanation →

978

MCQeasy

What is the primary purpose of an embedding model in a RAG pipeline?

A.To convert text into numerical vectors.

B.To generate human-like responses.

C.To rank search results.

D.To summarize long documents.

AnswerA

Embedding models encode text semantically into vectors.

Why this answer

The primary purpose of an embedding model in a RAG pipeline is to convert text into numerical vectors (embeddings) that capture semantic meaning. These vectors enable the retrieval component to efficiently find relevant documents by measuring similarity (e.g., cosine similarity) between the query and stored document embeddings. Without this conversion, the system cannot perform semantic search over unstructured text.

Exam trap

Cisco often tests the distinction between the embedding model's role (conversion to vectors) and the LLM's role (generation), so candidates may mistakenly attribute response generation or summarization to the embedding model.

How to eliminate wrong answers

Option B is wrong because generating human-like responses is the role of the large language model (LLM) in the generation step, not the embedding model. Option C is wrong because ranking search results is typically performed by a reranker or the retrieval algorithm (e.g., using vector similarity scores), not by the embedding model itself. Option D is wrong because summarizing long documents is a task for the LLM or a dedicated summarization model, not the embedding model, which only produces vector representations.

Full explanation →

979

MCQhard

An AI team is fine-tuning a large language model using OCI Data Science and plans to deploy the fine-tuned model using the Generative AI service's custom model deployment. What is the required format for the model artifacts?

A.A Git repository URL

B.A single .pth file

C.A Docker image with the model and inference code

D.A .zip archive containing model weights and configuration files

AnswerD

The custom model deployment requires a zip archive with all necessary files.

Why this answer

The OCI Generative AI service requires custom model artifacts to be packaged as a .zip archive containing the model weights, configuration files (e.g., config.json, tokenizer files), and any necessary inference code. This format ensures the service can extract and load the model correctly into its managed inference infrastructure, aligning with the standard Hugging Face model repository structure.

Exam trap

The trap here is that candidates may confuse OCI Generative AI's custom model deployment with OCI Data Science model deployment, which does support Docker images, leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because a Git repository URL is not a supported artifact format for OCI Generative AI custom model deployment; the service expects a static artifact file, not a live repository reference. Option B is wrong because a single .pth file contains only PyTorch model weights without the required configuration files (e.g., config.json, tokenizer.json) and inference code, making it incomplete for deployment. Option C is wrong because OCI Generative AI custom model deployment does not accept Docker images; it uses a serverless, managed inference environment that expects a .zip archive of model artifacts, not a containerized application.

Full explanation →

980

MCQhard

A team fine-tuned a Cohere Command R model in OCI GenAI and validated it. They now need to deploy it for production inference with a dedicated endpoint. What is the correct sequence of steps?

A.Create an endpoint, deploy the model, then provision a Dedicated AI Cluster

B.Deploy the model to a shared endpoint, then provision a Dedicated AI Cluster for scaling

C.Create an endpoint, then provision a Dedicated AI Cluster, then deploy the model

D.Provision a Dedicated AI Cluster, deploy the model to the cluster, then create an endpoint

AnswerD

The correct order: allocate cluster, deploy model, then expose via endpoint.

Why this answer

Option D is correct because in OCI Generative AI, the correct sequence for deploying a fine-tuned model to a dedicated endpoint is: first provision a Dedicated AI Cluster (which provides the isolated compute infrastructure), then deploy the model to that cluster, and finally create an endpoint that exposes the deployed model for inference. This ensures the model is hosted on dedicated resources before the endpoint is created.

Exam trap

The trap here is that candidates often confuse the order of provisioning infrastructure versus deploying the model, mistakenly thinking the endpoint can be created first and then attached to a cluster later, but OCI GenAI requires the cluster to exist and the model to be deployed before the endpoint can be created.

How to eliminate wrong answers

Option A is wrong because it attempts to create an endpoint before the Dedicated AI Cluster is provisioned, which would fail as the endpoint requires an existing cluster to attach to. Option B is wrong because it incorrectly suggests deploying to a shared endpoint first, which is not the intended path for dedicated production inference; dedicated endpoints require a Dedicated AI Cluster, not a shared one. Option C is wrong because it creates an endpoint before the Dedicated AI Cluster is provisioned and before the model is deployed, which is invalid since the endpoint must reference an already deployed model on an existing cluster.

Full explanation →

981

MCQhard

A developer implements a RAG chatbot using OCI Generative AI with streaming enabled. The chatbot fails to remember earlier conversation turns during a session. What is the most likely cause?

A.The max_tokens parameter is set too low.

B.The streaming endpoint does not support conversation history.

C.The application does not include previous messages in the request.

D.The temperature parameter is too high.

AnswerC

Session memory requires the client to send the conversation history in the messages list.

Why this answer

Option C is correct because a RAG chatbot with streaming enabled still requires the application to manage conversation state by including previous messages in each request. The OCI Generative AI streaming endpoint processes each request independently; without explicitly passing the conversation history, the model has no context of prior turns, causing it to fail to remember earlier interactions.

Exam trap

Cisco often tests the misconception that streaming endpoints inherently preserve conversation history, when in fact they are stateless and require explicit inclusion of prior messages in each request.

How to eliminate wrong answers

Option A is wrong because max_tokens controls the maximum length of the generated response, not the retention of conversation history; a low max_tokens might truncate output but does not cause loss of memory across turns. Option B is wrong because the OCI Generative AI streaming endpoint does support conversation history when the application includes previous messages in the request; streaming itself does not disable context. Option D is wrong because temperature affects the randomness of token selection, not the model's ability to recall prior context; a high temperature might produce more varied responses but does not erase conversation memory.

Full explanation →

982

MCQeasy

A retail company uses OCI Generative AI Service to build a RAG chatbot for product recommendations. The chatbot should consider both the user's query and the retrieved product descriptions. Which component of the RAG pipeline is responsible for combining these inputs before sending to the LLM?

A.Reranker

B.Document retriever

C.Embedding model

D.Prompt template

AnswerD

Merges user query and context into a single prompt.

Why this answer

The prompt template is the component in a RAG pipeline that structures the final input to the LLM by combining the user's query with the retrieved product descriptions. It defines the format and instructions (e.g., 'Based on these product descriptions, recommend...') that the LLM uses to generate a coherent response. Without a prompt template, the raw query and documents would be sent without context, leading to poor or irrelevant outputs.

Exam trap

Oracle often tests the misconception that the embedding model or retriever handles input combination, when in fact those components only deal with vector representation and retrieval, not prompt assembly.

How to eliminate wrong answers

Option A is wrong because a reranker reorders retrieved documents based on relevance scores after initial retrieval, but it does not combine inputs with the user query for the LLM. Option B is wrong because the document retriever fetches relevant documents from the vector store using similarity search, but it does not merge them with the query into a single prompt. Option C is wrong because the embedding model converts text into vector representations for search, but it plays no role in assembling the final input to the LLM.

Full explanation →

983

MCQeasy

Which LangChain document loader would be most appropriate to load content from a public website for inclusion in a knowledge base?

A.PDFLoader

B.CSVLoader

C.TextLoader

D.WebBaseLoader

AnswerD

WebBaseLoader fetches content from a given URL and loads it as a Document.

Why this answer

WebBaseLoader is specifically designed to load documents from web URLs, fetching the HTML content and converting it to LangChain Document objects. PDFLoader, CSVLoader, and TextLoader are for local files of specific formats.

Full explanation →

984

MCQmedium

An organization wants to use OCI Generative AI for summarizing long legal documents. Which OCI Generative AI service component is specifically designed for this task?

A.Chat API

B.Embedding API

C.Generate API

D.Summarisation

AnswerD

Summarisation is a dedicated component for summarizing documents.

Why this answer

Option D is correct because the Summarisation API in OCI Generative AI is a dedicated endpoint optimized for condensing long texts into concise summaries. It uses specialized model configurations and prompt engineering to handle the context window and extraction requirements of legal documents, unlike general-purpose generation endpoints.

Exam trap

Cisco often tests the distinction between a general-purpose generation API (Generate API) and a task-specific API (Summarisation), leading candidates to incorrectly choose the Generate API because they assume any text generation endpoint can handle summarization equally well.

How to eliminate wrong answers

Option A is wrong because the Chat API is designed for multi-turn conversational interactions, not for single-document summarization tasks, and lacks the specific prompt templates and length controls needed for legal document summarization. Option B is wrong because the Embedding API converts text into vector representations for semantic search or clustering, not for generating summaries. Option C is wrong because the Generate API is a general-purpose text generation endpoint that can produce summaries but is not specifically optimized or designed for summarization tasks, lacking the built-in summarization-specific parameters and model tuning that the Summarisation API provides.

Full explanation →

985

MCQeasy

A company is building a chatbot using OCI Generative AI service. They want to ensure that the model responses are grounded in their internal knowledge base. Which approach should they use?

A.Prompt engineering with few-shot examples

B.Fine-tuning the model on the internal knowledge base

C.Model distillation to compress the knowledge base

D.Retrieval-Augmented Generation (RAG)

AnswerD

RAG retrieves relevant documents from a knowledge base and uses them to generate grounded responses.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct approach because it retrieves relevant documents from the company's internal knowledge base at inference time and provides them as context to the LLM, ensuring the model's responses are grounded in verifiable, up-to-date information without modifying the model itself. This directly addresses the requirement to ground responses in an internal knowledge base while avoiding the cost and complexity of retraining.

Exam trap

The trap here is that candidates often confuse fine-tuning (Option B) as the only way to incorporate proprietary data, overlooking that RAG provides a more flexible, cost-effective, and updatable method for grounding responses in a dynamic knowledge base without altering model weights.

How to eliminate wrong answers

Option A is wrong because prompt engineering with few-shot examples only provides a handful of static examples in the prompt, which cannot dynamically retrieve or incorporate the full breadth of an internal knowledge base, leading to hallucinations on unseen or specific internal data. Option B is wrong because fine-tuning the model on the internal knowledge base would embed that data into the model's weights, making it expensive to update, prone to catastrophic forgetting, and unable to guarantee factual grounding for new or changing documents without retraining. Option C is wrong because model distillation compresses a larger model into a smaller one for efficiency, but it does not introduce external knowledge retrieval; it merely replicates the behavior of the teacher model, which still lacks access to the internal knowledge base.

Full explanation →

986

Multi-Selectmedium

In a LangChain RAG pipeline using OCI Generative AI, which THREE components are essential for ingesting documents into a vector store?

Select 3 answers

A.Retriever (e.g., vectorstore.as_retriever())

B.Text splitter (e.g., RecursiveCharacterTextSplitter)

C.LLM (e.g., ChatOCIGenAI)

D.Document loader (e.g., PDFLoader)

E.Embedding model (e.g., OCIGenAIEmbeddings)

AnswersB, D, E

Text splitter divides documents into manageable chunks for embedding and indexing.

Why this answer

Option B is correct because text splitters like RecursiveCharacterTextSplitter are essential for breaking large documents into smaller, manageable chunks that fit within the context window limits of embedding models and LLMs. Without chunking, the vector store cannot effectively index and retrieve relevant passages, making it a core component of the ingestion pipeline.

Exam trap

Cisco often tests the distinction between the ingestion pipeline (loader, splitter, embeddings) and the retrieval/generation pipeline (retriever, LLM), leading candidates to incorrectly include the retriever or LLM as essential for ingestion.

Full explanation →

987

MCQhard

A machine learning team is fine-tuning a 7B parameter Llama 2 model on a custom dataset of 10,000 documents using OCI Data Science and GPU instances. They encounter out-of-memory (OOM) errors during the fine-tuning process. They are using a batch size of 8 and a sequence length of 2048. They cannot increase the GPU memory. Which change should they prioritize to resolve the OOM?

A.Enable gradient accumulation with steps of 4 or more.

B.Use mixed precision training (FP16).

C.Reduce the model size by using a 3B parameter version.

D.Decrease the number of training epochs.

AnswerA

Correct: Gradient accumulation reduces memory per step without changing effective batch size.

Why this answer

Option B is correct because enabling gradient accumulation allows the effective batch size to be maintained while reducing per-step memory usage. Option A changes the model entirely, Option C may not fix the memory issue, and Option D helps but may still OOM if the batch size is too high; gradient accumulation is more directly targeted.

Full explanation →

988

MCQeasy

A developer is using OCI GenAI to generate structured data. They often get responses that include additional commentary or markdown. Which prompt engineering technique should they use to ensure only JSON output?

A.Set top_p to 0.1.

B.Use a model with a larger context window.

C.Add 'Return only JSON' at the end of the prompt.

D.Increase the temperature to 1.5.

AnswerC

Correct: Direct instruction enforces format.

Why this answer

Option C is correct because explicitly instructing the model to 'Return only JSON' directly constrains the output format, reducing the likelihood of extraneous commentary or markdown. This technique leverages prompt engineering to guide the model's behavior without altering inference parameters like temperature or top_p, which control randomness rather than output structure.

Exam trap

Oracle often tests the misconception that adjusting sampling parameters (like temperature or top_p) can enforce output format, when in fact these parameters control randomness and diversity, not structural constraints—leading candidates to overlook the direct prompt engineering solution.

How to eliminate wrong answers

Option A is wrong because setting top_p to 0.1 reduces the nucleus sampling threshold, making the model more deterministic but not preventing it from generating additional text or markdown; it controls token selection diversity, not output format. Option B is wrong because a larger context window allows the model to process more input tokens but does not enforce a specific output structure; it addresses memory limitations, not format constraints. Option D is wrong because increasing temperature to 1.5 raises randomness, which can actually increase the likelihood of unpredictable or verbose responses, including unwanted commentary, rather than ensuring strict JSON output.

Full explanation →

989

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month

B.Use a larger foundation model with a longer context window and paste all documents into each prompt

C.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

D.Fine-tune a base LLM on the policy documents monthly

AnswerC

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Full explanation →

990

MCQhard

An OCI Generative AI user notices that a model generates repetitive phrases when summarizing technical articles. Which parameter adjustment is MOST likely to reduce this repetition?

A.Decrease max tokens

B.Increase the frequency penalty

C.Set top-p to 0.95

D.Increase temperature to 0.9

AnswerB

Frequency penalty penalizes tokens that have already appeared, discouraging the model from repeating phrases.

Why this answer

Frequency penalty reduces the likelihood of repeating tokens that have already appeared, directly targeting repetition. Presence penalty also helps but frequency penalty is stronger for repeated phrases.

Full explanation →

991

MCQmedium

A team has deployed a generative AI model and needs to monitor inference performance and set up alerts for increased error rates. Which OCI service should they integrate with?

A.OCI Monitoring

B.OCI Cloud Guard

C.OCI Events

D.OCI Logging

AnswerA

Correct: Monitoring provides metrics and alerting for inference endpoints.

Why this answer

OCI Monitoring is the correct service because it provides metrics and alarms for tracking inference performance (e.g., latency, throughput) and error rates from deployed generative AI models. It allows you to set up threshold-based alerts on custom or predefined metrics, enabling proactive incident response. This directly addresses the requirement to monitor inference performance and alert on increased error rates.

Exam trap

Oracle often tests the distinction between monitoring (metrics/alarms) and logging (raw events) — candidates mistakenly choose OCI Logging because they think 'error rates' require log analysis, but OCI Monitoring is designed for metric-based alerting with thresholds.

How to eliminate wrong answers

Option B is wrong because OCI Cloud Guard is a security posture management service that detects misconfigurations and security threats, not a real-time performance monitoring or alerting tool for inference metrics. Option C is wrong because OCI Events is a notification service that reacts to state changes in OCI resources (e.g., object creation, instance termination) but does not natively track or alert on time-series performance metrics like error rates. Option D is wrong because OCI Logging collects and stores log data for audit and troubleshooting, but it lacks built-in metric-based alerting capabilities for monitoring inference performance trends or setting threshold alarms.

Full explanation →

Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 976–991