Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 826900

991 questions total · 14pages · All types, answers revealed

Page 11

Page 12 of 14

Page 13
826
MCQmedium

A developer is using LangChain's ConversationBufferMemory to store chat history. They notice that after many turns, the prompt becomes too large and exceeds the model's context window. What is the BEST memory type to use for this scenario?

A.ConversationEntityMemory
B.ConversationBufferMemory with a large max_token_limit
C.ConversationSummaryMemory
D.ConversationStringBufferMemory
AnswerC

Summary memory compresses history into summaries, keeping the prompt small.

Why this answer

ConversationSummaryMemory periodically summarizes the conversation, keeping the prompt size manageable. It retains the gist of the conversation while reducing token usage.

827
Multi-Selectmedium

A data scientist is using the OCI Generative AI Embeddings API to generate vectors for a classification task. Which TWO input types are appropriate for this use case?

Select 2 answers
A.search_query
B.clustering
C.text
D.classification
E.search_document
AnswersB, D

Optimizes embeddings for clustering tasks, which can be used for classification.

Why this answer

The Embeddings API supports input types like 'classification' and 'clustering' which optimize embeddings for those tasks. 'search_document' and 'search_query' are for search. 'text' is not a valid input type.

828
MCQhard

A developer notices that an LLM-based question-answering system sometimes provides answers that are correct but from an outdated version of the knowledge base. The system uses RAG with a vector database updated daily. What is the MOST likely root cause?

A.The retrieval top-k parameter is set too high
B.The chunking strategy splits documents into too-small pieces
C.The embedding model was not re-run on the updated documents, so the index contains old embeddings
D.The LLM's training data has a knowledge cutoff date
AnswerC

If the vector database is updated but embeddings are not recomputed, the index still matches old chunks, causing retrieval of outdated information.

Why this answer

Option C is correct because the core issue is that the vector database index still contains old embeddings. Even though the knowledge base documents are updated daily, if the embedding model is not re-run on those updated documents, the vector representations in the index remain stale. When the RAG system retrieves, it fetches these outdated embeddings, leading to correct but outdated answers.

This is a classic index synchronization problem in RAG pipelines.

Exam trap

Cisco often tests the distinction between retrieval-side issues (index staleness) and model-side issues (knowledge cutoff), so candidates mistakenly pick D because they confuse the LLM's training cutoff with the freshness of the vector database index.

How to eliminate wrong answers

Option A is wrong because a high top-k parameter would retrieve more documents, potentially including both old and new versions, but it does not cause the system to systematically favor outdated content; it would increase recall, not introduce staleness. Option B is wrong because chunking into too-small pieces might reduce context or cause fragmentation, but it does not inherently cause the system to retrieve outdated information; the chunks themselves would still reflect the current document content if embeddings are updated. Option D is wrong because the LLM's training data cutoff date affects the model's parametric knowledge, not the retrieval from the vector database; the RAG system is designed to overcome this by retrieving fresh documents, so the cutoff date is irrelevant to the index staleness problem.

829
Multi-Selectmedium

A company is building a chatbot that must maintain a professional tone and avoid discussing off-topic subjects. Which TWO prompt engineering approaches should they combine to enforce these requirements?

Select 2 answers
A.Use a few-shot prompt with examples of off-topic conversations to teach the model what to avoid
B.Use a system prompt that defines the chatbot's role (e.g., 'You are a professional customer support agent') and includes constraints (e.g., 'Do not discuss topics outside of product support.')
C.Include a template pattern in the system prompt that specifies the response format (e.g., 'Greeting, Answer, Closing')
D.Set frequency penalty to 2.0 to reduce repetition of any words
E.Set temperature to 1.0 to ensure creative responses
AnswersB, C

This directly sets the tone and limits the scope.

Why this answer

A system prompt with role and constraints sets the overall behavior, and a template pattern for responses provides a consistent structure. The other options are less suitable.

830
MCQmedium

A healthcare startup is building an AI assistant to help doctors draft clinical notes from patient-physician conversations. They have a large language model that is fine-tuned on medical data. During testing, they notice the model occasionally generates plausible-sounding but incorrect medical recommendations. The startup wants to deploy the assistant to assist doctors, not replace them. They have the following options: (A) Deploy the model as-is and rely on doctors to catch errors, (B) Add a disclaimer that the model may make mistakes, (C) Implement a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base before presenting to doctors, (D) Reduce the model's temperature to 0 to ensure deterministic outputs. Which option best balances safety and utility?

A.Implement a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base.
B.Add a disclaimer that the model may make mistakes.
C.Deploy the model as-is and rely on doctors to catch errors.
D.Reduce the model's temperature to 0 to ensure deterministic outputs.
AnswerA

Fact-checking reduces hallucinations and ensures accuracy.

Why this answer

Option A is correct because implementing a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base directly mitigates the risk of hallucinated medical recommendations while preserving the assistant's utility. This approach leverages retrieval-augmented generation (RAG) principles to ground the model's outputs in verified facts, ensuring safety without sacrificing the flexibility needed for drafting clinical notes.

Exam trap

Cisco often tests the misconception that deterministic outputs (temperature=0) guarantee factual accuracy, when in reality they only eliminate randomness but not the model's tendency to hallucinate from its training distribution.

How to eliminate wrong answers

Option B is wrong because a disclaimer does not prevent the model from generating incorrect medical advice; it merely shifts liability and does not address the underlying safety risk. Option C is wrong because deploying the model as-is and relying on doctors to catch errors places an unrealistic cognitive burden on clinicians, increasing the chance of oversight and patient harm. Option D is wrong because reducing temperature to 0 makes outputs deterministic but does not guarantee correctness; the model may still produce plausible-sounding but false recommendations from its training data, and deterministic outputs can actually amplify systematic errors.

831
MCQmedium

An administrator runs the above CLI command to check the status of a dedicated AI cluster. The cluster is ACTIVE with capacity 10. However, a user reports that inference requests to this cluster are failing with a '429 Too Many Requests' error. What is the most likely cause?

A.The cluster is hitting the maximum inference requests per minute limit
B.The cluster does not have enough nodes to handle the load
C.The user is not in the same compartment as the cluster
D.The cluster is not in ACTIVE state
AnswerA

429 indicates rate limit; the cluster has a requests-per-minute limit separate from node count.

Why this answer

The '429 Too Many Requests' error is an HTTP status code indicating rate limiting has been exceeded. In OCI Generative AI, dedicated AI clusters have a configurable 'maximum inference requests per minute' limit. Even if the cluster is ACTIVE and has capacity (e.g., 10 nodes), hitting this per-minute request cap will cause the API gateway to reject further requests with a 429 error.

The administrator must increase the rate limit or implement client-side throttling to resolve this.

Exam trap

The trap here is that candidates confuse capacity (number of nodes) with rate limits, assuming a cluster with available compute resources cannot produce a 429 error, when in fact the 429 is tied to a separate API-level throttling mechanism.

How to eliminate wrong answers

Option B is wrong because a cluster with insufficient nodes would typically result in higher latency, timeouts, or '503 Service Unavailable' errors, not a '429 Too Many Requests' which is specifically a rate-limiting response. Option C is wrong because compartment mismatches cause '404 Not Found' or '403 Forbidden' errors, not a 429 status code. Option D is wrong because the cluster is explicitly stated as ACTIVE; an inactive cluster would return a '503 Service Unavailable' or '400 Bad Request' error, not a 429.

832
MCQmedium

A developer is building a code generation assistant and needs to ensure the LLM follows a specific output format (e.g., JSON). Which approach is MOST effective for achieving format adherence without retraining?

A.Lower the temperature to 0 to reduce output variability
B.Provide a few-shot example of the desired JSON format in the prompt
C.Fine-tune the model on a dataset of JSON code examples
D.Increase the context window to include more code context
AnswerB

In-context learning (few-shot) guides the model to mimic the provided format without retraining.

Why this answer

Option B is correct because few-shot prompting—providing explicit examples of the desired JSON format in the prompt—directly guides the LLM's output structure without requiring retraining. This technique leverages in-context learning, where the model infers the required schema from the examples, making it the most effective and efficient method for format adherence.

Exam trap

Cisco often tests the misconception that lowering temperature or increasing context window can enforce output format, when in fact these parameters only affect randomness or input length, not structural adherence.

How to eliminate wrong answers

Option A is wrong because lowering temperature to 0 reduces randomness but does not enforce a specific output format; the model may still produce valid JSON with varying structures or deviate entirely. Option C is wrong because fine-tuning requires retraining the model on a dataset, which is costly, time-consuming, and contradicts the constraint of 'without retraining.' Option D is wrong because increasing the context window provides more input context but does not constrain the output format; the model may still generate malformed or non-JSON responses.

833
Multi-Selecthard

Which THREE techniques are commonly used to improve the quality of text generation?

Select 3 answers
A.Temperature scaling
B.Top-k sampling
C.Greedy decoding
D.Random sampling
E.Beam search
AnswersA, B, E

Temperature scaling smooths token probabilities and can improve the quality-diversity trade-off.

Why this answer

Temperature scaling is correct because it controls the randomness of token probability distributions by dividing logits before softmax; lower temperatures (e.g., 0.1) make the model more deterministic, while higher temperatures (e.g., 1.5) increase diversity. This directly influences the quality of generated text by balancing coherence and creativity.

Exam trap

Oracle often tests the misconception that greedy decoding or random sampling are valid quality-improvement techniques, when in fact they either cause repetition (greedy) or incoherence (random) without the controlled stochasticity of temperature, top-k, or the global optimization of beam search.

834
MCQhard

An administrator wants to grant a group of data scientists permission to use OCI Generative AI resources in a specific compartment, but prevent them from creating Dedicated AI Clusters. Which IAM policy statement achieves this?

A.Allow group data-scientists to read generative-ai-family in compartment genai-dev
B.Allow group data-scientists to manage generative-ai-family in compartment genai-dev
C.Allow group data-scientists to use generative-ai-family in compartment genai-dev where request.operation != 'CreateDedicatedAiCluster'
D.Allow group data-scientists to use generative-ai-models in compartment genai-dev
AnswerC

This grants use of most GenAI resources but excludes creating dedicated clusters via a condition.

Why this answer

Option C is correct because it uses the 'use' verb to grant the data scientists access to OCI Generative AI resources while adding a condition with 'request.operation != 'CreateDedicatedAiCluster'' to explicitly deny the ability to create Dedicated AI Clusters. In OCI IAM, the 'use' verb includes read and update capabilities but not create or delete, and the condition further restricts the specific create operation, aligning with the requirement to prevent cluster creation.

Exam trap

The trap here is that candidates often confuse the 'use' verb with 'manage' or 'read', or overlook the necessity of a condition to block a specific operation, assuming a broader verb like 'manage' can be restricted by a condition when it actually grants all permissions including create.

How to eliminate wrong answers

Option A is wrong because the 'read' verb only allows viewing resources, not using them (e.g., invoking models), so data scientists cannot perform inference or other actions. Option B is wrong because the 'manage' verb grants full control, including creating Dedicated AI Clusters, which violates the requirement to prevent that action. Option D is wrong because 'generative-ai-models' is a subset of 'generative-ai-family' and does not cover all necessary resources like endpoints or deployments, and it lacks the condition to block Dedicated AI Cluster creation.

835
MCQhard

A company uses OCI Generative AI to generate legal document summaries. They have a custom model deployed on a dedicated AI cluster. They want to ensure that the model is not used by unauthorized users. They also need to log all inference requests for auditing. Which combination of OCI services should they use?

A.OCI Vault for encryption and OCI Audit for logging.
B.OCI Identity and Access Management (IAM) policies and OCI Logging.
C.OCI Data Safe and OCI Monitoring.
D.OCI API Gateway with authentication and OCI Audit.
AnswerB

IAM controls access, Logging records inference requests for audit.

Why this answer

Option B is correct because OCI IAM policies are the primary mechanism for controlling access to OCI resources, including custom models on dedicated AI clusters, by defining which users or groups can invoke the model. OCI Logging captures detailed logs of all inference requests, including metadata such as timestamps, source IPs, and request payloads, which satisfies the auditing requirement. Together, they provide both authorization enforcement and audit trail without additional services.

Exam trap

The trap here is that candidates often confuse OCI Audit (which logs only management-plane operations) with OCI Logging (which logs data-plane operations like inference requests), leading them to pick Option A or D, while also overlooking that IAM policies are the native access control mechanism for Generative AI models on dedicated clusters.

How to eliminate wrong answers

Option A is wrong because OCI Vault manages encryption keys and secrets, not access control or logging; it does not prevent unauthorized model usage. OCI Audit records only management-plane API calls (e.g., creating or deleting resources), not data-plane inference requests, so it cannot log individual inference calls. Option C is wrong because OCI Data Safe is a database security service for protecting sensitive data in databases, not for controlling access to or logging inference requests for a Generative AI model.

OCI Monitoring collects metrics and alarms, not detailed request logs for auditing. Option D is wrong because OCI API Gateway can provide authentication and request logging, but it is an unnecessary intermediary for a model deployed on a dedicated AI cluster; the question specifies the model is already deployed on a dedicated cluster, and IAM policies directly control access to the model endpoint without requiring an API Gateway. OCI Audit, as noted, does not log data-plane inference requests.

836
MCQeasy

What is the primary benefit of using a Dedicated AI Cluster for inference in OCI Generative AI?

A.Ability to use any LLM model for free
B.Higher throughput and lower latency due to dedicated compute resources
C.Automatic model fine-tuning on the cluster
D.No need to create endpoints
AnswerB

Dedicated clusters offer consistent, low-latency performance without competition for resources.

Why this answer

A Dedicated AI Cluster provides exclusive, low-latency inference with reserved capacity, unlike shared infrastructure where resources are contended.

837
MCQmedium

An organization wants to combine keyword search and vector search to improve retrieval accuracy in their RAG pipeline. Which OCI service provides built-in hybrid search capabilities?

A.OCI Search with AI
B.OCI OpenSearch
C.Autonomous Database with AI Vector Search
D.OCI Logging
AnswerB

OpenSearch integrates BM25 and vector search.

Why this answer

OCI OpenSearch is the correct answer because it natively supports hybrid search, which combines keyword-based (BM25) and vector-based (k-NN) queries in a single search request. This allows the RAG pipeline to retrieve documents that match both exact terms and semantic meaning, improving overall accuracy without requiring separate search systems.

Exam trap

Cisco often tests the misconception that any service with 'AI' or 'Vector' in its name supports hybrid search out of the box, but candidates must recognize that OCI OpenSearch is the only service with a built-in hybrid search pipeline that combines keyword and vector search natively.

How to eliminate wrong answers

Option A is wrong because OCI Search with AI is a managed search service that primarily focuses on AI-powered search over enterprise content, but it does not provide native hybrid search capabilities combining keyword and vector search in a single query. Option C is wrong because Autonomous Database with AI Vector Search supports vector similarity search and SQL-based keyword search, but it requires manual orchestration to combine them into a hybrid search pipeline, lacking built-in hybrid search. Option D is wrong because OCI Logging is a service for collecting and analyzing log data, not a search engine for RAG pipelines, and it has no vector search or hybrid search capabilities.

838
Multi-Selectmedium

Which THREE of the following are known limitations of large language models? (Select THREE)

Select 3 answers
A.Real-time awareness of current events
B.Knowledge cutoff (lack of information after a certain date)
C.Bias in training data leading to skewed outputs
D.Hallucination (generating factually incorrect information)
E.Unlimited context window
AnswersB, C, D

LLMs are trained on static datasets and do not know events after their cutoff date.

Why this answer

Option B is correct because large language models are trained on static datasets that have a fixed cutoff date, after which they have no knowledge of new events, publications, or data. This is an inherent architectural limitation: the model's parameters are frozen at the end of training, so it cannot learn or incorporate information beyond that point without retraining or fine-tuning.

Exam trap

Cisco often tests the misconception that LLMs can access real-time data or have unlimited memory, when in fact both are hard architectural constraints tied to training data cutoffs and transformer attention mechanisms.

839
MCQmedium

A security administrator needs to grant a data science team access to use OCI Generative AI resources (e.g., run inference, create fine-tuning jobs) but only within a specific compartment. What is the correct IAM policy statement?

A.Allow group DataScientists to manage generative-ai-family in compartment Production
B.Allow group DataScientists to manage llm-models in compartment Production
C.Allow group DataScientists to use all-resources in tenancy
D.Allow group DataScientists to read ai-services in compartment Production
AnswerA

This policy grants the group permission to use (read, use, manage) all GenAI resources in the specified compartment.

Why this answer

The 'allow group to use generative-ai-family in compartment' is the standard OCI IAM policy for granting access to all GenAI resources in a compartment. The other options have incorrect resource types or conditions.

840
MCQmedium

A data scientist is using OCI Generative AI to generate synthetic data for training. They observe that the model's outputs lack diversity and often repeat the same phrases. Which combination of parameter adjustments would BEST increase output diversity?

A.Set frequency penalty to 0.0 and presence penalty to 0.0
B.Increase temperature to 0.9 and increase top-p to 0.9
C.Decrease temperature to 0.3 and increase top-p to 0.9
D.Increase temperature to 0.9 and decrease top-p to 0.5
AnswerB

Both higher temperature and higher top-p increase randomness and token variety, boosting diversity.

Why this answer

Increasing temperature and top-p both increase randomness and diversity. Temperature controls the randomness of token selection, while top-p (nucleus sampling) allows a broader set of probable tokens.

841
Multi-Selectmedium

A company wants to use OCI Generative AI to build a multilingual customer support chatbot. They need to understand customer queries in multiple languages and generate responses in the same language. Which TWO actions should they take? (Choose two.)

Select 2 answers
A.Use the embed-english-v3.0 model for embedding queries
B.Fine-tune Meta Llama 3 on multilingual data
C.Select Cohere Command R as the base model for chat
D.Use the embed-multilingual-v3.0 model for embedding queries
E.Use the Summarisation API for each language
AnswersC, D

Command R supports multilingual conversations without additional fine-tuning.

Why this answer

Cohere Command R (or R+) supports multilingual input and output natively, so fine-tuning is unnecessary. Using the embed-multilingual-v3.0 model for embedding customer queries enables multilingual semantic search if RAG is used. Embed-english-v3.0 only supports English, and Llama 3 is primarily English-focused.

842
MCQhard

A developer is using the OCI Generative AI Generate API (not Chat API) to create a single-turn text completion. They need to include a system-level instruction that guides the model's behavior for that request. Which parameter should they use?

A.'temperature' parameter
B.'preamble_override' parameter
C.'system' parameter
D.'max_tokens' parameter
AnswerB

In the Generate API, 'preamble_override' allows you to set a preamble that acts as a system instruction for the completion.

Why this answer

The Generate API uses 'preamble_override' to set a system instruction for the completion. The 'system' parameter is for the Chat API. 'max_tokens' and 'temperature' are not system instructions.

843
Multi-Selectmedium

An organization is concerned about bias in their LLM-powered hiring assistant. Which TWO actions are MOST effective in mitigating bias?

Select 2 answers
A.Use a larger context window to include more examples
B.Increase the temperature parameter to introduce more randomness
C.Use only encoder-only models like BERT for classification
D.Implement human-in-the-loop evaluation with fairness-focused rubrics
E.Fine-tune the model on a carefully curated dataset that balances demographic representation
AnswersD, E

Human evaluation with explicit fairness criteria can catch biased responses.

Why this answer

Option D is correct because human-in-the-loop evaluation with fairness-focused rubrics directly addresses bias by incorporating human judgment to detect and correct biased outputs. This approach allows reviewers to systematically assess responses against predefined fairness criteria, catching subtle biases that automated methods might miss. It is a standard practice in responsible AI deployment for high-stakes applications like hiring.

Exam trap

Cisco often tests the misconception that technical parameters like temperature or context window size can solve bias, when in fact bias mitigation requires deliberate data curation and human oversight, not model hyperparameter tuning.

844
Multi-Selectmedium

Which TWO are benefits of using OCI Generative AI service's dedicated AI cluster?

Select 2 answers
A.Automatic scaling to handle large workloads.
B.Built-in content filtering for all outputs.
C.Ability to fine-tune models on custom data.
D.No need to provide any training data.
E.Lower latency compared to serverless.
AnswersC, E

Dedicated clusters support fine-tuning with custom datasets.

Why this answer

Option C is correct because dedicated AI clusters in OCI Generative AI service provide isolated compute resources that allow you to fine-tune foundation models on your own custom datasets. This is a key benefit over the serverless offering, which only supports inference and does not permit model customization. Fine-tuning enables domain-specific optimization, improving accuracy for specialized tasks.

Exam trap

Cisco often tests the misconception that dedicated clusters automatically scale like cloud-native services, but in OCI, dedicated clusters are static resources requiring manual scaling, while serverless endpoints handle auto-scaling.

845
MCQmedium

A developer is using OCI Generative AI Service to generate code snippets. They want to ensure the output is as deterministic as possible for testing. Which combination of parameters should they use?

A.Temperature = 0, Top-p = 1
B.Temperature = 0.5, Top-p = 0.5
C.Temperature = 0, Top-p = 0
D.Temperature = 1, Top-p = 1
AnswerA

Temperature=0 makes output deterministic; top-p=1 disables nucleus sampling.

Why this answer

Setting Temperature=0 makes the model deterministic by always selecting the highest-probability token, while Top-p=1 includes all tokens in the sampling pool, ensuring no additional randomness is introduced. This combination eliminates stochastic variation, making outputs repeatable for testing.

Exam trap

The trap here is that candidates mistakenly think Top-p=0 (like Temperature=0) would also enforce determinism, but Top-p=0 actually removes all tokens, leading to generation failure rather than deterministic output.

How to eliminate wrong answers

Option B is wrong because Temperature=0.5 introduces moderate randomness and Top-p=0.5 restricts the sampling pool, both of which reduce determinism. Option C is wrong because Top-p=0 would exclude all tokens, causing the model to fail to generate any output (or produce an error). Option D is wrong because Temperature=1 maximizes randomness and Top-p=1 includes all tokens, resulting in highly variable outputs.

846
MCQmedium

After fine-tuning a Cohere Command model on a dataset of customer emails, the model performs well on validation data but poorly on new, unseen emails. Which action is most likely to improve generalization?

A.Expand the training dataset with more diverse examples.
B.Increase the number of fine-tuning epochs.
C.Reduce the number of layers being fine-tuned.
D.Switch to a smaller model variant such as Cohere Light.
AnswerA

A larger, more varied dataset improves generalization.

Why this answer

Option A is correct because the model is overfitting to the training data, which is a common issue when the dataset lacks diversity. Expanding the training dataset with more diverse examples exposes the model to a wider range of patterns and variations, reducing overfitting and improving generalization to unseen customer emails. In the context of Cohere Command models, this aligns with best practices for fine-tuning on OCI Generative AI Service, where data quality and diversity are critical for robust performance.

Exam trap

Cisco often tests the misconception that overfitting is best addressed by reducing model complexity or training duration, rather than improving data quality and diversity, which is the fundamental solution for generalization in fine-tuned language models.

How to eliminate wrong answers

Option B is wrong because increasing the number of fine-tuning epochs would likely worsen overfitting, as the model would memorize the training data more closely, leading to even poorer performance on unseen data. Option C is wrong because reducing the number of layers being fine-tuned (e.g., using parameter-efficient fine-tuning like LoRA) does not address the root cause of overfitting; it may even reduce the model's capacity to learn, but the core issue is data diversity, not model complexity. Option D is wrong because switching to a smaller model variant such as Cohere Light would reduce model capacity, potentially underfitting the data, but it does not solve the overfitting problem caused by a non-diverse training set; the model would still fail to generalize to new examples.

847
MCQmedium

A developer wants to compare two sentences for semantic similarity using embeddings. Which distance or similarity metric is most commonly used for dense vector representations?

A.Cosine similarity
B.Jaccard similarity
C.Manhattan distance
D.Euclidean distance
AnswerA

Cosine similarity is the standard metric for comparing embedding vectors because it focuses on orientation, not magnitude.

Why this answer

Cosine similarity measures the cosine of the angle between two vectors, is commonly used for comparing embedding vectors, and ranges from -1 to 1, where 1 indicates identical direction.

848
MCQmedium

A developer receives a 403 error when calling the OCI GenAI API from a function. They have set up policies for the function's dynamic group. What is the most likely cause?

A.The request body format is incorrect.
B.The model is not available in the region.
C.The API key is invalid.
D.Missing IAM policy for GenAI service.
AnswerD

A 403 error indicates the function's dynamic group lacks permission to call the GenAI API.

Why this answer

A 403 error when calling the OCI GenAI API from a function indicates an authorization failure. Even if the function's dynamic group is correctly configured, the IAM policy must explicitly grant the dynamic group permission to invoke the GenAI service. Without a policy statement like 'Allow dynamic-group [name] to use generative-ai-family in compartment [name]', the API call is denied, resulting in a 403 Forbidden.

Exam trap

Cisco often tests the distinction between authentication (401) and authorization (403) errors, trapping candidates who confuse missing API keys or incorrect request formats with IAM policy misconfigurations.

How to eliminate wrong answers

Option A is wrong because a 403 error is an authorization error, not a client-side request format issue; an incorrect request body would produce a 400 Bad Request or 422 Unprocessable Entity. Option B is wrong because model unavailability in a region typically returns a 404 Not Found or a 400 error, not a 403. Option C is wrong because the function uses instance principal authentication via the dynamic group, not an API key; an invalid API key would cause a 401 Unauthorized, not a 403.

849
MCQmedium

A developer is using the OCI Generative AI Playground to test a Cohere Command R model. They want to reduce repetitiveness in the generated responses. Which parameter should they increase?

A.Max tokens
B.Top P
C.Frequency penalty
D.Temperature
AnswerC

A higher frequency penalty discourages the model from repeating the same tokens.

Why this answer

Frequency penalty penalizes tokens that have already appeared in the text, reducing repetition. Temperature increases randomness, top_p changes nucleus sampling, and max tokens controls length.

850
MCQmedium

An organization is concerned about the safety of generated content. Which OCI feature allows them to define custom policies to block inappropriate outputs?

A.OCI IAM policies
B.Content filtering and safety controls in Generative AI
C.OCI Audit logs
D.OCI Vault
AnswerB

The Generative AI service includes configurable safety filters that can block inappropriate content based on defined categories and thresholds.

Why this answer

Option B is correct because OCI Generative AI includes built-in content filtering and safety controls that allow organizations to define custom policies to block inappropriate or harmful outputs. These controls operate at the model inference layer, enabling fine-grained filtering based on categories such as toxicity, hate speech, or personally identifiable information (PII). This directly addresses the concern about generated content safety.

Exam trap

The trap here is that candidates often confuse IAM policies (access control) with content safety policies, or assume that logging (Audit) or encryption (Vault) can prevent inappropriate outputs, when in fact only the Generative AI service's built-in content filtering provides that capability.

How to eliminate wrong answers

Option A is wrong because OCI IAM policies govern access control and permissions for OCI resources, not the filtering or safety of generated content from AI models. Option C is wrong because OCI Audit logs capture API calls and operational events for compliance and monitoring, but they do not provide any mechanism to block or filter inappropriate outputs in real time. Option D is wrong because OCI Vault is a key management service for storing and managing secrets, encryption keys, and certificates; it has no role in content safety or output filtering for generative AI.

851
MCQeasy

In the context of LLMs, what is the primary function of tokenization?

A.To assign positional encodings to each word
B.To convert tokens into dense vector representations
C.To split text into manageable pieces (tokens) that the model can understand
D.To remove stop words and punctuation from the input
AnswerC

Tokenization breaks text into tokens, which are the atomic units processed by the model.

Why this answer

Tokenization is the first step in processing text for LLMs, where raw input is split into smaller units called tokens (words, subwords, or characters). This is essential because models like GPT or BERT operate on discrete tokens, not raw strings, and tokenization defines the model's vocabulary and input structure.

Exam trap

Cisco often tests the distinction between tokenization and embedding, so the trap here is confusing the splitting of text into tokens (tokenization) with the subsequent conversion of those tokens into numerical vectors (embedding).

How to eliminate wrong answers

Option A is wrong because positional encodings are added after tokenization to inject sequence order information, not assigned during tokenization. Option B is wrong because converting tokens into dense vector representations is the role of the embedding layer, not tokenization. Option D is wrong because LLMs typically retain stop words and punctuation as tokens to preserve context and syntactic structure; removal is a preprocessing step in traditional NLP, not a function of tokenization.

852
MCQmedium

A developer is fine-tuning a Cohere Command R model using OCI Data Science and the T-Few technique. They have prepared a dataset. What is the required format for the training data?

A.A CSV file with columns 'input' and 'output'
B.A Parquet file with 'text' and 'label' columns
C.A plain text file with one conversation per line
D.A JSONL file where each line contains a 'prompt' and a 'completion' field
AnswerD

The training dataset must be in JSONL format with prompt/completion pairs.

Why this answer

OCI Generative AI fine-tuning expects a JSONL file where each line is a JSON object containing a prompt and completion (or response) field. This format pairs input with expected output for supervised fine-tuning.

853
MCQeasy

Which OCI Generative AI service component is specifically designed to convert text into numerical vectors (embeddings) that can be used for semantic search and clustering?

A.Summarisation
B.Chat
C.Rerank
D.Embedding
AnswerD

The Embedding API creates vector representations of text for downstream tasks.

Why this answer

The Embedding component in OCI Generative AI service is specifically designed to convert text into numerical vectors (embeddings). These embeddings capture semantic meaning, enabling use cases like semantic search and clustering by measuring vector similarity (e.g., cosine similarity).

Exam trap

Cisco often tests the distinction between 'Rerank' and 'Embedding' because both are used in search pipelines, but only Embedding produces the initial vector representations needed for semantic search.

How to eliminate wrong answers

Option A is wrong because Summarisation is a text generation task that produces a concise summary of input text, not numerical vectors. Option B is wrong because Chat is a conversational interface for multi-turn dialogue, not a vectorization function. Option C is wrong because Rerank is a post-processing step that reorders search results based on relevance scores, not a component that generates embeddings.

854
MCQhard

A financial institution uses an LLM for generating investment advice. They are concerned about hallucinations. Which method is most effective?

A.Fine-tune on general financial data.
B.Use RAG with a verified corpus of regulations and reports.
C.Increase the temperature to get more creative responses.
D.Use a larger model to improve accuracy.
AnswerB

Correct: Grounding in trusted data reduces hallucinations.

Why this answer

Option B is correct because Retrieval-Augmented Generation (RAG) grounds the LLM's output in a verified, external knowledge base (e.g., regulations and reports). By retrieving relevant documents at inference time, RAG reduces the model's reliance on its parametric memory, directly mitigating hallucinations in high-stakes domains like financial advice.

Exam trap

Oracle often tests the misconception that simply fine-tuning or scaling a model can fix hallucinations, when in fact grounding via retrieval (RAG) is the most effective technique for factual accuracy in domain-specific applications.

How to eliminate wrong answers

Option A is wrong because fine-tuning on general financial data does not provide a mechanism to verify or update the model's knowledge at inference time; it only adjusts weights on static data, leaving the model prone to hallucinating outdated or fabricated details. Option C is wrong because increasing temperature makes the output more random and creative, which amplifies the risk of hallucinations rather than reducing them. Option D is wrong because using a larger model does not inherently solve hallucination; larger models can still confidently generate false information, and without a retrieval or grounding mechanism, they remain susceptible to fabricating details.

855
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month
B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
C.Use a larger foundation model with a longer context window and paste all documents into each prompt
D.Fine-tune a base LLM on the policy documents monthly
AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

856
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
B.Use a larger foundation model with a longer context window and paste all documents into each prompt
C.Train a custom model from scratch on the policy documents each month
D.Fine-tune a base LLM on the policy documents monthly
AnswerA

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

857
MCQmedium

A practitioner is developing a legal document summarization system and needs to reduce hallucinations. Which prompting technique is most effective for improving factual accuracy by exploring multiple reasoning paths?

A.Few-shot prompting
B.Self-consistency prompting
C.Zero-shot prompting
D.Chain-of-thought prompting
AnswerB

Self-consistency samples multiple chain-of-thought outputs and picks the most consistent answer, improving factual accuracy.

Why this answer

Self-consistency generates several reasoning chains and aggregates the results, increasing reliability and reducing hallucinations in tasks requiring factual accuracy.

858
MCQhard

An application uses ConversationalRetrievalChain with a vector store retriever. Users report that the chatbot sometimes provides answers that are not grounded in the retrieved documents. Which step in the RAG pipeline is most likely the cause?

A.The chunk_size in the text splitter is too large
B.The embedding model is not compatible with the retriever
C.The LLM prompt does not instruct the model to base its answer solely on the provided context
D.The retriever is returning irrelevant documents
AnswerC

The prompt should explicitly constrain the LLM to answer only from the retrieved documents; otherwise, the LLM may use its internal knowledge, leading to ungrounded answers.

Why this answer

Option C is correct because the ConversationalRetrievalChain in LangChain relies on the LLM prompt to instruct the model to base its answer solely on the provided context. If the prompt does not include such an instruction, the LLM may generate answers using its pre-trained knowledge rather than the retrieved documents, leading to ungrounded responses. This is a common oversight in RAG pipeline design where the prompt template fails to enforce context-only generation.

Exam trap

Cisco often tests the misconception that retrieval quality (chunk size, embeddings, or document relevance) is the primary cause of ungrounded answers, when in fact the prompt instruction to the LLM is the critical control point in the RAG pipeline.

How to eliminate wrong answers

Option A is wrong because chunk_size affects the granularity of document splitting and retrieval relevance, but it does not directly cause the LLM to ignore retrieved context; a too-large chunk may reduce precision but still provides context. Option B is wrong because embedding model compatibility with the retriever affects retrieval quality, not the LLM's adherence to provided context; incompatible embeddings would cause poor retrieval, not ungrounded answers from the LLM. Option D is wrong because irrelevant documents from the retriever would lead to answers based on wrong context, but the core issue of the LLM not grounding its answer in the provided context is a prompt-level failure, not a retrieval failure.

859
MCQhard

In OCI OpenSearch, a k-NN search query returns results with low precision. The index uses HNSW algorithm. The search parameters are: `k=10`, `ef_search=100`. To improve recall without significantly increasing latency, which parameter should be adjusted?

A.Increase `ef_search`
B.Decrease `ef_search`
C.Decrease `k`
D.Increase `k`
AnswerA

Larger ef_search explores more candidates, increasing recall at a small latency cost.

Why this answer

Increasing `ef_search` expands the search radius in the HNSW graph, allowing the algorithm to explore more candidate nodes during the search phase. This directly improves recall by reducing the chance of missing true nearest neighbors, while the impact on latency is typically sub-linear because HNSW's hierarchical structure limits the additional traversal cost.

Exam trap

Cisco often tests the misconception that increasing `k` improves recall, when in fact `k` only controls the number of results returned, while `ef_search` directly controls the search breadth and recall quality.

How to eliminate wrong answers

Option B is wrong because decreasing `ef_search` would reduce the number of candidates explored, further lowering recall and worsening the precision problem. Option C is wrong because decreasing `k` reduces the number of results returned but does not affect the quality of the nearest neighbor ranking; it may even hide low precision by returning fewer items. Option D is wrong because increasing `k` returns more results but does not improve the ranking quality or recall of the top-k neighbors; it only adds more potentially low-precision items to the output.

860
MCQeasy

Refer to the exhibit. A RAG application logs this error when trying to search. What is the most likely cause?

A.The embedding model is incompatible
B.The OpenSearch cluster is not accessible
C.The index name is misspelled in the application configuration
D.The query syntax is incorrect
AnswerC

A mismatch between the configured index name and the actual index causes this exception.

Why this answer

The error log indicates that the RAG application cannot find the specified index when performing a vector search. This is most commonly caused by a mismatch between the index name configured in the application and the actual index name in OpenSearch. Option C is correct because a misspelled index name will cause the search request to fail with a '404 index_not_found_exception' or similar error, even if the cluster is healthy and the embedding model is valid.

Exam trap

The trap here is that candidates often assume connectivity or syntax issues first, but Cisco tests the specific error message wording — 'not found' always points to a missing resource (index, document, or field), not a connection or parsing problem.

How to eliminate wrong answers

Option A is wrong because an incompatible embedding model would typically cause dimension mismatch errors or invalid vector format errors, not a 'not found' error for the index. Option B is wrong because if the OpenSearch cluster were not accessible, the error would be a connection timeout or 'Connection refused' (e.g., HTTP 503 or socket error), not an index-not-found error. Option D is wrong because an incorrect query syntax would produce a parsing error (e.g., 'parse_exception' or 'query_shard_exception'), not a missing index error.

861
Multi-Selecteasy

Which TWO OCI Generative AI features are available in the Playground for testing models?

Select 2 answers
A.Adjusting parameters like temperature and max tokens
B.Setting system prompts and preamble overrides
C.Provisioning a dedicated AI cluster
D.Viewing model training metrics
E.Submitting a fine-tuning job
AnswersA, B

The Playground provides sliders and fields for common generation parameters.

Why this answer

The Playground allows interactive testing with parameter adjustment and system prompts. It does not allow fine-tuning or creating dedicated clusters.

862
MCQhard

A company has deployed a generative AI model on OCI to generate product descriptions. After a recent update, the model started producing outputs with repetitive phrases and poor coherence. The inference endpoint is configured with default parameters. Which single parameter adjustment is most likely to improve output quality?

A.Increase the max-tokens parameter to 512
B.Increase the frequency penalty parameter to 0.5
C.Increase the temperature parameter to 1.5
D.Decrease the top-p parameter to 0.8
AnswerB

Frequency penalty reduces repeated tokens, directly improving repetitive output.

Why this answer

The correct answer is B because increasing the frequency penalty reduces the likelihood of the model repeating the same phrases, directly addressing the repetitive outputs. The frequency penalty subtracts a proportional penalty from tokens that have already appeared, discouraging repetition and improving coherence. Default parameters often have no frequency penalty (0.0), so a small positive value like 0.5 can significantly enhance output diversity.

Exam trap

The trap here is that candidates often confuse frequency penalty with temperature or top-p, assuming that increasing randomness (temperature) or narrowing token selection (top-p) will fix repetition, when in fact those parameters address different aspects of output diversity and coherence.

How to eliminate wrong answers

Option A is wrong because increasing max-tokens only extends the maximum length of the output, not the quality or repetition; it could even worsen the problem by allowing more repetitive text. Option C is wrong because increasing temperature to 1.5 makes the model more random and less focused, which typically reduces coherence and can increase nonsensical outputs. Option D is wrong because decreasing top-p to 0.8 narrows the sampling pool to the top 80% of probability mass, which may reduce diversity and potentially increase repetition rather than fix it.

863
MCQeasy

Which fine-tuning technique does OCI Generative AI use to efficiently update model parameters without modifying the entire model, enabling faster training on limited data?

A.T-Few
B.Full fine-tuning
C.Prefix tuning
D.LoRA
AnswerA

OCI GenAI uses the T-Few fine-tuning technique.

Why this answer

T-Few is a parameter-efficient fine-tuning technique that updates only a small fraction of model parameters.

864
MCQhard

A team fine-tuned a model using T-Few and validated it. They now want to deploy this fine-tuned model to a dedicated AI cluster for low-latency inference. What must they do FIRST?

A.Copy the fine-tuned model to an Object Storage bucket
B.Increase the temperature parameter in the inference request
C.Create a dedicated AI cluster with a specified number of model units
D.Delete the base model to free up capacity
AnswerC

The cluster must be provisioned first to host the fine-tuned model.

Why this answer

To deploy a fine-tuned model to a dedicated AI cluster for low-latency inference, you must first create the dedicated AI cluster and specify the number of model units. This cluster provides isolated compute resources that ensure consistent, low-latency performance, unlike the shared serverless endpoint. The fine-tuned model is then deployed onto this cluster, not copied to Object Storage first.

Exam trap

Cisco often tests the misconception that you must first copy the model to Object Storage before deployment, but in OCI Generative AI, the model remains in the model catalog and is deployed directly to a dedicated cluster.

How to eliminate wrong answers

Option A is wrong because copying the fine-tuned model to an Object Storage bucket is not a required first step; the model is already stored in the model catalog after fine-tuning, and deployment pulls it from there. Option B is wrong because increasing the temperature parameter affects the randomness of generated text, not the infrastructure or latency of inference. Option D is wrong because deleting the base model is unnecessary and would break the fine-tuned model, which depends on the base model's weights; capacity is managed by provisioning model units, not by deleting models.

865
MCQmedium

A company notices that some inference requests to their deployed model on OCI Generative AI take longer than acceptable. They want to reduce per-request latency. What should they do?

A.Reduce the maximum number of tokens generated
B.Enable request batching
C.Use a larger model to improve accuracy
D.Increase the number of replicas in the deployment
AnswerA

Lowering max tokens reduces the amount of computation per request, directly decreasing latency.

Why this answer

Reducing the maximum number of tokens generated directly decreases the amount of computation required per inference request because the model stops generating output earlier. Since latency is proportional to the number of output tokens produced, this is the most effective single change to reduce per-request response time in OCI Generative AI deployments.

Exam trap

Oracle often tests the distinction between latency (per-request speed) and throughput (requests per second), causing candidates to confuse batching or scaling replicas (which improve throughput) with reducing individual request latency.

How to eliminate wrong answers

Option B is wrong because request batching aggregates multiple inference requests into a single batch, which improves throughput (requests per second) but does not reduce the latency of any individual request; in fact, it can increase per-request latency due to queuing and waiting for batch completion. Option C is wrong because using a larger model increases the number of parameters and computational steps per token, which typically increases latency, not reduces it. Option D is wrong because increasing the number of replicas improves scalability and concurrency (handling more requests in parallel) but does not reduce the latency of a single inference request; each request still processes through the same model with the same token generation steps.

866
MCQhard

A financial institution wants to use OCI Generative AI to analyze sensitive customer documents. They need to ensure no data leaves OCI and the model is fine-tuned on their proprietary data. Which deployment option should they choose?

A.Serverless inference with data isolation.
B.OCI Functions with GPU.
C.Dedicated AI cluster with private endpoint.
D.OCI Data Science notebook session.
AnswerC

This option ensures data remains in OCI and supports fine-tuning with custom data.

Why this answer

A Dedicated AI cluster with private endpoint ensures that all data processing and model fine-tuning occur within OCI's network boundary, with no data egress to the public internet. This deployment option provides a fully isolated environment where the model can be fine-tuned on proprietary customer data while meeting strict data residency and security requirements.

Exam trap

Cisco often tests the distinction between inference-only options (like serverless inference) and full training/fine-tuning capabilities, leading candidates to mistakenly choose a cheaper or simpler option that cannot actually perform fine-tuning on proprietary data.

How to eliminate wrong answers

Option A is wrong because serverless inference with data isolation still operates in a multi-tenant environment and does not provide dedicated compute resources for fine-tuning; it is designed for inference only, not for training or fine-tuning on proprietary data. Option B is wrong because OCI Functions with GPU is a serverless compute service for short-lived, stateless functions and lacks the persistent storage and orchestration needed for fine-tuning large language models. Option D is wrong because OCI Data Science notebook session is an interactive development environment that does not guarantee data isolation or a private endpoint; it is intended for experimentation and prototyping, not for production-grade fine-tuning with strict data residency controls.

867
MCQhard

Refer to the exhibit. A user runs the command shown and receives the error: 'ServiceError: NotAuthorizedOrNotFound'. What is the MOST likely cause?

A.The CLI is not configured with OCI credentials
B.The user does not have the 'inspect' permission on the model
C.The model ID is incorrectly formatted
D.The model is in a different region than iad
AnswerB

NotAuthorizedOrNotFound is common when permissions are insufficient.

Why this answer

The error 'NotAuthorizedOrNotFound' typically indicates either the model ID does not exist or the user lacks permission to view it. Option D is correct because the error message is generic to avoid information leakage. Option A would give a different error (e.g., invalid model ID), but the generic error suggests authorization or existence issues.

868
MCQeasy

What is the primary purpose of the self-attention mechanism in a transformer model?

A.To reduce the number of parameters in the model
B.To convert tokens into fixed-length vectors
C.To ensure the model is autoregressive
D.To process tokens in parallel while modeling long-range dependencies
AnswerD

Self-attention enables parallelization by computing attention scores between all token pairs simultaneously, and its receptive field covers the entire sequence.

Why this answer

The self-attention mechanism allows each token in the input sequence to attend to every other token, computing a weighted sum of their representations. This enables the model to capture long-range dependencies directly without the sequential processing constraints of RNNs, and because the attention scores for all tokens can be computed simultaneously, the mechanism supports parallel processing of the entire sequence.

Exam trap

Cisco often tests the distinction between the self-attention mechanism's core function (parallel processing and long-range dependencies) and other transformer components like embeddings or causal masking, leading candidates to confuse the purpose of self-attention with the overall autoregressive nature of the decoder.

How to eliminate wrong answers

Option A is wrong because self-attention actually increases the number of parameters (through query, key, and value projection matrices) rather than reducing them. Option B is wrong because converting tokens into fixed-length vectors is the role of the embedding layer, not the self-attention mechanism. Option C is wrong because self-attention itself is not autoregressive; autoregressive behavior in transformers is enforced by causal masking (masking future tokens) during decoding, not by the self-attention mechanism itself.

869
Multi-Selecteasy

A developer is comparing different foundation models for a text completion API on OCI. Which TWO of the following are model families available through OCI Generative AI service? (Choose two.)

Select 2 answers
A.OpenAI GPT
B.BERT
C.Meta Llama
D.Cohere Command/Embed
E.Mistral
AnswersC, D

Meta Llama models are available on OCI.

Why this answer

OCI Generative AI offers models including Cohere Command/Embed and Meta Llama. Mistral and GPT are not mentioned in the context of OCI's available models, and BERT is an encoder-only model not typically offered as a generation model.

870
MCQmedium

A company is deploying a generative AI service on OCI using the OCI Data Science service with a large language model (LLM) in a VCN. The model inference endpoint must be accessible only from a private subnet within the same VCN. Which networking component should be configured to enable this?

A.NAT Gateway
B.Dynamic Routing Gateway (DRG)
C.Internet Gateway
D.Service Gateway
AnswerD

Service gateway enables private subnet access to OCI services like Data Science.

Why this answer

A Service Gateway enables private subnet resources to access OCI services (including the OCI Data Science model deployment endpoint) without traversing the internet. Since the inference endpoint must be accessible only from a private subnet within the same VCN, the Service Gateway provides the necessary private connectivity by routing traffic over the OCI network fabric, not through a NAT or internet gateway.

Exam trap

The trap here is that candidates often confuse a Service Gateway with a NAT Gateway, assuming both provide outbound-only access, but the Service Gateway is specifically designed for private access to OCI services, not general internet egress.

How to eliminate wrong answers

Option A is wrong because a NAT Gateway allows outbound internet access from a private subnet but does not provide private connectivity to OCI services; it would expose traffic to the internet. Option B is wrong because a Dynamic Routing Gateway (DRG) is used for connecting a VCN to on-premises networks or other VCNs via VPN or FastConnect, not for accessing OCI services privately within the same VCN. Option C is wrong because an Internet Gateway provides bidirectional internet access, which would make the endpoint publicly accessible, violating the requirement of private subnet-only access.

871
MCQhard

A data engineer wants to migrate a large corpus of PDFs to OCI for use with GenAI. Which storage and preprocessing approach is most efficient for RAG?

A.Store PDFs in OCI Object Storage, then use OCI AI Document Understanding to extract text and create embeddings.
B.Convert PDFs to text locally, upload to OCI Database, use SQL queries to retrieve.
C.Use OCI Data Flow to process in batch and store in NoSQL.
D.Store PDFs in OCI File Storage, mount to compute, run offline extraction.
AnswerA

This leverages cloud-native services for scalable extraction and embedding, ideal for RAG.

Why this answer

Option A is correct because OCI Object Storage is optimized for large-scale, unstructured data like PDFs, and OCI AI Document Understanding provides a managed service to extract text from PDFs, which can then be directly fed into embedding pipelines for RAG. This eliminates the need for manual preprocessing or local compute, ensuring scalability and integration with GenAI services.

Exam trap

Oracle often tests the misconception that any storage service (like File Storage or Database) can be used for RAG, but the key is that Object Storage combined with a managed AI extraction service is the most efficient for unstructured data at scale, avoiding local processing overhead.

How to eliminate wrong answers

Option B is wrong because converting PDFs to text locally introduces a bottleneck and inefficiency for large corpora, and storing text in OCI Database with SQL queries is not designed for vector search or RAG workflows, lacking native embedding support. Option C is wrong because OCI Data Flow (Apache Spark) is for batch processing but storing in NoSQL does not provide the vector indexing or retrieval capabilities required for RAG, and it adds unnecessary complexity. Option D is wrong because OCI File Storage is a shared file system for compute instances, not optimized for high-throughput object access, and running offline extraction on a mounted compute instance is manual, lacks scalability, and does not leverage managed AI services.

872
MCQmedium

A developer is using LangChain's SequentialChain to process text: first, summarize a long document, then translate the summary to French. How should they configure the chain to pass the output of the first step as input to the second?

A.Manually call the first chain, extract the output, and pass it to the second chain
B.Set the input_variables of the second chain to match the output_variables of the first chain in a SequentialChain
C.Define two separate LLMChains and combine them with the | operator in LCEL
D.Use SimpleSequentialChain, which assumes a single input and output, and chain the two chains
AnswerD

SimpleSequentialChain is the easiest way to chain chains where each chain has a single input and output. The output of the first chain is automatically passed as input to the second.

Why this answer

SimpleSequentialChain is designed for exactly this scenario: a single-input/single-output pipeline where the output of the first chain is automatically passed as input to the second chain. It eliminates the need to manually wire input/output variables, making it the simplest and most correct choice for summarizing a document and then translating the summary.

Exam trap

Cisco often tests the distinction between SimpleSequentialChain (single input/output) and SequentialChain (multiple inputs/outputs), and the trap here is that candidates may overcomplicate the solution by choosing SequentialChain with manual variable mapping (Option B) when SimpleSequentialChain is the correct, simpler choice.

How to eliminate wrong answers

Option A is wrong because manually calling the first chain, extracting the output, and passing it to the second chain defeats the purpose of using a SequentialChain abstraction; it introduces unnecessary boilerplate and error-prone manual steps. Option B is wrong because setting input_variables of the second chain to match output_variables of the first chain is how you configure a standard SequentialChain (which supports multiple inputs/outputs), but for a simple single-input/single-output pipeline, SimpleSequentialChain is the more direct and intended approach. Option C is wrong because the | operator in LCEL is used for composing runnables in a streaming/piping fashion, but it does not automatically handle the sequential chaining of two separate LLMChains with explicit input/output variable mapping; it would require additional steps to ensure the output of the first chain is correctly fed as input to the second.

873
MCQeasy

A prompt engineer wants the LLM to output a list of countries in a specific JSON format with fields 'country_code' and 'name'. Which prompt component should be used to define this structure?

A.Output format specification
B.Constraints
C.Context/background
D.Task instruction
AnswerA

This component explicitly defines the desired output structure, such as JSON with specific fields.

Why this answer

Option A is correct because the output format specification is the prompt component explicitly designed to define the structure, schema, or layout of the LLM's response. In this scenario, the prompt engineer needs the LLM to output a list of countries with specific JSON fields ('country_code' and 'name'), which is a direct instruction about the format of the output, not the task itself. This component ensures the LLM adheres to a precise data structure, such as JSON, XML, or a table, which is critical for downstream parsing or integration.

Exam trap

Cisco often tests the distinction between 'task instruction' and 'output format specification' by presenting a scenario where the task is obvious (e.g., 'list countries') but the format is the key requirement, causing candidates to mistakenly choose 'task instruction' (Option D) because they conflate the action with the output structure.

How to eliminate wrong answers

Option B is wrong because constraints are used to limit the LLM's behavior (e.g., 'do not use external data' or 'keep responses under 100 words'), not to define the output structure. Option C is wrong because context/background provides situational or historical information to inform the LLM's reasoning, but does not specify the format of the response. Option D is wrong because the task instruction tells the LLM what to do (e.g., 'list countries'), but does not inherently define the format; the output format specification is a separate component that refines how the task result should be presented.

874
MCQmedium

A data scientist wants to improve the accuracy of a summarization model on medical texts. Which OCI service feature is most suitable?

A.OCI Data Flow
B.OCI Language service
C.OCI Generative AI fine-tuning
D.OCI Anomaly Detection
AnswerC

Fine-tuning adapts a model to domain-specific data, improving accuracy.

Why this answer

C is correct because OCI Generative AI fine-tuning allows a data scientist to adapt a pre-trained large language model (LLM) specifically for medical text summarization by training it on domain-specific data. This improves accuracy by aligning the model's outputs with the terminology, context, and nuances of medical literature, which generic models may not capture well.

Exam trap

The trap here is that candidates may confuse the OCI Language service's pre-built summarization capabilities with the ability to customize a model for a specialized domain, overlooking that fine-tuning is required for significant accuracy improvements on niche text like medical records.

How to eliminate wrong answers

Option A is wrong because OCI Data Flow is a serverless Apache Spark-based data processing service for ETL and big data analytics, not designed for fine-tuning or improving summarization model accuracy. Option B is wrong because OCI Language service provides pre-trained NLP capabilities like sentiment analysis and entity extraction but does not support custom fine-tuning of generative models for summarization tasks. Option D is wrong because OCI Anomaly Detection is used for identifying unusual patterns in time-series data, such as equipment failures or fraud, and has no relevance to improving text summarization accuracy.

875
MCQmedium

An organization needs to ensure that all inference requests to OCI Generative AI are logged for compliance. Which OCI feature should be enabled?

A.OCI Cloud Guard
B.OCI Logging for the AI service
C.OCI Vault
D.OCI Audit logs
AnswerB

OCI Logging enables detailed logging of inference requests and responses for compliance.

Why this answer

Option B is correct because OCI Logging for the AI service captures detailed request and response data for inference calls to OCI Generative AI, including payloads, timestamps, and user identities. This feature must be explicitly enabled per service endpoint to meet compliance requirements for logging all inference requests. Unlike Audit logs, which record control-plane operations, OCI Logging provides data-plane logging for the AI service itself.

Exam trap

Oracle often tests the distinction between control-plane logging (Audit logs) and data-plane logging (service-specific Logging), leading candidates to mistakenly choose Audit logs for operational request tracking.

How to eliminate wrong answers

Option A is wrong because OCI Cloud Guard is a security posture management service that detects misconfigurations and threats, but it does not log individual inference requests to Generative AI. Option C is wrong because OCI Vault manages encryption keys and secrets, not request logging for AI services. Option D is wrong because OCI Audit logs capture only control-plane API calls (e.g., creating or deleting resources), not data-plane inference requests to the Generative AI service.

876
MCQhard

A company uses OCI Generative AI service with a Cohere Command model for a real-time chat application and experiences high latency. They have already set max_tokens to 50 and temperature to 0.2. Which further change would be most effective in reducing latency?

A.Use asynchronous invocation.
B.Switch to a smaller model variant.
C.Disable context caching.
D.Increase the number of GPUs.
AnswerB

Smaller models have fewer parameters and are faster.

Why this answer

Switching to a smaller model variant (e.g., from Command to Command-Light) directly reduces the number of parameters and computational steps per token, which lowers inference latency. Since the company has already minimized max_tokens and temperature, the next most impactful change is to use a less resource-intensive model. This is a common optimization for real-time applications where response speed is critical.

Exam trap

The trap here is that candidates often confuse throughput optimization (asynchronous calls or more GPUs) with latency reduction, but for a single real-time request, model size is the dominant factor.

How to eliminate wrong answers

Option A is wrong because asynchronous invocation does not reduce the latency of a single request; it only decouples the client from waiting for the response, which is unsuitable for a real-time chat application that requires synchronous replies. Option C is wrong because disabling context caching would increase latency, as the model would have to reprocess the conversation history from scratch on every turn, negating the benefit of cached key-value states. Option D is wrong because increasing the number of GPUs does not reduce per-request latency for a single inference call; it improves throughput for concurrent requests but adds overhead for distributing the workload, which can actually increase latency for a single user.

877
Multi-Selecthard

A prompt engineer is troubleshooting a chatbot that consistently fails to follow instructions when the user includes adversarial input. Which two strategies can mitigate prompt injection attacks? (Choose two.)

Select 2 answers
A.Increase temperature to make model less predictable
B.Use instruction shielding: clearly separate system instructions from user input
C.Use a smaller model to reduce capability
D.Add more few-shot examples with safe outputs
E.Implement input validation and sanitization to remove adversarial patterns
AnswersB, E

Separating instructions from user input prevents the model from treating user input as instructions.

Why this answer

Instruction shielding (clear separation of instruction and input) and input validation/sanitization are effective defenses. Adding more examples or adjusting temperature do not address injection.

878
Multi-Selecteasy

Which TWO are benefits of using few-shot prompting compared to zero-shot prompting?

Select 2 answers
A.It always reduces the need for parameter tuning
B.It eliminates the need for a system prompt
C.It reduces the number of tokens in the output
D.It helps the model understand the desired pattern, especially for uncommon tasks
E.It can improve performance on tasks requiring specific output formats
AnswersD, E

Examples guide the model for tasks it may not have seen frequently.

Why this answer

Few-shot provides examples that improve format adherence and guide the model, especially for complex tasks.

879
MCQhard

An enterprise is building a LangChain application that must use Oracle AI Vector Search for retrieval. They need to store embeddings in an Oracle Database 23ai table with a VECTOR column. Which index type should they create to support efficient similarity search with exact nearest neighbor queries?

A.No index is needed for similarity search
B.Bitmap index
C.B-tree index
D.HNSW index
AnswerD

HNSW (Hierarchical Navigable Small World) is a vector index for approximate nearest neighbor search in Oracle Database 23ai.

Why this answer

Oracle AI Vector Search supports exact nearest neighbor search using L2 distance on VECTOR columns without an index, or with an index for approximate search. For exact search, no specialized index is needed; a simple sorted scan can be used, but for efficiency, an HNSW or IVF index provides approximate results. However, the question asks for exact nearest neighbor queries, which typically require no index or a brute-force approach.

But in practice, for exact results, you might not use an index, but the question likely expects the common index type for similarity search. Re-reading: 'efficient similarity search with exact nearest neighbor queries' is contradictory because indexes provide approximate results. The correct answer is that for exact search, you can use no index, but that is not efficient.

In Oracle Database, you can use a vector index of type HNSW for approximate search. For exact search, you can still use an index if you set the accuracy parameter to high. However, the most appropriate answer is that HNSW is used for approximate search.

Given the options, HNSW is the only index type mentioned. Let's assume they intend approximate search. I'll make the stem clearer: 'efficient approximate similarity search' -> I need to adjust.

Since I'm generating, I'll modify the stem in the output to avoid ambiguity. But I'll keep as is and explanation clarifies.

880
MCQhard

A developer runs this CLI command but receives only one response instead of three. What is the most likely cause?

A.The model specified does not support multiple generations
B.The parameter --num-generations is misspelled; should be --num-generations-to-generate
C.The --max-tokens limit is too low to return multiple generations
D.The API version is outdated and does not support the num-generations parameter
AnswerA

Some models only support a single generation; check model capabilities.

Why this answer

Option A is correct because the OCI Generative AI service's `cohere.command` model does not support the `--num-generations` parameter for generating multiple responses in a single invocation. The `--num-generations` parameter is only supported by certain models like `cohere.command-light` or `meta.llama-2-70b-chat`, and using it with an unsupported model results in a single generation being returned, ignoring the parameter.

Exam trap

The trap here is that candidates assume `--num-generations` is universally supported across all OCI Generative AI models, when in fact it is model-specific, and Cisco tests this by having the command execute without error but return fewer results than expected.

How to eliminate wrong answers

Option B is wrong because `--num-generations` is the correct parameter name in the OCI Generative AI CLI; there is no parameter named `--num-generations-to-generate`. Option C is wrong because the `--max-tokens` limit controls the length of each individual generation, not the number of generations returned; even with a low token limit, the service would still return multiple generations if the model supported them. Option D is wrong because the `--num-generations` parameter is supported in current API versions; an outdated API version would likely result in an error or the parameter being ignored entirely, but the question states the command runs and returns one response, not an error.

881
Multi-Selectmedium

A team wants to reduce hallucinations in their LLM-powered question-answering system. Which TWO techniques are most effective?

Select 2 answers
A.Implementing RAG to retrieve relevant documents
B.Switching to a smaller model
C.Using a lower temperature (e.g., 0) for more deterministic outputs
D.Using a larger context window
E.Increasing the temperature to 1.5
AnswersA, C

RAG grounds answers in retrieved facts.

Why this answer

RAG provides factual grounding, and reducing temperature makes outputs more deterministic, reducing fabricated details.

882
Multi-Selecthard

A company is deploying a LangChain application using OCI Generative AI. They need to comply with a policy that requires all prompts sent to the LLM to be logged for audit, and they must also handle rate limits gracefully. Which TWO strategies should they implement?

Select 2 answers
A.Use a faster LLM to reduce response time
B.Implement a custom LangChain callback that logs the prompt before sending it to the model
C.Increase the batch size of requests to reduce the number of API calls
D.Wrap the LLM call in a retry mechanism with exponential backoff to handle rate limit errors
E.Store the full conversation history in the prompt's system message
AnswersB, D

Callbacks are the idiomatic way to intercept and log prompts in LangChain.

Why this answer

Using LangChain callbacks (e.g., on_llm_start) allows capturing prompts for logging without modifying the chain. For rate limits, adding a retry with exponential backoff (e.g., via tenacity or a custom callback) ensures resilience without dropping requests.

883
Multi-Selectmedium

A data scientist is configuring a fine-tuning job in OCI Generative AI. Which TWO of the following are required inputs for creating the job?

Select 2 answers
A.Temperature setting for fine-tuning
B.Training dataset (JSONL file)
C.Max tokens limit for validation
D.Base model selection (e.g., Cohere Command R)
E.Inference endpoint name
AnswersB, D

Why this answer

Fine-tuning requires a base model and a training dataset. The other options are optional or configurable later.

884
Multi-Selectmedium

A prompt library manager wants to implement version control for prompt templates used across multiple applications. Which THREE practices should they adopt?

Select 3 answers
A.Automatically test prompts on a fixed set of inputs after each change
B.Store prompts only in the application's database without history
C.Use semantic versioning (e.g., v1.2.3) for prompt templates
D.Maintain a changelog documenting what changed and why
E.Store prompt templates in a version control system (e.g., Git)
AnswersC, D, E

Semantic versioning helps communicate the nature of changes.

Why this answer

Storing templates in a version control system, using semantic versioning, and maintaining a changelog are standard practices for prompt version management. Automated testing is good but not version control per se.

885
MCQmedium

A developer calls the OCI GenAI embedding API as shown in the exhibit. What is the most likely cause of the error?

A.The API key does not have permission to call the endpoint
B.The endpoint ID is incorrect
C.The input string is too long for the embedding model
D.The model ID is not supported for embedding
AnswerC

The error confirms the input exceeds the token limit.

Why this answer

The error is most likely caused by the input string exceeding the maximum token limit for the embedding model. OCI GenAI embedding models have a fixed context window (e.g., 512 or 1024 tokens), and if the input text is longer than that, the API returns an error. The developer must truncate or chunk the input before calling the API.

Exam trap

Cisco often tests the distinction between authentication/authorization errors (403) and input validation errors (400), so candidates mistakenly blame permissions or endpoint configuration when the real issue is input length exceeding the model's context window.

How to eliminate wrong answers

Option A is wrong because the API key permission issue would typically result in a 403 Forbidden error, not a model-level input validation error. Option B is wrong because an incorrect endpoint ID would cause a 404 Not Found or connection error, not a model-specific input length error. Option D is wrong because the model ID is explicitly provided and supported for embedding; if it were unsupported, the API would return a 'model not found' or 'unsupported model' error, not an input length error.

886
Multi-Selecthard

A data scientist is evaluating the cost of deploying a fine-tuned model for a high-volume production application. They need low latency but are cost-sensitive. Which TWO considerations should they evaluate when choosing between on-demand (shared) and dedicated cluster pricing?

Select 2 answers
A.The cost per model unit per hour for a dedicated cluster
B.Whether the model supports streaming responses
C.The number of days required to provision the dedicated cluster
D.The availability of the model in the OCI Generative AI Playground
E.The expected monthly token volume and the on-demand per-token price
AnswersA, E

Dedicated clusters are billed per model unit hour, which must be compared to on-demand token costs.

Why this answer

The cost of model units per hour and the expected token volume help determine whether dedicated or on-demand is more economical. Cluster provisioning time and streaming availability are not direct cost factors.

887
MCQeasy

A developer wants to build a RAG application that processes highly sensitive medical records. The documents are already stored in OCI Object Storage. Which vector storage strategy best balances security and performance?

A.Store vectors in-memory within the application server
B.Use OCI OpenSearch with a public endpoint for low latency
C.Use OCI OpenSearch with a private subnet and VCN security lists
D.Use a third-party vector database outside OCI
AnswerC

Private subnet ensures network isolation, and security lists control access.

Why this answer

Option C is correct because it uses OCI OpenSearch deployed within a private subnet, which ensures that vector data never traverses the public internet, while VCN security lists provide granular traffic control. This architecture balances security (data isolation and access control) with performance (low-latency access within the same VCN or via FastConnect/IPSEC VPN) for sensitive medical records.

Exam trap

The trap here is that candidates may assume a public endpoint is acceptable for 'low latency' (Option B) without recognizing that security requirements for sensitive data override performance considerations, and that private subnet connectivity can still achieve very low latency within the same region.

How to eliminate wrong answers

Option A is wrong because storing vectors in-memory within the application server is volatile, lacks persistence, and cannot scale to handle large document collections, making it unsuitable for production RAG workloads. Option B is wrong because using a public endpoint for OCI OpenSearch exposes the vector store to the internet, violating security requirements for highly sensitive medical records and increasing attack surface. Option D is wrong because using a third-party vector database outside OCI introduces data egress costs, higher latency over the public internet, and compliance risks for sensitive data that should remain within OCI's tenancy.

888
MCQhard

A team uses OCI Generative AI’s fine-tuning capability to adapt a base model. After fine-tuning, they evaluate the model but see degraded performance on certain edge cases. What is the most likely cause?

A.Overfitting on the training data
B.Validation data leakage
C.Learning rate too high
D.Insufficient training epochs
AnswerA

Overfitting leads to poor generalization, especially on edge cases not seen during training.

Why this answer

Fine-tuning adapts a base model to a specific dataset, but if the training data is too narrow or the model is trained for too many epochs, it can memorize the training examples rather than learning generalizable patterns. This overfitting causes the model to perform well on training-like inputs but poorly on edge cases that deviate from the training distribution. In OCI Generative AI, overfitting is a common pitfall when fine-tuning hyperparameters like the number of epochs or learning rate are not properly validated.

Exam trap

Oracle often tests the distinction between overfitting and underfitting by presenting a scenario where performance is good on training data but poor on unseen data, leading candidates to incorrectly blame a high learning rate or insufficient epochs.

How to eliminate wrong answers

Option B is wrong because validation data leakage would cause artificially high performance on validation metrics, not degraded performance on edge cases; leakage means the model has seen the test data during training, which would inflate scores rather than cause failures. Option C is wrong because a learning rate that is too high typically causes training instability, divergence, or failure to converge, not selective degradation on edge cases after successful fine-tuning. Option D is wrong because insufficient training epochs would result in underfitting, where the model fails to learn even the main training patterns, leading to poor performance across all cases, not just edge cases.

889
MCQhard

Refer to the exhibit. A user in GenAI-Users group tries to run a text generation inference but gets permission denied. What is the most likely issue?

A.The policy resource type is wrong.
B.The operation condition is too restrictive.
C.The group name mismatch.
D.The user is not in the compartment.
AnswerB

The condition likely does not match the actual operation, causing denial.

Why this answer

The policy attached to the GenAI-Users group includes a condition that restricts the operation to a specific compartment or resource, but the user is attempting to run inference in a different compartment or without meeting the condition. Since the condition is too restrictive, the IAM policy denies the action even though the user is in the correct group and the resource type is valid.

Exam trap

Oracle often tests the nuance that a policy with overly restrictive conditions (e.g., scoping to a specific compartment or resource) will deny access even when the group, resource type, and user compartment are all correct, leading candidates to incorrectly blame the group or resource type.

How to eliminate wrong answers

Option A is wrong because the policy resource type (e.g., 'ai-language-models' or 'genai-models') is correct for text generation inference in OCI Generative AI, so a mismatch would cause a different error. Option C is wrong because the group name mismatch would result in no policy being applied at all, not a permission denied error with a valid group. Option D is wrong because the user being in the compartment is not the issue; the condition in the policy is what restricts the operation, not the user's compartment membership.

890
MCQeasy

A developer is using the OCI Generative AI SDK in Python to call the cohere.command model. They are getting a 401 Unauthorized error. They have configured the SDK with their tenancy OCID and user OCID. What is the most likely missing piece?

A.Correct region endpoint.
B.Model OCID.
C.API key or token.
D.Compartment OCID.
AnswerC

Authentication requires a valid API key or token; omitting it causes 401 errors.

Why this answer

The 401 Unauthorized error indicates that the request lacks valid authentication credentials. In the OCI Generative AI SDK, even when tenancy and user OCIDs are provided, the SDK requires an API signing key or a token (such as an OCI API key pair or a session token from an instance principal) to sign requests. Without this key or token, the SDK cannot authenticate the request to the OCI API, resulting in a 401 error.

Exam trap

The trap here is that candidates assume providing tenancy and user OCIDs is sufficient for authentication, overlooking that OCI requires a cryptographic signing key or token to prove identity.

How to eliminate wrong answers

Option A is wrong because a correct region endpoint affects routing and service availability, not authentication; a 401 error is unrelated to endpoint configuration. Option B is wrong because the model OCID is a parameter for specifying which model to invoke, not for authentication; omitting it would cause a different error (e.g., 400 Bad Request). Option D is wrong because the compartment OCID is used for resource scoping and billing, not for signing requests; missing it would not cause a 401 error.

891
MCQmedium

A data scientist is fine-tuning a Cohere model on OCI Generative AI service for a custom classification task. They have a dataset of 1000 labeled examples. What is the minimum recommended dataset size for fine-tuning?

A.500
B.1000
C.5000
D.100
AnswerB

Cohere's documentation states a minimum of 1000 examples.

Why this answer

Cohere models on OCI Generative AI require a minimum of 1000 labeled examples for fine-tuning to ensure sufficient signal for learning task-specific patterns without overfitting. This threshold is documented in OCI's fine-tuning requirements and applies to custom classification tasks.

Exam trap

The trap here is that candidates may assume a lower number like 500 is sufficient based on general machine learning heuristics, but OCI's specific fine-tuning documentation explicitly sets 1000 as the minimum, and Cisco tests this exact documented value.

How to eliminate wrong answers

Option A (500) is wrong because 500 examples are below the documented minimum threshold, risking poor generalization and overfitting. Option C (5000) is wrong because while larger datasets can improve performance, 5000 is not the minimum requirement; 1000 is the stated minimum. Option D (100) is wrong because 100 examples are far too few for fine-tuning a transformer-based model like Cohere, leading to severe overfitting and unreliable results.

892
Multi-Selectmedium

Which TWO actions can improve the retrieval accuracy of a RAG system? (Select two.)

Select 2 answers
A.Use a smaller chunk size for all documents
B.Remove stop words from documents before embedding
C.Increase the topK parameter significantly
D.Use a more accurate embedding model
E.Enrich chunk metadata and apply strict filters during retrieval
AnswersD, E

Better embeddings improve similarity search.

Why this answer

Option D is correct because a more accurate embedding model (e.g., from sentence-transformers or OpenAI's text-embedding-3-large) produces higher-quality vector representations that capture semantic meaning more precisely, directly improving retrieval relevance in a RAG pipeline.

Exam trap

Cisco often tests the misconception that simply increasing the number of retrieved documents (topK) or naively preprocessing text (e.g., removing stop words) will improve accuracy, when in fact these actions can harm retrieval quality.

893
MCQeasy

A prompt engineer notices that the model's responses are frequently repetitive and contain redundant phrases. Which parameter adjustment is MOST likely to reduce this repetition?

A.Increase the presence penalty to a positive value, e.g., 0.3
B.Decrease the top-p value to 0.5
C.Increase the temperature to 1.5
D.Increase the frequency penalty to a positive value, e.g., 0.3
AnswerD

Frequency penalty reduces the likelihood of repeated tokens.

Why this answer

Option D is correct because increasing the frequency penalty (e.g., to 0.3) directly penalizes tokens that have already appeared in the generated text, reducing the likelihood of the model repeating the same phrases. This parameter is specifically designed to discourage repetition by subtracting a fixed amount from the log-probability of each token each time it has been generated, making it the most targeted adjustment for this issue.

Exam trap

Cisco often tests the distinction between frequency penalty and presence penalty, where candidates mistakenly choose presence penalty (Option A) because they confuse 'penalizing repetition' with 'penalizing topic presence,' not realizing that frequency penalty is the precise parameter for reducing redundant phrases.

How to eliminate wrong answers

Option A is wrong because increasing the presence penalty (e.g., to 0.3) penalizes tokens based on whether they have appeared at all, not how often, which can reduce topic repetition but does not specifically target redundant phrases or frequent repetition of the same token. Option B is wrong because decreasing top-p to 0.5 narrows the sampling pool to the most probable tokens, which can actually increase repetition by making the model more deterministic and likely to pick the same high-probability tokens repeatedly. Option C is wrong because increasing temperature to 1.5 flattens the probability distribution, making the model more random and less likely to repeat exact phrases, but it often introduces incoherence and does not directly address the root cause of repetition; it is a less targeted and riskier adjustment.

894
MCQhard

An AI engineer observes that the RAG application fails to retrieve relevant documents for certain user queries, despite having a comprehensive knowledge base. The issue appears to be a semantic gap between query phrasing and document content. Which technique should the engineer implement first to address this?

A.Switch from dense to sparse vector embeddings
B.Apply query expansion techniques before embedding the user query
C.Implement a re-ranking model to reorder retrieved results
D.Increase the chunk overlap to ensure more context
AnswerB

Query expansion broadens the query to capture more relevant documents.

Why this answer

Query expansion techniques (e.g., synonym injection, back-translation, or LLM-based paraphrasing) directly address the semantic gap by enriching the user query with alternative phrasings before embedding. This increases the likelihood of matching relevant document chunks in the vector space, without altering the retrieval architecture or requiring additional inference steps.

Exam trap

Cisco often tests the distinction between retrieval-stage fixes (query expansion) and post-retrieval optimizations (re-ranking), tempting candidates to choose re-ranking because it sounds more advanced, even though it cannot recover documents missed in the initial retrieval.

How to eliminate wrong answers

Option A is wrong because switching from dense to sparse embeddings (e.g., TF-IDF or BM25) would likely worsen the semantic gap, as sparse vectors rely on exact keyword matches and cannot capture semantic similarity. Option C is wrong because re-ranking models operate after initial retrieval and do not fix the root cause of missing relevant documents in the first retrieval stage. Option D is wrong because increasing chunk overlap only provides more context within each chunk but does not bridge the lexical/semantic mismatch between query phrasing and document content.

895
Multi-Selecthard

Which THREE factors should be considered when designing a vector search index for a RAG application that supports multiple languages?

Select 3 answers
A.Implement language identification as a preprocessing step.
B.Create separate vector indexes for each language.
C.Use a multilingual embedding model that supports all required languages.
D.Configure language-specific text analyzers for preprocessing documents.
E.Use larger chunk sizes for languages with complex morphology.
AnswersA, C, D

Allows proper analyzer selection.

Why this answer

Option A is correct because language identification as a preprocessing step ensures that documents are correctly tagged before indexing, which allows the system to apply appropriate language-specific tokenization, stop-word removal, and stemming. This prevents cross-language contamination in the vector index and improves retrieval accuracy for a multilingual RAG application.

Exam trap

Oracle often tests the misconception that separate indexes per language are required for multilingual support, but the correct approach is to use a single index with a multilingual embedding model and language-specific preprocessing.

896
MCQhard

An organization needs to implement a RAG application with Oracle AI Vector Search but has strict latency requirements. They have millions of vectors. Which index type is likely to provide the best search speed while maintaining reasonable recall?

A.No index, relying on the VECTOR data type only
B.IVF (Inverted File) index
C.BTREE index
D.Exhaustive search (no index)
AnswerB

IVF uses clustering to limit search to a subset of vectors, offering a good trade-off between speed and recall.

Why this answer

IVF (Inverted File) partitions the vector space into clusters, reducing search scope. It typically offers faster search than exhaustive search and good recall, especially for large datasets. HNSW may also be fast but can have higher memory usage.

BTREE is for scalar data. Exhaustive search is too slow.

897
Multi-Selectmedium

Which TWO factors should be considered when selecting a base model for fine-tuning on OCI Generative AI service?

Select 2 answers
A.The model's training dataset size
B.The model's size and number of parameters
C.The model's license and terms of use
D.The model's training framework (PyTorch vs TensorFlow)
E.The model's built-in features like content filtering
AnswersB, C

Larger models consume more resources and cost more to serve.

Why this answer

When selecting a base model for fine-tuning on OCI Generative AI service, the model's size and number of parameters (B) directly impact computational cost, training time, and the model's capacity to learn from your dataset. The model's license and terms of use (C) are critical because commercial use, redistribution, and fine-tuning rights vary per model (e.g., Llama 2 vs. GPT-based models), and violating these can lead to legal or compliance issues.

Exam trap

Oracle often tests the misconception that technical details like training framework or dataset size are relevant, when in fact the exam focuses on operational and legal factors (size/license) that directly affect deployment and compliance in OCI's managed service.

898
MCQmedium

A data scientist uses OCI Generative AI Playground to test a Cohere Command R model for a summarization task. They want the summary to be concise and avoid repeating phrases. Which parameter adjustments would BEST achieve this?

A.Set temperature to 0.5 and max tokens to 50
B.Set temperature to 0.0 and frequency penalty to 0.0
C.Set temperature to 0.2 and frequency penalty to 0.8
D.Set temperature to 1.0 and presence penalty to 0.0
AnswerC

Low temperature for concise output, high frequency penalty to avoid repetition.

Why this answer

Decreasing temperature makes output more deterministic; increasing frequency penalty discourages repetition of phrases.

899
Multi-Selectmedium

A developer is building a conversational AI application using LangChain and needs to persist chat history across sessions. Which TWO approaches can they use? (Choose TWO.)

Select 2 answers
A.Use the agent's memory parameter with a default in-memory store
B.Enable streaming responses to automatically save history
C.Use ChatMessageHistory without a backing store
D.Use ConversationSummaryMemory and store the summary in a file
E.Use ConversationBufferMemory and save the buffer to a database
AnswersD, E

SummaryMemory keeps a running summary; persisting the summary file allows restoring history.

Why this answer

Option D is correct because ConversationSummaryMemory can be persisted by storing its summary in a file, which allows chat history to survive across sessions. Option E is correct because ConversationBufferMemory can be explicitly saved to a database, providing durable storage for the conversation buffer. Both approaches decouple memory from the in-memory lifecycle, enabling cross-session persistence.

Exam trap

Cisco often tests the misconception that any memory parameter or streaming feature inherently provides persistence, when in fact persistence requires an explicit storage backend such as a file, database, or external key-value store.

900
Multi-Selecthard

Which THREE steps are necessary to secure access to the OCI Generative AI inference API in a production environment?

Select 3 answers
A.Enable encryption with OCI Vault keys for all inference data.
B.Configure network security groups to allow only trusted source IPs to the inference endpoint.
C.Create IAM policies that grant the 'use' verb on generative-ai-family resources.
D.Use private endpoints to access the Generative AI service from a VCN.
E.Apply data masking policies to obfuscate sensitive information in prompts.
AnswersB, C, D

NSGs provide network-level security.

Why this answer

Option B is correct because network security groups (NSGs) allow you to restrict inbound traffic to the Generative AI inference endpoint to only trusted source IP addresses, reducing the attack surface. In a production environment, this is a fundamental network-layer security control to prevent unauthorized access to the API.

Exam trap

Oracle often tests the distinction between network-layer controls (NSGs, private endpoints) and data-layer controls (encryption, masking), expecting candidates to recognize that securing API access requires network and IAM controls, not data protection features.

Page 11

Page 12 of 14

Page 13
Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 1Z0-1127 Questions 826–900 | Page 12/14 | Courseiva