Knowledge + Practice

Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 226–300

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 4 of 7

226

MCQhard

A data science team is using OCI Data Science to fine-tune a model. They notice that training jobs are failing due to out-of-memory errors on the notebook session. What should they do to resolve this?

A.Enable autoscaling on the notebook session.

B.Use OCI Data Flow instead.

C.Switch to a larger notebook session shape.

D.Reduce the batch size in the training script.

AnswerC

A larger shape provides more memory, resolving OOM issues.

Why this answer

Out-of-memory errors during training on a notebook session indicate that the current shape's memory capacity is insufficient for the model or data being processed. Switching to a larger notebook session shape directly increases available RAM and compute resources, resolving the memory constraint without altering the training logic or infrastructure type.

Exam trap

Oracle often tests the misconception that autoscaling or reducing batch size can fix memory issues in a single-node notebook session, but the correct approach is to match the compute shape to the workload's memory requirements.

How to eliminate wrong answers

Option A is wrong because autoscaling adjusts the number of compute instances horizontally, not the memory of a single notebook session; it does not prevent OOM errors caused by insufficient per-instance memory. Option B is wrong because OCI Data Flow is a serverless Spark-based service for big data processing, not designed for fine-tuning deep learning models, and migrating would require rewriting the training pipeline. Option D is wrong because reducing batch size can mitigate memory usage but does not address the root cause of an undersized notebook session shape; it may also degrade training convergence or performance, and the question asks for a resolution to the failing jobs, not a workaround.

Full explanation →

227

MCQmedium

An LLM-based application must comply with data privacy regulations by not memorizing personally identifiable information (PII). Which technique best reduces memorization of PII?

A.Use a larger model with more parameters

B.Decrease the temperature during inference

C.Train with differential privacy

D.Increase the number of training epochs

AnswerC

Differential privacy bounds the influence of any single data point, reducing memorization.

Why this answer

Differential privacy (DP) is the correct technique because it directly limits the model's ability to memorize training data, including PII, by adding calibrated noise to the gradient updates during training. This ensures that the model's parameters do not encode specific individual records, providing a formal mathematical guarantee against memorization. Other options like model size, temperature, or training epochs do not address the root cause of memorization in the training process.

Exam trap

Oracle often tests the misconception that inference-time parameters like temperature or model size affect training data memorization, when in fact memorization is a training-phase phenomenon that must be addressed during training itself.

How to eliminate wrong answers

Option A is wrong because increasing model parameters generally increases the model's capacity to memorize training data, making PII leakage more likely, not less. Option B is wrong because temperature controls the randomness of output token sampling during inference, not the memorization of training data; it has no effect on whether PII is stored in the model weights. Option D is wrong because increasing training epochs typically leads to overfitting and greater memorization of training examples, including PII, as the model sees the data more times.

Full explanation →

228

MCQhard

A financial services company has deployed a custom fine-tuned model using OCI Generative AI service on a dedicated AI cluster for automated report generation. They use a Python application that sends prompts via the OCI SDK. Recently, they started seeing 429 Too Many Requests errors intermittently. The dedicated cluster has 2 replicas and the application is making about 100 requests per second. The cluster's documented throughput is 50 requests per second per replica. The company has not set up any throttling limits. What is the most likely cause of the 429 errors?

A.The application is exceeding the cluster's replica capacity.

B.The OCI SDK version is outdated.

C.The API calls are not authenticated.

D.The model's context window is too small for the prompts.

AnswerA

The cluster is at maximum throughput; bursts push over the limit.

Why this answer

Option B is correct because the application's request rate (100 requests/second) matches the cluster's total capacity (2 replicas x 50 = 100). Any burst can exceed capacity, causing 429 errors. Other options would produce different error codes.

Full explanation →

229

MCQhard

A financial services company is concerned about data privacy when using OCI Generative AI service for processing sensitive customer data. They want to ensure that their data is not used to improve the model and is encrypted at rest and in transit. Which combination of OCI features should they implement?

A.Use OCI Object Storage buckets with encryption to store prompts and responses

B.Provision a dedicated endpoint, configure data privacy opt-out, and use OCI Vault for encryption keys

C.Deploy the model in an OCI Data Science project with a private endpoint

D.Use the on-demand API with default encryption

AnswerB

Dedicated endpoints allow data isolation; Vault provides customer-managed keys; privacy opt-out prevents use for training.

Why this answer

Option B is correct because it addresses all three requirements: a dedicated endpoint ensures network isolation and encryption in transit, data privacy opt-out prevents OCI from using customer data for model improvement, and OCI Vault integration allows customers to manage their own encryption keys for data at rest, meeting the financial services company's strict data privacy and encryption needs.

Exam trap

Oracle often tests the misconception that encryption alone (at rest or in transit) is sufficient for data privacy, when in fact the critical requirement for preventing model improvement is the explicit data privacy opt-out mechanism, which is a separate control from encryption.

How to eliminate wrong answers

Option A is wrong because OCI Object Storage encryption only protects data at rest, not data in transit, and it does not prevent OCI from using prompts and responses for model training or improvement. Option C is wrong because deploying a model in an OCI Data Science project with a private endpoint provides network isolation but does not include a data privacy opt-out mechanism to prevent data from being used for model improvement, nor does it inherently enforce customer-managed encryption keys for the Generative AI service. Option D is wrong because the on-demand API with default encryption uses OCI-managed keys, which does not give the customer control over encryption keys, and it does not include a data privacy opt-out to prevent data usage for model improvement.

Full explanation →

230

MCQmedium

A company is deploying a RAG system for internal document search using OCI OpenSearch as the vector store. Users report that queries about recent policy changes return no results, even though the new policies were ingested. Which configuration is most likely missing?

A.The query should use a hybrid search combining keyword and vector.

B.The embeddings must be normalized before indexing.

C.The vector search index must have a refresh interval set to immediate.

D.The ingestion pipeline should use a text-splitting chunker.

AnswerC

Without immediate refresh, new documents may not be visible in search results.

Why this answer

Option A is correct because if the vector search index's refresh interval is not set to immediate, new documents may not be immediately searchable. Option B is wrong because chunking is for ingestion, not search availability. Option C is about hybrid search, which improves relevance but not availability.

Option D is not required for basic search functionality.

Full explanation →

231

Multi-Selecthard

A troubleshooting scenario: A RAG system returns no results for certain queries. The index exists and has documents. Which TWO are likely causes?

Select 2 answers

A.The search algorithm is set to exact (brute-force) and index is small

B.The query embedding dimension does not match the index dimension

C.The query is too long for the embedding model

D.The index has not been refreshed after adding new documents

E.The embedding model used for indexing is different from the one used for query

AnswersB, E

Dimension mismatch causes search errors or zero results.

Why this answer

Mismatched embedding models or dimension differences prevent correct searches. Query length is not an issue (model truncates). Exact search still returns results.

Index refresh may be delayed but unlikely for no results.

Full explanation →

232

MCQeasy

Which OCI Generative AI parameter controls the diversity of generated text by increasing the probability of less likely tokens?

A.frequency_penalty

B.top_k

C.temperature

D.top_p

AnswerC

Higher temperature increases randomness by scaling logits before softmax.

Why this answer

Option C is correct: Temperature above 1 increases randomness. Option A (top_p) is nucleus sampling, option B (top_k) limits to top K tokens, option D (frequency_penalty) reduces repetition.

Full explanation →

233

MCQmedium

During a RAG implementation, the response quality degrades because the LLM receives too many irrelevant document chunks. Which technique can best filter out irrelevant chunks before sending them to the LLM?

A.Use a larger LLM for generation, hoping it ignores irrelevant chunks.

B.Reduce the top-k retrieval count.

C.Implement a reranking step using a cross-encoder model.

D.Increase the chunk size to provide more context.

AnswerC

Reranking with a cross-encoder (e.g., Cohere rerank) reorders chunks by relevance to the query, filtering out irrelevant ones.

Why this answer

Option D is correct because reranking with a cross-encoder is a common post-retrieval step to improve relevance. Option A is wrong because increasing chunk size may include more noise. Option B is wrong because using a larger LLM does not filter irrelevant chunks.

Option C is wrong because reducing top-k lowers chance of including relevant ones too.

Full explanation →

234

MCQmedium

A company wants to deploy a private instance of a large language model on OCI for sensitive data processing. What is the recommended approach?

A.Use OCI Data Science with a publicly accessible model.

B.Use the OCI Generative AI public endpoint with data encryption.

C.Use OCI Dedicated AI Clusters for a private endpoint.

D.Use third-party model hosting outside OCI.

AnswerC

Dedicated AI Clusters offer isolated compute with private networking, meeting security and compliance needs.

Why this answer

Option C is correct because OCI Dedicated AI Clusters provide a fully isolated, private endpoint for deploying large language models, ensuring that sensitive data never traverses the public internet. This approach meets the requirement for private inference with no data leaving the customer's tenancy, unlike public endpoints or third-party hosting.

Exam trap

Oracle often tests the misconception that encryption alone (Option B) is sufficient for private deployment, but the trap is that encryption does not eliminate the need for network isolation when processing sensitive data in a shared infrastructure environment.

How to eliminate wrong answers

Option A is wrong because OCI Data Science with a publicly accessible model exposes the model endpoint to the internet, violating the requirement for private, sensitive data processing. Option B is wrong because the OCI Generative AI public endpoint, even with data encryption, still routes traffic through OCI's shared infrastructure and public IP space, which does not guarantee the isolation needed for sensitive data. Option D is wrong because third-party model hosting outside OCI would require data to leave the OCI tenancy, breaking the requirement for a private deployment within OCI.

Full explanation →

235

Multi-Selecthard

Which THREE are known challenges when deploying large language models in production?

Select 3 answers

A.Bias in training data perpetuating stereotypes

B.High computational cost for inference

C.Hallucination of plausible but incorrect information

D.Fast inference speed due to parallelization

E.Low memory footprint

AnswersA, B, C

Models can reflect and amplify biases from training data.

Why this answer

Option A is correct because large language models (LLMs) are trained on vast, unfiltered internet text corpora that inherently contain societal biases. These biases are learned and can be amplified during inference, leading to outputs that perpetuate harmful stereotypes, which is a well-documented production challenge.

Exam trap

Oracle often tests the distinction between known challenges (bias, cost, hallucination) and desirable properties (fast inference, low memory) that are actually false for LLMs, trapping candidates who confuse optimization goals with current limitations.

Full explanation →

236

MCQmedium

You are a data scientist at a legal firm. The firm uses OCR to digitize court documents and then indexes them in OCI OpenSearch for a RAG application. The application uses OCI Generative AI Service (Cohere Command) to answer questions about case law. Recently, the team noticed that the answers are often factually incorrect or include information not present in the retrieved documents. After reviewing the pipeline, you find that the chunking strategy splits documents into 512-token chunks with 128-token overlap. The embedding model is Cohere Embed v3 (English), and the retrieval returns the top 5 chunks. The LLM has a context window of 4096 tokens. The team suspects that the chunking strategy is causing loss of context. What is the best course of action to improve answer accuracy?

A.Increase the chunk size to 1024 tokens and overlap to 256 tokens.

B.Reduce the chunk overlap to 64 tokens to avoid redundancy.

C.Switch to a smaller LLM with a larger context window.

D.Increase the number of retrieved chunks from 5 to 10.

AnswerA

Larger chunks with more overlap preserve context better.

Why this answer

Increasing the chunk size to 1024 tokens and overlap to 256 tokens directly addresses the loss of context by ensuring each chunk contains more complete semantic units (e.g., entire paragraphs or legal arguments) while the larger overlap preserves continuity across chunk boundaries. This improves the quality of the embeddings and the relevance of retrieved chunks, leading to more factually accurate answers from the LLM.

Exam trap

The trap here is that candidates may assume increasing retrieval count (Option D) always improves accuracy, but in RAG systems, more chunks often introduce noise and dilute relevant context, whereas fixing the chunking strategy directly addresses the root cause of context loss.

How to eliminate wrong answers

Option B is wrong because reducing the overlap to 64 tokens would further fragment context, increasing the risk of missing critical information at chunk boundaries and worsening the factual inaccuracies. Option C is wrong because switching to a smaller LLM with a larger context window does not fix the root cause—poor chunking—and a smaller model may have lower reasoning capability, potentially degrading answer quality. Option D is wrong because increasing the number of retrieved chunks from 5 to 10 would introduce more noise and irrelevant content into the LLM's context, likely amplifying hallucinations rather than improving accuracy.

Full explanation →

237

MCQeasy

Your team has deployed a fine-tuned GPT-2 model on OCI Model Deployment for a simple text generation API. The model performs text completion for short prompts (e.g., 50 tokens). The endpoint is working but response times are over 10 seconds for these short prompts. The model size is approximately 500MB and you used a VM.Standard.E3.Flex shape (2 OCPU, 16GB RAM). The deployment is in a single replica with no autoscaling. You have verified that the network latency is minimal (<5ms). The model was trained in OCI Data Science using a GPU shape, but during deployment you selected a CPU shape to reduce cost. The model is a transformer-based neural network. You've also confirmed that the deployment is healthy and there are no errors in the logs. The memory usage is within limits. What is the most likely cause of the high latency?

A.High network latency between the client and the model endpoint

B.Model is too large for the VM.Standard.E3.Flex shape

C.Insufficient CPU resources for the model size

D.Missing GPU acceleration for inference

AnswerD

GPU acceleration is essential for fast inference on neural network models like GPT-2.

Why this answer

Option D is correct because GPT-2 is a transformer-based neural network that relies heavily on matrix multiplications, which are far more efficiently executed on GPUs due to their parallel architecture. Even though the model is only 500MB, CPU inference for transformer models is notoriously slow because CPUs process sequential operations, while GPUs can parallelize the attention mechanism and feed-forward layers. The 10-second latency for a 50-token prompt is a classic symptom of missing GPU acceleration, as the CPU shape (2 OCPU) lacks the specialized tensor cores needed for fast transformer inference.

Exam trap

The trap here is that candidates might assume a 500MB model is 'small enough' for CPU inference, overlooking that transformer architecture—not model size—is the primary driver of latency, and that GPU acceleration is essential even for moderately sized transformer models.

How to eliminate wrong answers

Option A is wrong because the problem states that network latency is minimal (<5ms), so high network latency is not the cause. Option B is wrong because the model size of 500MB fits comfortably within the 16GB RAM of the VM.Standard.E3.Flex shape, and memory usage is confirmed to be within limits. Option C is wrong because while CPU resources are limited (2 OCPU), the core issue is not insufficient CPU resources per se, but rather that CPUs are architecturally unsuited for the parallel computations required by transformer models; even with more CPU cores, inference would still be significantly slower than with a GPU.

Full explanation →

238

MCQmedium

An OCI CLI command above returns embeddings for the phrase 'Hello world'. The developer notices that the embedding vector length is 384 dimensions. However, they expected 768 dimensions. What is the most likely cause?

A.The input text 'Hello world' is too short, causing dimension reduction.

B.The CLI result is truncated in the display.

C.The model 'cohere.embed-multilingual-light-v3.0' outputs 384-dimensional vectors.

D.The --truncate END flag reduces the dimension.

AnswerC

This specific model produces 384 dimensions; the 'light' version is smaller.

Why this answer

Option B is correct because cohere.embed-multilingual-light-v3.0 outputs 384-dimensional embeddings by default, while the 'v3' version outputs 1024. Option A is wrong because the flag does not affect dimension. Option C is wrong because truncate mode does not change dimension.

Option D is wrong because input length is irrelevant.

Full explanation →

239

MCQeasy

A user sends an inference request with the JSON parameters shown. They notice the model is returning very short responses. What is the most likely cause?

A.maxTokens is set too high

B.topP is set too high

C.The modelId is incorrect

D.temperature is set too low

AnswerD

Low temperature reduces randomness, often leading to shorter, safer outputs.

Why this answer

Option D is correct because a low temperature value (close to 0) makes the model highly deterministic, reducing randomness and often leading to shorter, more conservative responses. In generative AI, temperature controls the probability distribution over tokens; lower values cause the model to favor the most likely tokens, which can result in repetitive or truncated outputs. The user's inference request likely includes a temperature setting that is too low, causing the model to produce very short responses.

Exam trap

Oracle often tests the misconception that maxTokens or topP control response length directly, when in fact temperature has a more subtle effect on output length by influencing token diversity and repetition.

How to eliminate wrong answers

Option A is wrong because maxTokens sets the maximum number of tokens the model can generate; setting it too high would allow longer responses, not shorter ones. Option B is wrong because topP (nucleus sampling) controls the cumulative probability threshold for token selection; setting it too high would include more diverse tokens, potentially leading to longer or more varied responses, not shorter ones. Option C is wrong because an incorrect modelId would typically cause an error or unexpected behavior (e.g., model not found), not consistently produce very short responses.

Full explanation →

240

MCQeasy

Refer to the exhibit. A user runs this command and gets an error: 'InvalidParameter: unit-shape 'GPU.1' is not supported in this region. Supported shapes: GPU.2, GPU.4'. What should the user do?

A.Use unit-shape GPU.2

B.Increase unit count

C.Change compartment

D.Use a different region

AnswerA

GPU.2 is explicitly supported in the error message.

Why this answer

Option B is correct because the error indicates GPU.1 is unsupported, and GPU.2 is listed as supported. Changing region (A) may work but is unnecessary; increasing unit count (C) or changing compartment (D) will not resolve the shape issue.

Full explanation →

241

MCQmedium

A company is fine-tuning a Llama model on OCI with dedicated AI cluster. They want to use their own training data stored in Oracle Object Storage. What must they do to ensure the fine-tuning job can access the data?

A.Upload the data to the dedicated cluster's local storage.

B.Use OCI Data Flow to transfer data.

C.Configure a service principal for the cluster to read the bucket.

D.Create a pre-authenticated request for the bucket.

AnswerC

A service principal grants the cluster permissions to access resources like Object Storage.

Why this answer

Option D is correct because the dedicated AI cluster needs a service principal (resource principal) with permissions to read the Object Storage bucket. Other options are not valid for controlled access.

Full explanation →

242

MCQeasy

A developer is using OCI Generative AI service's CLI to generate text but gets a 'Rate limit exceeded' error. What is the most likely cause?

A.The number of requests per minute exceeds the allowed quota.

B.The API key is invalid.

C.The model is not available in that region.

D.The input text is too long.

AnswerA

Rate limiting is enforced to control throughput.

Why this answer

Rate limit exceeded indicates the request quota per minute has been reached.

Full explanation →

243

MCQhard

A development team wants to generate code snippets from natural language. Which model strategy should they adopt?

A.Use a code-specific model like Code Llama.

B.Use a general-purpose LLM like Llama 2.

C.Use a multimodal model.

D.Use an embedding model for text.

AnswerA

Correct: Code-specific models are fine-tuned for code generation.

Why this answer

Code Llama is a specialized variant of Llama 2 that has been fine-tuned on code datasets, enabling it to generate syntactically and semantically correct code from natural language prompts. This makes it the optimal choice for code generation tasks, as general-purpose LLMs lack the targeted training on code structures and programming languages.

Exam trap

Oracle often tests the distinction between general-purpose and domain-specific models, and the trap here is that candidates assume any large language model can handle code generation equally well, overlooking the critical fine-tuning on code corpora that makes Code Llama superior for this task.

How to eliminate wrong answers

Option B is wrong because a general-purpose LLM like Llama 2 is trained on diverse text but not specifically optimized for code, leading to higher rates of syntax errors and logical inconsistencies in generated code. Option C is wrong because multimodal models process images, audio, and text, but code generation from natural language does not require multiple modalities and would add unnecessary complexity without improving code quality. Option D is wrong because embedding models are designed to convert text into vector representations for similarity search or clustering, not for generating new text or code snippets.

Full explanation →

244

Multi-Selectmedium

Which TWO configuration steps are required to enable OCI Generative AI service in a tenancy? (Select two.)

Select 2 answers

A.Enable the generative AI service in the console.

B.Set up a VCN with service gateway.

C.Create a dynamic group.

D.Subscribe to the service in the tenancy.

E.Create a policy to allow access.

AnswersD, E

Subscription enables the service for the tenancy.

Why this answer

You must subscribe to the service in the tenancy and then create a policy to allow access.

Full explanation →

245

Multi-Selecthard

Which THREE factors directly influence the quality of responses in a RAG system? (Choose three.)

Select 3 answers

A.The prompt template used to ask the LLM

B.The chunk size used during document processing

C.The temperature parameter of the LLM

D.The number of GPUs allocated to the LLM

E.The choice of embedding model

AnswersA, B, E

A well-structured prompt helps the LLM use the context properly.

Why this answer

The choice of embedding model affects how well semantics are captured, chunk size determines granularity of retrieval, and prompt engineering guides the LLM to use context effectively.

Full explanation →

246

MCQmedium

Refer to the exhibit. A developer ran the OCI CLI command shown and received the JSON output. What does the output indicate about the model's confidence and why?

A.The model is uncertain because all scores are roughly equal.

B.The model is neutral because the neutral score is lowest.

C.The model is unsure because the scores are probabilities that sum to 1.

D.The model is highly confident the text is positive, as indicated by the 0.98 score.

AnswerD

0.98 is very close to 1, indicating high confidence.

Why this answer

Option D is correct because the JSON output shows a sentiment score of 0.98 for 'positive', which is very close to 1.0, indicating the model is highly confident that the text is positive. In sentiment analysis models, scores represent probabilities for each class, and a value near 1.0 for one class with much lower scores for others reflects strong confidence.

Exam trap

Oracle often tests the distinction between the sum of probabilities equaling 1 (a mathematical property) and the actual confidence level indicated by the distribution of those probabilities, leading candidates to mistakenly choose option C.

How to eliminate wrong answers

Option A is wrong because the scores are not roughly equal; the positive score (0.98) is significantly higher than the negative (0.01) and neutral (0.01) scores, indicating high confidence, not uncertainty. Option B is wrong because the neutral score being lowest does not imply neutrality; the model is highly confident the text is positive, not neutral. Option C is wrong because while the scores do sum to 1 (as probabilities should), this fact alone does not indicate uncertainty; the distribution of probabilities matters, and here the high positive score shows confidence.

Full explanation →

247

MCQeasy

A developer is using OCI Data Science to create a RAG pipeline. They have ingested documents into a vector store using OCI Generative AI's text-embedding model. During testing, they notice that queries return very few results (often 0 or 1) even when the knowledge base contains relevant documents. They have set the top-k parameter to 10. What is the most likely cause?

A.The similarity threshold is set too high, filtering out most results.

B.The documents were chunked with too small a chunk size, losing key information.

C.The embedding model's dimensionality is too low to capture semantic differences.

D.The vector search index is not configured with the correct distance metric.

AnswerA

A threshold that is too strict reduces the number of retrieved chunks.

Why this answer

Option B is correct because a high similarity threshold (e.g., >0.9) can exclude many relevant results. Option A: dimensionality is fixed by the model. Option C: distance metric affects ranking but not count.

Option D: chunk size may affect quality but not count.

Full explanation →

248

MCQhard

An application mixes RAG with other data sources. The vector search returns too many irrelevant chunks. What is the best approach to filter them?

A.Use a reranker model

B.Use exact search instead of ANN

C.Reduce the number of retrieved chunks

D.Increase chunk size

AnswerA

A reranker scores retrieved chunks by relevance, filtering out irrelevant ones.

Why this answer

A reranker model (Option A) is the best approach because it takes the initial set of retrieved chunks and re-orders them based on semantic relevance to the query, effectively filtering out irrelevant chunks. Unlike simple vector similarity, a reranker uses cross-encoding to evaluate the query-chunk pair as a whole, which significantly improves precision when mixing RAG with other data sources.

Exam trap

Oracle often tests the misconception that reducing the number of retrieved chunks (Option C) is a valid filter, but the trap is that this only limits output size without improving relevance—reranking is the correct technique to reorder and discard irrelevant results.

How to eliminate wrong answers

Option B is wrong because exact search (e.g., brute-force k-NN) retrieves the same chunks as ANN but without approximation; it does not filter irrelevant chunks—it only guarantees the true nearest neighbors, which may still be irrelevant if the vector representation is poor. Option C is wrong because reducing the number of retrieved chunks (e.g., lowering top_k) risks missing relevant chunks and does not address the core problem of irrelevant chunks being ranked too high. Option D is wrong because increasing chunk size makes each chunk more likely to contain irrelevant content, potentially worsening the problem by diluting relevant information with noise.

Full explanation →

249

MCQmedium

A healthcare startup uses OCI Generative AI to automatically generate patient summary reports from clinical notes. They use the Cohere command model (command-r-plus) with default parameters. Over the past week, the team has noticed two issues: (1) the summaries occasionally contain medical inaccuracies, such as incorrect drug dosages or misinterpreted lab results, and (2) the response time has increased from an average of 2 seconds to over 10 seconds. The application has a high volume of concurrent requests, and the startup has already increased the max tokens to 4096 and set temperature to 0.1. The model appears to perform well on general language tasks but struggles with specialized medical terminology. The team is looking for a long-term solution that balances accuracy, latency, and cost. What should they do?

A.Switch to a larger base model, such as Llama 2 70B, to improve knowledge of medical terms

B.Fine-tune the Cohere command model using a curated dataset of medical notes and correct summaries

C.Implement a caching layer to store and reuse responses for identical queries

D.Reduce the max tokens parameter from 4096 to 1024 to decrease response time

AnswerB

Fine-tuning adapts the model to the medical domain, improving accuracy for the specific use case. This also reduces the need for high token counts, indirectly improving latency.

Why this answer

Option B is the best course of action. Fine-tuning the model on a curated medical dataset will improve accuracy for domain-specific terminology and context. This addresses the root cause of inaccuracies.

While fine-tuning itself is time-consuming, it reduces latency in the long run because the model becomes more efficient for the specific task. Option A (reducing max tokens) might improve latency but won't fix inaccuracies. Option C (switching to a larger model) may increase latency and cost without guaranteeing improvement on medical terms.

Option D (caching) does not address inaccurate responses for new queries.

Full explanation →

250

MCQeasy

A developer wants to implement a simple RAG pipeline using OCI Language's text generation and embedding models. Which OCI SDK method is used to generate embeddings for a text chunk?

A.embed_text

B.generate_embeddings

C.encode_text

D.create_embedding

AnswerA

`embed_text` is the correct method to call for generating embeddings from text.

Why this answer

The OCI Python SDK for AI Language provides the `embed_text` method to generate embeddings. Other names like `create_embedding` or `generate_embeddings` are not standard in OCI SDK.

Full explanation →

251

Multi-Selectmedium

Which TWO of the following are best practices when indexing documents for a RAG application using OCI OpenSearch?

Select 2 answers

A.Use the same chunk size for all documents regardless of content type.

B.Apply stemming to reduce vocabulary size.

C.Enable chunk overlap to avoid splitting important information across chunks.

D.Remove all stop words from the text before embedding.

E.Store metadata (e.g., source URL, page number) alongside the vector.

AnswersC, E

Overlap ensures that boundaries don't cut off context, improving retrieval.

Why this answer

Options A and D are correct. Using overlapping chunks prevents information loss at boundaries (A). Storing metadata like source document helps with traceability (D).

Option B is wrong because stop words removal may harm retrieval for queries containing common words. Option C is wrong because stemming can lose precision in semantic search when using embeddings.

Full explanation →

252

MCQhard

An IAM policy is shown in the exhibit. A user reports that they cannot call the OCI GenAI embedding API, but they can use OCI AI Language. Which policy statement is missing to allow embedding API access?

A.The policy needs 'use' action on 'oci-generative-ai-family'

B.The policy needs 'inspect' on 'oci-generative-ai-endpoint'

C.The policy needs 'manage' action on 'oci-generative-ai-family'

D.The policy needs 'inspect' on 'oci-ai-language-family'

AnswerA

Missing 'use' permission for GenAI.

Why this answer

The embedding API requires the 'use' permission on the 'oci-generative-ai-family'. The first statement only grants 'inspect', which allows listing but not using. The second grants 'use' on AI Language, not GenAI.

Adding 'use' action to the first statement would fix the issue.

Full explanation →

253

MCQmedium

A data scientist is using OCI Data Science with the Generative AI service to fine-tune a Cohere Command model on a custom dataset of customer support tickets. After training, the model produces poor, irrelevant responses. What is the most likely cause?

A.Incorrect tokenizer configuration

B.Insufficient training data quality or quantity

C.Too many epochs causing overfitting

D.Model architecture mismatch between fine-tuned and base model

AnswerB

Cohere models need clean, diverse, and task-relevant data; poor data leads to poor fine-tuning.

Why this answer

Insufficient training data quality or quantity is the most likely cause because fine-tuning a Cohere Command model on a custom dataset of customer support tickets requires a sufficiently large and representative dataset to teach the model domain-specific patterns. If the dataset is too small, noisy, or lacks diversity, the model will fail to generalize and produce irrelevant responses, even with correct tokenization and training hyperparameters.

Exam trap

Oracle often tests the misconception that overfitting (Option C) is the primary cause of poor model output after fine-tuning, but in this scenario the irrelevance points to data insufficiency rather than memorization of training examples.

How to eliminate wrong answers

Option A is wrong because incorrect tokenizer configuration would typically cause tokenization errors or mismatched vocabulary, not poor semantic relevance; the Cohere Command model uses a fixed tokenizer that is automatically applied during fine-tuning in OCI Data Science. Option C is wrong because too many epochs causing overfitting would result in the model memorizing training examples and producing overly specific or repetitive responses, not generally irrelevant ones; overfitting typically degrades performance on unseen data but does not cause broad irrelevance. Option D is wrong because model architecture mismatch between fine-tuned and base model is not possible in OCI Data Science's Generative AI service, as the fine-tuning process uses the same architecture as the base model; the service enforces compatibility.

Full explanation →

254

Multi-Selectmedium

Which TWO of the following are best practices when implementing a RAG application using OCI OpenSearch as a vector store?

Select 2 answers

A.Use a large embedding dimension (e.g., 1536) to improve accuracy.

B.Set index.number_of_replicas to 0 to speed up indexing.

C.Enable approximate nearest neighbor (ANN) search for large datasets.

D.Store the embedding vectors in the _source field to simplify retrieval.

E.Use cosine similarity as the distance metric for vector comparison.

AnswersC, E

ANN search significantly reduces query latency for large vector collections.

Why this answer

Cosine similarity (A) is the recommended distance metric for text embeddings. ANN search (E) is essential for scaling to large datasets. Storing embeddings in _source (B) is unnecessary and increases index size.

Larger dimensions (C) can degrade performance without guaranteed accuracy improvement. Setting replicas to 0 (D) risks data loss and is not production-ready.

Full explanation →

255

MCQeasy

A developer is building a RAG chatbot for an internal knowledge base. To ensure the system retrieves the most relevant chunks, what is the best practice for chunking?

A.Use very small chunks

B.Use semantic chunking with overlap

C.Use fixed-size chunks without overlap

D.Use random-sized chunks

AnswerB

Semantic chunking preserves natural boundaries, and overlap provides context continuity.

Why this answer

Semantic chunking with overlap ensures that context is preserved and retrieval is more accurate by avoiding splits in meaningful content.

Full explanation →

256

MCQeasy

A developer wants to use the OCI Generative AI service to generate text using a Cohere model. Which SDK class should be used for inference calls?

A.GenerativeAiInferenceClient

B.CustomModelClient

C.AiInferenceClient

D.GenerativeAiClient

AnswerA

This client provides methods like generate_text and embed_text for inference.

Why this answer

Option B is correct because the GenerativeAiInferenceClient is the dedicated client for inference operations in the OCI Python SDK. Option A is incorrect because GenerativeAiClient is for managing models and endpoints, not inference. Option C is incorrect because it's a generic term.

Option D is incorrect because it's for managing custom models.

Full explanation →

257

MCQeasy

The architecture shown in the exhibit is missing a critical component for a RAG pipeline. What step is missing between receiving the user query and searching the vector store?

A.A document chunking step

B.A query embedding step using an embedding model

C.A data masking step for privacy

D.A reranker step after retrieval

AnswerB

The query must be embedded for vector search.

Why this answer

In a RAG pipeline, the user query must be converted into a vector embedding before searching the vector store. The architecture directly passes the query to OpenSearch without embedding. OCI Functions likely performs orchestration but does not automatically embed the query.

Adding a call to an embedding model (e.g., Cohere Embed) is necessary.

Full explanation →

258

Multi-Selecthard

Which three characteristics of LLMs can lead to hallucinations? (Select THREE)

Select 3 answers

A.Overconfidence in predictions

B.Ability to generate plausible-sounding text

C.Lack of real-world grounding

D.Gaps in training data coverage

E.Large vocabulary size

AnswersB, C, D

Correct: Fluency can mask inaccuracies.

Why this answer

Option B is correct because LLMs are trained to generate text that is statistically plausible and coherent, but they lack mechanisms to verify factual accuracy. This means they can produce sentences that sound convincing and grammatically correct while being entirely false, which is a direct cause of hallucinations.

Exam trap

Oracle often tests the distinction between symptoms and root causes, so the trap here is that candidates might confuse 'overconfidence in predictions' (a symptom) with a direct cause of hallucinations, or mistakenly think 'large vocabulary size' contributes to hallucinations when it is merely an enabler of the model's generative capability.

Full explanation →

259

MCQeasy

A data scientist is using a large language model to summarize customer support tickets. The model occasionally generates summaries that include hallucinated details not present in the original ticket. Which technique would best reduce hallucinations while maintaining summary quality?

A.Implement retrieval-augmented generation (RAG) to ground the model in relevant documents.

B.Use a longer system prompt instructing the model to be factual.

C.Fine-tune the model on a large corpus of general text to improve its knowledge.

D.Increase the temperature parameter to 0.9 to encourage more deterministic outputs.

AnswerA

RAG provides factual context, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) reduces hallucinations by grounding the model's output in external, verifiable documents retrieved from a knowledge base. Instead of relying solely on the model's parametric memory, RAG fetches relevant context (e.g., the original ticket) at inference time, ensuring the summary is factually aligned with the source. This maintains summary quality because the model can still generate fluent text while being constrained to the retrieved evidence.

Exam trap

Oracle often tests the misconception that simply instructing the model to be factual (Option B) or fine-tuning (Option C) can eliminate hallucinations, when in reality grounding via retrieval (RAG) is the only technique that directly supplies external evidence to constrain generation.

How to eliminate wrong answers

Option B is wrong because a longer system prompt instructing the model to be factual does not provide new factual data; it only changes the model's behavior via instruction tuning, which cannot correct hallucinations stemming from missing or incorrect parametric knowledge. Option C is wrong because fine-tuning on a large corpus of general text would not specifically address hallucinations in customer support tickets; it might even dilute domain-specific accuracy and does not provide a retrieval mechanism to verify facts. Option D is wrong because increasing the temperature parameter to 0.9 actually increases randomness and creativity, making outputs less deterministic and more prone to hallucination, not less.

Full explanation →

260

MCQhard

A developer uses OCI Generative AI with a custom OCI OpenSearch vector store. The text generation model sometimes hallucinates facts not in the retrieved documents. What is the most effective mitigation?

A.Use a larger retrieval chunk size

B.Increase the temperature

C.Use prompt engineering to instruct the model to stick to the provided context

D.Decrease the maximum token length

AnswerC

Explicitly instructing the model to base answers only on the given context reduces hallucination.

Why this answer

Prompt engineering with clear instructions to use only the provided context is a direct and effective way to reduce hallucination.

Full explanation →

261

Multi-Selectmedium

Which THREE of the following are supported capabilities of OCI Generative AI Service?

Select 3 answers

A.Text summarization

B.Sentiment analysis

C.Image generation

D.Question answering

E.Code generation

AnswersA, D, E

Summarization is a core capability.

Why this answer

Option A is correct because OCI Generative AI Service includes a dedicated text summarization capability that uses large language models (LLMs) to generate concise summaries from longer documents. This feature is part of the service's core generative AI offerings, supporting use cases like meeting notes summarization and document abstraction.

Exam trap

Oracle often tests the distinction between OCI Generative AI Service (text generation only) and other OCI AI services (e.g., AI Language for sentiment analysis, Vision for image tasks), causing candidates to mistakenly attribute all AI capabilities to the generative service.

Full explanation →

262

MCQhard

A machine learning engineer evaluates OCI Generative AI for a real-time content generation application. They need to meet a SLAs of 99.9% availability. Which deployment architecture satisfies the requirement with the lowest cost?

A.Two dedicated AI clusters in different regions.

B.Two dedicated AI clusters in different availability domains.

C.Two dedicated AI clusters in the same availability domain.

D.Single dedicated AI cluster with a single replica.

AnswerB

Clusters in different ADs provide resilience against AD failures at moderate cost.

Why this answer

Option B is correct because deploying two dedicated AI clusters in different availability domains within a single region provides high availability (HA) to meet the 99.9% SLA while minimizing cost. OCI's dedicated AI clusters are regional resources, and placing replicas across availability domains protects against domain-level failures without the cross-region data transfer and egress costs incurred by multi-region deployments.

Exam trap

The trap here is that candidates often assume multi-region deployment is required for high availability, but OCI's 99.9% SLA can be achieved within a single region using availability domains, making the cross-region option unnecessarily expensive.

How to eliminate wrong answers

Option A is wrong because two dedicated AI clusters in different regions introduces unnecessary cross-region data transfer costs and higher latency, making it more expensive than a single-region HA solution. Option C is wrong because two dedicated AI clusters in the same availability domain does not protect against availability domain failures, thus failing to meet the 99.9% SLA requirement. Option D is wrong because a single dedicated AI cluster with a single replica provides no redundancy; any failure of that cluster or its underlying infrastructure would cause downtime, violating the 99.9% availability SLA.

Full explanation →

263

MCQmedium

Users report that inference requests to the OCI Generative AI service are taking longer than expected. The application uses the on-demand endpoint. What is the most likely cause of the increased latency?

A.The inference model is not fine-tuned for the use case.

B.The on-demand endpoint experiences shared resource contention.

C.The selected model is too large for the use case.

D.The API request timeout is set too low.

AnswerB

On-demand endpoints are multi-tenant; high concurrent usage can cause latency spikes.

Why this answer

On-demand endpoints share resources; during peak usage, resource contention increases latency. Dedicated AI clusters provide predictable performance.

Full explanation →

264

MCQeasy

A developer wants to deploy a custom generative AI model that was trained using OCI Data Science. Which service should they use to expose the model as an API endpoint?

A.OCI API Gateway

B.OCI Data Science Model Deployment

C.OCI Functions

D.OCI Generative AI service

AnswerB

This is purpose-built for deploying models as APIs.

Why this answer

B is correct because OCI Data Science Model Deployment is specifically designed to host and serve machine learning models as REST API endpoints. It directly deploys models trained in OCI Data Science, managing the underlying infrastructure, scaling, and providing a secure HTTPS endpoint for inference requests.

Exam trap

The trap here is that candidates confuse OCI Generative AI service (a managed service for pre-built models) with the ability to deploy custom models, or they assume API Gateway alone can serve a model without a backend compute service.

How to eliminate wrong answers

Option A is wrong because OCI API Gateway is a service for creating, managing, and securing API endpoints for backend services, but it does not host or run models; it would need a separate compute target like a model deployment behind it. Option C is wrong because OCI Functions is a serverless compute service for running stateless code snippets (functions) in response to events, not for hosting large generative AI models with persistent state and GPU requirements. Option D is wrong because OCI Generative AI service is a managed service that provides pre-built foundation models (like LLMs) from providers such as Cohere and Meta, not a platform to deploy custom models trained by the user.

Full explanation →

265

MCQeasy

A developer integrates OCI GenAI into a mobile app to provide product descriptions. The responses sometimes include explanations or questions instead of the requested format. The developer is using a simple prompt: 'Describe product X.' The app expects a single paragraph. Which corrective action should the developer take?

A.Add a structured prompt with format instructions and an example.

B.Lower the temperature to 0 to make responses deterministic.

C.Increase the max tokens to allow longer responses.

D.Switch to a different model with better language understanding.

AnswerA

Correct: Structured prompts effectively enforce output format.

Why this answer

Option B is correct because adding a structured prompt with format instructions and an example guides the model to output exactly as needed. Option A may increase irrelevant content, Option C may not fix the format issue, and Option D could make responses repetitive but still not enforce the format.

Full explanation →

266

MCQhard

A company uses OCI Data Science to fine-tune an embedding model for a specialized domain. After fine-tuning, the model produces embeddings that are not aligned with the vector index used in OCI OpenSearch. What is the most likely cause?

A.The fine-tuning process modified the model architecture

B.The embedding dimension changed after fine-tuning

C.The fine-tuning dataset was too small

D.The vector index was built using a different distance metric than used during fine-tuning

AnswerB

If fine-tuning added/removed layers or changed the output size, the embedding dimension differs, causing index incompatibility.

Why this answer

Fine-tuning may change the output dimension (e.g., if the model head is modified), causing dimension mismatch which makes existing vector indices unusable. Distance metric mismatch is less common as it's usually fixed during indexing. Architecture changes are unlikely for minor fine-tuning.

Small dataset affects quality, not dimension.

Full explanation →

267

MCQmedium

An enterprise is deploying a RAG application for compliance document analysis using OCI. They use OCI OpenSearch as the vector store and have millions of documents. Retrieval latency is critical. Currently, a single query takes over 2 seconds. The index uses a flat (brute-force) distance computation. They have considered using approximate nearest neighbor (ANN) algorithms but are unsure about the impact on recall. They need to reduce latency to under 500ms while maintaining high recall. What should they do?

A.Use a smaller embedding dimension by truncating the existing embeddings.

B.Reduce the number of shards in the OpenSearch index to improve parallelism.

C.Switch to an HNSW algorithm with an appropriate M and ef_search parameters.

D.Increase the top-k parameter to retrieve more candidates then filter.

AnswerC

HNSW provides sub-linear search time with good recall.

Why this answer

Option C is correct because switching to HNSW with appropriate parameters provides fast approximate search with configurable recall. Option A (reducing shards) may not achieve the required latency reduction. Option B (reducing dimensions) can degrade embedding quality.

Option D (increasing top-k) would increase latency.

Full explanation →

268

MCQhard

A healthcare company is using OCI Generative AI to analyze patient records and generate clinical summaries. The company must comply with HIPAA regulations, which require that all protected health information (PHI) be encrypted at rest and in transit, and that access be logged and audited. The current architecture uses an OCI Data Science model deployment with a public endpoint. The model is stored in an OCI Object Storage bucket that is publicly accessible for testing. The company is now moving to production. The compliance officer has flagged the following issues: (1) The model endpoint is publicly accessible. (2) The bucket containing the model is public. (3) No audit logs are enabled. The company wants to remediate these issues while maintaining the ability to invoke the model from on-premises applications via a secure connection. Which set of actions should the architect take?

A.Switch the model endpoint to a private subnet with a service gateway, change the bucket to be accessible only via pre-authenticated requests, and enable OCI Logging for the model deployment.

B.Keep the public endpoint but restrict access using IAM policies and source IP addresses, make the bucket private, and enable OCI Audit.

C.Switch the model endpoint to a private subnet with a service gateway, update the bucket policy to block all public access, enable OCI Audit service, and set up a VPN or FastConnect for on-premises access.

D.Use a public load balancer with SSL termination, restrict bucket access to the load balancer's OCID, and enable OCI Audit.

AnswerC

This ensures private endpoint, private bucket, audit logging, and secure on-premises connectivity.

Why this answer

Option C is correct because it addresses all three compliance issues: moving the model endpoint to a private subnet with a service gateway removes public exposure, making the bucket private with a policy that blocks all public access secures the model artifacts, and enabling OCI Audit provides the required logging. Additionally, setting up a VPN or FastConnect allows secure on-premises access without exposing the endpoint to the public internet, fully satisfying HIPAA encryption and audit requirements.

Exam trap

The trap here is that candidates often think IP restrictions or pre-authenticated requests are sufficient for HIPAA compliance, but HIPAA requires that PHI be encrypted at rest and in transit and that access be logged and audited—public endpoints and shared URLs violate the 'encryption in transit' and 'audit' requirements because they rely on internet-exposed paths and lack proper access controls.

How to eliminate wrong answers

Option A is wrong because pre-authenticated requests (PARs) still expose the bucket via a URL that can be shared, which does not meet HIPAA's requirement for access logging and audit; PARs are not a substitute for private bucket policies and audit logging. Option B is wrong because keeping the public endpoint even with IP restrictions is not sufficient for HIPAA compliance—public endpoints are inherently exposed to network-level attacks and do not satisfy the requirement for encryption at rest and in transit in a fully private manner; also, OCI Audit alone does not cover logging for the model deployment itself. Option D is wrong because a public load balancer with SSL termination still leaves the endpoint publicly accessible, and restricting bucket access to the load balancer's OCID does not prevent the bucket from being publicly listed or accessed via other paths; OCI Audit alone does not address the public endpoint issue.

Full explanation →

269

MCQhard

A company is using OCI GenAI with a Dedicated AI Cluster to serve a large language model for real-time chat applications. They notice high inference latency (average 2 seconds per response) and want to reduce it to under 500 milliseconds without significantly degrading the quality of responses. The cluster is configured with NVIDIA A100 GPUs. The model is the base Cohere Command model (52B parameters). They have explored increasing batch size, but that increases latency for interactive use cases. Which action should they take?

A.Deploy the model with inference optimization frameworks like vLLM, TensorRT, or ONNX Runtime.

B.Increase batch size to process multiple queries at once.

C.Swap the model to a smaller variant, such as Cohere Command Light (6B).

D.Enable model quantization (e.g., int8) to reduce memory and computation.

AnswerA

These frameworks optimize GPU utilization and reduce latency without changing the model.

Why this answer

Option D is correct because deploying the model with optimization techniques like vLLM or TensorRT leverages GPU acceleration specifically for inference, reducing latency significantly. Option A is wrong because increasing batch size is not suitable for real-time, single-query scenarios. Option B is wrong because using a smaller model (e.g., 6B) would reduce latency but also degrade quality, which they want to avoid.

Option C is wrong because model quantization can reduce model size and latency but may degrade output quality, especially at lower precision.

Full explanation →

270

MCQmedium

A data scientist is deploying a custom generative AI model using OCI Data Science. After deploying the model to an endpoint, they notice that inference requests are failing with a timeout error when the payload size exceeds 1 MB. What is the most likely cause and solution?

A.The load balancer is misconfigured; reconfigure the load balancer timeout settings.

B.The model server lacks sufficient memory; scale out to more instances.

C.The model is not optimized for large payloads; use AutoML to optimize the model.

D.The model deployment has a default payload size limit of ~1 MB; increase the payload limit in the deployment configuration.

AnswerD

OCI Data Science model deployments have a default request payload limit that can be increased.

Why this answer

The correct answer is D because OCI Data Science model deployments have a default payload size limit of approximately 1 MB. When inference requests exceed this limit, the load balancer or gateway times out the request. The solution is to increase the payload limit in the deployment configuration, which can be adjusted via the OCI console or API by modifying the `maximumRequestPayloadSize` setting.

Exam trap

The trap here is that candidates often confuse a payload size limit with a generic timeout or resource issue, leading them to choose load balancer reconfiguration (A) or scaling (B) instead of recognizing the explicit payload limit enforced by the deployment configuration.

How to eliminate wrong answers

Option A is wrong because the load balancer timeout settings are not the root cause; the timeout is a symptom of hitting the payload size limit, not a misconfiguration of the load balancer itself. Option B is wrong because insufficient memory would cause out-of-memory errors or slow inference, not a timeout specifically triggered by payload size exceeding 1 MB. Option C is wrong because AutoML optimizes model training and hyperparameters, not the runtime payload handling; the issue is a deployment configuration limit, not model optimization.

Full explanation →

271

MCQmedium

Your team is deploying a generative AI model for a clinical decision support system. The model must meet HIPAA compliance requirements. You have trained a model using OCI Data Science and now need to deploy it so that patient data is protected. The application requires real-time inference. Which set of actions should you take to ensure compliance while maintaining low latency?

A.Use OCI Functions with API Gateway and allow anonymous access

B.Deploy in a public subnet with HTTPS and enable OCI Audit

C.Use OCI Data Flow for batch inference and store results in Object Storage with SSE

D.Deploy in a private VCN subnet, use a service gateway, store keys in OCI Vault, and enable OCI Logging and OCI Audit

AnswerD

These actions address HIPAA requirements for access control, encryption, and auditing.

Why this answer

Option D is correct because deploying the model in a private VCN subnet ensures the inference endpoint is not exposed to the internet, meeting HIPAA's requirement for network isolation. Using a service gateway allows private connectivity to OCI services without traversing the internet, while storing encryption keys in OCI Vault enables customer-managed key control for data at rest. Enabling OCI Logging and OCI Audit provides the necessary audit trail for compliance, and the private subnet with service gateway keeps latency low by avoiding internet hops.

Exam trap

Oracle often tests the misconception that HTTPS encryption alone is sufficient for HIPAA compliance, but the trap here is that network isolation (private subnet) is mandatory for PHI, and public subnet exposure violates the HIPAA Security Rule even with encryption in transit.

How to eliminate wrong answers

Option A is wrong because OCI Functions with API Gateway and anonymous access bypasses authentication and authorization, violating HIPAA's access control requirements, and anonymous access exposes patient data to unauthorized users. Option B is wrong because deploying in a public subnet, even with HTTPS, exposes the inference endpoint to the internet, which is not permitted for protected health information (PHI) under HIPAA's security rule, and OCI Audit alone does not enforce network isolation. Option C is wrong because OCI Data Flow is a batch processing service, not suitable for real-time inference, and storing results in Object Storage with SSE does not address the need for low-latency, synchronous inference required by the clinical decision support system.

Full explanation →

272

MCQmedium

A company wants to create a chatbot that answers questions based on a large internal document set that is updated weekly. They have limited ML expertise. Which approach is recommended?

A.Fine-tune a model on the entire document set.

B.Train a custom model from scratch.

C.Include all documents in the system prompt.

D.Use retrieval-augmented generation (RAG) with a vector database.

AnswerD

Correct: RAG handles dynamic data without retraining.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) allows dynamic updates without retraining and is easier to implement. Option A requires frequent retraining, Option B may exceed context limits, and Option D is too resource-intensive.

Full explanation →

273

MCQmedium

A company uses OCI GenAI to build a content moderation system that filters toxic language in user-generated comments. They have a small labeled dataset of 1,000 comments (500 toxic, 500 non-toxic) and need an efficient solution that balances accuracy, cost, and latency. They are considering different model options: fine-tuning a large LLM (e.g., Cohere Command), using a pre-trained LLM with prompting, fine-tuning a smaller BERT-based classifier, or building a rule-based system. The team has moderate ML experience and wants to deploy using OCI Data Science. Which approach is most efficient for this binary classification task?

A.Fine-tune a BERT-based classifier (e.g., 'bert-base-uncased') on the dataset.

B.Develop a rule-based system using regular expressions and keyword lists.

C.Use a pre-trained LLM with a toxic/non-toxic prompt.

D.Fine-tune the Cohere Command model on the labeled dataset.

AnswerA

BERT is efficient for classification, fine-tunes quickly on small data, and has low inference cost.

Why this answer

Fine-tuning a BERT-based classifier (e.g., 'bert-base-uncased') is the most efficient approach because BERT is specifically designed for text classification tasks, requiring far fewer computational resources and lower latency than large LLMs. With only 1,000 labeled samples, BERT can achieve high accuracy through transfer learning, while keeping inference costs minimal—ideal for a production content moderation system on OCI Data Science.

Exam trap

Oracle often tests the misconception that larger LLMs (like Cohere Command) are always superior for classification tasks, ignoring the practical constraints of small datasets, cost, and latency that make fine-tuned BERT models the optimal choice for binary classification.

How to eliminate wrong answers

Option B is wrong because rule-based systems using regex and keyword lists cannot generalize to nuanced toxic language (e.g., sarcasm, misspellings, or context-dependent toxicity) and require constant manual maintenance, leading to poor accuracy and high operational overhead. Option C is wrong because using a pre-trained LLM with prompting (e.g., Cohere Command) incurs high per-token inference costs and latency, and with only 1,000 examples, few-shot prompting may not reliably capture the specific toxicity patterns in the dataset. Option D is wrong because fine-tuning a large LLM like Cohere Command on a tiny dataset of 1,000 samples risks catastrophic forgetting and overfitting, while also being computationally expensive and slower for real-time moderation compared to a smaller BERT model.

Full explanation →

274

MCQmedium

A company wants to use OCI Generative AI service to automatically generate product descriptions for an e-commerce catalog. They have 10,000 products. What is the best approach to ensure high-quality, consistent descriptions?

A.Use a pre-trained summarization model.

B.Use a template-based generation with keyword insertion.

C.Use the built-in chat model with few-shot examples in the prompt.

D.Fine-tune a base model on a dataset of existing product descriptions.

AnswerD

Fine-tuning adapts the model to the specific domain and produces consistent outputs across many products.

Why this answer

Fine-tuning a base model on existing product descriptions ensures the model learns the specific style and terminology, leading to consistent and high-quality outputs for a large number of products.

Full explanation →

275

Multi-Selecteasy

Which THREE are essential steps in the prompt engineering process for an LLM?

Select 3 answers

A.Test the prompt with a variety of input examples

B.Fine-tune the model on a domain corpus

C.Define the desired output format and constraints

D.Quantize the model to INT8

E.Iteratively refine the prompt based on model responses

AnswersA, C, E

Testing ensures robustness across different inputs.

Why this answer

Option A is correct because testing the prompt with a variety of input examples is essential to evaluate the LLM's generalization, robustness, and sensitivity to different phrasing or contexts. This step helps identify edge cases, biases, or inconsistencies in the model's responses before deployment.

Exam trap

Oracle often tests the distinction between prompt engineering (input-side optimization) and model modification (fine-tuning, quantization) to trap candidates who confuse these fundamentally different processes.

Full explanation →

276

MCQeasy

A company needs to ensure that only authorized users can invoke an endpoint for a generative AI model. Which OCI feature should be used to control access?

A.Network security groups (NSGs)

B.VCN flow logs

C.OCI Web Application Firewall (WAF)

D.OCI Identity and Access Management (IAM) policies

AnswerD

Correct: IAM policies grant or deny access to specific resources like models and endpoints.

Why this answer

OCI Identity and Access Management (IAM) policies are the correct choice because they define who (users, groups, or service principals) can invoke which OCI resources, including generative AI model endpoints. IAM policies use resource-type and verb-based statements (e.g., 'allow group A to manage ai-service-family in compartment X') to enforce authorization at the API level, ensuring only authorized principals can call the model's inference endpoint.

Exam trap

The trap here is that candidates confuse network-level controls (NSGs, WAF) with identity-based access control, mistakenly thinking that restricting network traffic to the endpoint is sufficient for authorization, whereas OCI requires IAM policies to authenticate and authorize the caller's identity at the API layer.

How to eliminate wrong answers

Option A is wrong because Network Security Groups (NSGs) control network traffic at the subnet or VNIC level using stateful firewall rules (e.g., allow/deny TCP port 443), not user identity or API-level authorization. Option B is wrong because VCN flow logs capture metadata about network traffic (source IP, destination port, etc.) for auditing or troubleshooting, but they do not enforce access control. Option C is wrong because OCI Web Application Firewall (WAF) protects against HTTP-based attacks (e.g., SQL injection, XSS) and can filter by IP or request patterns, but it cannot authenticate or authorize individual users or service principals invoking the model endpoint.

Full explanation →

277

MCQhard

You are a cloud architect at a healthcare company that uses OCI Generative AI Service to analyze patient records and generate clinical summaries. The service is deployed in the Frankfurt region with a dedicated AI cluster. Recently, the compliance team flagged that some generated summaries contain hallucinated diagnoses not present in the source records. They demand immediate mitigation. The current setup uses the default model (cohere.command-r-08-2024) with temperature=0.7, top_p=0.9, and max_tokens=2048. The application sends the entire patient record as a single prompt. You have access to OCI Logging, monitoring metrics (latency, request count, token count, safety filter rejections), and the AI service's model fine-tuning capability. You must reduce hallucinations while minimizing latency increase. What is the most effective course of action?

A.Switch to cohere.command-light model for faster inference and add a post-processing step using a BERT-based NER model to validate entities.

B.Increase max_tokens to 4096 and use chunked processing with overlapping context windows to provide more context.

C.Enable the safety filter with strict content moderation and set up OCI Logging to audit all generations.

D.Reduce temperature to 0.2, top_p to 0.5, and fine-tune the model on a curated dataset of 5,000 clinical summaries with a learning rate of 0.00005 and batch size of 8.

AnswerD

Lower temperature/top_p yields more deterministic outputs; fine-tuning on domain-specific data directly reduces hallucinations.

Why this answer

Option D is correct because reducing temperature and top_p makes the model more deterministic, reducing randomness and thus hallucinations. Fine-tuning on curated clinical data with a lower learning rate and smaller batch size aligns the model to the domain without excessive training. Option A might reduce hallucinations but increases latency and token cost.

Option B only adds a safety filter, which does not address factual accuracy. Option C may change style but not reduce hallucinations.

Full explanation →

278

MCQmedium

A data scientist is fine-tuning a model on OCI Generative AI to generate code comments. They use a dataset of 10,000 examples. After fine-tuning, the model generates comments that are too similar to the training data and lack generalization. What is the most likely cause?

A.Incorrect tokenizer.

B.Insufficient training data.

C.Too many training epochs.

D.Too high learning rate.

AnswerC

Excessive epochs cause the model to memorize training data, reducing generalization.

Why this answer

Option C is correct. Too many training epochs lead to overfitting, where the model memorizes the training data rather than learning general patterns. Option A is wrong because 10,000 examples is not insufficient for fine-tuning; overfitting is more likely.

Option B is wrong because a high learning rate causes divergence, not memorization. Option D is wrong because an incorrect tokenizer would cause garbled output, not just over-similarity.

Full explanation →

279

MCQmedium

A healthcare company is deploying an OCI Generative AI service to summarize patient notes. They have recently moved from a managed serving endpoint to a dedicated AI cluster to ensure data privacy. The fine-tuned model is deployed on a dedicated cluster in the US West region. Users report that the summarization responses are now slower and occasionally timeout. The IT team checks the metrics: the cluster has 1 replica and CPU utilization is at 90%. The Object Storage bucket containing the model artifacts is in the same region. They have increased the timeout in their client configuration to 120 seconds, but still get timeouts. What should they do first to address the issue?

A.Move the Object Storage bucket to a local NVMe cache in the cluster.

B.Move the model back to a managed serving endpoint in a different region.

C.Increase the number of replicas in the dedicated cluster.

D.Increase the max tokens parameter in the API call.

AnswerC

Adding replicas provides more compute capacity to handle the load.

Why this answer

Option B is correct because the cluster is overloaded with a single replica, causing high CPU and timeouts. Increasing replicas distributes the load and reduces latency. Other options are either counterproductive or not feasible.

Full explanation →

280

MCQeasy

What is the primary purpose of chunking documents in a RAG pipeline?

A.To improve embedding quality

B.To speed up training

C.To reduce storage costs

D.To ensure each chunk fits within the model's context window

AnswerD

Models have token limits; chunking prevents truncation.

Why this answer

Chunking ensures that each text segment fits within the input token limit of the embedding model and the LLM context window. While it may also help retrieval granularity, the primary reason is to meet model constraints.

Full explanation →

281

MCQmedium

A developer uses OCI Generative AI's chat endpoint with a system message placed after user messages. The model ignores the system message. What is the most likely reason?

A.The system message is too long

B.Temperature is set too high

C.The model has not been fine-tuned for instruction following

D.The system message is placed after user messages

AnswerD

The standard order is system first, then user; otherwise the model may misinterpret.

Why this answer

In OCI Generative AI's chat endpoint, the system message must be placed before user messages to establish the model's behavior and context. When placed after user messages, the model treats it as part of the conversation history rather than a directive, causing it to be ignored. This ordering is a fundamental requirement for the chat API's message structure.

Exam trap

Oracle often tests the specific API message ordering requirement, where candidates mistakenly attribute the failure to model limitations or hyperparameters rather than the structural placement of the system message.

How to eliminate wrong answers

Option A is wrong because the system message being too long would cause a token limit error or truncation, not silent ignoring. Option B is wrong because temperature controls randomness in output, not whether instructions are followed; a high temperature might produce varied responses but does not cause the model to ignore the system message. Option C is wrong because OCI Generative AI models are pre-trained for instruction following without requiring fine-tuning; the issue is purely about message ordering, not model capability.

Full explanation →

282

Multi-Selectmedium

Which TWO are required components to implement a basic RAG system using OCI services? (Choose two.)

Select 2 answers

A.OCI Object Storage

B.OCI Functions

C.OCI Data Flow

D.OCI Search with OpenSearch

E.OCI Document Understanding

AnswersD, E

Required as the vector database for similarity search.

Why this answer

A RAG system needs a way to parse documents into chunks (OCI Document Understanding) and a vector store to index and search embeddings (OCI Search with OpenSearch).

Full explanation →

283

MCQeasy

A developer wants to call the OCI Generative AI service from a Python application running on an OCI Compute instance. Which method is the most secure for authenticating the API calls?

A.Use a resource principal

B.Use the OCI CLI with a config file containing credentials

C.Use instance principals with a dynamic group and policy

D.Use an API signing key stored on the instance

AnswerC

Instance principals allow secure authentication without storing secrets.

Why this answer

Option C is correct because instance principals allow the Compute instance to authenticate to OCI services without storing any credentials on the instance. By assigning a dynamic group and policy, the instance obtains a temporary security token from the OCI metadata service, which is the most secure method for programmatic access from within OCI.

Exam trap

The trap here is that candidates confuse resource principals (used for serverless functions) with instance principals (used for Compute instances), or they assume that storing credentials in a config file is acceptable because it is a common practice in non-OCI environments.

How to eliminate wrong answers

Option A is wrong because resource principals are used for OCI Functions or other OCI resources that need to make API calls, not for Compute instances. Option B is wrong because using the OCI CLI with a config file containing credentials stores long-lived user credentials on the instance, which is less secure and violates the principle of least privilege. Option D is wrong because storing an API signing key on the instance creates a persistent secret that could be compromised if the instance is breached, and it requires manual key rotation.

Full explanation →

284

MCQeasy

An organization wants to deploy a generative AI chatbot using OCI Generative AI service. The chatbot must comply with data residency requirements by ensuring that all data processing occurs within a specific geographic region. What is the best practice to achieve this?

A.Use a dedicated AI cluster in the required region

B.Enable cross-region replication for disaster recovery

C.Configure a tenancy-wide policy to restrict region usage

D.Use IAM policies to block access from other regions

AnswerA

Dedicated AI clusters are region-specific and ensure data stays in that region.

Why this answer

Option A is correct because OCI Generative AI service allows you to provision a dedicated AI cluster within a specific region, ensuring all model inference and data processing remain within that geographic boundary. This dedicated cluster is isolated from other regions and complies with data residency requirements by design, as no data leaves the chosen region during processing.

Exam trap

The trap here is that candidates confuse data residency with access control or disaster recovery, thinking that IAM policies or replication settings can enforce geographic data boundaries, when in fact only the physical placement of the compute cluster guarantees data stays within a region.

How to eliminate wrong answers

Option B is wrong because cross-region replication is a disaster recovery feature that copies data to another region, which would violate data residency by moving data outside the required geographic region. Option C is wrong because tenancy-wide policies restrict where resources can be created, but they do not control where data processing occurs for an existing AI cluster; data could still be processed in a different region if the cluster is not explicitly placed. Option D is wrong because IAM policies block user access from other regions but do not prevent the AI service from processing data in a region other than the required one; data residency is about data location, not access control.

Full explanation →

285

MCQeasy

A team is using OCI Generative AI Agents to build a customer support bot. The bot sometimes generates answers that contradict the knowledge base. What is the most likely cause?

A.The chunking strategy for the knowledge base does not capture enough context overlap.

B.The max tokens value is too low, truncating the response.

C.The temperature parameter is set too high, causing the model to hallucinate.

D.The model's repetition penalty is too high.

AnswerA

If chunks are too small or lack overlap, the model may not retrieve all relevant information, leading to inconsistencies.

Why this answer

Option A is correct because when the chunking strategy lacks sufficient context overlap, the retrieved chunks may omit critical surrounding information, causing the generative AI model to infer missing details incorrectly and produce answers that contradict the knowledge base. In OCI Generative AI Agents, the chunking strategy determines how documents are split into smaller pieces for retrieval; without adequate overlap, the model loses the semantic continuity needed to stay faithful to the source material.

Exam trap

Oracle often tests the misconception that hallucinations are always caused by temperature settings, when in fact retrieval quality issues like poor chunking are a more common root cause in RAG-based systems.

How to eliminate wrong answers

Option B is wrong because a low max tokens value truncates the response length but does not cause the model to generate contradictory content; it simply cuts off the output prematurely. Option C is wrong because while a high temperature parameter increases randomness and can lead to hallucinations, the question specifically states the bot contradicts the knowledge base, which is more directly tied to retrieval failures (chunking) than to generation randomness. Option D is wrong because a high repetition penalty discourages the model from repeating phrases, which might reduce fluency but does not cause contradictions with the knowledge base.

Full explanation →

286

MCQmedium

A company uses OCI Generative AI to power a chatbot for customer support. They notice that the model's responses sometimes contain factual inaccuracies. Which strategy would best reduce hallucination?

A.Implementing Retrieval-Augmented Generation (RAG).

B.Increasing the temperature parameter.

C.Reducing the max token limit.

D.Fine-tuning the model on a larger general corpus.

AnswerA

RAG retrieves relevant facts from a knowledge base, grounding the output and reducing hallucination.

Why this answer

Retrieval-Augmented Generation (RAG) grounds the model's responses in retrieved factual information, directly reducing hallucination. Increasing temperature increases randomness, fine-tuning on a larger corpus may not fix factual accuracy, and reducing max tokens does not affect correctness.

Full explanation →

287

MCQhard

Refer to the exhibit. The dashboard shows latency grouped by modelId, but some points are missing for certain modelIds. Which of the following is the most likely reason?

A.The metric name is misspelled

B.The aggregation interval is too short

C.The modelIds with missing data may have been deleted or are inactive

D.The compartmentId is incorrect

AnswerC

Inactive or deleted models stop emitting metrics, leading to gaps in the time series.

Why this answer

Option C is correct because in OCI's Generative AI service, model deployments are associated with specific modelIds. If a modelId is deleted or its deployment is deactivated, the corresponding telemetry data (e.g., latency metrics) will no longer be reported, causing gaps in the dashboard. The dashboard aggregates metrics only for active modelIds, so missing points indicate that those modelIds are no longer in service.

Exam trap

The trap here is that candidates may confuse missing data due to inactive resources with configuration errors (e.g., metric name typos or compartment mismatches), but Cisco tests the understanding that metric gaps are often caused by resource lifecycle events rather than misconfiguration.

How to eliminate wrong answers

Option A is wrong because a misspelled metric name would cause all data points to be missing for all modelIds, not just selective gaps. Option B is wrong because a too-short aggregation interval would result in sparse or noisy data across all modelIds, not missing points for specific ones. Option D is wrong because an incorrect compartmentId would prevent any metrics from being displayed for the entire dashboard, not just for certain modelIds.

Full explanation →

288

MCQeasy

A company deploys a fine-tuned Llama 2 model using OCI Generative AI service. They want to ensure low-latency inference for a real-time chat application. Which deployment option should they use?

A.Batch inference job

B.OCI Functions

C.Dedicated AI cluster

D.Serverless endpoint (standard)

AnswerC

Dedicated AI clusters offer reserved capacity and low latency for real-time inference.

Why this answer

A dedicated AI cluster provides reserved compute resources (GPUs) for low-latency, real-time inference by eliminating resource contention. This is essential for a fine-tuned Llama 2 model in a chat application where consistent sub-second response times are required, unlike shared or serverless options that introduce cold starts or queuing delays.

Exam trap

The trap here is that candidates confuse 'serverless endpoint (standard)' with a low-latency option, not realizing that its shared infrastructure and potential cold starts make it unsuitable for real-time inference, while a dedicated cluster guarantees consistent performance.

How to eliminate wrong answers

Option A is wrong because batch inference jobs are designed for asynchronous, high-throughput processing of large datasets, not for real-time, low-latency chat interactions. Option B is wrong because OCI Functions is a serverless compute service with cold-start latency and limited GPU support, making it unsuitable for sustained, low-latency model inference. Option D is wrong because a serverless endpoint (standard) uses shared infrastructure that can experience variable latency due to multi-tenancy and scaling delays, which is not acceptable for real-time chat.

Full explanation →

289

Multi-Selectmedium

Which TWO techniques are commonly used to reduce the memory footprint of LLM inference?

Select 2 answers

A.Quantization

B.Increasing batch size

C.KV cache optimization

D.Gradient checkpointing

E.Using full precision (FP32)

AnswersA, C

Reduces memory by using lower precision weights.

Why this answer

Quantization reduces the memory footprint by lowering the precision of model weights and activations from FP32 to lower bit-widths like INT8 or FP16, which directly decreases the memory required to store and compute with the model. KV cache optimization reduces memory usage by efficiently managing the key-value cache during autoregressive decoding, often through techniques like shared memory, pruning, or compression, which is critical for long-context inference.

Exam trap

Oracle often tests the distinction between training and inference techniques, so candidates mistakenly apply gradient checkpointing (a training memory saver) to inference, or confuse batch size scaling with memory reduction.

Full explanation →

290

Multi-Selectmedium

Which TWO factors are most likely to cause hallucinations in LLMs?

Select 2 answers

A.High temperature

B.Short context window

C.Excessive fine-tuning

D.Low top-p

E.Inadequate training data

AnswersA, E

High temperature increases randomness, leading to less factual outputs.

Why this answer

A high temperature setting increases the randomness of token sampling, making the model more likely to generate plausible-sounding but factually incorrect or nonsensical outputs. This directly contributes to hallucinations by encouraging the model to deviate from the most probable, grounded responses.

Exam trap

Oracle often tests the misconception that low top-p or short context windows are primary causes of hallucinations, when in fact high temperature and insufficient training data are the two most direct factors that increase the likelihood of generating false or fabricated content.

Full explanation →

291

Multi-Selecteasy

Which TWO techniques can help reduce bias in LLM outputs?

Select 2 answers

A.Setting temperature to 0

B.Using only English data

C.Using diverse training data

D.Increasing model size

E.Applying adversarial debiasing

AnswersC, E

Diverse data reduces representation bias.

Why this answer

Option C is correct because using diverse training data helps the model learn from a wide range of perspectives, reducing the risk of over-representing any single group or viewpoint. This directly mitigates bias by ensuring the training distribution is more representative of the real world, rather than skewed toward a dominant demographic or cultural norm.

Exam trap

Oracle often tests the misconception that lowering temperature or increasing model size can fix bias, when in reality these parameters affect randomness and capacity, not the underlying distributional fairness of the training data.

Full explanation →

292

MCQeasy

Refer to the exhibit. Why did the embedding creation fail?

A.The input text is too short

B.The API call was not properly authenticated

C.The model ID is not available in the us-ashburn-1 region

D.The region is not enabled for the Generative AI service

AnswerB

The MissingAuthenticationError indicates no credentials were provided.

Why this answer

The error 'MissingAuthenticationError' clearly indicates that the API call lacks authentication credentials, which is required for OCI API calls.

Full explanation →

293

Multi-Selecthard

A developer is troubleshooting low recall in a vector search. Which THREE factors should be checked? (Choose three.)

Select 3 answers

A.Embedding model quality and relevance to domain

B.Chunk size and overlap strategy

C.Quality of the query embedding generation

D.The number of results returned (k) in the search

E.The LLM's temperature setting

AnswersA, B, C

A model not trained on similar data may produce poor embeddings.

Why this answer

Embedding model quality (A) directly affects relevance; chunk size and overlap (B) impact granularity; and query embedding quality (D) ensures the search input is properly represented.

Full explanation →

294

MCQhard

A team is building a Retrieval-Augmented Generation (RAG) pipeline using OCI Generative AI. They need to store and retrieve document embeddings for semantic search. Which OCI service is most appropriate as the vector store?

A.OCI Search with OpenSearch

B.OCI Streaming

C.OCI Object Storage

D.OCI Autonomous Database with AI Vector Search

AnswerA

OpenSearch supports vector storage and k-NN search, making it ideal for RAG pipelines.

Why this answer

OCI Search with OpenSearch provides vector database capabilities that integrate natively with OCI GenAI for RAG workflows.

Full explanation →

295

MCQmedium

A team is fine-tuning a foundation model on a large dataset stored in OCI Object Storage. They want to minimize data transfer costs. What is the best practice for locating the storage?

A.Place the bucket in the same region and availability domain as the fine-tuning job

B.Use OCI File Storage instead of Object Storage

C.Use a cross-region bucket to leverage geographically distributed data

D.Place the bucket in the same region as the fine-tuning job

AnswerD

Correct: Same-region transfer is free of charge.

Why this answer

Option D is correct because placing the Object Storage bucket in the same OCI region as the fine-tuning job eliminates cross-region data transfer charges. OCI charges egress fees when data moves between regions, but intra-region data transfer between services in the same region is free. This minimizes costs while keeping the data accessible for the fine-tuning workload.

Exam trap

Oracle often tests the misconception that specifying an availability domain (Option A) is necessary for cost optimization, when in fact Object Storage buckets are regional and availability domain selection is irrelevant for data transfer costs.

How to eliminate wrong answers

Option A is wrong because OCI Object Storage buckets are regional resources, not tied to a specific availability domain; specifying an availability domain is irrelevant and does not affect data transfer costs. Option B is wrong because OCI File Storage is a network-attached file system that incurs additional egress costs when accessed from compute instances in a different region or availability domain, and it does not inherently reduce data transfer costs compared to Object Storage. Option C is wrong because a cross-region bucket replicates data across regions, which incurs replication and egress costs, and accessing data from a different region than the fine-tuning job would still result in cross-region data transfer charges.

Full explanation →

296

MCQhard

A company is deploying a multi-language chatbot using OCI Generative AI Service. The chatbot must support English, Spanish, and French. The team finds that responses in Spanish are less accurate than in English. They have a small bilingual dataset. What is the best approach?

A.Use a multilingual base model (e.g., mT5) and fine-tune on the bilingual dataset (English and Spanish) using cross-lingual transfer learning.

B.Use prompt engineering with language-specific instructions in the system prompt.

C.Translate all user queries to English, process them, then translate responses back.

D.Train separate fine-tuned models for each language.

AnswerA

Cross-lingual transfer leverages English data to improve Spanish performance, and fine-tuning on bilingual data further boosts accuracy.

Why this answer

Option A is correct because fine-tuning a multilingual base model like mT5 on a small bilingual dataset leverages cross-lingual transfer learning, where knowledge from high-resource languages (English) improves performance on low-resource languages (Spanish). This approach is specifically designed for scenarios with limited data and directly addresses the accuracy gap without requiring separate models or translation pipelines.

Exam trap

The trap here is that candidates often overestimate the power of prompt engineering (Option B) for language-specific accuracy, underestimating that systematic linguistic errors require model adaptation through fine-tuning or transfer learning, not just instruction tuning.

How to eliminate wrong answers

Option B is wrong because prompt engineering with language-specific instructions does not adapt the model's internal representations; it merely provides contextual cues, which is insufficient to correct systematic inaccuracies in a specific language. Option C is wrong because translating queries to English and back introduces translation errors, latency, and loss of nuance, and does not improve the model's native understanding of Spanish. Option D is wrong because training separate fine-tuned models for each language is inefficient with a small bilingual dataset and fails to exploit cross-lingual transfer, leading to poor performance on the low-resource language.

Full explanation →

297

MCQhard

A company is using OCI Generative AI for a RAG-based code assistant. They index source code repositories into a vector store. Developers report that the assistant often suggests deprecated APIs or outdated code snippets, even though the latest code is in the repository. The index was built a week ago and has not been updated. They plan to set up incremental updates. However, they notice that even after re-indexing the latest commits, the issue persists. What is the most likely oversight?

A.The vector store is not configured to overwrite existing vectors for updated documents.

B.The retrieval top-k is set too low, missing some relevant snippets.

C.The chunking strategy splits code at function boundaries, losing import statements.

D.The embedding model is not fine-tuned on code; it was trained on natural language.

AnswerA

Without overwrite, old vectors persist even after re-indexing, causing retrieval of outdated code.

Why this answer

Option A is correct because if the vector store does not overwrite or update vectors for changed documents, old vectors remain, causing retrieval of outdated code. Option B (chunking at function boundaries) may cause missing imports but not specifically deprecation. Option C (embedding model not fine-tuned on code) might affect quality but not freshness.

Option D (low top-k) would affect recall, not freshness.

Full explanation →

298

MCQhard

Refer to the exhibit. A data scientist received this output after submitting a fine-tuning job. What is the most effective change to resolve the out-of-memory error?

A.Increase the sequence length.

B.Reduce the learning rate.

C.Decrease the number of fine-tuning epochs.

D.Increase the number of nodes in the cluster.

AnswerD

Correct: More nodes mean more total memory, alleviating OOM.

Why this answer

The out-of-memory error during fine-tuning indicates that the model's memory requirements exceed the available resources on the current node. Increasing the number of nodes in the cluster distributes the model parameters, gradients, and optimizer states across multiple GPUs or nodes, effectively increasing the total memory capacity and resolving the OOM error. This is a standard approach in distributed training frameworks like PyTorch DDP or FSDP, which OCI Data Science supports.

Exam trap

Oracle often tests the misconception that reducing epochs or learning rate can fix memory errors, when in fact memory errors are resource constraints that require scaling hardware (more nodes or GPUs) or reducing memory-intensive parameters like batch size or sequence length.

How to eliminate wrong answers

Option A is wrong because increasing the sequence length would increase the memory footprint per sample (due to larger attention matrices), making the OOM error worse, not better. Option B is wrong because reducing the learning rate affects training dynamics and convergence, not memory usage; it does not address the root cause of insufficient memory. Option C is wrong because decreasing the number of fine-tuning epochs reduces total training time but does not change the peak memory consumption per step, so the OOM error would still occur.

Full explanation →

299

Multi-Selecthard

A developer is evaluating OCI GenAI model families. Which three are correct characteristics of the available models? (Choose three.)

Select 3 answers

A.Llama models are open-source and available for fine-tuning

B.All models support real-time streaming of tokens

C.Cohere embedding models produce vector representations

D.OCI GenAI provides both hosted and dedicated deployment options

E.Cohere Command models are optimized for multilingual tasks

AnswersA, C, D

Meta's Llama models are open-source and supported by OCI GenAI for fine-tuning.

Why this answer

Llama models, such as Llama 2 and Llama 3, are open-source large language models originally developed by Meta. OCI GenAI provides them as pre-built models that developers can fine-tune using their own datasets, enabling customization for domain-specific tasks without training from scratch.

Exam trap

Oracle often tests the misconception that all models in a platform share the same capabilities, such as streaming or multilingual optimization, when in reality each model family (e.g., Llama, Cohere Command, Cohere Embed) has distinct design goals and feature sets.

Full explanation →

300

MCQhard

A company fine-tunes an LLM on internal support tickets. After deployment, the model hallucinates company-specific product names. What is the most effective mitigation?

A.Switch to a smaller model to reduce hallucination risk

B.Use prompt engineering to remind the model to be accurate

C.Implement RAG with a verified product database

D.Fine-tune further with more ticket data

AnswerC

RAG provides factual grounding, reducing hallucinations.

Why this answer

RAG (Retrieval-Augmented Generation) grounds the LLM's output in a verified product database, providing factual context that prevents hallucination of company-specific product names. Unlike fine-tuning, which only adjusts model weights and can still produce plausible but incorrect names, RAG retrieves exact records at inference time, ensuring accuracy for proprietary terminology.

Exam trap

Oracle often tests the misconception that fine-tuning alone can fix factual accuracy for domain-specific entities, when in reality RAG is required to ground outputs in a verifiable external knowledge source.

How to eliminate wrong answers

Option A is wrong because switching to a smaller model reduces capacity and often increases hallucination risk due to lower parameter count and less memorization ability. Option B is wrong because prompt engineering is a fragile, surface-level fix that cannot enforce factual accuracy for specific product names; the model may still generate plausible but incorrect names. Option D is wrong because further fine-tuning with more ticket data risks overfitting and does not guarantee elimination of hallucinated product names, as the model can still invent names not present in the training distribution.

Full explanation →

Page 4 of 7

All pages

Practice 1Z0-1127 by domain

Target a specific domain to shore up weak areas.

Fundamentals of Large Language Models Using OCI Generative AI Service Building LLM Applications with RAG and Vector Search Deploying and Managing Generative AI on OCI

See all domains with question counts →