Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 1Z0-1127 Questions 301–375 | Page 5/7

301

MCQeasy

What is a recommended practice to prevent the LLM from generating information not present in the retrieved context when building a RAG application?

A.Setting the temperature to 0.

B.Using a system message that says 'Use only the provided context to answer.'

C.Including few-shot examples in the prompt.

D.Increasing the topK value.

AnswerB

This instruction directly constrains the model to the context.

Why this answer

Including a system message that explicitly instructs the model to rely only on the provided context reduces the likelihood of hallucination.

Full explanation →

302

MCQeasy

Refer to the exhibit. In this RAG pipeline, what is the role of the 'embedding_model' variable?

A.It converts text into vector representations for similarity search.

B.It applies guardrails to filter content.

C.It fine-tunes the model on the provided texts.

D.It generates text completions based on prompts.

AnswerA

Embeddings are used to index and retrieve relevant documents via vector similarity.

Why this answer

The embedding model converts text into numerical vector representations that can be stored and searched for similarity.

Full explanation →

303

MCQhard

A healthcare startup is building a chatbot to answer patient inquiries using a large language model (LLM) deployed on OCI Data Science AI Quick Actions. The chatbot must comply with HIPAA regulations, so all patient data must remain within the OCI tenancy and never be sent to third-party APIs. The team has fine-tuned a Llama 2 7B model on de-identified medical records using OCI Data Science notebooks. The model is deployed as a managed endpoint via AI Quick Actions. Early testing shows that the chatbot sometimes generates responses containing specific patient names or dates of birth that were present in the fine-tuning dataset. Moreover, the model occasionally hallucinates medication dosages that are not medically accurate. Which course of action should the team take to address both issues while maintaining HIPAA compliance?

A.Deploy a rule-based post-processing script that checks each response against a list of known patient names and medication dosages, and rejects any response containing them.

B.Switch to a larger model (e.g., Llama 2 70B) to improve accuracy and reduce hallucinations, and apply output filtering to remove any detected PII from responses.

C.Increase the fine-tuning dataset size with more varied de-identified records to reduce overfitting, and apply a temperature setting of 0 to make outputs deterministic.

D.Re-fine-tune the model using differential privacy to limit memorization of training data, and implement retrieval-augmented generation (RAG) with a curated medical knowledge base to ground medication-related responses.

AnswerD

Differential privacy during training reduces the risk of memorizing private data, and RAG grounds responses in a trusted knowledge base, reducing hallucinations. This combination addresses both issues effectively.

Why this answer

Option D is correct because it addresses both memorization of PII and hallucination of medication dosages while maintaining HIPAA compliance. Differential privacy during fine-tuning limits the model's ability to memorize specific patient data, and retrieval-augmented generation (RAG) grounds responses in a curated medical knowledge base, reducing hallucinations without sending data outside the OCI tenancy.

Exam trap

Oracle often tests the misconception that simply filtering outputs or increasing model size can solve memorization and hallucination issues, when in fact only training-time techniques like differential privacy and inference-time grounding like RAG address the root causes.

How to eliminate wrong answers

Option A is wrong because a rule-based post-processing script cannot catch all variations of patient names or hallucinated dosages (e.g., misspellings, new names), and rejecting responses containing known names does not prevent the model from generating them in the first place. Option B is wrong because switching to a larger model (Llama 2 70B) does not inherently reduce memorization of training data or hallucinations; it may even increase both, and output filtering alone cannot guarantee removal of all PII without risking false positives or missing subtle leaks. Option C is wrong because increasing dataset size does not guarantee reduced overfitting or memorization, and setting temperature to 0 makes outputs deterministic but does not prevent the model from reproducing memorized PII or hallucinating dosages; it only removes randomness.

Full explanation →

304

MCQmedium

You deployed a generative AI model on OCI Model Deployment with autoscaling configured based on average CPU utilization. The model is a large language model that heavily utilizes the GPU. During peak hours, the scaling is too slow to keep up with demand, resulting in high latency for users. You want to improve the responsiveness of autoscaling. Which change should you make?

A.Decrease the target CPU utilization threshold for scale-out

B.Increase the maximum number of replicas in the autoscaling configuration

C.Use GPU utilization as the scaling metric instead of CPU utilization

D.Increase the cooldown period between scale-out events

AnswerC

GPU utilization directly correlates with inference load, enabling more responsive scaling.

Why this answer

Option C is correct because the model heavily utilizes GPU, not CPU. Autoscaling based on CPU utilization is irrelevant for GPU-bound workloads, leading to delayed scale-out. Using GPU utilization as the scaling metric directly reflects the actual resource bottleneck, enabling faster and more accurate scaling decisions.

Exam trap

The trap here is that candidates assume CPU utilization is always the correct scaling metric for any workload, overlooking that GPU-bound models require a metric that reflects the actual bottleneck.

How to eliminate wrong answers

Option A is wrong because decreasing the target CPU utilization threshold would cause scale-out to trigger at even lower CPU usage, but since the model is GPU-bound, CPU utilization remains low and irrelevant, so this change does not address the root cause. Option B is wrong because increasing the maximum number of replicas only sets an upper limit on scaling; it does not speed up the scaling decision or make it more responsive to demand. Option D is wrong because increasing the cooldown period between scale-out events would actually slow down scaling further, worsening latency during peak hours.

Full explanation →

305

MCQeasy

When building a RAG application for document retrieval, which chunking strategy is recommended to maximize retrieval accuracy?

A.Use fixed-size token chunks with no overlap

B.Use overlapping chunks with a sliding window

C.Use random splitting points

D.Use entire documents as single chunks

AnswerB

Overlap ensures contextual continuity between chunks.

Why this answer

Overlapping chunks with a sliding window preserve context at chunk boundaries, improving the chance that relevant text is captured in at least one chunk.

Full explanation →

306

MCQeasy

An OCI GenAI practitioner wants to deploy a model that can generate code from natural language descriptions. Which type of model is most suitable?

A.T5

B.ResNet

C.BERT

D.GPT

AnswerD

GPT (decoder-only) excels at autoregressive text generation, ideal for code generation.

Why this answer

GPT models (decoder-only) are designed for text generation, including code generation. BERT is encoder-only, T5 is encoder-decoder but not as optimized for code, and ResNet is for images.

Full explanation →

307

MCQhard

A developer is deploying a fine-tuned model using OCI Generative AI service. They want to use a custom container image for inference. Which statement is true?

A.Custom containers are only supported with OCI Data Science, not Generative AI.

B.You can upload a container image to OCI Container Registry and reference it when creating a dedicated AI cluster.

C.Custom containers are supported only for fine-tuning jobs, not inference.

D.Custom containers are not supported; only built-in models are available.

AnswerB

Correct: This is the documented approach for using custom inference containers.

Why this answer

Option B is correct because OCI Generative AI service allows you to bring your own custom container image for inference by uploading it to OCI Container Registry (OCIR) and referencing it when creating a dedicated AI cluster. This enables you to deploy fine-tuned models with custom inference logic, dependencies, or frameworks that are not available in the built-in serving containers.

Exam trap

The trap here is that candidates may confuse the scope of custom container support, assuming it is limited to OCI Data Science or only for training, when in fact OCI Generative AI explicitly supports custom containers for inference via dedicated AI clusters.

How to eliminate wrong answers

Option A is wrong because custom containers are supported with OCI Generative AI for inference, not only with OCI Data Science. Option C is wrong because custom containers are supported for inference, not just for fine-tuning jobs; fine-tuning uses built-in containers or custom training containers, but inference also supports custom containers. Option D is wrong because custom containers are indeed supported; you are not limited to only built-in models.

Full explanation →

308

MCQhard

A team is optimizing a RAG pipeline for OCI Generative AI. They observe that the model's responses are verbose and often include irrelevant details from the retrieved chunks, reducing user satisfaction. They have already tuned the prompt template. What is the most effective next step?

A.Apply instruction tuning on the generation model.

B.Implement a re-ranking step using a cross-encoder model.

C.Reduce the number of retrieved chunks from 5 to 3.

D.Increase the similarity threshold for retrieval from 0.7 to 0.85.

AnswerB

Re-ranking scores each chunk for relevance to the query, filtering out noise.

Why this answer

Option C is correct because re-ranking with a cross-encoder can filter out irrelevant chunks before generation, improving response quality. Option A reduces quantity but does not guarantee relevance. Option B may miss relevant chunks.

Option D is costly and time-consuming, and may not address the issue of irrelevant details.

Full explanation →

309

Multi-Selecteasy

Which TWO of the following are valid similarity metrics used in vector search?

Select 2 answers

A.Levenshtein distance

B.Cosine similarity

C.Euclidean distance

D.Hamming distance

E.Jaccard index

AnswersB, C

Commonly used for normalized vectors.

Why this answer

Cosine similarity and Euclidean distance are standard metrics for vector search. Jaccard is for sets, Levenshtein for strings, Hamming for bit vectors.

Full explanation →

310

MCQmedium

A company uses OCI Generative AI Service to generate personalized email content. They need to ensure that personally identifiable information (PII) is not included in the model's training data. What should they do?

A.Encrypt the training data with OCI Vault

B.Use the moderation API to scan outputs

C.Use a dedicated model endpoint

D.Enable data redaction in the service

AnswerD

Data redaction removes PII before processing.

Why this answer

Option D is correct because OCI Generative AI Service provides a built-in data redaction feature that automatically detects and removes personally identifiable information (PII) from training data before it is used for model training. This ensures compliance with data privacy regulations without requiring manual preprocessing or external tools.

Exam trap

The trap here is confusing data redaction (pre-training data sanitization) with output moderation (post-generation filtering), leading candidates to incorrectly select the moderation API option.

How to eliminate wrong answers

Option A is wrong because encrypting training data with OCI Vault protects data at rest and in transit but does not remove or redact PII from the content itself; encryption does not prevent PII from being included in model training. Option B is wrong because the moderation API is designed to scan and filter model outputs (inference results) for inappropriate content, not to sanitize training data before training occurs. Option C is wrong because using a dedicated model endpoint isolates the model instance but does not alter or filter the training data; it addresses data residency or performance concerns, not PII removal.

Full explanation →

311

MCQmedium

Refer to the exhibit. An administrator receives the error shown when attempting to deploy a custom model. What is the most likely cause?

A.The user or service does not have permission to read the model artifact from Object Storage

B.The compartment ID is incorrect

C.The model artifact file is corrupted

D.The dedicated AI cluster ID is invalid

AnswerA

The 403 error indicates lack of IAM permissions to access the bucket.

Why this answer

The error indicates that the deployment process cannot access the model artifact stored in Object Storage. In OCI Generative AI, the service must have read permission on the bucket and object to download the artifact. If the user or service principal lacks the necessary IAM policy (e.g., `allow service generative-ai to read objects in compartment X where target.bucket.name='Y'`), the deployment fails with this access-denied error.

Exam trap

Oracle often tests the distinction between 'permission denied' and 'resource not found' errors; the trap here is that candidates may confuse a missing IAM policy with an incorrect compartment ID or a corrupted artifact, but the error message's reference to 'access' or 'permission' points directly to Object Storage read rights.

How to eliminate wrong answers

Option B is wrong because an incorrect compartment ID would produce a different error (e.g., 'compartment not found' or 'not authorized for compartment'), not a permission error on the artifact. Option C is wrong because a corrupted artifact would cause a validation or extraction failure during model loading, not an access-denied error at the storage retrieval stage. Option D is wrong because an invalid dedicated AI cluster ID would result in a cluster-not-found or capacity error, not a permission error on Object Storage.

Full explanation →

312

MCQeasy

An administrator needs to ensure that only specific users in the finance department can invoke a generative AI model deployed on OCI. Which IAM policy should be used?

A.allow group admins to use generative-ai-model in compartment finance

B.allow group finance_group to manage generative-ai-model in compartment finance

C.allow group finance_group to use generative-ai-model in compartment finance

D.allow any-user to use generative-ai-model in compartment finance

AnswerC

This correctly restricts to the finance group.

Why this answer

Option C is correct because the 'use' verb in an OCI IAM policy grants the minimum required permissions to invoke a generative AI model without allowing management actions like creating or deleting models. The policy scopes access to the 'finance_group' group and the 'finance' compartment, ensuring only specific users in the finance department can invoke the model.

Exam trap

Oracle often tests the distinction between 'use' and 'manage' verbs, where candidates mistakenly choose 'manage' thinking it includes 'use', but 'manage' grants excessive permissions that violate least privilege requirements.

How to eliminate wrong answers

Option A is wrong because it grants access to the 'admins' group instead of the finance department group, and 'use' on the resource type 'generative-ai-model' is correct but the group is wrong. Option B is wrong because 'manage' provides excessive permissions (e.g., create, update, delete models) beyond the required invoke action, violating the principle of least privilege. Option D is wrong because 'any-user' allows all authenticated users in the tenancy to invoke the model, which does not restrict access to only finance department users.

Full explanation →

313

MCQhard

A healthcare company is building a RAG-based chatbot to answer patient queries using medical documents stored in OCI Object Storage. They use OCI Generative AI service with Cohere Command R+ model and OCI OpenSearch as the vector database. The chatbot is deployed on OCI Compute with a Flask application. After deployment, the latency for each query is 15-20 seconds, which is unacceptable. Logs show that the embedding generation step (using OCI Generative AI embedding API) takes 8-10 seconds, and the vector search in OpenSearch takes 5-7 seconds. The team has already enabled connection pooling and increased the compute instance shape to the maximum allowed. Which action would MOST effectively reduce the overall latency?

A.Pre-generate embeddings for all documents during ingestion and store them in the vector database, so at query time only the query embedding is generated and compared.

B.Implement a caching layer with Redis to store previous query results and serve cached responses for identical queries.

C.Reindex the OpenSearch vector index with optimal settings (e.g., HNSW algorithm, ef_search param) to speed up vector search.

D.Switch to a faster embedding model like Cohere Embed v3 (English) which has lower latency.

AnswerA

This eliminates the need to generate embeddings for each document during the query path, drastically reducing latency.

Why this answer

The primary bottleneck is the embedding generation step (8-10 seconds). By pre-generating embeddings for all documents during ingestion and storing them in the vector database, the query-time embedding generation is eliminated, reducing the per-query latency to only the time needed to generate the query embedding and perform the vector search. This directly addresses the largest contributor to the 15-20 second latency.

Exam trap

The trap here is that candidates may focus on optimizing the vector search or caching responses, but the real bottleneck is the embedding generation step, which must be eliminated at query time through pre-generation during ingestion.

How to eliminate wrong answers

Option B is wrong because caching previous query results only helps for repeated identical queries, not for the vast majority of unique patient queries, and does not address the embedding generation bottleneck. Option C is wrong because while tuning HNSW parameters (like ef_search) can improve vector search speed, it only targets the 5-7 second search step, not the 8-10 second embedding generation step, so the overall latency reduction would be insufficient. Option D is wrong because switching to a faster embedding model may reduce embedding latency slightly, but the core issue is that embedding generation is still performed at query time for every query; pre-generation is a more fundamental optimization that eliminates the per-query embedding cost entirely.

Full explanation →

314

MCQeasy

A company is building a RAG application on OCI and needs a managed vector database with native support for AI Vector Search, which offers high performance and integration with OCI GenAI. Which OCI service should they use?

A.Autonomous Database with AI Vector Search

B.OCI OpenSearch

C.MySQL HeatWave

D.OCI Object Storage

AnswerA

Autonomous Database offers AI Vector Search, a key capability for RAG.

Why this answer

Oracle Autonomous Database provides AI Vector Search, enabling efficient similarity search on vector embeddings. OCI OpenSearch can also serve as a vector store but lacks the same level of AI Vector Search integration. MySQL HeatWave and Object Storage are not optimized for vector search.

Full explanation →

315

Multi-Selecteasy

Which two factors are essential for calculating the cost of using OCI Generative AI for text generation? (Choose two.)

Select 2 answers

A.Model architecture (encoder-only vs decoder-only).

B.Number of API calls per minute.

C.Temperature setting.

D.Number of input tokens.

E.Number of output tokens.

AnswersD, E

Input tokens are a direct factor in cost calculation.

Why this answer

The cost of using OCI Generative AI for text generation is primarily determined by the number of input tokens (the prompt you send) and the number of output tokens (the generated response). OCI charges per token processed, making these two factors essential for cost calculation.

Exam trap

Oracle often tests the misconception that API call frequency or model architecture parameters directly influence cost, when in reality only token counts (input and output) are the billing units.

Full explanation →

316

MCQhard

A financial institution needs to deploy a fine-tuned model on OCI with strict data residency requirements. They must ensure that data used for inference never leaves a specific OCI region. The model is stored in Object Storage in the same region. What additional configuration is needed?

A.Configure the dedicated AI cluster to use a private endpoint and restrict access to the region

B.Use OCI Data Transfer service to move data

C.Set up a VPN connection to on-premises

D.Enable cross-region replication on the bucket

AnswerA

Private endpoints keep all traffic within the OCI network and the same region.

Why this answer

Option A is correct because configuring the dedicated AI cluster to use a private endpoint ensures that inference traffic stays within the OCI region and never traverses the public internet. This satisfies the strict data residency requirement by keeping all data and model inference within the designated region, while the model stored in Object Storage in the same region is accessed via the private endpoint without leaving the region.

Exam trap

The trap here is that candidates confuse data residency with data security, incorrectly assuming that a VPN (Option C) or cross-region replication (Option D) can enforce regional confinement, when in fact they either route data outside the region or actively replicate it across regions.

How to eliminate wrong answers

Option B is wrong because OCI Data Transfer Service is designed for offline bulk data migration (e.g., shipping physical drives) and does not address real-time inference data residency or network-level regional confinement. Option C is wrong because setting up a VPN connection to on-premises would route inference traffic outside the OCI region to an on-premises network, violating the data residency requirement that data never leave the specific region. Option D is wrong because enabling cross-region replication on the bucket would actively copy data to another region, directly contradicting the requirement that data never leave the original region.

Full explanation →

317

MCQhard

A data scientist fine-tuned a model on OCI Gen AI using a dedicated AI cluster. After deployment, the model gives inaccurate results. Which troubleshooting step should they take first?

A.Switch to a different base model

B.Increase the cluster size

C.Use a serverless endpoint

D.Check the training data for bias or quality issues

AnswerD

Training data quality directly impacts model accuracy.

Why this answer

Option B is correct because inaccurate results often stem from training data issues such as bias, quality, or insufficient diversity. Other options may be considered later but data quality is the primary suspect.

Full explanation →

318

MCQeasy

A developer needs to generate embeddings for text data using the OCI Generative AI service. Which API should they call to get vector representations of text?

A.cohere.generate

B.cohere.embed

C.cohere.summarize

D.cohere.classify

AnswerB

Correct endpoint for generating embeddings.

Why this answer

The 'cohere.embed' endpoint is used for embedding generation in OCI GenAI. The 'cohere.generate' and 'cohere.summarize' are for text generation and summarization, respectively. 'cohere.classify' is for classification tasks.

Full explanation →

319

MCQhard

An architect is designing a GenAI solution for document summarization that must meet GDPR compliance. The data should not leave the EU. OCI GenAI models are available in Frankfurt, London, and Paris. Which is the best approach?

A.Deploy a dedicated AI cluster in Frankfurt and upload data to Object Storage in Frankfurt.

B.Use the managed serving endpoint in Frankfurt.

C.Use the playground in any EU region.

D.Use a pre-trained model from OCI's catalog.

AnswerA

Dedicated cluster processes data within the cluster, ensuring GDPR compliance.

Why this answer

Option D is correct because deploying a dedicated AI cluster in Frankfurt ensures data stays within the EU region and does not traverse to other regions for inference.

Full explanation →

320

Multi-Selecthard

Which THREE factors should be considered when choosing between a fine-tuning and a prompt engineering approach?

Select 3 answers

A.Latency requirements

B.Need for model personalization

C.Availability of foundation model in OCI

D.Amount of labeled data available

E.Budget for GPU compute

AnswersB, D, E

Fine-tuning is necessary for deep personalization.

Why this answer

Option B is correct because model personalization is a key driver for choosing fine-tuning over prompt engineering. Fine-tuning modifies the model's weights to adapt it to a specific domain or task, enabling deeper customization that prompt engineering alone cannot achieve, especially when the desired behavior requires learning new patterns or knowledge not present in the base model.

Exam trap

Oracle often tests the misconception that latency or model availability are primary differentiators, when in fact the core trade-off is between the need for deep personalization (fine-tuning) versus the ease and speed of prompt engineering, with labeled data and compute budget being practical constraints.

Full explanation →

321

MCQhard

Refer to the exhibit. The group DataScientists can run inference but cannot fine-tune a model on a dedicated AI cluster. Which additional policy statement is required to allow fine-tuning?

A.Allow group DataScientists to inspect dedicated-ai-clusters in compartment ABC

B.Allow group DataScientists to manage dedicated-ai-clusters in compartment ABC

C.Allow group DataScientists to use generative-ai-fine-tune in compartment ABC

D.Allow group DataScientists to use dedicated-ai-clusters in compartment ABC

AnswerB

'manage' includes permissions to create and manage fine-tuning jobs.

Why this answer

Fine-tuning requires manage permission on dedicated-ai-clusters to create and manage fine-tuning jobs.

Full explanation →

322

MCQmedium

A company has deployed a model on a Dedicated AI Cluster and needs to monitor inference performance metrics such as request latency, throughput, and error rates. Which OCI service provides built-in monitoring dashboards for these metrics?

A.OCI Logging

B.OCI Notifications

C.OCI Monitoring

D.OCI Events

AnswerC

Monitoring provides dashboards for metrics like latency and throughput.

Why this answer

OCI Monitoring is the correct service because it provides built-in dashboards and metrics for inference performance, including request latency, throughput, and error rates, specifically for Dedicated AI Cluster deployments. These metrics are automatically collected and visualized in the OCI Monitoring console, allowing real-time tracking of model inference health without additional configuration.

Exam trap

Oracle often tests the distinction between monitoring (real-time metrics and dashboards) and logging (text-based event records), leading candidates to mistakenly choose OCI Logging for performance metrics when it is actually designed for troubleshooting and compliance, not live dashboarding.

How to eliminate wrong answers

Option A is wrong because OCI Logging is designed for collecting and storing log data (e.g., audit logs, custom logs) and does not offer built-in dashboards for real-time inference performance metrics like latency or throughput. Option B is wrong because OCI Notifications is a pub/sub messaging service for alerting and event distribution, not a monitoring dashboard for metrics. Option D is wrong because OCI Events triggers automated actions based on changes in OCI resources (e.g., state changes) but does not provide dashboards for continuous performance metrics.

Full explanation →

323

MCQeasy

When using OCI Generative AI with a fine-tuned model, what is the primary benefit of creating a dedicated AI cluster?

A.Automatic scaling based on demand.

B.Reduced cost per inference token compared to on-demand.

C.Consistent low latency and high throughput for production workloads.

D.Enhanced security through network isolation from other tenants.

AnswerC

Dedicated clusters ensure resources are reserved for your model.

Why this answer

Option A is correct because a dedicated cluster provides guaranteed throughput and low latency. Option B is incorrect because it does not affect cost per token directly. Option C is incorrect because capacity is managed by the service.

Option D is incorrect because security is not inherently better with a dedicated cluster.

Full explanation →

324

MCQeasy

A company uses OCI Generative AI to create embeddings for a vector search. They notice high latency in search queries. What is one possible optimization?

A.Decrease batch size for embedding creation

B.Use approximate nearest neighbor (ANN) search

C.Use exact search for better accuracy

D.Increase the embedding dimension

AnswerC

Exact search is slower; the optimization would be to use ANN.

Why this answer

Approximate nearest neighbor (ANN) search trades a slight reduction in accuracy for a significant speedup, addressing high latency.

Full explanation →

325

MCQhard

Refer to the exhibit. A developer encounters this error. Which action should they take to resolve the issue?

A.Wait and retry after some time.

B.Change the model to cohere.command-light.

C.Increase the max-tokens value.

D.Decrease the temperature to 0.0.

AnswerA

Rate limit errors require waiting for the quota to reset, typically after a short period. Automatic retries with backoff are recommended.

Why this answer

The error indicates a rate limit or throttling issue, typically returned by the OCI Generative AI service when the API request quota is exceeded. Waiting and retrying after the cooldown period allows the rate limit to reset, which is the correct resolution for transient throttling errors.

Exam trap

Oracle often tests the misconception that model parameters (like temperature or max-tokens) can resolve API-level errors, when in fact throttling errors require waiting or implementing retry logic with backoff.

How to eliminate wrong answers

Option B is wrong because changing the model to cohere.command-light does not address rate limiting; it only changes the underlying LLM, which may have different quotas but does not resolve the current throttling error. Option C is wrong because increasing max-tokens affects the length of generated responses, not the request rate or quota limits. Option D is wrong because decreasing temperature to 0.0 controls output randomness and determinism, not API request throttling or rate limits.

Full explanation →

326

MCQeasy

A developer wants to invoke an OCI Generative AI model from an application running on a compute instance in OCI. The instance is in a private subnet. What is the most secure method to access the model endpoint?

A.Use a Service Gateway to access the endpoint privately.

B.Use an Internet Gateway and public endpoint.

C.Use a VPN Connect to connect to the model's public IP.

D.Use a NAT Gateway to access the endpoint.

AnswerA

A Service Gateway enables private access to OCI services without traversing the internet.

Why this answer

A Service Gateway allows resources in a private subnet to access OCI services, including the Generative AI model endpoint, over the OCI private network without traversing the internet. This is the most secure method because traffic stays within the OCI backbone, avoiding exposure to public IPs and reducing the attack surface.

Exam trap

The trap here is that candidates may confuse a NAT Gateway with a Service Gateway, assuming that any gateway providing outbound access is sufficient, but only a Service Gateway offers private, secure access to OCI services without internet exposure.

How to eliminate wrong answers

Option B is wrong because using an Internet Gateway and public endpoint exposes the model endpoint to the public internet, increasing security risks and violating the requirement for a private subnet. Option C is wrong because VPN Connect is used to extend an on-premises network to OCI, not to access OCI service endpoints from within OCI; it would add unnecessary complexity and does not provide private access to the model endpoint. Option D is wrong because a NAT Gateway enables outbound internet access from a private subnet but does not provide private connectivity to OCI services; traffic would still leave the OCI network and return, which is less secure and not the intended use for accessing OCI service endpoints.

Full explanation →

327

Multi-Selecteasy

A developer needs to authenticate API calls to OCI Generative AI from a compute instance. Which TWO methods can be used?

Select 2 answers

A.Configure an API key in OCI IAM for the user

B.Configure a customer-managed key (CMK) for encryption

C.Set up a service connector to forward requests

D.Use resource principal with instance principals

E.Use an auth token from OCI Identity

AnswersA, D

API keys are a standard way to authenticate SDK/CLI requests to OCI services, including Generative AI.

Why this answer

Options B and D are correct. API keys (B) allow programmatic access, and resource principals via instance principals (D) enable secure access without storing credentials on the instance. Option A (auth token) is used for REST API calls to Object Storage, not Generative AI.

Option C (service connector) is for data movement, not authentication. Option E (customer-managed key) is for encryption, not authentication.

Full explanation →

328

MCQmedium

A company is deploying a chatbot powered by OCI Generative AI. They want to inject the conversation history into the model prompt to maintain context. However, they notice that after a long conversation, the model starts to ignore earlier messages. What is the most likely cause?

A.The model's max_tokens limit is too low, truncating the prompt.

B.The model has a limited context window size.

C.The top_p parameter is set to 1, causing deterministic output.

D.The temperature setting is too high, causing randomness.

AnswerB

The context window determines how many input tokens the model can consider; exceeding it causes truncation.

Why this answer

The model's context window size limits the total number of tokens (input + output) it can process at once. When the conversation history grows beyond this limit, older messages are truncated or dropped, causing the model to lose context from earlier parts of the conversation. This is a fundamental constraint of transformer-based models like those used in OCI Generative AI.

Exam trap

Oracle often tests the distinction between input-side limits (context window) and output-side limits (max_tokens), so candidates mistakenly attribute context loss to max_tokens when the real issue is the fixed context window size.

How to eliminate wrong answers

Option A is wrong because max_tokens controls the maximum number of tokens in the generated response, not the input prompt; truncation of the prompt is caused by the context window limit, not max_tokens. Option C is wrong because top_p=1 means nucleus sampling considers all tokens with cumulative probability up to 1, which is the default and does not cause deterministic output; it does not affect context retention. Option D is wrong because temperature controls randomness in token selection, not the ability to retain conversation history; a high temperature increases diversity but does not cause earlier messages to be ignored.

Full explanation →

329

MCQhard

Given the CLI output from `oci generative-ai model list`, what can be determined about the model 'my-fine-tuned-model'?

A.It was created by fine-tuning an existing base model

B.It is a pre-built model provided by OCI

C.It has been deployed to an endpoint

D.It is currently being trained

AnswerA

The base-model-id indicates it was fine-tuned from another model.

Why this answer

The CLI output from `oci generative-ai model list` includes a model named 'my-fine-tuned-model'. In OCI Generative AI, models listed with custom names that are not part of the base model catalog (e.g., cohere.command, meta.llama) indicate they were created by fine-tuning a base model using your own dataset. The presence of a custom name without a base model prefix confirms it is a fine-tuned model, not a pre-built one.

Exam trap

Oracle often tests the distinction between listing models and checking their lifecycle or deployment state, so candidates mistakenly assume a listed model is either deployed or still training, when in fact the `model list` command only confirms the model exists and is registered.

How to eliminate wrong answers

Option B is wrong because pre-built models in OCI Generative AI have names like 'cohere.command' or 'meta.llama-2-70b-chat', not custom names like 'my-fine-tuned-model'. Option C is wrong because the `model list` command only shows model metadata; deployment status requires a separate `oci generative-ai model get` or `oci generative-ai deployment list` command. Option D is wrong because the model list output does not indicate training status; training status is shown via `oci generative-ai model get` with a 'lifecycle-state' field (e.g., 'ACTIVE', 'CREATING'), and a listed model is typically already in an active state.

Full explanation →

330

MCQhard

A company has multiple teams sharing an OCI Generative AI Dedicated AI Cluster. They need to ensure that each team can only access their own fine-tuned models and cannot see or invoke models from other teams. What is the best approach?

A.Use OCI compartments and IAM policies with resource-level permissions for models

B.Train separate models for each team

C.Encrypt model artifacts with different keys for each team

D.Use network security lists to isolate traffic

AnswerA

Compartments and IAM policies can restrict access to specific models.

Why this answer

OCI compartments and IAM policies with resource-level permissions allow you to grant granular access to specific models within a Dedicated AI Cluster. By placing each team's fine-tuned models in separate compartments and writing policies that restrict access to those compartments, you ensure teams can only see and invoke their own models. This approach leverages OCI's native identity and access management without requiring separate clusters or network-level isolation.

Exam trap

The trap here is that candidates often assume network-level isolation (security lists) or encryption keys are sufficient for multi-tenant model access control, but OCI requires IAM resource-level policies to enforce which principals can invoke specific models.

How to eliminate wrong answers

Option B is wrong because training separate models for each team does not address access control; it only creates more models without any mechanism to prevent cross-team visibility or invocation. Option C is wrong because encrypting model artifacts with different keys protects data at rest but does not control access at the API or invocation layer; teams could still see and invoke models if IAM permissions allow it. Option D is wrong because network security lists operate at the network layer and cannot distinguish between different models within the same Dedicated AI Cluster; they are designed for traffic filtering between subnets, not for model-level authorization.

Full explanation →

331

MCQeasy

A data scientist is using OCI Data Science to fine-tune a Cohere command model on domain-specific documents. They observe that the fine-tuned model generates repetitive text. What is the most likely cause?

A.The number of epochs was insufficient.

B.The training dataset lacked diversity.

C.The learning rate was too high.

D.The batch size was too small.

AnswerB

Lack of diversity in training data leads to overfitting and repetitive outputs.

Why this answer

Repetitive text in fine-tuned models is a classic symptom of overfitting to a narrow or homogeneous training dataset. When the domain-specific documents lack diversity in phrasing, topics, or contexts, the model learns to latch onto the most common patterns and repeats them, rather than generalizing. This is not a hyperparameter tuning issue but a data quality issue.

Exam trap

The trap here is that candidates often blame hyperparameters (epochs, learning rate, batch size) for overfitting symptoms, but Cisco specifically tests the understanding that data diversity is the root cause of repetitive generation in fine-tuned LLMs.

How to eliminate wrong answers

Option A is wrong because insufficient epochs typically cause underfitting, not repetitive text; the model would fail to learn patterns at all. Option C is wrong because a learning rate that is too high usually leads to training instability or divergence, not repetitive outputs. Option D is wrong because a batch size that is too small increases gradient noise and can slow convergence, but it does not directly cause repetitive text generation.

Full explanation →

332

MCQeasy

A developer is building a RAG application using Oracle Cloud Infrastructure (OCI) Document Understanding and OCI Generative AI. After chunking documents and generating embeddings, the developer observes that the retrieval step often returns chunks that are semantically unrelated to the query. Which action is MOST likely to improve retrieval relevance?

A.Switch from a dense embedding model to a sparse embedding model.

B.Adjust the chunk size and chunk overlap to better capture coherent passages.

C.Increase the chunk size to capture more context.

D.Reduce the number of retrieved chunks (k) in the vector search.

AnswerB

Proper chunking helps preserve meaning and improves retrieval accuracy.

Why this answer

Option C is correct because adjusting the chunk size and overlap can significantly impact the quality of retrieved passages. Option A is wrong because increasing the chunk size may introduce more noise. Option B is wrong because reducing the number of retrieved chunks could miss relevant information.

Option D is wrong because the embedding model is already chosen; changing it may not fix the chunking issue.

Full explanation →

333

MCQeasy

Which of the following best describes the role of attention in transformer models?

A.It assigns equal weight to all words in the input.

B.It is used only during training, not inference.

C.It allows the model to focus on relevant parts of the input sequence when generating output.

D.It replaces the need for positional encoding.

AnswerC

This is the core function of attention: it enables the model to selectively attend to important input parts.

Why this answer

Option C is correct because the attention mechanism in transformer models dynamically computes a weighted sum of all input tokens, allowing the model to focus on the most relevant parts of the input sequence when generating each output token. This is achieved through scaled dot-product attention, which assigns higher weights to tokens that are more contextually important, enabling the model to capture long-range dependencies effectively.

Exam trap

Oracle often tests the misconception that attention is only for training or that it replaces positional encoding, so candidates must remember that attention is inherently order-agnostic and requires positional encoding to capture sequence order, and that it is used in both training and inference phases.

How to eliminate wrong answers

Option A is wrong because attention does not assign equal weight to all words; instead, it computes a distribution of weights (attention scores) that vary based on the relevance of each token to the current query, with some tokens receiving much higher weights than others. Option B is wrong because attention is used during both training and inference; during inference, the model still computes attention over the input sequence to generate each output token, though the key-value cache may be used for efficiency. Option D is wrong because attention does not replace the need for positional encoding; the self-attention operation is permutation-invariant (it treats the input as a set), so positional encodings are required to inject information about the order of tokens in the sequence.

Full explanation →

334

MCQmedium

An enterprise deployed a custom fine-tuned model for generating financial reports. After the first month, the model's outputs began to include outdated information and occasional factual errors. The team suspects data drift. What is the best course of action?

A.Switch to a newer base model like Llama 3.1 without retraining.

B.Decrease the temperature parameter to 0.1 to reduce model creativity.

C.Retrain the model on the latest financial data and monitor for drift.

D.Increase the max tokens value to allow longer responses.

AnswerC

Retraining with current data mitigates data drift and improves output accuracy.

Why this answer

Option D is correct because retraining with up-to-date data addresses the root cause of data drift. Option A is wrong because adjusting temperature may reduce creativity but not fix factual accuracy. Option B is wrong because increasing max tokens does not improve accuracy.

Option C is wrong because switching to a different base model without retraining does not address drift.

Full explanation →

335

MCQeasy

A company has deployed a fine-tuned GPT model on OCI Generative AI using a dedicated AI cluster with 2 nodes. The endpoint is used by an internal application that generates product descriptions. Recently, the application started receiving timeouts and slow responses. The monitoring dashboard shows that the cluster's CPU utilization is consistently above 90%, and the request queue is growing. The team has verified that the model and code have not changed. The application traffic has increased by 20% over the past month. What should the team do to resolve the issue?

A.Switch to a serverless endpoint to handle variable traffic.

B.Reduce the batch size in the inference requests to lower CPU usage.

C.Implement a caching layer for frequently requested descriptions.

D.Increase the number of nodes in the dedicated AI cluster from 2 to 4.

AnswerD

This directly adds compute capacity to handle the increased traffic.

Why this answer

Option D is correct because the dedicated AI cluster with 2 nodes is experiencing sustained CPU utilization above 90% and a growing request queue due to a 20% increase in traffic. Scaling out the cluster by adding more nodes (from 2 to 4) increases the available compute capacity, allowing the cluster to handle the higher inference load without timeouts. This directly addresses the resource bottleneck without requiring code or model changes.

Exam trap

The trap here is that candidates may confuse reducing batch size (which actually increases CPU overhead per request) with reducing load, or assume caching is a universal performance fix, when the real solution is to scale the dedicated cluster horizontally to match increased traffic.

How to eliminate wrong answers

Option A is wrong because switching to a serverless endpoint would not resolve the issue; serverless endpoints on OCI Generative AI still rely on underlying compute resources and may introduce cold-start latency, and the problem is a sustained increase in traffic that requires dedicated capacity, not variable traffic handling. Option B is wrong because reducing the batch size in inference requests would decrease throughput per request and increase the number of requests, potentially worsening CPU utilization and queue growth, not lowering it. Option C is wrong because implementing a caching layer for frequently requested descriptions would only help if identical requests are repeated, but the problem is a general increase in traffic volume and CPU saturation, not redundant requests; caching does not reduce the compute load for unique or varied product descriptions.

Full explanation →

336

MCQmedium

An organization stores its knowledge base in Oracle Autonomous Database and wants to build a RAG chatbot using OCI Generative AI. The chatbot must retrieve the most relevant documents based on user queries. Which indexing approach is BEST suited for efficient similarity search on text embeddings?

A.Create an ANN index on the embedding vector column.

B.Create a bitmap index on the embedding vector column.

C.Create an inverted index on the document text column.

D.Create a B-tree index on the document text column.

AnswerA

ANN indexes enable fast approximate nearest neighbor search in vector databases.

Why this answer

Option A is correct because Approximate Nearest Neighbor (ANN) indexes are specifically designed for high-dimensional vector spaces, enabling efficient similarity search on embedding vectors. In Oracle Autonomous Database, ANN indexes (e.g., using IVF or HNSW algorithms) drastically reduce search latency compared to brute-force scans, which is critical for real-time RAG chatbot responses.

Exam trap

Oracle often tests the misconception that any index type can be applied to vector columns, but the trap here is that candidates confuse traditional database indexes (B-tree, bitmap, inverted) with specialized vector indexes, failing to recognize that only ANN indexes support distance-based similarity search on embeddings.

How to eliminate wrong answers

Option B is wrong because bitmap indexes are optimized for low-cardinality columns (e.g., gender or status flags), not for high-dimensional floating-point vectors, and they cannot perform similarity comparisons like cosine or Euclidean distance. Option C is wrong because inverted indexes are designed for full-text search on tokenized text, not for vector embeddings, and they cannot compute distances between vectors. Option D is wrong because B-tree indexes are for exact match or range queries on scalar data (e.g., numbers or short strings), and they do not support the distance-based ordering required for vector similarity search.

Full explanation →

337

MCQmedium

An enterprise RAG system must ensure that retrieved data comes only from authorized sources. Which OCI feature should be used to enforce this?

A.Data encryption at rest

B.OCI IAM policies for the vector database

C.Network security groups

D.Resource quotas

AnswerB

IAM policies control who can access the vector database and its data.

Why this answer

OCI IAM policies allow fine-grained access control to resources, ensuring only authorized users or services can retrieve data from the vector database.

Full explanation →

338

MCQeasy

A developer sends the above request to the OCI Generative AI API. The response returns an error: 'InvalidParameter: The parameter 'topP' is not supported for this model.' What is the most likely reason?

A.The 'cohere.command-r-plus-v1:0' model does not accept the topP parameter.

B.The model ID is deprecated.

C.The topP parameter value is out of range.

D.The JSON request format is incorrect.

AnswerA

Cohere's command-r-plus model only supports temperature for randomness control, not topP.

Why this answer

Option B is correct because the command-r-plus model does not support the topP parameter; it only supports temperature. Option A is wrong because topP is a valid parameter in some models. Option C is wrong because JSON is valid.

Option D is wrong because modelId is correct.

Full explanation →

339

MCQmedium

A large enterprise is deploying a generative AI model for internal document summarization. The model is deployed on OCI Data Science using a custom container. The inference endpoint is behind a public load balancer. The security team requires that all traffic between the client and the endpoint be encrypted in transit and that the endpoint not be accessible from the public internet. The current setup uses a public load balancer with an SSL certificate. The VCN has a public subnet for the load balancer and a private subnet for the model deployment. The security team is concerned that the load balancer is publicly accessible. The enterprise wants to maintain high availability and low latency. What should the architect do to meet the security requirements?

A.Use a site-to-site VPN to connect clients to the VCN and access the endpoint via private IP.

B.Remove the load balancer and use a service gateway to access the model deployment directly from the VCN.

C.Keep the public load balancer but add a Web Application Firewall (WAF) to block unauthorized IPs.

D.Replace the public load balancer with a private load balancer in a private subnet, and attach an SSL certificate for encryption.

AnswerD

A private load balancer is not internet-facing, ensures encryption via SSL, and provides high availability.

Why this answer

Option D is correct because replacing the public load balancer with a private load balancer in a private subnet ensures the endpoint is not accessible from the public internet, while attaching an SSL certificate maintains encryption in transit. This satisfies both security requirements without sacrificing high availability or low latency, as the private load balancer still provides load balancing and TLS termination within the VCN.

Exam trap

The trap here is that candidates may think a WAF or VPN alone can satisfy both encryption and private access, but they overlook that the public load balancer itself remains a publicly routable endpoint, which directly violates the 'not accessible from the public internet' requirement.

How to eliminate wrong answers

Option A is wrong because a site-to-site VPN only encrypts traffic between the client site and the VCN, but the public load balancer remains publicly accessible, violating the requirement that the endpoint not be accessible from the public internet. Option B is wrong because removing the load balancer and using a service gateway would bypass load balancing, breaking high availability and low latency, and service gateways are used for outbound traffic to OCI services, not for inbound client access. Option C is wrong because keeping the public load balancer with a WAF does not remove public internet accessibility; WAF only filters traffic but does not make the endpoint private, so the security team's concern remains unaddressed.

Full explanation →

340

MCQmedium

A document processing pipeline uses OCI Document Understanding to extract text from PDFs, then creates embeddings with OCI Generative AI. Some documents exceed the embedding model's token limit. What is the best approach?

A.Truncate the document to the token limit

B.Use a different embedding model with a higher token limit

C.Skip documents that exceed the limit

D.Split the document into chunks that fit the limit and embed each chunk separately

AnswerD

Chunking preserves full content and allows granular retrieval.

Why this answer

Splitting the document into chunks that fit the token limit and embedding each chunk separately is standard practice, ensuring all content is represented in the vector store.

Full explanation →

341

MCQhard

A healthcare company is deploying OCI Generative AI Service for clinical decision support. They must ensure that model outputs are auditable, explainable, and free from patient data exposure. Which combination of OCI features should they use?

A.Fine-tune a model on de-identified patient notes and use default inference settings.

B.Use Retrieval-Augmented Generation with an internet search index for up-to-date medical knowledge.

C.Use OCI Data Masking to de-identify inputs, and enable model monitoring with explainability outputs via OCI Monitoring and OCI Logging.

D.Deploy the model in a private endpoint and disable all logging to prevent data leaks.

AnswerC

Data masking ensures compliance, and monitoring with logging provides auditability and explainability.

Why this answer

Option C is correct because OCI Data Masking can de-identify patient data in inputs before they reach the generative AI model, ensuring no protected health information (PHI) is exposed. Enabling model monitoring with explainability outputs via OCI Monitoring and OCI Logging provides an auditable trail of model decisions and explanations, meeting the requirements for auditability and explainability in clinical decision support.

Exam trap

The trap here is that candidates often assume that simply de-identifying data (Option A) or using a private endpoint (Option D) is sufficient for auditability and explainability, overlooking the need for explicit monitoring and logging mechanisms to capture and review model behavior.

How to eliminate wrong answers

Option A is wrong because fine-tuning on de-identified patient notes does not guarantee that model outputs will be free from patient data exposure—fine-tuned models can memorize and regurgitate training data, and default inference settings lack the monitoring and explainability needed for auditability. Option B is wrong because using Retrieval-Augmented Generation with an internet search index introduces uncontrolled, non-auditable external data sources, which cannot ensure explainability or prevent patient data exposure, and internet search results may not comply with healthcare data privacy regulations. Option D is wrong because disabling all logging to prevent data leaks eliminates the ability to audit model outputs or provide explainability, which directly contradicts the requirements for auditability and explainability.

Full explanation →

342

MCQeasy

Which OCI Generative AI service model family supports fine-tuning with custom datasets?

A.Cohere Command

B.Cohere Embed

C.Cohere Summarize

D.GPT-3

AnswerA

Cohere Command models are designed for text generation and support fine-tuning.

Why this answer

Cohere Command is the model family within OCI Generative AI that supports fine-tuning with custom datasets, allowing users to adapt the model for domain-specific tasks like summarization or classification. In contrast, Cohere Embed is designed for generating text embeddings, Cohere Summarize is a specialized endpoint for summarization without fine-tuning support, and GPT-3 is not natively available in OCI Generative AI for fine-tuning.

Exam trap

Oracle often tests the misconception that all Cohere model families (Embed, Summarize, Command) support fine-tuning, but only Command is designed for customization with custom datasets.

How to eliminate wrong answers

Option B (Cohere Embed) is wrong because it is optimized for creating vector embeddings of text, not for generative tasks, and does not support fine-tuning with custom datasets. Option C (Cohere Summarize) is wrong because it is a pre-configured summarization endpoint that does not allow model customization or fine-tuning. Option D (GPT-3) is wrong because it is an OpenAI model not offered within the OCI Generative AI service; OCI uses Cohere and Meta Llama models, and GPT-3 cannot be fine-tuned through OCI.

Full explanation →

343

MCQeasy

A developer wants to use OCI Generative AI Service to summarize long documents. Which endpoint should they use to send the document content?

A./generate

B./classify

C./embed

D./chat

AnswerD

The /chat endpoint accepts a conversation history, suitable for summarization tasks.

Why this answer

Option D is correct because the /chat endpoint in OCI Generative AI Service is designed for conversational interactions and can handle long document summarization by accepting the document content as part of the chat context. This endpoint supports multi-turn dialogues and large input payloads, making it suitable for processing and summarizing lengthy documents.

Exam trap

Oracle often tests the misconception that /generate is the correct endpoint for all text generation tasks, including summarization, but the /chat endpoint is specifically optimized for interactive and context-aware tasks like document summarization.

How to eliminate wrong answers

Option A is wrong because /generate is used for text generation tasks like content creation or completion, not specifically for summarization of long documents. Option B is wrong because /classify is intended for text classification tasks such as sentiment analysis or topic labeling, not summarization. Option C is wrong because /embed is used to generate vector embeddings for text, which are useful for semantic search or similarity comparisons, not for producing summaries.

Full explanation →

344

Multi-Selecthard

Which TWO are common causes of poor answer quality in a RAG system built on OCI Generative AI? (Choose two.)

Select 2 answers

A.Mismatch between the embedding model's training data and the domain of the documents.

B.Using a generation model that is too large for the task.

C.Setting the temperature parameter too low, causing overly deterministic outputs.

D.Insufficient number of relevant chunks in the document corpus for the given query.

E.Using only vector search without keyword-based fallback.

AnswersA, D

Domain mismatch leads to poor semantic alignment and irrelevant retrieval.

Why this answer

Option A is correct because the embedding model's training data determines the semantic space in which documents and queries are represented. If the model was trained on general text (e.g., Wikipedia) but the documents are from a specialized domain (e.g., medical or legal), the embeddings will fail to capture domain-specific nuances, leading to poor retrieval relevance and thus poor answer quality in the RAG system.

Exam trap

Oracle often tests the distinction between retrieval-side failures (like embedding mismatch or insufficient chunks) and generation-side parameters (like temperature or model size), so candidates mistakenly attribute poor answer quality to generation settings rather than the retrieval pipeline.

Full explanation →

345

MCQeasy

A startup is using OCI Generative AI serverless inference for a text generation application. They notice that the latency is high during peak hours. They have a budget to increase costs moderately. Which action would most effectively reduce latency?

A.Switch to dedicated AI cluster.

B.Enable content filtering.

C.Increase the number of concurrent requests.

D.Use a smaller model.

AnswerA

Dedicated clusters offer predictable, low-latency inference.

Why this answer

Option A is correct. Switching to a dedicated AI cluster provides consistent low latency compared to serverless inference. Option B is wrong because using a smaller model might reduce latency but could degrade quality.

Option C is wrong because enabling content filtering does not affect latency. Option D is wrong because increasing concurrent requests may increase load without improving latency.

Full explanation →

346

MCQhard

A multinational corporation is deploying a generative AI chatbot for customer support using Oracle Cloud Infrastructure's Generative AI service. The chatbot is powered by a large language model (LLM) accessed via the on-demand serving mode. During initial testing, the chatbot provides accurate answers for well-known products but frequently hallucinates or gives incorrect specifications for niche products. The company maintains a comprehensive internal database of product specifications, updated daily. The support team prefers not to fine-tune the LLM due to cost and maintenance overhead. Additionally, the chatbot must respond within 2 seconds to maintain a good customer experience. The team considers several approaches: A. Increasing the 'temperature' parameter to make the model more creative, hoping it will generate more accurate responses when unsure. B. Using few-shot prompting with three manually curated examples of correct product specifications included in every prompt. C. Implementing a Retrieval Augmented Generation (RAG) pipeline that retrieves relevant product documents from the internal database and prepends them to the prompt before inference. D. Reducing the 'topP' parameter to 0.1 to force the model to sample only from the highest probability tokens, thereby reducing randomness. Which approach best meets the requirements of improving factual accuracy while maintaining low latency?

A.Reduce the 'topP' parameter to 0.1 to force the model to sample only from the highest probability tokens, thereby reducing randomness.

B.Implement a Retrieval Augmented Generation (RAG) pipeline that retrieves relevant product documents from the internal database and prepends them to the prompt before inference.

C.Use few-shot prompting with three manually curated examples of correct product specifications included in every prompt.

D.Increase the 'temperature' parameter to make the model more creative, hoping it will generate more accurate responses when unsure.

AnswerB

RAG injects accurate, domain-specific context, improving factual accuracy without fine-tuning, and can be implemented with efficient retrieval for low latency.

Why this answer

Option A is correct because Retrieval Augmented Generation (RAG) provides relevant, up-to-date context from the internal database, improving factual accuracy without fine-tuning, and can be optimized for low latency. Option B (few-shot) is limited by context window size and increases token usage, potentially increasing latency. Option C (increasing temperature) is counterproductive as it increases randomness.

Option D (reducing topP) does not add factual knowledge and may reduce output quality.

Full explanation →

347

MCQhard

A company uses RAG (Retrieval-Augmented Generation) with OCI OpenSearch and OCI Generative AI. The system retrieves irrelevant documents. What is the first step to debug?

A.Use a different LLM

B.Increase the number of retrieved documents

C.Check the embeddings quality

D.Lower the temperature

AnswerC

Embeddings directly impact retrieval relevance; low-quality embeddings cause irrelevant results.

Why this answer

Option A is correct because poor quality embeddings often cause retrieval of irrelevant documents. Checking and improving embeddings (e.g., using a better model or fine-tuning) should be the first step. Option B (increasing retrieved documents) may include more noise.

Option C (different LLM) does not address retrieval. Option D (lower temperature) affects generation but not retrieval.

Full explanation →

348

MCQmedium

A legal firm needs an AI assistant that can answer questions based on a large corpus of internal regulations that change quarterly. The firm also requires high accuracy and the ability to cite sources. Which approach should the firm choose?

A.Build a RAG application with vector search and citation generation

B.Use a pre-trained model without customization

C.Implement a rule-based search engine

D.Fine-tune a pre-trained model on the current regulations

AnswerA

RAG retrieves relevant documents and can cite sources, and updating the knowledge base is straightforward.

Why this answer

A RAG application with vector search retrieves relevant regulations and allows citation generation, keeping answers up-to-date without retraining.

Full explanation →

349

Multi-Selecthard

Which THREE components are essential for a production-grade generative AI deployment on OCI? (Select THREE)

Select 3 answers

A.OCI Logging for audit

B.OCI Vault for secrets

C.OCI Data Flow for data processing

D.Dedicated AI cluster

E.OCI IAM policies for access control

AnswersA, D, E

Logging is critical for monitoring and compliance.

Why this answer

A is correct because OCI Logging provides centralized audit logging for all API calls and resource changes in the generative AI deployment. This is essential for compliance, security monitoring, and troubleshooting in a production environment, as it captures detailed logs of model invocations, data access, and configuration changes.

Exam trap

Oracle often tests the distinction between 'essential' components for deployment versus 'useful but optional' services, leading candidates to select OCI Vault or OCI Data Flow because they are commonly used in AI pipelines, but they are not mandatory for a production-grade deployment.

Full explanation →

350

Multi-Selecthard

An OCI administrator is configuring access control for OCI Generative AI. Which three IAM components are required to allow a group of data scientists to call the GenerateText API? (Choose three.)

Select 3 answers

A.An IAM group for the data scientists

B.A local peering gateway

C.A policy granting ai-services-generative-ai-family in a compartment

D.A dynamic group

E.A compartment for the AI resources

AnswersA, C, E

The group is the subject of the policy.

Why this answer

An IAM group is required to organize the data scientists into a logical set of principals. IAM policies are then attached to this group to grant permissions, ensuring only members of the group can call the GenerateText API. Without a group, you cannot apply a policy to a collection of users.

Exam trap

The trap here is that candidates confuse dynamic groups (for resources) with IAM groups (for users), or mistakenly think a networking component like a local peering gateway is required for API access control.

Full explanation →

351

MCQhard

Refer to the exhibit. A developer sends this JSON payload to the /chat endpoint. The response includes an error that 'maxTokens' must be an integer. What is the issue?

A.The compartmentId is missing

B.The temperature value is too low

C.The parameter should be 'max_tokens' instead of 'maxTokens'

D.The model name 'cohere.command-light' is incorrect

AnswerC

The API expects snake_case parameters.

Why this answer

The OCI Generative AI service expects the parameter name 'max_tokens' (snake_case) for specifying the maximum number of tokens in the response, not 'maxTokens' (camelCase). The error message indicates that the value is not being recognized as an integer because the JSON key itself is incorrect, causing the service to fail validation.

Exam trap

Oracle often tests the difference between snake_case and camelCase parameter names in OCI services, and the trap here is that candidates familiar with OpenAI's API conventions might assume 'maxTokens' is correct, overlooking OCI's strict snake_case requirement.

How to eliminate wrong answers

Option A is wrong because the compartmentId is not required for the /chat endpoint when using a model that is accessible via the service's default compartment or when the request is authenticated via API keys that have the necessary permissions. Option B is wrong because a temperature value of 0.5 is within the valid range (typically 0.0 to 1.0) and does not cause an error about 'maxTokens' needing to be an integer. Option D is wrong because 'cohere.command-light' is a valid model name in OCI Generative AI, and the error message specifically points to the 'maxTokens' parameter, not the model name.

Full explanation →

352

Multi-Selecteasy

Which TWO operations are supported by the OCI Generative AI inference API?

Select 2 answers

A.EmbedText

B.SummarizeText

C.TranslateText

D.GenerateText

E.ChatCompletion

AnswersA, D

The embed_text endpoint creates vector embeddings for input text.

Why this answer

Options A and B are correct. The inference API includes generate_text for text generation and embed_text for creating embeddings. Option C is not a separate API; summarization can be done via generate_text.

Option D is incorrect because translation is not a built-in API. Option E is incorrect because chat is a model capability but not a distinct API endpoint; the API uses generate_text for chat.

Full explanation →

353

MCQmedium

An organization wants to use OCI Generative AI to build a summarization tool but must ensure that all inference requests are logged for audit purposes. Which approach should they take?

A.Implement a custom proxy with logging

B.Enable OCI Audit service

C.Enable OCI Logging on the generative AI endpoint

D.Use OCI Vault to store logs

AnswerC

OCI Logging can capture detailed request and response data for audit.

Why this answer

Option C is correct because OCI Logging can be enabled directly on the Generative AI endpoint to capture all inference requests and responses as logs, which can then be used for audit purposes. This is the native, recommended approach for logging API calls without introducing additional infrastructure or complexity.

Exam trap

Oracle often tests the distinction between management-plane logging (OCI Audit) and data-plane logging (OCI Logging on the service endpoint), leading candidates to mistakenly choose OCI Audit for inference request auditing.

How to eliminate wrong answers

Option A is wrong because implementing a custom proxy with logging introduces unnecessary complexity, latency, and potential security gaps, and is not a native OCI solution for logging inference requests. Option B is wrong because the OCI Audit service captures only management-plane events (e.g., create, update, delete operations on resources), not data-plane events like individual inference API calls. Option D is wrong because OCI Vault is designed for storing secrets (e.g., API keys, passwords), not for storing logs; logs should be stored in OCI Logging or Object Storage.

Full explanation →

354

Multi-Selectmedium

Which TWO actions are required to use a custom fine-tuned model via OCI Generative AI? (Choose two.)

Select 2 answers

A.Deploy the model to an endpoint

B.Provision a private endpoint for the model

C.Enable cross-region replication

D.Grant access to other tenancies

E.Complete the fine-tuning job successfully

AnswersA, E

A deployed endpoint is needed to invoke the model.

Why this answer

Options B and D are required. B: Fine-tuning must be complete. D: Model endpoint must be deployed.

A is optional (private endpoint). C is not needed if within same region. E is not required unless cross-tenant.

Full explanation →

355

MCQhard

A data scientist is fine-tuning a generative AI model on OCI Data Science using a custom container with GPU resources. The training job fails with an out-of-memory error despite the GPU instance having sufficient memory. The job works fine on a smaller dataset. What is the most likely cause?

A.The training script has a memory leak

B.The GPU instance is not supported by OCI Data Science

C.The model is not compatible with the PyTorch version

D.The batch size is too large for the GPU memory

AnswerD

Large batch size can cause OOM errors; reducing batch size resolves it.

Why this answer

The most likely cause is that the batch size is too large for the GPU memory. Even though the GPU instance has sufficient total memory, a batch size that exceeds the available GPU memory (after accounting for model parameters, gradients, and optimizer states) will trigger an out-of-memory (OOM) error. Reducing the batch size allows the model to fit within the GPU's memory limits, which explains why the job works on a smaller dataset but fails on a larger one.

Exam trap

Oracle often tests the misconception that 'sufficient instance memory' guarantees no OOM errors, ignoring that GPU memory is a separate, finite resource that must accommodate both the model and the batch data simultaneously.

How to eliminate wrong answers

Option A is wrong because a memory leak would cause gradual memory consumption over time, not a consistent OOM error that correlates with dataset size; the error occurs immediately with a larger dataset, not after prolonged execution. Option B is wrong because OCI Data Science supports a wide range of GPU instances (e.g., VM.GPU.A10.1, VM.GPU.A100.1), and if the instance were unsupported, the job would fail with a different error (e.g., 'unsupported instance shape') rather than an OOM error. Option C is wrong because model compatibility with PyTorch version would typically cause import or runtime errors (e.g., 'module not found' or 'operator not implemented'), not an OOM error; PyTorch version mismatches do not directly affect memory allocation.

Full explanation →

356

MCQhard

During deployment of a generative AI model, the inference endpoint returns high latency and timeouts. The model is deployed on a dedicated AI cluster with multiple nodes. What is the most likely cause?

A.The inference request batch size is too small

B.The model is too large for the cluster memory

C.The cluster nodes are configured with insufficient parallelism or the model is not properly parallelized across nodes

D.The client-side network is slow

AnswerC

Correct: Without proper model parallelism, nodes may be underutilized leading to high per-request latency.

Why this answer

High latency and timeouts in a distributed AI inference deployment typically indicate that the model workload is not efficiently distributed across the cluster nodes. Option C is correct because insufficient parallelism—either due to misconfigured node resources (e.g., insufficient vCPUs, GPU cores, or memory bandwidth) or improper model sharding/parallelization—causes some nodes to become bottlenecks while others remain underutilized, leading to queuing delays and eventual timeouts.

Exam trap

Oracle often tests the misconception that high latency is always due to insufficient resources (e.g., memory or batch size), but the real trap here is that candidates overlook the critical role of parallelization configuration in distributed inference—assuming that simply adding more nodes automatically distributes the workload.

How to eliminate wrong answers

Option A is wrong because a batch size that is too small would actually reduce latency per request (though it might lower throughput), not cause high latency or timeouts; the issue here is overload, not underutilization. Option B is wrong because if the model were too large for the cluster memory, the deployment would fail to load or would crash immediately, not return high latency and timeouts during inference. Option D is wrong because client-side network slowness would manifest as high network round-trip time or packet loss, not as server-side timeouts from the inference endpoint; the problem is explicitly on the deployment side.

Full explanation →

357

MCQeasy

Refer to the exhibit. A user receives this error when using the OCI CLI to chat with a model. What is the most likely cause?

A.The model is not deployed.

B.The model ID is incorrect.

C.The OCI CLI is not configured with the correct region.

D.The user does not have the required IAM policy to invoke the model.

AnswerD

Correct: The 'AuthorizationFailure' error indicates insufficient permissions.

Why this answer

The error occurs because the user lacks the necessary IAM policy to invoke the model. In OCI, even if the model is deployed and the CLI is correctly configured, the IAM policy must grant the user or group the 'inference' permission on the specific model or model family. Without this policy, the OCI CLI returns an authorization error when attempting to chat with the model.

Exam trap

The trap here is that candidates often assume the error is due to a misconfiguration (region or model ID) rather than a missing IAM policy, because the CLI error message may not explicitly say 'authorization' and instead show a generic 'service error'.

How to eliminate wrong answers

Option A is wrong because if the model were not deployed, the error would typically indicate that the model endpoint is unavailable or not found, not an authorization failure. Option B is wrong because an incorrect model ID would result in a 'model not found' or 'invalid parameter' error, not an authorization error. Option C is wrong because an incorrect region configuration would cause connectivity or endpoint resolution errors, such as 'region not found' or 'endpoint unreachable', not an IAM permission error.

Full explanation →

358

MCQmedium

A healthcare company wants to use OCI Generative AI to summarize patient medical records while ensuring PHI compliance. Which OCI service feature should they enable?

A.Configure a Virtual Cloud Network (VCN) with private subnets

B.Deploy a Web Application Firewall (WAF) in front of the API

C.Set up Identity and Access Management (IAM) policies to restrict access

D.Use the data masking capability within OCI Generative AI

AnswerD

OCI Generative AI supports data masking to redact sensitive information like PHI.

Why this answer

Option B is correct: Data masking helps redact PHI. Option A (VCN) is network, not data masking. Option C (IAM) is access control.

Option D (WAF) is web security.

Full explanation →

359

MCQhard

Your company uses OCI Data Science for model development and deployment. You have a generative AI model that requires dynamic batching for efficient inference. You deployed the model using the OCI Model Deployment service with a custom inference script in a Docker container. However, you notice that the batch size is fixed at 1, leading to low throughput. The model can process multiple requests together efficiently. You want to implement dynamic batching to increase throughput without significantly increasing latency for individual requests. What is the best approach?

A.Modify the model deployment to use a larger GPU shape to handle larger batches

B.Enable the model deployment's built-in request batching feature

C.Use OCI Streaming service to buffer requests and then invoke the model in batches from a consumer

D.Implement a queuing mechanism in the inference script that collects incoming requests and processes them in batches

AnswerD

This is a common pattern for dynamic batching and can be done within the custom container.

Why this answer

Option D is correct because dynamic batching must be implemented at the application level within the custom inference script when using OCI Model Deployment. The service does not provide built-in request batching; instead, you need to collect incoming requests in a queue and process them together in a single forward pass, which maximizes GPU utilization while controlling latency via a timeout or max batch size.

Exam trap

The trap here is that candidates assume OCI Model Deployment has a built-in batching feature similar to some cloud ML services, but OCI requires you to implement batching logic yourself in the custom inference script.

How to eliminate wrong answers

Option A is wrong because simply using a larger GPU shape does not change the fact that the inference script processes one request at a time; throughput gains require batching logic, not just more compute. Option B is wrong because OCI Model Deployment does not have a built-in request batching feature; this is a common misconception—the service routes each request individually to the container. Option C is wrong because OCI Streaming is designed for asynchronous, durable message buffering and would introduce significant latency and complexity; it is not suitable for real-time inference where low latency is critical.

Full explanation →

360

MCQhard

An AI assistant needs to solve complex math word problems step by step. Which prompting technique is most suitable?

A.Chain-of-thought prompting with few-shot examples.

B.Zero-shot prompting with the problem only.

C.Prompting with a high temperature setting.

D.Using a model with a larger context window.

AnswerA

Correct: CoT with examples guides reasoning.

Why this answer

Chain-of-thought prompting with few-shot examples is most suitable because it guides the LLM to break down complex math word problems into intermediate reasoning steps, mimicking human problem-solving. Few-shot examples provide a template for the desired reasoning structure, which significantly improves accuracy on multi-step arithmetic tasks compared to direct answer generation.

Exam trap

Oracle often tests the misconception that simply increasing model capacity (context window) or randomness (temperature) can substitute for structured reasoning, when in fact the prompting strategy itself is the critical factor for multi-step tasks.

How to eliminate wrong answers

Option B is wrong because zero-shot prompting lacks the explicit reasoning structure needed for multi-step math problems, often leading to incorrect or incomplete answers. Option C is wrong because a high temperature setting increases randomness in token selection, which is counterproductive for deterministic math tasks requiring precise calculations. Option D is wrong because a larger context window does not inherently improve reasoning quality; it only allows more input tokens, but without structured prompting the model may still fail to perform step-by-step logic.

Full explanation →

361

Multi-Selectmedium

A data scientist is designing a RAG system using OCI Data Science and OCI Generative AI. Which two considerations are critical for optimal retrieval quality? (Choose 2.)

Select 2 answers

A.Use the same embedding model for both indexing and querying.

B.Increase the chunk size to the maximum allowed to capture more context.

C.Fine-tune the generation model on domain-specific data.

D.Apply metadata filtering to restrict search domain before vector search.

E.Use a hierarchical index structure for faster search.

AnswersA, D

Mismatched embeddings reduce similarity accuracy.

Why this answer

Options A and C are correct. Using the same embedding model for indexing and querying ensures consistent vector representation, and metadata filtering helps narrow down the search domain, improving relevance. Option B is wrong because too-large chunks can dilute relevance.

Option D is about generation, not retrieval. Option E is about performance, not quality.

Full explanation →

362

MCQhard

A company is building a customer support chatbot that uses Retrieval-Augmented Generation (RAG) with OCI Generative AI. They need low-latency responses and the ability to update the knowledge base daily. Which architecture best meets these requirements?

A.Store embeddings in OCI Object Storage and use OCI Functions to perform similarity search.

B.Use OCI Data Science Notebook Sessions to run the RAG pipeline with a managed Cohere model.

C.Use OCI Streaming to ingest documents and OCI Data Flow to update a knowledge base in OCI Object Storage.

D.Use OCI Search with OpenSearch for the vector database, OCI Generative AI for inference, and Oracle Database for metadata.

AnswerD

OpenSearch provides low-latency vector search and supports daily indexing updates.

Why this answer

Option A is correct because it integrates OCI Search with OpenSearch for low-latency vector search and updates, OCI Generative AI for inference, and Oracle Database for metadata. Option B is incorrect because Functions without a database may not scale well for indexing. Option C is incorrect because using object storage directly for retrieval would be slow.

Option D is incorrect because Data Science notebooks are not suitable for production inference.

Full explanation →

363

MCQeasy

A developer is building a RAG application using OCI Generative AI. They notice that the generated responses often contain outdated information even though the knowledge base is updated daily. What is the most likely cause?

A.The embedding model is not fine-tuned on the latest data.

B.The vector database index is not rebuilt after data updates.

C.The retrieval top-k is set too high.

D.The chunk size is too small, causing loss of context.

AnswerB

If the index is not refreshed, new data is not searchable, leading to outdated results.

Why this answer

Option C is correct because if the vector index is not rebuilt after data updates, retrieval will still return old chunks. Option A is wrong because fine-tuning the embedding model is not required for updating knowledge. Option B is wrong because chunk size affects context but not freshness.

Option D is wrong because a high top-k would include more results, but still old if not updated.

Full explanation →

364

MCQmedium

A developer wants to deploy a RAG application using OCI Generative AI for both embedding and text generation while minimizing costs. Which strategy is most effective?

A.Use a larger generation model

B.Cache frequent queries and their embeddings

C.Reduce chunk size to decrease embedding calls

D.Use a larger embedding model for better accuracy

AnswerB

Caching reduces redundant embedding API calls, lowering costs.

Why this answer

Caching embeddings for frequent queries eliminates repeated embedding API calls, directly reducing cost.

Full explanation →

365

MCQmedium

A developer sends this request but receives an error: "modelId not found". Which is the most likely cause?

A.The temperature parameter is out of range.

B.The compartment ID is incorrect.

C.The modelId "cohere.command" is deprecated.

D.The presencePenalty parameter is misspelled.

AnswerC

Deprecated model IDs return 'not found' errors; the correct ID should be used.

Why this answer

The error indicates the model ID is not recognized. "cohere.command" is a legacy model ID; the correct current ID might be "cohere.command-r" or similar. Compartment issues, temperature range, or parameter spelling would produce different errors.

Full explanation →

366

MCQhard

A large organization is deploying a multi-tenant RAG application on OCI, where each tenant has its own set of documents. They use a shared OCI OpenSearch cluster with tenant_id metadata to filter documents. They observe that occasionally, queries from one tenant return results from another tenant's documents. The security team requires strict isolation. They have verified that the metadata filter is correctly applied in the search request. What is the most likely root cause?

A.The OpenSearch index has not been refreshed after ingestion of new documents.

B.The tenant_id field is not indexed as a keyword, causing incorrect filtering.

C.The embedding model has been trained on data from multiple tenants, causing cross-tenant leakage.

D.The metadata filter is being applied after the vector search instead of before.

AnswerB

If the field is not indexed properly, the filter may not match correctly, returning results from other tenants.

Why this answer

Option D is correct because if the tenant_id field is not indexed as a keyword, the filter may not be applied correctly, leading to cross-tenant results. Option A (index not refreshed) affects availability, not isolation. Option B (order of filter) does not matter.

Option C (embedding model training) does not cause retrieval leakage.

Full explanation →

367

MCQmedium

A company has deployed a generative AI model endpoint on OCI. They want to monitor token usage and latency for cost optimization. Which OCI service should they use to collect these metrics?

A.OCI Monitoring

B.OCI Events

C.OCI Notifications

D.OCI Logging

AnswerA

OCI Monitoring collects and visualizes metrics such as token count and latency.

Why this answer

A is correct because OCI Monitoring is the native telemetry service that collects and stores metrics such as token usage (e.g., input/output token counts) and latency (e.g., model inference latency) from OCI Generative AI endpoints. These metrics are automatically emitted by the OCI Generative AI service and can be queried via the Monitoring API or visualized in the Console, enabling cost optimization by tracking consumption patterns.

Exam trap

The trap here is that candidates confuse OCI Logging (which collects unstructured logs) with OCI Monitoring (which collects structured metrics), leading them to select Logging for numeric performance data like token counts and latency.

How to eliminate wrong answers

Option B (OCI Events) is wrong because OCI Events is a notification service that triggers actions based on changes in OCI resources (e.g., state transitions), not a service for collecting time-series metrics like token usage or latency. Option C (OCI Notifications) is wrong because OCI Notifications is a pub/sub messaging service for distributing alerts and messages, not a metric collection or storage service. Option D (OCI Logging) is wrong because OCI Logging captures log data (e.g., text-based audit logs, error logs) from resources, not structured numeric metrics; metrics require OCI Monitoring's custom or predefined metric streams.

Full explanation →

368

MCQhard

A multinational corporation uses OCI Generative AI to power a customer support chatbot. The chatbot uses a fine-tuned model deployed on a dedicated AI cluster in the us-ashburn-1 region. The application is used globally, and users in Europe are experiencing high latency (over 2 seconds) compared to users in North America (under 500 ms). The company has a requirement to keep all data within the US due to compliance, so they cannot deploy in Europe. The latency is not due to network bandwidth but due to the inference time. The monitoring shows that the cluster is at 80% utilization during peak hours. The team wants to reduce the latency for European users without violating data residency. What is the best course of action?

A.Optimize the model using techniques like quantization or pruning to reduce inference time.

B.Implement an edge caching layer in Europe to serve common queries.

C.Increase the number of nodes in the cluster to distribute the load.

D.Deploy an additional endpoint in a European region and use a global load balancer.

AnswerA

Model optimization directly reduces per-request latency without moving data.

Why this answer

Option A is correct because the latency issue is explicitly due to inference time, not network bandwidth or cluster utilization. Model optimization techniques like quantization (reducing precision of weights from FP32 to INT8) and pruning (removing redundant neurons) directly reduce the computational cost per inference, thereby lowering the response time without moving data or changing the deployment region. This approach satisfies the data residency constraint while addressing the root cause of high latency for European users.

Exam trap

The trap here is that candidates may confuse latency caused by inference time with latency caused by network distance or cluster load, leading them to choose scaling or caching solutions that do not address the fundamental computational bottleneck.

How to eliminate wrong answers

Option B is wrong because an edge caching layer in Europe would only serve cached responses for common queries; it does not reduce inference time for unique or dynamic queries, and caching introduces stale data risks for a customer support chatbot that may require real-time accuracy. Option C is wrong because increasing the number of nodes in the cluster addresses throughput (handling more concurrent requests) but does not reduce the per-request inference time; with 80% utilization, the cluster is not saturated, so adding nodes would not lower latency for individual inference calls. Option D is wrong because deploying an additional endpoint in a European region would violate the compliance requirement to keep all data within the US; even with a global load balancer, inference would still require data processing in Europe, which is not permitted.

Full explanation →

369

MCQhard

A data scientist is fine-tuning a model on OCI Generative AI with a custom dataset. They receive a "QuotaExceeded" error during training. What is the most likely cause?

A.Exceeded the training compute unit quota

B.Exceeded the API call rate limit

C.Exceeded the model storage limit

D.Exceeded the data transfer out limit

AnswerA

Fine-tuning uses training compute units; quota may be exceeded.

Why this answer

Option B is correct: Fine-tuning consumes training compute units, which have quota limits. Option A (API call rate) is for API requests. Option C (model storage) is for storing models.

Option D (data transfer) is for egress.

Full explanation →

370

MCQmedium

Refer to the exhibit. A developer runs this command and sees that the 'cohere.embed-english-v3.0' model is INACTIVE. What is the most likely cause?

A.The model is not supported in the current region.

B.The API call lacks the required OCI policy for the model.

C.The model has been deprecated and is no longer available.

D.The compartment does not have access to the model.

AnswerC

An INACTIVE state indicates the model has been deprecated or retired, making it unavailable for new inference requests.

Why this answer

The 'cohere.embed-english-v3.0' model is listed as INACTIVE because Oracle Cloud Infrastructure (OCI) has deprecated it, meaning it is no longer available for inference. When a model is deprecated, its status changes to INACTIVE, and any attempt to invoke it will fail, even if the region, policies, and compartment permissions are correctly configured.

Exam trap

Oracle often tests the distinction between model lifecycle states (INACTIVE vs. ACTIVE) and common operational errors (policy, region, compartment), leading candidates to confuse a deprecation event with a configuration or permission issue.

How to eliminate wrong answers

Option A is wrong because if the model were unsupported in the current region, the command would typically return a 'not found' or 'unsupported' error, not an INACTIVE status. Option B is wrong because a missing OCI policy would result in a 403 Forbidden or authorization error, not an INACTIVE model status. Option D is wrong because compartment access issues would produce a permissions error, not an INACTIVE status; the model's availability is independent of compartment-level access.

Full explanation →

371

Multi-Selectmedium

Which TWO of the following are benefits of using OCI Generative AI service compared to self-hosting an LLM?

Select 2 answers

A.Lower latency always

B.No data egress costs

C.Built-in content safety filters

D.Automatic scaling

E.Full control over model weights

AnswersC, D

OCI Generative AI includes safety filters.

Why this answer

Options A and C are correct. OCI Generative AI provides automatic scaling (A) and built-in content safety filters (C). Self-hosting gives full control over weights (B) and may have lower latency if optimized (D is not always true).

Data egress costs (E) may still apply.

Full explanation →

372

MCQhard

A security team requires that all OCI GenAI API calls be logged and audited. Despite enabling Audit logs in OCI, they do not see GenAI API calls. What is the most likely reason?

A.The audit log retention policy is too short and logs were overwritten.

B.The user is not a tenancy administrator.

C.OCI Audit currently only records control-plane operations; data-plane operations like inference are not logged.

D.The API calls are made by an OCI function, which is not logged.

AnswerC

Data-plane calls (e.g., model inference) are not captured by Audit; use Service Connector Hub for logging.

Why this answer

C is correct because OCI Audit service is designed to log control-plane operations (e.g., creating, updating, or deleting resources) but does not log data-plane operations such as inference API calls to the Generative AI service. The GenAI inference calls (e.g., generating text) are data-plane operations that occur on the service endpoint, not on the OCI control-plane API, so they are not captured by Audit logs. To log data-plane operations, you would need to use a different mechanism, such as OCI Vault for key usage or custom logging via API Gateway.

Exam trap

The trap here is that candidates assume enabling Audit logs captures all API activity, but OCI Audit explicitly excludes data-plane operations, which is a common misconception tested in the 1Z0-1127 exam.

How to eliminate wrong answers

Option A is wrong because audit log retention policies affect how long logs are kept, not whether specific API calls are recorded in the first place; if the calls were never logged, retention is irrelevant. Option B is wrong because tenancy administrator privileges are not required to view Audit logs; any user with the appropriate IAM policies (e.g., Audit Log Readers) can access them, and the issue is about logging scope, not permissions. Option D is wrong because OCI Functions calls are logged if they are control-plane operations; the fact that an API call originates from a function does not exclude it from Audit logging—the exclusion is based on whether the call is control-plane or data-plane.

Full explanation →

373

Multi-Selectmedium

Which TWO of the following are required to fine-tune a model using OCI Generative AI Service?

Select 2 answers

A.A training dataset in the required format

B.The base model identifier

C.A compartment with sufficient quota

D.An OCI API key

E.A dedicated AI cluster

AnswersA, B

Training data is essential for fine-tuning.

Why this answer

A is correct because fine-tuning a model in OCI Generative AI Service requires a training dataset in the required format (JSONL with prompt-completion pairs) to provide the task-specific examples that adjust the model's weights. B is correct because you must specify the base model identifier (e.g., 'cohere.command-light-14-07-2024') to indicate which pre-trained model to fine-tune, as the service uses this to load the correct architecture and initial parameters.

Exam trap

Oracle often tests the misconception that you need a dedicated AI cluster or an API key for every operation, but OCI Generative AI Service abstracts infrastructure management and supports multiple authentication methods, making those options distractors.

Full explanation →

374

MCQhard

A data scientist is using the OCI Generative AI SDK to create embeddings for a large corpus of legal documents. They want to perform semantic search. Which endpoint should they use?

A./v1/classify

B./v1/embed

C./v1/generate

D./v1/chat

AnswerB

The /v1/embed endpoint returns embeddings that can be stored in a vector database and used for semantic search.

Why this answer

The /v1/embed endpoint is specifically designed to generate vector embeddings from input text, which are numerical representations that capture semantic meaning. For semantic search over a large corpus of legal documents, embeddings must be created to enable similarity comparisons, making this the correct choice.

Exam trap

Oracle often tests the distinction between embedding endpoints and generation/classification endpoints, trapping candidates who confuse the purpose of semantic search (which requires embeddings) with text generation or classification tasks.

How to eliminate wrong answers

Option A is wrong because /v1/classify is used for text classification tasks (e.g., sentiment analysis or topic labeling), not for generating embeddings. Option C is wrong because /v1/generate is for text generation (e.g., completing a prompt or producing new content), not for creating vector representations. Option D is wrong because /v1/chat is designed for conversational interactions with a chat model, not for producing embeddings for semantic search.

Full explanation →

375

MCQhard

A company is using Oracle Database 23ai AI Vector Search for their RAG pipeline. They notice that similarity search often returns chunks that are semantically unrelated but syntactically similar due to token overlap. Which vector index type should they consider to improve semantic relevance?

A.IVF_SQ8 index

B.IVF_FLAT index

C.HNSW index

D.Use the default index type, which is IVF_FLAT

AnswerC

HNSW builds a hierarchical graph that captures semantic neighborhood better, reducing token overlap effects.

Why this answer

Option C is correct because the Hierarchical Navigable Small World (HNSW) index is more effective for semantic search than IVF indices because it preserves global graph structure. Option A is wrong because IVF_FLAT uses inverted files and may suffer from token overlap bias. Option B is wrong because IVF_SQ8 is a quantized version of IVF, not better for semantics.

Option D is wrong because the default index is often IVF_FLAT.

Full explanation →

Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 301–375