Knowledge + Practice

Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 901–975

991 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 13 of 14

901

MCQmedium

Refer to the exhibit. A team created this dedicated AI cluster. However, when they try to create a model deployment, the deployment fails with an error indicating insufficient public IPs. What change to the cluster configuration should they make?

A.Change assignPublicIp to true.

B.Increase the nodeCount to 8.

C.Attach a different subnet that has more available public IPs.

D.Change the AI cluster shape to VM.GPU.A10.2.

AnswerA

Correct: Enabling public IPs allows nodes to have public endpoints.

Why this answer

The error indicates insufficient public IPs because the cluster's subnet does not have enough available public IP addresses. Setting `assignPublicIp` to `true` in the cluster configuration allows the cluster to automatically allocate public IPs from the subnet's pool, resolving the shortage. This is required for model deployments that need public endpoints.

Exam trap

The trap here is that candidates might think the issue is a subnet IP shortage (Option C) or a scaling problem (Option B), when the real cause is a misconfigured public IP assignment flag that prevents the cluster from using available IPs.

How to eliminate wrong answers

Option B is wrong because increasing the nodeCount to 8 would require even more public IPs, exacerbating the shortage rather than fixing it. Option C is wrong because attaching a different subnet with more public IPs is a workaround, but the root cause is that the cluster is not configured to assign public IPs; changing the subnet does not enable the assignment. Option D is wrong because changing the AI cluster shape to VM.GPU.A10.2 does not affect public IP allocation; it only changes the GPU type and compute capacity.

Full explanation →

902

MCQmedium

An e-commerce company uses OCI Generative AI to generate product descriptions. They have fine-tuned the model on their product catalog. They notice that the descriptions are accurate but lack creativity and are repetitive. They want to maintain accuracy while adding variety. What change should they make?

A.Increase the top_p sampling from 0.9 to 1.0.

B.Increase the temperature from 0.2 to 0.5.

C.Use a different base model.

D.Add more training examples with diverse descriptions.

AnswerB

A moderate temperature increase adds variety while preserving factual accuracy.

Why this answer

Option B is correct. Slightly increasing temperature (e.g., from 0.2 to 0.5) introduces controlled variability without significantly compromising accuracy. Option A is wrong because top_p=1.0 samples from the full distribution, which can add noise.

Option C is wrong because adding more training data requires effort and time, and may not immediately add variety. Option D is wrong because changing the base model could hurt accuracy and requires retraining.

Full explanation →

903

MCQmedium

An AI specialist is troubleshooting why a fine-tuned model produces inconsistent results across different inference calls. What is the most likely cause?

A.The base model is not suitable

B.The temperature is set too high

C.The model is overfitted

D.The fine-tuning dataset is too small

AnswerB

High temperature increases randomness, causing variable outputs.

Why this answer

Temperature controls the randomness of token sampling during inference. A high temperature (e.g., >1.0) increases the probability of selecting less likely tokens, causing the model to produce varied outputs for the same input across different calls. This is the most direct cause of inconsistent results when the base model and fine-tuning are otherwise sound.

Exam trap

Oracle often tests the misconception that overfitting (Option C) causes inconsistency, but overfitting actually reduces variance by memorizing patterns; the trap is confusing output variability with poor generalization.

How to eliminate wrong answers

Option A is wrong because an unsuitable base model would cause consistently poor or biased outputs, not inconsistency across calls; the base model's suitability affects overall quality, not per-call variance. Option C is wrong because overfitting leads to memorization of training data, producing deterministic or near-identical outputs for similar inputs, not random inconsistency. Option D is wrong because a small fine-tuning dataset typically causes underfitting or poor generalization, not random variation across inference calls; inconsistency from small data would manifest as high variance across different inputs, not across repeated calls with the same input.

Full explanation →

904

Multi-Selectmedium

A company is deploying a LangChain application on OCI and needs to implement error handling and rate limit management. Which THREE strategies should they consider? (Choose THREE.)

Select 3 answers

A.Implement retry logic with exponential backoff when receiving 429 (Too Many Requests) responses

B.Increase the chunk_size parameter in the text splitter

C.Use a caching layer to avoid repeating identical API calls

D.Monitor token usage and set up alerts to stay within service limits

E.Resubscribe to the model endpoint if errors occur

AnswersA, C, D

Exponential backoff is a standard approach to handle rate limits by retrying after increasing delays.

Why this answer

A is correct because HTTP 429 (Too Many Requests) responses indicate rate limiting by the API provider. Implementing retry logic with exponential backoff is a standard resilience pattern that progressively increases wait times between retries, preventing further rate limit violations and allowing the system to recover gracefully without overwhelming the endpoint.

Exam trap

Cisco often tests the distinction between strategies that directly address API rate limiting (retry logic, caching, monitoring) versus unrelated configuration parameters like chunk_size, which candidates may mistakenly associate with performance tuning.

Full explanation →

905

MCQmedium

A team is building a multilingual semantic search application. They need to index documents in English, Spanish, and French, and later search using queries in any of these languages. Which embedding model should they use?

A.Meta Llama 3 70B

B.Cohere Command R

C.Cohere embed-multilingual-v3.0

D.Cohere embed-english-v3.0

AnswerC

This model is designed for multilingual text, supporting English, Spanish, French, and many other languages.

Why this answer

Cohere embed-multilingual-v3.0 supports multiple languages in a single model, enabling cross-lingual semantic search. embed-english-v3.0 is English-only. Command R and Llama 3 are not embedding models.

Full explanation →

906

MCQhard

An organization is fine-tuning a large language model on OCI Data Science. They must ensure that the training data remains within a specific geographic region and is encrypted at rest. Which combination of resources should they use?

A.OCI Object Storage bucket with a bucket policy and default encryption, created in the required region.

B.OCI Database with Transparent Data Encryption, storing the training data in tables.

C.OCI File Storage with export options and encryption, mounted to the Data Science session.

D.OCI Block Volume with encryption, attached to the Data Science notebook session.

AnswerA

Bucket policy controls access, encryption secures data at rest, and region selection ensures data residency.

Why this answer

Option A is correct because OCI Object Storage with default encryption ensures data is encrypted at rest using AES-256, and a bucket policy can enforce that data remains within a specific geographic region by restricting cross-region replication or access. This combination directly meets the requirements of regional data residency and encryption at rest for training data used in OCI Data Science.

Exam trap

The trap here is that candidates may confuse encryption at rest with data residency enforcement, assuming any encrypted storage (like Block Volume or File Storage) automatically guarantees geographic containment, but only Object Storage provides bucket-level policies to explicitly restrict data movement across regions.

How to eliminate wrong answers

Option B is wrong because OCI Database with Transparent Data Encryption is designed for transactional workloads, not for storing large-scale training data for LLM fine-tuning, and it does not inherently enforce geographic region constraints on the data. Option C is wrong because OCI File Storage with export options and encryption can be mounted to a Data Science session, but it does not provide native mechanisms to enforce regional data residency; the data could be replicated or accessed across regions. Option D is wrong because OCI Block Volume with encryption attached to a notebook session encrypts data at rest, but it does not offer policy controls to ensure the data remains within a specific geographic region, as block volumes are tied to the compute instance's availability domain, not the broader region.

Full explanation →

907

MCQeasy

Which statement accurately describes the T-Few fine-tuning technique used in OCI Generative AI?

A.It automatically adjusts hyperparameters during inference.

B.It does not require any training data and works by prompting only.

C.It updates all model parameters, requiring substantial compute resources.

D.It is a parameter-efficient fine-tuning method that updates only a fraction of the model parameters.

AnswerD

T-Few uses low-rank adaptations to efficiently fine-tune models.

Why this answer

The T-Few fine-tuning technique is a parameter-efficient fine-tuning (PEFT) method that updates only a small fraction of the model's parameters, typically by introducing and training adapter layers or using low-rank updates. This approach significantly reduces computational and memory requirements compared to full fine-tuning, making it suitable for adapting large language models with limited resources. In OCI Generative AI, T-Few enables efficient customization without retraining the entire model.

Exam trap

The trap here is that candidates often confuse parameter-efficient fine-tuning (PEFT) with full fine-tuning or prompting, leading them to select options that describe full parameter updates or no training at all, rather than recognizing T-Few as a lightweight adaptation method.

How to eliminate wrong answers

Option A is wrong because T-Few does not automatically adjust hyperparameters during inference; hyperparameters are set before training and remain fixed during inference. Option B is wrong because T-Few requires training data for fine-tuning, unlike zero-shot prompting which works by prompting only without any training data. Option C is wrong because T-Few does not update all model parameters; it is specifically designed to update only a fraction of parameters, avoiding the substantial compute resources required for full fine-tuning.

Full explanation →

908

MCQhard

A team is building a conversational chatbot using LangChain and OCI Generative AI. They want to maintain a summary of the conversation rather than storing the entire history, to keep within token limits. Which memory class should they use, and what additional step is required when initializing the memory?

A.ConversationBufferWindowMemory; set a window size

B.ConversationTokenBufferMemory; set a token limit

C.ConversationSummaryMemory; provide an LLM to generate summaries

D.ConversationBufferMemory; no additional step

AnswerC

SummaryMemory needs an LLM to compress the conversation into a summary.

Why this answer

ConversationSummaryMemory is designed to maintain a running summary of the conversation instead of storing the full history, which directly addresses the requirement to stay within token limits. The additional step required is providing an LLM (e.g., via `llm=ChatOpenAI(...)`) because the memory class uses the LLM to generate and update the summary dynamically.

Exam trap

Cisco often tests the distinction between memory classes that truncate versus those that summarize, and the trap here is that candidates may confuse ConversationTokenBufferMemory (which drops messages) with summary-based memory, missing the critical requirement to provide an LLM for summary generation.

How to eliminate wrong answers

Option A is wrong because ConversationBufferWindowMemory keeps a fixed window of recent messages, not a summary, so it still stores raw history and does not reduce token usage beyond the window size. Option B is wrong because ConversationTokenBufferMemory drops messages when a token limit is exceeded, but it does not summarize; it simply truncates the history, losing context. Option D is wrong because ConversationBufferMemory stores the entire conversation history verbatim, which would exceed token limits and requires no additional step, making it unsuitable for the stated goal.

Full explanation →

909

Multi-Selecthard

An organization is planning to use OCI Generative AI for sensitive customer data. Which three OCI services or features should they consider for data governance and security?

Select 3 answers

A.OCI Vault for managing API keys

B.OCI Data Safe for data masking and encryption

C.OCI IAM for access control

D.OCI Data Labeling for annotating data

E.OCI Audit for logging API calls

AnswersA, C, E

Secure storage of API keys and secrets is crucial for authentication to Generative AI endpoints.

Why this answer

Option A is correct because OCI Vault is a dedicated service for securely storing and managing secrets, including API keys used to authenticate to OCI Generative AI. By centralizing API key management in Vault, organizations can enforce rotation policies, access controls, and audit trails, which is critical for protecting sensitive customer data when invoking generative AI models.

Exam trap

Oracle often tests the misconception that database security services like Data Safe apply to all data in OCI, but candidates must recognize that Generative AI operates through API calls and does not use a relational database, making Data Safe irrelevant here.

Full explanation →

910

MCQhard

An engineer sets beam search width to 1 during inference on OCI Generative AI. What is the most likely effect on output?

A.More memory usage

B.More diverse outputs

C.Better quality

D.Faster inference

AnswerD

Greedy decoding is the fastest decoding method as it considers only one candidate path.

Why this answer

Beam search width of 1 is equivalent to greedy decoding, where only the single most probable token is selected at each step. This eliminates the need to maintain and compare multiple candidate sequences, significantly reducing computational overhead and memory access, which directly speeds up inference.

Exam trap

Cisco often tests the misconception that a smaller beam width always degrades quality, but the trap here is that beam width 1 (greedy decoding) is actually the fastest inference method, not necessarily the worst for all tasks—though it does sacrifice diversity and sometimes quality.

How to eliminate wrong answers

Option A is wrong because beam width 1 uses less memory (only one candidate sequence is tracked), not more. Option B is wrong because greedy decoding reduces output diversity by always picking the highest-probability token, whereas larger beam widths explore more alternatives. Option C is wrong because greedy decoding often produces repetitive or locally optimal outputs, whereas moderate beam widths (e.g., 4–8) typically yield higher quality by considering global coherence.

Full explanation →

911

MCQeasy

An administrator needs to grant a data science team access to create and manage generative AI model endpoints in a specific compartment. Which policy should they create?

A.Allow group DataScientists to manage all-resources in compartment Production

B.Allow group DataScientists to use generative-ai-model-family in compartment Production

C.Allow group DataScientists to read generative-ai-model-family in compartment Production

D.Allow group DataScientists to manage generative-ai-model-family in compartment Production

AnswerD

This policy grants the required permissions.

Why this answer

Option D is correct because the verb 'manage' grants full CRUD (Create, Read, Update, Delete) permissions on the 'generative-ai-model-family' resource type, which is the specific resource family for generative AI model endpoints in OCI. This allows the DataScientists group to create and manage endpoints within the specified compartment without granting broader access to all resources.

Exam trap

Oracle often tests the distinction between 'use' and 'manage' verbs, where candidates mistakenly choose 'use' thinking it covers creation, but 'use' only allows invocation and access, not resource lifecycle management.

How to eliminate wrong answers

Option A is wrong because 'manage all-resources' grants excessive permissions beyond what is needed, including access to unrelated services like compute or storage, violating the principle of least privilege. Option B is wrong because 'use' only allows actions like invoking or accessing the resource, but does not permit creating, updating, or deleting model endpoints. Option C is wrong because 'read' only allows viewing or listing resources, with no ability to create or manage endpoints.

Full explanation →

912

Multi-Selecthard

Which THREE of the following are best practices when deploying a generative AI model on OCI?

Select 3 answers

A.Store API keys in the model endpoint configuration.

B.Set up autoscaling for the endpoint.

C.Disable logging to save costs.

D.Use a dedicated AI cluster for production endpoints.

E.Enable content filtering on the endpoint.

AnswersB, D, E

Autoscaling handles variable load efficiently.

Why this answer

Option B is correct because autoscaling ensures that the generative AI endpoint can dynamically adjust compute resources based on real-time inference traffic, maintaining low latency and high availability while optimizing cost. On OCI, autoscaling policies can be configured for dedicated AI clusters to scale the number of model serving replicas in response to metrics like CPU utilization or request queue depth.

Exam trap

Oracle often tests the misconception that disabling logging is a valid cost-saving measure, but in reality, logging is essential for operational visibility and compliance, and costs can be managed through sampling or retention policies rather than outright disabling.

Full explanation →

913

MCQeasy

A developer needs to integrate OCI Generative AI into a Python application. Which SDK should they use?

A.Boto3

B.OCI Python SDK

C.Google Cloud client

D.OpenAI library

AnswerB

Correct: OCI Python SDK is the standard integration method.

Why this answer

The OCI Python SDK (Option B) is the correct choice because it provides the official set of libraries and tools for interacting with Oracle Cloud Infrastructure services, including the Generative AI service. This SDK handles authentication, request signing, and API calls specific to OCI, enabling seamless integration of OCI Generative AI into Python applications.

Exam trap

Cisco often tests the misconception that any popular AI SDK (like OpenAI's library) can be used interchangeably with any cloud provider's AI service, but each cloud provider requires its own SDK for authentication and API compatibility.

How to eliminate wrong answers

Option A is wrong because Boto3 is the Amazon Web Services (AWS) SDK for Python, designed to interact with AWS services such as Amazon Bedrock or SageMaker, not with OCI Generative AI. Option C is wrong because the Google Cloud client library is for accessing Google Cloud Platform services like Vertex AI, not OCI. Option D is wrong because the OpenAI library is specifically for calling OpenAI's own API endpoints (e.g., GPT models) and does not support OCI's authentication or API structure.

Full explanation →

914

MCQmedium

During fine-tuning of a Cohere model on OCI Data Science, the loss curve shows a sharp spike after epoch 3. What is the most appropriate action?

A.Gradient clipping.

B.Reduce learning rate.

C.Add more training data.

D.Increase batch size.

AnswerA

Gradient clipping limits gradient values, preventing explosion and stabilizing training.

Why this answer

A sharp spike in the loss curve after epoch 3 during fine-tuning indicates a gradient explosion, where the gradients become excessively large and destabilize the model's weights. Gradient clipping is the most appropriate action because it directly caps the gradient norm (e.g., using `max_grad_norm=1.0` in Cohere's fine-tuning API) to prevent these spikes, ensuring stable training without altering the learning dynamics.

Exam trap

Oracle often tests the distinction between gradient explosion (sharp spikes) and learning rate divergence (gradual increase), leading candidates to incorrectly choose reducing the learning rate instead of gradient clipping.

How to eliminate wrong answers

Option B is wrong because reducing the learning rate addresses gradual divergence or oscillation, not sudden spikes; a sharp spike is a sign of gradient explosion, not a learning rate that is too high. Option C is wrong because adding more training data improves generalization and reduces overfitting but does not mitigate gradient instability during training. Option D is wrong because increasing batch size can stabilize gradient estimates but may also increase memory usage and does not directly prevent individual gradient values from becoming too large; it can even exacerbate gradient explosion by averaging over more samples.

Full explanation →

915

MCQhard

You are a machine learning engineer at a large e-commerce company. You have been tasked with deploying a large language model to power a customer service chatbot that handles product returns and refunds. The model will answer customer queries based on a knowledge base of return policies and FAQs. The company has strict requirements: (1) responses must be factually accurate and grounded in the knowledge base, (2) the system must be cost-effective, and (3) latency should be under 2 seconds per response. You decide to use a pre-trained LLM from OCI Data Science and implement retrieval-augmented generation (RAG). You have two options for the retriever: a dense embedding-based retriever (e.g., using OCI AI Language embeddings) or a sparse keyword-based retriever (e.g., BM25). You also need to decide on the generation model size: a 7B parameter model or a 70B parameter model. You run a pilot test: with the dense retriever + 7B model, average latency is 1.8 seconds and accuracy is 85%. With the sparse retriever + 7B model, latency is 1.2 seconds but accuracy drops to 75%. With the 70B model (any retriever), latency exceeds 5 seconds. Which combination should you choose to meet all requirements?

A.Sparse retriever + 70B model.

B.Dense retriever + 70B model.

C.Sparse retriever + 7B model.

D.Dense retriever + 7B model.

AnswerD

Meets both latency and accuracy requirements.

Why this answer

Option D (dense retriever + 7B model) is correct because it meets all three requirements: factual accuracy (85% accuracy from dense retrieval grounding), latency under 2 seconds (1.8 seconds), and cost-effectiveness (7B model is cheaper to run than 70B). The dense retriever provides better semantic matching for nuanced return policy queries, while the 7B model keeps inference fast and affordable.

Exam trap

Oracle often tests the trade-off between retrieval accuracy and model size, where candidates mistakenly prioritize a larger model (70B) for better generation quality, ignoring that the latency constraint makes it infeasible, or choose a sparse retriever thinking it's faster, but overlook the critical accuracy requirement for grounded responses.

How to eliminate wrong answers

Option A is wrong because the 70B model with any retriever exceeds 5 seconds latency, violating the 2-second requirement. Option B is wrong because the 70B model also exceeds 5 seconds latency, failing the latency requirement. Option C is wrong because the sparse retriever (BM25) with the 7B model yields only 75% accuracy, which is below the acceptable factual accuracy threshold given the strict requirement for grounded responses.

Full explanation →

916

MCQhard

A developer makes an API call to generate text with top_p=1.5. What is the correct way to fix this error?

A.Remove the top_p parameter from the request

B.Increase the temperature parameter to compensate

C.Set top_p to a value between 0 and 1, e.g., 0.9

D.Use the top_k parameter instead

AnswerC

Correcting the value to within the allowed range fixes the error.

Why this answer

The `top_p` parameter, also known as nucleus sampling, must be a probability value between 0 and 1. Setting it to 1.5 is invalid because it exceeds the allowed range, which would cause the API to reject the request. The correct fix is to set `top_p` to a valid value such as 0.9, which restricts token selection to the smallest set whose cumulative probability exceeds that threshold.

Exam trap

Cisco often tests the misconception that `top_p` can be any positive number, similar to `temperature`, when in fact it is a probability threshold strictly bounded between 0 and 1.

How to eliminate wrong answers

Option A is wrong because removing `top_p` entirely changes the sampling behavior to default settings, which may not achieve the desired output diversity and does not fix the invalid parameter error—the correct approach is to provide a valid value. Option B is wrong because increasing the `temperature` parameter does not compensate for an invalid `top_p` value; `temperature` controls randomness in token probability distribution, while `top_p` is a separate sampling constraint, and both must be within their respective valid ranges. Option D is wrong because `top_k` is a different sampling method that selects the top K tokens by probability; while it can be used instead of `top_p`, the question asks for the correct way to fix the error with `top_p`, not to replace it with another parameter.

Full explanation →

917

MCQmedium

An administrator creates this IAM policy to allow a group to use a specific generative AI model. However, users report a 403 Forbidden error. What is the most likely issue?

A.The policy syntax is invalid for OCI IAM; missing required fields

B.The resource OCID is incorrect

C.The policy does not specify the compartment where the model resides

D.The action name is wrong; it should be 'ai:generate-text' (correct already)

AnswerA

OCI IAM policies use a different JSON structure with 'subject', 'action', 'resource' arrays.

Why this answer

The most likely issue is that the policy syntax is invalid because OCI IAM policies require a 'verb' (e.g., 'allow', 'deny') at the beginning of each statement. Without this required field, the policy is syntactically incorrect and will be rejected by the IAM service, resulting in a 403 Forbidden error for all users in the group.

Exam trap

The trap here is that candidates focus on the action name or resource OCID being wrong, but the real issue is the missing mandatory 'verb' field in the policy syntax, which is a common oversight in OCI IAM policy creation.

How to eliminate wrong answers

Option B is wrong because an incorrect resource OCID would cause a 'not found' or 'permission denied' error for that specific resource, not a generic 403 Forbidden for all users; the error is more likely due to a syntax issue. Option C is wrong because OCI IAM policies do not require specifying a compartment for the model; the resource OCID inherently identifies the compartment, and the policy can target resources across compartments if the OCID is correct. Option D is wrong because the action name 'ai:generate-text' is correct for the OCI Generative AI service; the problem is not with the action name but with the missing verb in the policy statement.

Full explanation →

918

MCQeasy

A data scientist fine-tunes a model using OCI Data Science and wants to deploy it as a managed endpoint in OCI Generative AI. What must they do first?

A.Upload model artifacts to Object Storage and register in Model Catalog

B.Write a custom container

C.Create a dedicated AI cluster

D.Use OCI CLI to create an endpoint

AnswerA

This is the required first step to deploy a custom model.

Why this answer

To deploy a fine-tuned model as a managed endpoint in OCI Generative AI, the model artifacts must first be uploaded to Object Storage and registered in the Model Catalog. This is a prerequisite because OCI Generative AI endpoints pull model artifacts from the Model Catalog, which references the storage location. Without registration, the service cannot locate or serve the model.

Exam trap

The trap here is that candidates assume they can directly create an endpoint using CLI or SDK without first registering the model in the Model Catalog, overlooking the mandatory registration step that links the artifacts to the serving infrastructure.

How to eliminate wrong answers

Option B is wrong because custom containers are not required for managed endpoints in OCI Generative AI; the service provides built-in serving infrastructure for supported model formats. Option C is wrong because a dedicated AI cluster is used for training or batch inference, not for deploying a managed endpoint, which uses OCI's shared serving infrastructure. Option D is wrong because using OCI CLI to create an endpoint is a valid method, but it cannot succeed until the model is registered in the Model Catalog; the CLI command requires a model OCID from the catalog.

Full explanation →

919

MCQhard

A company has fine-tuned a Cohere Command R model using T-Few and wants to deploy it for real-time inference with the lowest possible latency. They have provisioned a dedicated AI cluster with 2 model units. However, latency is still higher than expected. Which action is MOST likely to reduce latency?

A.Reduce the temperature parameter to 0

B.Increase the number of model units on the dedicated AI cluster

C.Switch from dedicated AI cluster to shared infrastructure

D.Use a larger base model like Llama 3 70B

AnswerB

More model units provide additional compute capacity, reducing latency for concurrent requests.

Why this answer

Increasing model units on the dedicated cluster provides more compute capacity, reducing inference latency by parallelizing requests. Switching to shared infrastructure would likely increase latency due to multi-tenancy. Using a larger model would increase latency.

Reducing temperature does not affect latency.

Full explanation →

920

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Train a custom model from scratch on the policy documents each month

C.Fine-tune a base LLM on the policy documents monthly

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Full explanation →

921

Multi-Selecteasy

Which THREE components are essential in a typical RAG architecture built on OCI? (Select three.)

Select 3 answers

A.Vector database (e.g., OCI OpenSearch, Autonomous Database)

B.Data ingestion pipeline with Apache Spark

C.Embedding model (e.g., Cohere Embed)

D.Large language model (e.g., Cohere Command)

E.Prompt template for system instructions

AnswersA, C, D

Required for storing and retrieving embeddings.

Why this answer

Option A is correct because a vector database is a core component in RAG architecture on OCI, enabling efficient storage and retrieval of vector embeddings. OCI OpenSearch and Autonomous Database both support vector search capabilities, which are essential for finding relevant context to augment LLM prompts.

Exam trap

Cisco often tests the distinction between 'essential' and 'optional' components in RAG, where candidates mistakenly include data ingestion pipelines or prompt templates as mandatory, when the core triad is vector database, embedding model, and LLM.

Full explanation →

922

Multi-Selecthard

A team is building a RAG pipeline on OCI. Which THREE steps are essential components of a standard RAG pipeline? (Select THREE)

Select 3 answers

A.Embedding each chunk into a dense vector using an embedding model

B.Retrieving relevant chunks based on cosine similarity to the query embedding

C.Training a custom LLM from scratch on the document corpus

D.Fine-tuning the generation model on the retrieved chunks

E.Chunking documents into passages

AnswersA, B, E

Chunks are converted to vectors for similarity search.

Why this answer

Option A is correct because embedding each chunk into a dense vector using an embedding model is a fundamental step in a RAG pipeline. The embedding model converts text chunks into high-dimensional vector representations that capture semantic meaning, enabling efficient similarity search during retrieval. Without this step, the system cannot compare query intent with document content in a vector space.

Exam trap

Cisco often tests the distinction between RAG's retrieval-augmented generation (which uses a frozen LLM with retrieved context) and fine-tuning or training a model, leading candidates to mistakenly select fine-tuning or training steps as essential components.

Full explanation →

923

MCQmedium

An organization wants to deploy an LLM for legal document analysis where accuracy is critical, and the model must not reference any external data outside the provided legal corpus. Which approach BEST satisfies these requirements?

A.Use a decoder-only model with zero-shot prompting

B.Use a fine-tuned encoder-only model for classification only

C.Use a large foundation model with a high temperature setting

D.Use RAG with a vector store containing only the legal documents, and set the retriever to return a fixed number of chunks with high similarity threshold

AnswerD

RAG ensures answers are grounded in the provided legal corpus; similarity threshold can prevent retrieval of irrelevant chunks.

Why this answer

RAG can ground generation in a curated corpus, and with strict retrieval settings (e.g., only retrieving from the legal corpus), the model will not use any outside knowledge, reducing hallucinations.

Full explanation →

924

MCQeasy

A developer wants the LLM to solve a math problem by reasoning step by step. Which prompting technique should they use?

A.Chain-of-thought prompting

B.Zero-shot prompting

C.Tree-of-thought prompting

D.Few-shot prompting

AnswerA

Chain-of-thought prompts the model to reason step by step, which is ideal for math problems.

Why this answer

Chain-of-thought prompting explicitly instructs the model to show its reasoning steps, improving accuracy on multi-step problems.

Full explanation →

925

MCQmedium

A data scientist is building a RAG application that processes PDF invoices. The extraction step uses OCI Document Understanding to convert PDFs to text. The scientist then splits the text into chunks and generates embeddings using OCI Generative AI. However, the retrieval often misses critical fields like invoice numbers and dates. Which preprocessing step would MOST likely improve retrieval of these specific fields?

A.Increase the chunk size to include entire invoices.

B.Apply stemming and lemmatization to the text before chunking.

C.Tag each chunk with metadata such as invoice number, date, and vendor, and use metadata filtering during retrieval.

D.Switch from dense embeddings to sparse embeddings for better exact match.

AnswerC

Metadata filtering enables precise retrieval based on structured fields.

Why this answer

Option C is correct because metadata tagging and filtering directly address the retrieval of specific fields like invoice numbers and dates. By attaching metadata (e.g., invoice number, date, vendor) to each chunk and filtering on these metadata fields during retrieval, the RAG system can precisely locate the relevant chunks without relying solely on semantic similarity. This approach leverages OCI Document Understanding's ability to extract structured data and OCI Generative AI's vector search capabilities to combine dense embeddings with exact metadata matching.

Exam trap

Oracle often tests the misconception that increasing chunk size or changing embedding type alone can solve retrieval failures for structured fields, when in reality metadata filtering is the correct technique for precise field-level retrieval in RAG applications.

How to eliminate wrong answers

Option A is wrong because increasing chunk size to include entire invoices reduces granularity, making it harder to retrieve specific fields like invoice numbers and dates, and may exceed the context window of the embedding model, degrading retrieval quality. Option B is wrong because stemming and lemmatization reduce words to root forms, which can obscure exact matches for critical fields like invoice numbers (e.g., 'INV-12345' becomes 'inv-12345') and dates (e.g., '2023-01-15' might be altered), harming retrieval precision. Option D is wrong because sparse embeddings (e.g., TF-IDF) improve exact keyword matching but still rely on the text content of chunks; without metadata tagging, the system cannot filter chunks by field type, so critical fields may still be missed if they appear in chunks with low keyword overlap.

Full explanation →

926

Multi-Selectmedium

Which TWO actions are recommended best practices for managing costs when using OCI Generative AI dedicated AI clusters?

Select 2 answers

A.Provision a fixed number of nodes to handle peak load

B.Use preemptible instances for non-critical inference workloads

C.Use autoscaling to adjust nodes based on demand

D.Stop the dedicated AI cluster when not in use

E.Use pay-as-you-go billing instead of preemptible instances

AnswersB, C

Preemptible instances are cheaper and suitable for fault-tolerant tasks.

Why this answer

Option B is correct because preemptible instances in OCI are significantly cheaper than standard instances and are ideal for non-critical inference workloads that can tolerate interruptions. This aligns with cost optimization best practices by allowing you to use spare compute capacity at a reduced rate for tasks that do not require continuous availability.

Exam trap

The trap here is that candidates may think stopping a dedicated AI cluster is a valid cost-saving action, but OCI dedicated AI clusters do not support a 'stop' state—you must terminate the cluster, which loses all configuration and data, making it impractical for intermittent use.

Full explanation →

927

MCQmedium

A data science team at a healthcare company has fine-tuned a Llama 2 model using OCI Data Science and registered it in the Model Catalog. They want to deploy it as a managed endpoint using OCI Generative AI. The model requires 64 GB of GPU memory. The team has created a dedicated AI cluster with a single node shape that has 48 GB GPU memory. When they attempt to deploy the model, the deployment fails with an error indicating insufficient resources. The team has verified that the model artifact is correct and that the compartment policies allow deployment. What should the team do to successfully deploy the model?

A.Increase the number of nodes in the cluster to 2.

B.Enable model parallelism to split the model across nodes.

C.Select a node shape with higher GPU memory, such as 80 GB.

D.Reduce the model's precision from FP16 to INT8 to lower memory usage.

AnswerC

Using a node shape with sufficient memory allows the model to be loaded.

Why this answer

Option C is correct because the model requires 64 GB of GPU memory, but the dedicated AI cluster uses a node shape with only 48 GB. The only way to satisfy the memory requirement is to select a node shape with higher GPU memory, such as 80 GB, as OCI Generative AI managed endpoints require a single node to host the entire model. Increasing nodes or enabling model parallelism does not help because OCI Generative AI does not support distributed inference across nodes for managed endpoints, and reducing precision may not guarantee the model fits or may degrade accuracy.

Exam trap

The trap here is that candidates may think adding more nodes or enabling model parallelism can aggregate GPU memory, but OCI Generative AI managed endpoints do not support distributed inference across nodes, so the only valid solution is to use a node shape with sufficient single-GPU memory.

How to eliminate wrong answers

Option A is wrong because increasing the number of nodes to 2 does not solve the memory issue; OCI Generative AI managed endpoints deploy the model on a single node, and additional nodes are not used to aggregate GPU memory for inference. Option B is wrong because model parallelism is not supported for managed endpoints in OCI Generative AI; the service expects the entire model to fit on one node's GPU memory. Option D is wrong because reducing precision from FP16 to INT8 may lower memory usage, but it is not a guaranteed fix and could introduce accuracy loss; moreover, the question states the model requires 64 GB of GPU memory, and the team should first ensure the hardware meets the requirement rather than altering the model.

Full explanation →

928

MCQmedium

You manage a generative AI model deployed on OCI Model Deployment that serves a chatbot application. The model is a 13B parameter LLM on a VM.GPU.A100.1 shape. Recently, you rolled out a new version of the model that is supposed to improve response quality. However, after the update, the application starts returning HTTP 500 errors and memory usage spikes. You need to update to the new version without causing downtime. The current deployment has 2 replicas with autoscaling enabled. Which strategy should you use to safely deploy the new model version?

A.Directly update the existing model deployment with the new model artifact

B.Create a second deployment with the new model, test it, then shift traffic using a load balancer

C.Stop the existing deployment, update the model artifact, then start the deployment

D.Increase the number of replicas to 4, then update the model

AnswerB

Blue-green deployment ensures no downtime and safe rollout.

Why this answer

Option B is correct because it implements a blue/green deployment strategy: you create a second deployment with the new model, test it in isolation, and then shift traffic using a load balancer. This avoids downtime and allows you to validate the new model before exposing it to production traffic, which is critical given the observed HTTP 500 errors and memory spikes.

Exam trap

The trap here is that candidates may assume increasing replicas provides safety through redundancy, but it does not prevent the new model from causing errors on all replicas; the key is isolation via a separate deployment and traffic shifting.

How to eliminate wrong answers

Option A is wrong because directly updating the existing model deployment with the new artifact would cause in-place changes, potentially triggering the memory spike and HTTP 500 errors on the live replicas, leading to downtime. Option C is wrong because stopping the existing deployment before updating causes complete downtime, violating the requirement to update without downtime. Option D is wrong because increasing replicas to 4 and then updating still performs an in-place update on all replicas, which does not isolate the faulty model and can still cause errors and memory spikes across the entire fleet.

Full explanation →

929

MCQhard

Refer to the exhibit. A user runs 'oci generative-ai model list' and sees this output. They then try to use 'cohere.command-light' but get an error. What is the most likely reason?

A.The model is in INACTIVE state

B.The API key does not have access

C.The model is not listed

D.The region is wrong

AnswerA

INACTIVE models cannot be used for inference.

Why this answer

Option B is correct because the model 'cohere.command-light' has lifecycle-state 'INACTIVE', meaning it cannot be used. Option A is false because it is listed; C and D would produce different errors.

Full explanation →

930

Multi-Selectmedium

Which three techniques are commonly used to reduce the risk of prompt injection in LLM applications? (Choose three.)

Select 3 answers

A.Enabling prompt validation against regex patterns.

B.Output filtering.

C.Increasing temperature.

D.Input sanitization.

E.Using role-based system prompts.

AnswersB, D, E

Filtering outputs can block dangerous responses.

Why this answer

Output filtering (B) is correct because it acts as a post-processing defense that scans the LLM's generated output for malicious content, such as leaked system prompts or injected commands, before it reaches the user. This technique helps mitigate the impact of successful prompt injections by catching and neutralizing harmful outputs that bypass input controls.

Exam trap

Oracle often tests the distinction between security controls and model parameters, so the trap here is that candidates mistakenly think adjusting model settings like temperature can reduce injection risk, when in fact only input/output controls and system prompt design are effective.

Full explanation →

931

MCQmedium

When tuning the temperature parameter for a text generation task, which effect does setting temperature to 0.1 have compared to 0.9?

A.It increases the maximum number of tokens generated

B.It reduces the vocabulary considered at each step

C.It makes outputs more focused and deterministic

D.It increases randomness, producing more diverse outputs

AnswerC

Low temperature reduces randomness, making outputs more deterministic.

Why this answer

Low temperature makes output more deterministic and repetitive; high temperature increases randomness and creativity.

Full explanation →

932

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Fine-tune a base LLM on the policy documents monthly

D.Train a custom model from scratch on the policy documents each month

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions from the policy documents without retraining the model. By indexing the documents in a vector store and retrieving relevant chunks at query time, RAG handles monthly updates by simply re-indexing the new documents, avoiding the cost and complexity of fine-tuning or retraining.

Exam trap

Cisco often tests the misconception that fine-tuning or retraining is required for domain-specific knowledge, when in fact RAG provides a cost-effective, update-friendly alternative that leverages the LLM's existing reasoning capabilities.

How to eliminate wrong answers

Option A is wrong because pasting all documents into each prompt would exceed the context window limits of even the largest foundation models, leading to truncation, high token costs, and degraded performance due to information overload. Option C is wrong because fine-tuning a base LLM monthly on the policy documents is expensive, time-consuming, and risks catastrophic forgetting of previous content, making it impractical for frequent updates. Option D is wrong because training a custom model from scratch each month is prohibitively expensive, requires massive computational resources and data, and is unnecessary when RAG can achieve the same goal with far less overhead.

Full explanation →

933

MCQmedium

A developer is building a summarization pipeline using OCI Generative AI. They want to ensure the summary includes key points from the entire document without truncation. Which parameter should they primarily adjust?

A.Max_tokens

B.Frequency_penalty

C.Temperature

D.Top_p

AnswerA

Max_tokens sets the maximum number of tokens in the generated summary, directly addressing truncation.

Why this answer

The max_tokens parameter controls the maximum length of the generated output. Increasing it allows longer summaries, preventing truncation.

Full explanation →

934

MCQmedium

An organization wants to ensure that prompts submitted to an LLM do not contain sensitive customer data. Which practice is most effective?

A.Use a low temperature to avoid generating sensitive data

B.Increase the max tokens to allow the model to ignore sensitive data

C.Implement a prompt injection detection system that blocks malicious prompts

D.Sanitize user inputs by removing sensitive information before including them in the prompt

AnswerD

Correct: input sanitization is a direct mitigation.

Why this answer

Sanitizing prompts before submission (e.g., removing PII, using placeholders) prevents sensitive data from being sent to the model. Other options either do not prevent data leakage or are less direct.

Full explanation →

935

MCQeasy

Which OCI Generative AI model family is specifically designed for reranking search results to improve relevance?

A.Cohere Command R

B.Cohere Command R+

C.Meta Llama 3

D.Cohere Rerank

AnswerD

Cohere Rerank is specifically designed for reranking tasks.

Why this answer

Cohere Rerank is the OCI Generative AI model family specifically designed for reranking search results to improve relevance. Unlike generation-focused models, Rerank takes a query and a list of candidate documents, scoring each for relevance to the query, thereby enhancing the quality of retrieved results in RAG pipelines.

Exam trap

Cisco often tests the distinction between generative models (like Command R, Command R+, Llama 3) and specialized utility models (like Rerank), leading candidates to mistakenly select a generative model for a reranking task.

How to eliminate wrong answers

Option A is wrong because Cohere Command R is a generative model optimized for RAG and tool use, not for reranking search results. Option B is wrong because Cohere Command R+ is a larger, more capable generative model in the Command family, still focused on generation and instruction following, not reranking. Option C is wrong because Meta Llama 3 is a general-purpose large language model for text generation and understanding, not a specialized reranking model.

Full explanation →

936

Multi-Selectmedium

Which TWO are best practices for prompt management in production environments?

Select 2 answers

A.Avoid using system prompts to keep prompts simple

B.Maintain a prompt library with reusable templates

C.Store prompts in a version-controlled repository

D.Keep all prompts as hard-coded strings in the application code

E.Use the same prompt for all use cases to reduce complexity

AnswersB, C

A library encourages consistency and saves time.

Why this answer

Versioning and maintaining a library of templates are essential for tracking changes and reusability.

Full explanation →

937

MCQeasy

An organization wants to use an LLM to summarize legal documents. Which consideration is most important for ensuring accurate summaries?

A.Fine-tune the model on a curated legal corpus

B.Use the largest available general-purpose model

C.Rely on zero-shot summarization with careful prompting

D.Pre-train a new model from scratch on legal texts

AnswerA

Domain-specific fine-tuning teaches the model legal terminology and reasoning.

Why this answer

Legal documents require precise understanding, so fine-tuning on legal data is critical. Option B is wrong because larger models don't guarantee domain accuracy. Option C is wrong because pre-training from scratch is expensive and unnecessary.

Option D is wrong because zero-shot may miss legal nuances.

Full explanation →

938

MCQeasy

A data scientist wants to fine-tune a generative AI model on proprietary customer data. What is a best practice for preparing the training dataset?

A.Randomly sample 1000 records from production logs.

B.Use the same dataset as the base model's pre-training data.

C.Curate a dataset of domain-specific examples with clear input-output pairs.

D.Use the largest available public dataset from the internet.

AnswerC

Domain-specific curated data ensures the model learns the desired behavior for the target use case.

Why this answer

Option C is correct because fine-tuning a generative AI model on proprietary data requires a curated, domain-specific dataset with clear input-output pairs. This ensures the model learns the desired task (e.g., summarization, classification) without introducing noise or irrelevant patterns, which is critical for OCI Generative AI Service fine-tuning where data quality directly impacts model performance.

Exam trap

Oracle often tests the misconception that more data (random or public) is always better for fine-tuning, when in fact curated, domain-specific data with clear input-output pairs is essential for effective adaptation without degrading base model capabilities.

How to eliminate wrong answers

Option A is wrong because randomly sampling 1000 records from production logs introduces noise, missing labels, and imbalanced distributions, which degrade fine-tuning quality and may cause catastrophic forgetting. Option B is wrong because using the same dataset as the base model's pre-training data provides no new information, leading to zero improvement and potential overfitting to already learned patterns. Option D is wrong because using the largest available public dataset from the internet introduces irrelevant or conflicting data, diluting domain-specific learning and violating data privacy requirements for proprietary customer data.

Full explanation →

939

Multi-Selectmedium

A prompt engineer wants to use the self-consistency technique to improve answer reliability. Which THREE steps are part of implementing self-consistency?

Select 3 answers

A.Use a single response with high temperature

B.Generate multiple independent responses using chain-of-thought prompting

C.Set temperature to a non-zero value to introduce variation

D.Aggregate the final answers across the responses, e.g., by majority vote

E.Use a stop sequence after each reasoning step

AnswersB, C, D

Multiple paths are needed for consistency.

Why this answer

Option B is correct because self-consistency relies on generating multiple diverse reasoning paths via chain-of-thought (CoT) prompting. This technique samples several independent responses, each following a step-by-step reasoning process, to capture different valid approaches to the same problem.

Exam trap

Cisco often tests the misconception that self-consistency can be achieved with a single high-temperature response, but the technique explicitly requires multiple independent samples to enable aggregation.

Full explanation →

940

MCQmedium

An administrator needs to grant a group of data scientists access to use OCI Generative AI resources in a specific compartment. Which IAM policy statement should they use?

A.Allow group DataScientists to manage generative-ai-family in compartment ABC

B.Allow group DataScientists to inspect generative-ai-family in compartment ABC

C.Allow group DataScientists to use generative-ai-family in compartment ABC

D.Allow group DataScientists to read generative-ai-models in tenancy

AnswerC

Correct verb and resource for using GenAI.

Why this answer

The 'use' verb allows access to GenAI resources. The policy should target the specific compartment.

Full explanation →

941

MCQhard

A company needs to integrate OCI Generative AI Service with an existing application that uses OCI IAM for authentication. They want to use resource principal to allow the application to call the service without storing API keys. Which step is REQUIRED?

A.Create an OCI API key for the application

B.Enable the Generative AI Service for resource principal in the tenancy

C.Assign the application to a group with admin privileges

D.Create a dynamic group and a policy granting access to the Generative AI Service

AnswerD

Dynamic group with matching rules and a policy are required for resource principal.

Why this answer

Resource principal authentication in OCI requires the application to be represented by a dynamic group, which matches instances or resources based on defined rules. A policy must then grant that dynamic group access to the Generative AI Service. This avoids storing API keys by using OCI IAM's built-in resource principal token exchange.

Exam trap

Oracle often tests the misconception that resource principal requires a tenancy-wide setting or an API key, when in fact the correct mechanism is a dynamic group combined with a targeted IAM policy.

How to eliminate wrong answers

Option A is wrong because creating an OCI API key would reintroduce the need to store and manage secrets, which resource principal is designed to eliminate. Option B is wrong because there is no tenancy-level toggle to 'enable' the Generative AI Service for resource principal; the service is always available for resource principal, but access is controlled via dynamic groups and policies. Option C is wrong because assigning the application to a group with admin privileges violates the principle of least privilege and is unnecessary; a custom policy granting only the required permissions to the dynamic group is sufficient and more secure.

Full explanation →

942

MCQmedium

A company wants to use OCI Generative AI Agents to build a RAG application over documents stored in OCI Object Storage. What must they create first?

A.A knowledge base linked to the Object Storage bucket

B.A Dedicated AI Cluster

C.An embedding endpoint

D.A fine-tuning job for the base model

AnswerA

The agent uses a knowledge base to index and retrieve data from Object Storage.

Why this answer

OCI Generative AI Agents require a knowledge base to index data sources before creating an agent.

Full explanation →

943

MCQhard

During iterative prompt refinement, a team evaluates two prompt variants on 100 test queries. Variant A scores 85% accuracy but occasionally generates offensive content. Variant B scores 80% accuracy with no safety issues. Which evaluation criterion should take priority for a customer-facing application?

A.Accuracy — because it is highest and the offensive content can be filtered post-hoc

B.Cost — the variant with higher accuracy uses fewer tokens

C.Safety — offensive content is unacceptable in a customer-facing system

D.Latency — because the variant with higher accuracy also has lower latency

AnswerC

Safety is a hard requirement; accuracy can be improved through further refinement.

Why this answer

For customer-facing applications, safety is paramount. Even if accuracy is slightly lower, ensuring no offensive content is critical to avoid reputational and legal risks. The team should prioritize safety and then work to improve accuracy.

Full explanation →

944

MCQhard

A multinational corporation plans to deploy OCI Generative AI in multiple OCI regions for disaster recovery. They have fine-tuned a custom model in the primary region. What is the recommended approach to make the fine-tuned model available in the secondary region with minimal manual effort?

A.Create an IAM policy to allow cross-region access to the model from the secondary region.

B.Use OCI Cross-Region Replication for the model's underlying object storage bucket and the dedicated AI cluster.

C.Redeploy the fine-tuning job in the secondary region using the same training data.

D.Copy the model artifact to the secondary region's object storage bucket and create a new dedicated endpoint there.

AnswerD

This leverages existing model artifacts and can be automated with OCI CLI or SDK.

Why this answer

Option D is correct because the recommended approach to make a fine-tuned custom model available in a secondary OCI region with minimal manual effort is to copy the model artifact (the trained weights and configuration files) to the secondary region's object storage bucket and then create a new dedicated endpoint there. This avoids re-running the expensive fine-tuning job and leverages OCI's object storage cross-region copy capabilities, while the dedicated AI cluster in the secondary region can serve the model directly from the copied artifact.

Exam trap

Cisco often tests the misconception that cross-region replication of storage automatically makes the compute service (like a dedicated AI cluster) available in the secondary region, but in reality, the cluster is a separate resource that must be explicitly created and configured to use the replicated artifact.

How to eliminate wrong answers

Option A is wrong because IAM policies control access permissions but cannot make a model artifact physically available in another region; cross-region access to a model endpoint would still require the model to be deployed in the secondary region. Option B is wrong because OCI Cross-Region Replication for object storage buckets can replicate the model artifact, but it does not replicate the dedicated AI cluster (which is a compute resource, not a storage resource), and the cluster must be created separately in the secondary region. Option C is wrong because redeploying the fine-tuning job in the secondary region using the same training data is unnecessary and inefficient; it would consume significant time and compute resources when the model artifact can simply be copied.

Full explanation →

945

MCQmedium

A data scientist wants to fine-tune a Cohere Command R model using the T-Few technique. They have prepared a dataset in JSONL format with prompt/completion pairs. Which step is REQUIRED before creating the fine-tuning job?

A.Register the dataset in OCI Data Labeling

B.Upload the dataset to an OCI Object Storage bucket

C.Deploy a dedicated AI cluster to host the base model

D.Create an OCI Functions endpoint for dataset preprocessing

AnswerB

Fine-tuning jobs in OCI GenAI read training data from Object Storage.

Why this answer

The dataset must be uploaded to an OCI Object Storage bucket so the fine-tuning job can access it. The other options are either optional or not required.

Full explanation →

946

MCQmedium

A company has deployed a RAG application using OCI Generative AI service with a vector store in OCI OpenSearch. Users report that answers are often incomplete or irrelevant. The application uses a single prompt template with a fixed chunk size of 1000 tokens. Which action is most likely to improve answer quality?

A.Disable vector search and rely solely on the LLM's pre-trained knowledge

B.Use a smaller embedding model to reduce noise

C.Implement a re-ranking step after vector search

D.Increase the chunk size to 2000 tokens

AnswerC

Re-ranking improves precision by ordering chunks based on relevance to the query.

Why this answer

Implementing a re-ranking step after vector search improves answer quality by reordering the top-K retrieved chunks based on their relevance to the query, using a cross-encoder or similar model. This addresses the issue of incomplete or irrelevant answers caused by the fixed chunk size and single prompt template, as re-ranking ensures that only the most contextually appropriate chunks are passed to the LLM for generation.

Exam trap

Cisco often tests the misconception that increasing chunk size or simplifying the retrieval pipeline (e.g., disabling vector search) will improve RAG quality, when in fact the bottleneck is often the lack of a relevance refinement step like re-ranking.

How to eliminate wrong answers

Option A is wrong because disabling vector search removes the retrieval-augmented generation (RAG) component entirely, forcing the LLM to rely solely on its pre-trained knowledge, which is static and may lack domain-specific or up-to-date information, leading to even more irrelevant answers. Option B is wrong because using a smaller embedding model reduces the dimensionality and quality of vector representations, increasing noise and decreasing retrieval precision, which would worsen answer quality rather than improve it. Option D is wrong because increasing the chunk size to 2000 tokens may include more irrelevant context within each chunk, diluting the signal-to-noise ratio and potentially exceeding the LLM's context window limits, which does not address the core issue of poor retrieval relevance.

Full explanation →

947

MCQmedium

A company wants to use OCI Generative AI but must comply with GDPR. Which feature ensures data residency?

A.Encryption at rest

B.Access control policies

C.Data localization with dedicated AI clusters

D.Audit logging

AnswerC

Dedicated clusters in chosen regions enforce data residency.

Why this answer

Option C is correct because OCI Generative AI's data localization feature with dedicated AI clusters ensures that customer data remains within a specific geographic region, directly addressing GDPR's data residency requirements. By provisioning dedicated clusters in a chosen OCI region, the company can guarantee that data processing and storage do not cross borders without explicit consent, meeting the GDPR principle of restricting data transfer outside the EU/EEA.

Exam trap

Cisco often tests the misconception that encryption or access controls alone satisfy data residency requirements, but GDPR specifically mandates geographic localization of data, which only dedicated infrastructure can guarantee.

How to eliminate wrong answers

Option A is wrong because encryption at rest protects data confidentiality but does not control where the data is physically stored; GDPR data residency requires geographic restriction, not just encryption. Option B is wrong because access control policies govern who can access data, not where the data resides; they address authorization, not data location compliance. Option D is wrong because audit logging provides a record of data access and changes but does not enforce or guarantee that data stays within a specific jurisdiction; it is a monitoring tool, not a residency mechanism.

Full explanation →

948

MCQhard

In a LangChain RAG pipeline using Oracle AI Vector Search, the developer wants to retrieve chunks that are both relevant and diverse to cover multiple aspects of a query. Which retrieval method should they configure on the retriever?

A.Threshold-based search

B.Maximal Marginal Relevance (MMR)

C.Random sampling of the top-k results

D.Similarity search with a high k value

AnswerB

MMR balances relevance and diversity by iteratively selecting documents that are dissimilar to already chosen ones.

Why this answer

Maximal Marginal Relevance (MMR) is the correct retrieval method because it explicitly balances relevance to the query with diversity among the retrieved chunks. In a LangChain RAG pipeline using Oracle AI Vector Search, MMR re-ranks the initial similarity results to minimize redundancy, ensuring the final set covers multiple aspects of the query rather than returning near-duplicate chunks.

Exam trap

Cisco often tests the misconception that simply increasing k or using a threshold will naturally yield diverse results, but candidates fail to recognize that without an explicit diversity mechanism like MMR, similarity-based retrievers inherently favor redundancy over coverage.

How to eliminate wrong answers

Option A is wrong because threshold-based search returns all chunks above a similarity score cutoff, which can still produce redundant results and does not enforce diversity. Option C is wrong because random sampling of the top-k results ignores relevance entirely, potentially returning irrelevant chunks and defeating the purpose of a RAG pipeline. Option D is wrong because similarity search with a high k value simply retrieves more chunks based on similarity, but without any diversity mechanism, it often returns clusters of near-identical content, failing to cover multiple query aspects.

Full explanation →

949

MCQeasy

A developer is building a RAG pipeline using OCI Data Science and wants to store vector embeddings. Which OCI service is optimized for vector search and can be used as a vector store?

A.OCI Autonomous Database

B.OCI OpenSearch

C.OCI Object Storage

D.OCI Streaming

AnswerB

OCI OpenSearch includes a vector database plugin for k-NN similarity search, making it a suitable vector store.

Why this answer

B is correct because OCI OpenSearch is a fully managed, search and analytics engine that natively supports k-nearest neighbor (k-NN) search on dense vector embeddings. It provides optimized indexing and querying for high-dimensional vectors, making it the ideal vector store for a RAG pipeline in OCI Data Science.

Exam trap

The trap here is that candidates may confuse OCI Autonomous Database's ability to store vectors with being optimized for vector search, overlooking that OpenSearch is purpose-built for high-performance vector similarity search with native k-NN support.

How to eliminate wrong answers

Option A is wrong because OCI Autonomous Database, while capable of storing vectors, is not optimized for vector search; it lacks native k-NN indexing and relies on SQL-based similarity searches that are less performant for large-scale vector retrieval. Option C is wrong because OCI Object Storage is a blob storage service for unstructured data and does not support vector search operations or indexing. Option D is wrong because OCI Streaming is a real-time data ingestion service for event streams and has no vector storage or search capabilities.

Full explanation →

950

MCQhard

An enterprise wants to use OCI Generative AI to generate personalized email campaigns. They have a large customer database with preferences and past purchase history. Which design is best for high relevance and scalability?

A.Use a single prompt with all customer details as context.

B.Fine-tune a model on each customer's history separately.

C.Use a rule-based engine with AI-generated templates.

D.Use a pipeline: retrieve relevant customer data and inject into prompt.

AnswerD

This RAG approach is scalable and maintains relevance.

Why this answer

Option D is correct because it implements a retrieval-augmented generation (RAG) pattern, which is the recommended approach for personalizing outputs at scale in OCI Generative AI. By retrieving only the relevant customer data (e.g., recent purchases, preferences) and injecting it into the prompt, you keep the context window efficient, reduce token costs, and avoid exceeding model limits. This design also allows the model to generate unique, context-aware emails without the overhead of fine-tuning per customer or the rigidity of rule-based systems.

Exam trap

The trap here is that candidates confuse fine-tuning (option B) as a shortcut for personalization, not realizing that fine-tuning is for adapting a model to a domain or task, not for per-user customization, and that RAG (option D) is the scalable, cost-effective pattern for dynamic data injection.

How to eliminate wrong answers

Option A is wrong because a single prompt with all customer details as context would quickly exceed the model's context window (typically 4K–8K tokens in OCI Generative AI), leading to truncated input, loss of relevance, and high latency. Option B is wrong because fine-tuning a model on each customer's history separately is computationally infeasible and cost-prohibitive at scale; fine-tuning is designed for domain adaptation, not per-user personalization. Option C is wrong because a rule-based engine with AI-generated templates cannot achieve the dynamic, data-driven personalization needed for high relevance; it would rely on static rules and fail to adapt to nuanced customer behavior.

Full explanation →

951

Multi-Selecthard

Which THREE techniques effectively reduce query latency in a RAG system?

Select 3 answers

A.Pre-compute embeddings for all documents

B.Use approximate nearest neighbor search

C.Use a larger generation model

D.Increase the number of shards

E.Use a smaller embedding model

AnswersA, B, E

Pre-computed embeddings avoid real-time embedding calls during query.

Why this answer

Pre-computing embeddings for all documents eliminates the need to generate embeddings at query time, which is a computationally expensive step. By storing pre-computed vector representations, the system can directly perform similarity searches against the index, significantly reducing latency.

Exam trap

Oracle often tests the misconception that increasing model size or shard count always improves performance, but in RAG systems, these changes can introduce latency penalties due to higher computational overhead or distributed coordination costs.

Full explanation →

952

MCQhard

A prompt engineer notices that the model sometimes generates outputs that include parts of the system prompt or user message verbatim. This is likely a symptom of which common prompt failure?

A.Ambiguous instructions

B.Conflicting requirements

C.Insufficient context

D.Prompt injection vulnerabilities

AnswerD

Prompt injection can cause the model to treat parts of the prompt as instructions and output them, leading to leakage.

Why this answer

Prompt injection vulnerabilities can cause the model to leak or repeat the prompt itself. This is a known failure mode where the model confuses the input with output.

Full explanation →

953

MCQmedium

A financial services company is deploying a RAG system for regulatory compliance queries. The system uses OCI Data Science to run a custom embedding model fine-tuned on regulatory documents. The index in OpenSearch uses cosine similarity and HNSW algorithm. Users report that queries containing synonyms to regulatory terms (e.g., "AML" vs "Anti-Money Laundering") often fail to retrieve relevant documents. Which combination of improvements would be MOST effective? (Assume budget and latency constraints)

A.Increase the `m` parameter in HNSW to improve recall

B.Fine-tune the embedding model further on a dataset of synonyms

C.Implement a hybrid search combining keyword and vector search

D.Use query expansion with a thesaurus before embedding

AnswerC

Hybrid search (BM25 + vector) directly captures exact term matches, bridging the synonym gap effectively.

Why this answer

Hybrid search (combining keyword (BM25) and vector search) catches exact synonym matches from text. Query expansion helps but may not be as reliable. Fine-tuning on synonyms is possible but time-consuming.

Increasing HNSW m slightly improves recall but does not address synonym gap.

Full explanation →

954

MCQmedium

A developer notices that the RAG application returns irrelevant chunks for user queries. The embedding model used is `cohere.embed-english-light-v3.0`. Which action is MOST likely to improve relevance?

A.Reduce the number of retrieved chunks (k)

B.Increase the chunk size

C.Switch to a larger embedding model (e.g., cohere.embed-english-v3.0)

D.Use a different similarity metric (e.g., Euclidean instead of cosine)

AnswerC

Larger models produce higher-quality embeddings, improving retrieval relevance.

Why this answer

The `cohere.embed-english-light-v3.0` model is a smaller, faster embedding model that may lack the semantic richness needed to capture nuanced query-document relationships. Switching to the larger `cohere.embed-english-v3.0` model provides higher-dimensional embeddings with better representational capacity, which directly improves the relevance of retrieved chunks in a RAG pipeline.

Exam trap

Cisco often tests the misconception that tuning retrieval parameters (k, chunk size, similarity metric) can compensate for a weak embedding model, when in fact the embedding quality is the foundational factor for relevance in RAG systems.

How to eliminate wrong answers

Option A is wrong because reducing the number of retrieved chunks (k) does not improve the relevance of each chunk; it merely returns fewer results, potentially missing relevant ones. Option B is wrong because increasing chunk size can dilute semantic focus, making chunks less specific to the query and often reducing relevance. Option D is wrong because cosine similarity is the standard metric for comparing dense embeddings; Euclidean distance is less effective for high-dimensional vectors and would not address the core issue of embedding quality.

Full explanation →

955

MCQhard

During fine-tuning of a large language model on OCI, you notice that the model's performance on the validation set is not improving after several epochs, but the training loss continues to decrease. What is the most likely cause?

A.The learning rate is too high.

B.The validation set is not representative.

C.The model is overfitting to the training data.

D.The training data is too small.

AnswerC

Overfitting occurs when the model memorizes training examples, causing training loss to drop while validation performance plateaus or declines. This is the most likely cause.

Why this answer

When training loss decreases but validation performance stagnates or worsens, the model is overfitting to the training data. It memorizes the training examples but fails to generalize. A high learning rate might cause divergence, not this pattern.

Too small training data can contribute to overfitting but is not the direct symptom. An unrepresentative validation set could cause mismatch, but the described pattern is classic overfitting.

Full explanation →

956

MCQeasy

Which tokenization algorithm is used by models like BERT and GPT-2?

A.SentencePiece

B.Byte-Pair Encoding (BPE)

C.WordPiece

D.Unigram Language Model

AnswerC

BERT uses WordPiece, and GPT-2 uses BPE; however, among the options, WordPiece is correct for BERT, and the question likely expects the most common answer.

Why this answer

BERT uses WordPiece and GPT-2 uses BPE; both are subword tokenization methods. SentencePiece is used by models like T5 and Llama.

Full explanation →

957

MCQmedium

Which of the following sampling strategies selects tokens based on a cumulative probability threshold from the highest probability tokens?

A.Top-p (nucleus) sampling

B.Top-k sampling

C.Greedy decoding

D.Temperature sampling

AnswerA

Top-p selects the smallest set of tokens whose cumulative probability exceeds p.

Why this answer

Top-p (nucleus) sampling cuts off the tail of the probability distribution where cumulative probability exceeds p, allowing dynamic vocabulary size.

Full explanation →

958

Multi-Selecthard

A machine learning engineer is evaluating the performance of a translation model using BLEU score. Which THREE statements about BLEU are correct? (Choose three.)

Select 3 answers

A.BLEU includes a brevity penalty to penalize outputs that are too short

B.BLEU computes n-gram precision up to a maximum n (usually 4)

C.BLEU correlates well with human judgment at the corpus level

D.BLEU measures recall of n-grams by comparing the output to the reference

E.BLEU is a recall-oriented metric

AnswersA, B, C

The brevity penalty prevents short outputs from achieving artificially high scores.

Why this answer

BLEU is a precision-based metric (not recall). It uses modified n-gram precision with a brevity penalty. It correlates reasonably well with human judgment at the corpus level but has known limitations such as not capturing semantic equivalence.

Full explanation →

959

MCQhard

A developer uses RecursiveCharacterTextSplitter with chunk_size=500 and chunk_overlap=100. After splitting, a particular chunk ends with an incomplete sentence. What is the likely cause?

A.The splitter fell back to splitting on characters because no separator was found within the chunk_size

B.The chunk_overlap is too low; increase overlap to preserve sentence boundary

C.The chunk_size is too small; increase it to avoid incomplete sentences

D.TokenTextSplitter should be used instead because it respects token boundaries

AnswerA

The recursive splitter tries separators in order; if none are found, it splits by characters, which can cut sentences.

Why this answer

RecursiveCharacterTextSplitter splits on separators (like paragraphs, sentences, etc.) recursively. If no suitable separator is found within the chunk size, it will fall back to splitting at the character limit, which can cut sentences. The overlap only provides context to the next chunk, not continuity within the chunk.

TokenTextSplitter splits on tokens, not characters, and would not cause this issue.

Full explanation →

960

MCQeasy

A healthcare company is using OCI GenAI to generate patient summaries from clinical notes. The model output sometimes includes hallucinated medical facts, such as incorrect dosages or diagnoses, which could be dangerous. The team needs to improve factual accuracy while maintaining data privacy. They have a large collection of internal medical knowledge bases (clinical guidelines, drug databases) that are stored in OCI Object Storage. The current implementation uses a zero-shot prompt with the base Cohere Command model. The data science team has limited GPU resources and wants to avoid building a complex pipeline. Which course of action best addresses the hallucination problem?

A.Increase the temperature parameter to 0.9 to encourage more deterministic outputs.

B.Use prompt engineering to add 'Only provide facts that are absolutely certain.'

C.Implement a RAG pipeline that retrieves relevant documents from the internal knowledge bases and includes them in the prompt.

D.Fine-tune the Cohere model on a publicly available medical dataset like PubMed.

AnswerC

RAG grounds generation in retrieved facts, significantly reducing hallucinations.

Why this answer

Option C is correct because a Retrieval-Augmented Generation (RAG) pipeline directly addresses hallucination by grounding the model's output in verified, internal medical knowledge bases stored in OCI Object Storage. This approach retrieves relevant clinical guidelines or drug database entries and includes them in the prompt, providing factual context without requiring fine-tuning or complex GPU-intensive pipelines. It also preserves data privacy by keeping sensitive medical data within OCI and avoids exposing it to external model training.

Exam trap

Oracle often tests the misconception that prompt engineering alone can reliably eliminate hallucinations, but the trap here is that without external knowledge injection (RAG), the model cannot overcome its inherent tendency to fabricate facts, especially in high-stakes domains like healthcare.

How to eliminate wrong answers

Option A is wrong because increasing the temperature to 0.9 actually increases randomness and creativity, making outputs less deterministic and more prone to hallucinations, not less. Option B is wrong because prompt engineering with a vague instruction like 'Only provide facts that are absolutely certain' does not supply the model with actual factual data; the model still relies on its internal parametric knowledge, which is the source of hallucinations. Option D is wrong because fine-tuning on a publicly available dataset like PubMed introduces public, non-confidential data that may not align with the company's internal medical knowledge, and it requires significant GPU resources and complex pipeline management, which the team explicitly wants to avoid.

Full explanation →

961

MCQeasy

A developer is testing a RAG application using OCI Generative AI. They receive an error: 'The model cohere.command-r-plus-v1:0 is not supported in this region.' What is the most likely cause?

A.The endpoint URL is incorrectly formatted.

B.The model is not available in the selected OCI region.

C.The tenancy is in a different availability domain.

D.The model name has a typo.

AnswerB

Cohere models are deployed in specific regions; the developer may be in a region where the model isn't provisioned.

Why this answer

The error message explicitly states that the model 'cohere.command-r-plus-v1:0' is not supported in the region. OCI Generative AI models are region-specific; each model is deployed only in certain OCI regions (e.g., us-ashburn-1, eu-frankfurt-1). If the selected region does not host that model, the API returns this error regardless of endpoint formatting, tenancy configuration, or model name spelling.

Exam trap

Oracle often tests the misconception that model availability is global across all OCI regions, leading candidates to overlook region-specific model deployment restrictions.

How to eliminate wrong answers

Option A is wrong because an incorrectly formatted endpoint URL would typically produce a 404 Not Found or a connection error, not a model-not-supported error. Option C is wrong because availability domains are a concept for compute instances, not for Generative AI model availability; the error is about regional model support, not AD-level placement. Option D is wrong because a typo in the model name would result in a 'model not found' error (e.g., 400 Bad Request), not a region-specific unsupported error.

Full explanation →

962

MCQmedium

A developer runs an OCI GenAI chat request with system prompt "You are a sarcastic assistant." The output is offensive. How can the developer enforce safety policies?

A.Use the OCI GenAI content moderation filter.

B.Change model to LLAMA.

C.Increase maxTokens.

D.Set temperature to 0.

AnswerA

Content moderation filters explicitly block harmful or offensive content in outputs.

Why this answer

Option A is correct because the OCI GenAI content moderation filter is specifically designed to enforce safety policies by detecting and blocking offensive, harmful, or policy-violating content in both input prompts and model outputs. By enabling this filter, the developer can prevent the model from generating offensive responses even when a system prompt like 'You are a sarcastic assistant' encourages undesirable behavior.

Exam trap

Oracle often tests the misconception that adjusting model parameters (like temperature or maxTokens) or switching model families can substitute for explicit content moderation, when in fact safety enforcement requires dedicated filtering mechanisms that operate independently of model behavior.

How to eliminate wrong answers

Option B is wrong because changing the model to LLAMA does not inherently enforce safety policies; LLAMA models have their own safety risks and require separate content moderation or fine-tuning to block offensive outputs. Option C is wrong because increasing maxTokens only extends the maximum length of the generated response, which does nothing to prevent offensive content—it may even allow the model to produce more harmful text. Option D is wrong because setting temperature to 0 makes the model deterministic (greedy decoding) but does not filter or moderate content; it can still generate offensive responses if the training data or system prompt encourages such behavior.

Full explanation →

963

MCQmedium

A data scientist is fine-tuning a Llama 2 7B model on a custom dataset using OCI Data Science. After training, the model generates fluent but factually incorrect statements about the new domain. Which post-training technique would BEST address this issue without retraining?

A.Decrease the temperature to 0.1

B.Switch to a larger model like Llama 2 70B

C.Apply top-p sampling with p=0.9

D.Use a retrieval-augmented generation (RAG) pipeline

AnswerD

RAG retrieves relevant documents and feeds them as context, reducing hallucinations by grounding responses in verified sources.

Why this answer

RAG retrieves factual information from an external knowledge base to ground the generation, reducing hallucinations. The other options do not address factual accuracy.

Full explanation →

964

MCQmedium

A developer notices that the ConversationalRetrievalChain in their LangChain application is not retaining context from previous turns in the conversation. Which component is most likely missing or misconfigured?

A.A document splitter to chunk the history

B.A retriever with appropriate search parameters

C.An embedding model to vectorize the history

D.A memory component like ConversationBufferMemory

AnswerD

Memory stores the conversation history and injects it into the prompt, enabling context retention.

Why this answer

ConversationalRetrievalChain requires a Memory component to store and retrieve chat history. Without Memory, the chain treats each query independently. The retriever, document splitter, and embeddings are responsible for retrieval and storage, not conversation history.

Full explanation →

965

MCQhard

A data scientist in group DataScientists uses the OCI Generative AI SDK to start a fine-tuning job in compartment AIResources. They receive the error shown. What is the most likely cause?

A.The compartment AIResources does not exist.

B.The fine-tuning API is not yet available in that region.

C.The fine-tuning job requires additional IAM policies for accessing the training data in Object Storage.

D.The data scientist is not in the DataScientists group.

AnswerC

The policy must also grant permissions on Object Storage buckets containing the training data.

Why this answer

Option C is correct because the error message indicates a permissions issue related to accessing training data in Object Storage. When using the OCI Generative AI SDK to start a fine-tuning job, the data scientist's IAM policies must explicitly grant read access to the bucket and objects containing the training data. Without these policies, the API call fails even if the user is in the correct group and the compartment exists.

Exam trap

The trap here is that candidates assume the error is about group membership or compartment existence, when in fact the fine-tuning job's dependency on Object Storage permissions is a classic oversight in OCI IAM policy configuration.

How to eliminate wrong answers

Option A is wrong because if the compartment AIResources did not exist, the error would be a '404 Not Found' or 'CompartmentNotFound' error, not a permissions-related error. Option B is wrong because the fine-tuning API is available in all OCI regions where Generative AI is supported; region unavailability would produce a 'ServiceNotSupported' or 'RegionNotSupported' error. Option D is wrong because the user is explicitly stated to be in the DataScientists group, and group membership alone does not grant access to Object Storage; IAM policies must be attached to the group or compartment to allow read access to training data.

Full explanation →

966

MCQeasy

In few-shot prompting, what is the primary purpose of including examples in the prompt?

A.To reduce the need for a system prompt

B.To provide a template for the desired output format and reasoning pattern

C.To increase the model's vocabulary

D.To decrease the computational cost of inference

AnswerB

Examples demonstrate the expected mapping from input to output, reducing ambiguity.

Why this answer

Examples guide the model on the desired input-output pattern, improving task performance without fine-tuning.

Full explanation →

967

Multi-Selecteasy

Which TWO of the following are common prompt failures?

Select 2 answers

A.Providing too many examples

B.Ambiguous instructions

C.Prompt injection vulnerabilities

D.Using system prompt to set persona

E.Specifying output format as JSON

AnswersB, C

Ambiguity leads to inconsistent or wrong outputs.

Why this answer

Option B is correct because ambiguous instructions are a common prompt failure in LLM interactions. When a prompt lacks clarity or specificity, the model cannot reliably determine the user's intent, leading to off-target or inconsistent outputs. This is a fundamental failure mode in prompt engineering, as the model relies entirely on the given text to infer the desired response.

Exam trap

Cisco often tests the distinction between prompt engineering best practices (like using system prompts or JSON output) and actual failure modes, tricking candidates into selecting effective techniques as if they were failures.

Full explanation →

968

MCQmedium

A prompt engineer is testing two versions of a prompt for a content generation task. They want to measure which version produces more factual and concise outputs. Which evaluation approach is BEST?

A.Use only the first output from each prompt and manually compare

B.Run A/B tests on a diverse set of inputs and score outputs based on predefined criteria

C.Ask the model to self-evaluate its outputs

D.Increase the temperature to see which prompt handles randomness better

AnswerB

A/B testing with multiple inputs and scoring criteria provides objective comparison.

Why this answer

A/B testing with clear metrics (factuality, conciseness) is the standard method for comparing prompt variants. Manual inspection on a few cases is not statistically robust; other options are not comparative.

Full explanation →

969

MCQmedium

An administrator created the above IAM policies. A member of the GenerativeAIAdmins group reports they cannot invoke the model endpoint. Which permission is missing?

A.Permission to access the compartment

B.Permission to manage generative-ai-model

C.Permission to use or manage generative-ai-endpoint

D.Permission to read the model's training data

AnswerC

Only inspect is granted; need use or manage to invoke.

Why this answer

The error occurs because the IAM policy grants permissions for 'generative-ai-model' but not for 'generative-ai-endpoint'. Invoking a model endpoint requires the 'use' or 'manage' permission on the 'generative-ai-endpoint' resource type, as the endpoint is the runtime interface that handles inference requests. Without this permission, the API call to the endpoint is denied, even if the user has access to the underlying model.

Exam trap

The trap here is that candidates confuse the 'generative-ai-model' resource type (used for model lifecycle management) with the 'generative-ai-endpoint' resource type (required for runtime inference), leading them to select Option B instead of C.

How to eliminate wrong answers

Option A is wrong because compartment access is typically granted via a separate policy statement (e.g., 'Allow group to read compartments') and is not the specific missing permission for invoking an endpoint; the error is about resource-type permissions, not compartment-level access. Option B is wrong because 'manage generative-ai-model' allows management of the model resource (e.g., creating, updating, deleting models) but does not grant the runtime permission needed to invoke the endpoint for inference. Option D is wrong because reading the model's training data is a data-plane permission unrelated to endpoint invocation; model training data access is governed by object storage or data catalog policies, not by generative-ai-endpoint permissions.

Full explanation →

970

MCQeasy

Which component of the Transformer architecture allows the model to weigh the importance of different tokens in the input sequence when generating each output token?

A.Feed-forward neural network

B.Multi-head attention

C.Self-attention mechanism

D.Positional encoding

AnswerC

The self-attention mechanism computes attention scores between each token and every other token, allowing the model to focus on relevant parts of the input.

Why this answer

The self-attention mechanism computes attention scores between all pairs of tokens, enabling the model to dynamically focus on relevant parts of the input. Positional encoding adds order information, multi-head attention runs multiple attention heads in parallel, and the feed-forward network processes each position independently.

Full explanation →

971

MCQhard

An engineer configured the above index mapping for vector search. When performing a k-NN search, the results are unexpected. What is the most likely issue?

A.The space type 'cosinesimil' is not supported; it should be 'cosine'.

B.The dimension 768 does not match the embedding model's output dimension.

C.The mapping uses 'knn_vector' type with 'faiss' engine, which is incompatible.

D.The space type at the index level and mapping level are mismatched.

AnswerD

Mismatch causes incorrect distance calculations.

Why this answer

Option D is correct because OpenSearch requires the space type to be consistently defined at both the index-level settings (method.parameters.space_type) and the field-level mapping (space_type). A mismatch between these two causes the k-NN search to behave unexpectedly, as the engine uses the index-level setting for distance computation while the mapping-level setting may be used for validation or other purposes.

Exam trap

Oracle often tests the nuance that OpenSearch requires consistency between index-level and mapping-level space_type settings, a detail that candidates overlook because they assume only the mapping-level setting matters.

How to eliminate wrong answers

Option A is wrong because 'cosinesimil' is a valid space type in OpenSearch (an abbreviation for cosine similarity), not an unsupported value. Option B is wrong because while a dimension mismatch can cause issues, the question states the mapping is configured for vector search and the results are unexpected; the dimension 768 is a common embedding size and is not inherently incorrect without evidence of mismatch. Option C is wrong because 'knn_vector' type with 'faiss' engine is fully compatible and supported in OpenSearch for vector search workloads.

Full explanation →

972

MCQhard

A team has fine-tuned a Cohere Command R model using T-Few on a dataset of 5,000 prompt/completion pairs. After deployment, they notice the model sometimes generates off-topic responses. Which action is most likely to improve response relevance without requiring new training data?

A.Decrease the max_tokens

B.Increase the temperature to 1.5

C.Increase the frequency_penalty

D.Set a preamble override with instructions to stay on topic

AnswerD

Preamble override provides a system-level instruction that can steer the model's behavior toward relevance.

Why this answer

Preamble override allows setting a system message that guides the model's behavior, helping to keep responses on topic. Adjusting it is a low-cost intervention.

Full explanation →

973

MCQmedium

Which sampling strategy selects the token with the highest probability at each step, resulting in deterministic and often repetitive outputs?

A.Beam search

B.Temperature sampling

C.Top-k sampling

D.Greedy decoding

AnswerD

Greedy decoding selects the token with the highest probability at each step, producing deterministic outputs.

Why this answer

Greedy decoding selects the token with the highest probability at each step, making it deterministic and often leading to repetitive outputs because it always chooses the most likely next token without considering future alternatives. This contrasts with stochastic methods that introduce randomness or explore multiple paths.

Exam trap

Cisco often tests the distinction between deterministic (greedy) and stochastic (sampling-based) strategies, and the trap here is confusing 'greedy decoding' with 'beam search' because both involve selecting high-probability tokens, but beam search maintains multiple paths while greedy does not.

How to eliminate wrong answers

Option A is wrong because beam search maintains multiple candidate sequences (beams) at each step, not just the single highest-probability token, and can produce more diverse outputs. Option B is wrong because temperature sampling scales the logits before applying softmax to control randomness, not deterministically picking the top token. Option C is wrong because top-k sampling restricts the next token selection to the k most probable tokens but still samples randomly from that set, not deterministically choosing the single highest-probability token.

Full explanation →

974

MCQmedium

A developer wants to use LangChain to create an agent that can perform calculations and look up information from a database. Which tools should be provided to the agent?

A.Custom tool for database queries and a vector store tool

B.Calculator tool and a custom tool for database queries

C.Calculator tool and a retriever tool

D.Web search tool and an LLM tool

AnswerB

The agent needs a calculator for math and a custom tool to run SQL queries.

Why this answer

A calculator tool handles arithmetic, and a custom tool can be created to query the database. Web search is not needed, and an LLM is not a tool but the underlying model.

Full explanation →

975

MCQeasy

A developer wants to generate text using the OCI Generative AI service via the API. Which endpoint should they use to send a text generation request?

A./v1/chat/completions

B./v1/embeddings

C./v1/completions

D./v1/models

AnswerC

This is the correct endpoint for text generation requests.

Why this answer

Option C is correct because the OCI Generative AI service uses the /v1/completions endpoint for text generation requests, as documented in the OCI Generative AI API reference. This endpoint accepts a prompt and generates a continuation of the text, making it the appropriate choice for general text generation tasks.

Exam trap

Oracle often tests the distinction between OCI-specific endpoints and those from other AI services like OpenAI, so candidates may mistakenly choose /v1/chat/completions if they confuse OCI Generative AI with ChatGPT's API.

How to eliminate wrong answers

Option A is wrong because /v1/chat/completions is an endpoint used by OpenAI's ChatGPT API, not by OCI Generative AI, which does not have a dedicated chat completions endpoint. Option B is wrong because /v1/embeddings is used for generating vector embeddings of text, not for generating new text completions. Option D is wrong because /v1/models is used to list available models or retrieve model metadata, not to send a text generation request.

Full explanation →

Page 13 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice 1Z0-1127 by domain

Target a specific domain to shore up weak areas.

Prompt Engineering OCI Generative AI Service LLM Fundamentals LangChain and AI Application Development Fundamentals of Large Language Models Using OCI Generative AI Service Building LLM Applications with RAG and Vector Search Deploying and Managing Generative AI on OCI

See all domains with question counts →