CCNA Oci Genai Service Questions

61 of 136 questions · Page 2/2 · Oci Genai Service topic · Answers revealed

76
MCQmedium

A data scientist needs to fine-tune a large language model on a custom dataset of 10,000 prompt-completion pairs. They want to minimize cost while still updating the model effectively. Which fine-tuning technique is used by OCI Generative AI service?

A.Prefix tuning
B.T-Few fine-tuning
C.Adapter fine-tuning
D.LoRA fine-tuning
AnswerB

T-Few is the parameter-efficient fine-tuning method provided by OCI GenAI service.

Why this answer

OCI Generative AI uses T-Few, which updates only a small number of parameters via learned transformations, reducing computational cost while maintaining performance. Adapter, LoRA, and prefix tuning are general PEFT methods but not the specific technique offered by OCI.

77
MCQmedium

A developer is using the OCI Generative AI Chat API to build a multi-turn conversational agent. They want the model to remember previous exchanges within the same session. How should they manage conversation history?

A.Store history client-side and concatenate all previous outputs into the user prompt each time
B.Use the Generate API with a system prompt that includes the entire history
C.Use the Chat API's message list parameter, appending each user and assistant message
D.Rely on the model's built-in memory mechanism
AnswerC

The Chat API supports multi-turn by passing an array of messages representing the conversation history.

Why this answer

The Chat API accepts a list of messages (including user and assistant turns) to maintain context. The developer should append each exchange to the message list and send it with each request.

78
MCQmedium

A team wants to use the Embedding API to convert product descriptions into vectors for a semantic search application. They have descriptions in English and Spanish. Which embedding model should they use?

A.embed-english-v3.0
B.embed-multilingual-v3.0
C.Cohere Command R
D.Cohere Rerank
AnswerB

Supports multiple languages including English and Spanish.

Why this answer

embed-multilingual-v3.0 supports multiple languages including English and Spanish.

79
MCQhard

During a fine-tuning job for a text generation model, the loss curve shows that the training loss decreases steadily, but the evaluation loss increases after a few epochs. Which action is most likely to improve the model's generalization?

A.Implement early stopping based on evaluation loss
B.Increase the number of training epochs
C.Reduce the size of the training dataset
D.Increase the temperature parameter during generation
AnswerA

Early stopping halts training when evaluation loss starts increasing, preventing overfitting.

Why this answer

Increasing evaluation loss while training loss decreases indicates overfitting. Early stopping prevents further training once evaluation metrics plateau or degrade, improving generalization. Reducing dataset size or increasing epochs would worsen overfitting; increasing temperature does not affect training.

80
MCQhard

An organization is deploying a custom fine-tuned model for a real-time fraud detection application. The model must respond within 200ms and cannot share infrastructure with other customers. Which OCI GenAI infrastructure option should they choose?

A.On-demand token-based inference
B.A dedicated AI cluster with provisioned model units
C.Shared infrastructure with reserved capacity
D.OCI Functions using the GenAI SDK
AnswerB

A dedicated cluster provides isolated compute for low-latency, dedicated inference of custom fine-tuned models.

Why this answer

Option B is correct because a dedicated AI cluster with provisioned model units ensures exclusive infrastructure, meeting the requirement to not share resources with other customers. This option also provides predictable, low-latency inference (under 200ms) by allocating dedicated compute capacity for the fine-tuned model, which is critical for real-time fraud detection.

Exam trap

Cisco often tests the distinction between 'reserved capacity' (which still shares infrastructure) and 'dedicated infrastructure' (which provides full isolation), leading candidates to mistakenly choose shared infrastructure with reserved capacity.

How to eliminate wrong answers

Option A is wrong because on-demand token-based inference runs on shared infrastructure, which violates the requirement to not share infrastructure with other customers and may introduce latency variability. Option C is wrong because shared infrastructure with reserved capacity still shares underlying hardware with other customers, failing the isolation requirement. Option D is wrong because OCI Functions using the GenAI SDK is a serverless compute service that does not provide dedicated GPU resources for model inference, and it introduces additional latency from function cold starts, making it unsuitable for sub-200ms real-time responses.

81
Multi-Selectmedium

A data scientist needs to generate embeddings for a collection of documents to be used for both clustering and semantic search. They want to use appropriate input types for each task. Which TWO input types should they use from the Cohere Embed API? (Choose two.)

Select 2 answers
A.search_query
B.embedding
C.search_document
D.classification
E.clustering
AnswersC, E

Used for documents in a search corpus.

Why this answer

For clustering, the 'clustering' input type is appropriate. For semantic search, 'search_document' (for documents to be searched) and 'search_query' (for queries) are used. The question asks for two options that cover both tasks; 'clustering' and 'search_document' are correct. 'search_query' is for queries, not documents, and 'classification' is for classification tasks.

82
Multi-Selecthard

A company wants to deploy a fine-tuned model for real-time inference with consistent low latency. They are evaluating dedicated AI clusters. Which THREE factors should they consider when provisioning the cluster?

Select 3 answers
A.Fine-tuning job timeout settings
B.Number of clusters (for high availability)
C.Region where the cluster is provisioned
D.Number of model units per cluster
E.The base model used for fine-tuning
AnswersB, C, D

Multiple clusters provide redundancy and fault tolerance.

Why this answer

Model units determine compute capacity and cost, cluster size affects availability and fault tolerance, and the region impacts data residency and latency. The base model is already chosen, and fine-tuning timeout is irrelevant for inference.

83
Multi-Selecthard

A financial services company must deploy a fine-tuned model for transaction categorization. The model must be isolated from other tenants and provide predictable low-latency inference. The compliance team also requires that training data never leaves the OCI tenancy. Which THREE steps should the team take? (Choose three.)

Select 3 answers
A.Fine-tune the model using T-Few technique within OCI
B.Use OCI GenAI on-demand inference for the fine-tuned model
C.Ensure the model is deployed on the dedicated cluster after fine-tuning
D.Provision a dedicated AI cluster with model units
E.Host the model on a shared AI cluster to reduce cost
AnswersA, C, D

T-Few fine-tuning runs inside OCI, ensuring data does not leave the tenancy.

Why this answer

A dedicated AI cluster provides isolation and predictable low latency. T-Few fine-tuning runs entirely within OCI, keeping data in tenancy. Model units are required for dedicated cluster provisioning.

Shared infrastructure would compromise isolation. On-demand inference does not guarantee low latency.

84
MCQeasy

Which OCI Generative AI model would you use to reorder search results to improve relevance ranking?

A.Cohere Command R+
B.Cohere Rerank
C.Cohere embed-english-v3.0
D.Meta Llama 3 70B
AnswerB

Rerank is designed to reorder documents by relevance to a query.

Why this answer

Cohere Rerank is specifically designed to reorder documents based on relevance to a query. The other models are for generation or embedding.

85
MCQhard

An organization needs to deploy a custom fine-tuned model for real-time inference with consistent low latency, and they must keep the model isolated from other tenants. Which deployment option should they choose?

A.Use the shared infrastructure endpoint with an on-demand serving
B.Provision a dedicated AI cluster with model units
C.Deploy the model on OCI Data Science using a custom container
D.Use the OCI Generative AI Agents service
AnswerB

Dedicated cluster ensures isolation and low-latency dedicated inference.

Why this answer

A dedicated AI cluster provides exclusive, low-latency inference for custom models.

86
Multi-Selecthard

A machine learning engineer is fine-tuning a Cohere Command R model using T-Few. They need to prepare the training dataset in the correct format. Which TWO statements about the dataset format are true? (Choose two.)

Select 2 answers
A.The 'completion' field should contain the expected model response
B.The dataset must include a 'context' field for RAG fine-tuning
C.The dataset can be in CSV format with 'input' and 'output' columns
D.Each line must include a 'system' key for the system prompt
E.The dataset should be in JSONL format with each line containing a JSON object with 'prompt' and 'completion' keys
AnswersA, E

The completion field holds the target output for the given prompt.

Why this answer

Option A is correct because the T-Few fine-tuning method for Cohere Command R models requires the dataset to include a 'completion' field that contains the expected model response. This field is used as the target output during supervised fine-tuning, where the model learns to generate the desired completion given the input prompt.

Exam trap

Cisco often tests the misconception that CSV format is acceptable for fine-tuning datasets, but the OCI Generative AI service strictly requires JSONL format to handle structured fields like 'prompt' and 'completion'.

87
MCQhard

An organization wants to use OCI Generative AI for a high-volume summarization workload. They estimate 10 million tokens per month and need consistent low latency. Which pricing model is most cost-effective?

A.Use the free tier
B.Use OCI Data Science notebook sessions
C.On-demand token-based pricing
D.Provision a Dedicated AI Cluster with model units
AnswerD

Dedicated clusters provide consistent low latency and predictable cost, better for high-volume workloads.

Why this answer

On-demand pricing can be expensive at high volumes. Dedicated clusters offer predictable cost per model unit and low-latency dedicated inference, making them more cost-effective for high-volume, latency-sensitive workloads. Pay-as-you-go (on-demand) is suitable for low or variable usage.

88
MCQeasy

Which OCI Generative AI API should be used to convert a user's query into a vector representation for semantic search?

A.Embedding API
B.Chat API
C.Generate API
D.Rerank API
AnswerA

The Embedding API provides models like embed-english-v3.0 to convert text into vectors.

Why this answer

The OCI Generative AI Embedding API is specifically designed to convert text inputs, such as user queries, into dense vector representations (embeddings). These vectors capture semantic meaning and are essential for similarity search in vector databases or retrieval-augmented generation (RAG) pipelines. The other APIs serve different purposes: Chat handles multi-turn conversations, Generate produces text completions, and Rerank reorders documents based on relevance.

Exam trap

Cisco often tests the distinction between APIs that return text (Chat, Generate) versus those that return vector data (Embedding), and candidates may confuse the Rerank API's relevance scoring with embedding generation.

How to eliminate wrong answers

Option B (Chat API) is wrong because it is designed for conversational interactions, returning text responses rather than vector embeddings. Option C (Generate API) is wrong because it generates natural language completions from a prompt, not vector representations. Option D (Rerank API) is wrong because it reorders a list of documents by relevance scores, but does not produce embeddings from a query.

89
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly
B.Train a custom model from scratch on the policy documents each month
C.Use a larger foundation model with a longer context window and paste all documents into each prompt
D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
AnswerD

RAG retrieves relevant chunks at query time, avoiding retraining.

Why this answer

RAG allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining.

90
MCQmedium

A company wants to use OCI Generative AI Agents for a question-answering system over their internal knowledge base stored in OCI Object Storage. The data consists of PDF and Word documents. What is the first step to make this data usable by the agent?

A.Use the Embedding API to generate embeddings for all documents
B.Create a dedicated AI cluster for inference
C.Create an OCI Generative AI Agent
D.Create a knowledge base and associate it with the Object Storage bucket
AnswerD

The knowledge base indexes the documents from Object Storage, enabling the agent to retrieve relevant content for answering questions.

Why this answer

A knowledge base must be created and linked to the data source (Object Storage bucket) so the agent can index and retrieve the content. Creating an agent first without a knowledge base would not work. The Embedding API is lower-level; the agent service abstracts this.

91
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
B.Fine-tune a base LLM on the policy documents monthly
C.Train a custom model from scratch on the policy documents each month
D.Use a larger foundation model with a longer context window and paste all documents into each prompt
AnswerA

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions by retrieving relevant chunks from the policy documents stored in a vector store at inference time, without requiring any model retraining. When documents are updated monthly, you only need to re-index the new content into the vector store, and the LLM can use the retrieved context to generate accurate answers. This decouples knowledge updates from model training, making it cost-effective and agile for frequently changing internal documents.

Exam trap

Cisco often tests the misconception that fine-tuning or training from scratch is necessary for domain-specific knowledge, when in fact RAG provides a more efficient and maintainable solution for dynamic document sets.

How to eliminate wrong answers

Option B is wrong because fine-tuning a base LLM monthly on the policy documents would require significant compute resources, time, and expertise for each update, and it risks catastrophic forgetting of previous policy versions. Option C is wrong because training a custom model from scratch each month is prohibitively expensive and impractical for a task that only requires answering questions from a relatively small set of documents. Option D is wrong because pasting all documents into each prompt would exceed typical context window limits (even with large models), incur high token costs, and degrade performance due to the model struggling to attend to the most relevant information among thousands of tokens.

92
MCQeasy

Which OCI Generative AI API is used to send a message and receive a model-generated response while maintaining a conversation history?

A.InferenceClient
B.Chat API
C.Embeddings API
D.Generate API
AnswerB

Chat API handles multi-turn conversations, system prompts, and history.

Why this answer

The Chat API is specifically designed for multi-turn conversations, supporting system prompts and history. The Generate API is for single-turn text generation. InferenceClient is a client class, not an API endpoint.

The Embeddings API is for vector generation.

93
Multi-Selectmedium

A developer is building a multilingual search application and needs to generate embeddings for user queries in multiple languages. Which two options are correct? (Select TWO)

Select 2 answers
A.Use the embed-english-v3.0 model
B.Set the input type to 'search_document'
C.Use the embed-multilingual-v3.0 model
D.Set the input type to 'search_query'
E.Set the input type to 'classification'
AnswersC, D

This model supports multiple languages for embedding.

Why this answer

OCI Generative AI provides embed-multilingual-v3.0 for multilingual support. The input type 'search_query' is designed for query embeddings. Embed-english-v3.0 is English-only. 'search_document' is for documents, not queries. 'classification' is for classification tasks.

94
MCQeasy

Which OCI Generative AI model is specifically designed to generate embeddings for English text?

A.Meta Llama 3 8B
B.Cohere embed-multilingual-v3.0
C.Cohere embed-english-v3.0
D.Cohere Command R
AnswerC

This is the dedicated English embedding model.

Why this answer

Cohere embed-english-v3.0 is the embedding model for English text. The other options are either generative models or multilingual embedding models.

95
MCQhard

A team is fine-tuning a Llama 3 model using OCI Generative AI. The training dataset contains 10,000 prompt-completion pairs in JSONL format. After submitting the fine-tuning job, it fails with a 'Data validation error'. What is the most likely cause?

A.The base model selected does not support fine-tuning
B.The dataset is in the wrong compartment
C.The fine-tuning job does not have enough model units allocated
D.The JSONL file uses 'input' and 'output' as key names instead of 'prompt' and 'completion'
AnswerD

OCI Generative AI expects exact key names 'prompt' and 'completion' in the JSONL training file; using alternate keys causes a data validation error.

Why this answer

OCI fine-tuning requires a specific JSONL format with 'prompt' and 'completion' keys. If the keys are named differently (e.g., 'input'/'output'), validation fails. Model units, compartment, and base model are not typically validation issues.

96
MCQeasy

Which of the following is NOT an available model in OCI Generative AI service?

A.Cohere Embed English v3.0
B.OpenAI GPT-4
C.Meta Llama 3
D.Cohere Command R+
AnswerB

GPT-4 is not part of OCI Generative AI service.

Why this answer

OCI GenAI offers Cohere Command R, Command R+, Meta Llama 3, Cohere Embed models, and Cohere Rerank. GPT-4 is an OpenAI model and is not available in OCI GenAI.

97
MCQhard

During fine-tuning, a user notices the loss does not decrease after several epochs. The dataset is a JSONL file with 500 prompt/completion pairs. What is the MOST likely cause?

A.The JSONL format is incorrect because it lacks system prompts
B.The base model is not compatible with the T-Few technique
C.The dataset is too small; T-Few fine-tuning generally needs at least 1000 examples
D.The learning rate is too high, causing the model to diverge
AnswerC

T-Few is efficient but still requires a minimum dataset size to learn effectively.

Why this answer

Fine-tuning with T-Few typically requires at least 1000 examples for meaningful learning. The dataset size is likely insufficient.

98
MCQmedium

A security administrator needs to grant a group of data scientists access to use OCI Generative AI resources (models, endpoints) in compartment 'GenAI-Prod', but not allow them to create or manage infrastructure. Which IAM policy statement should be used?

A.Allow group DataScientists to inspect genai-family in compartment GenAI-Prod
B.Allow group DataScientists to use genai-family in compartment GenAI-Prod
C.Allow group DataScientists to read genai-family in compartment GenAI-Prod
D.Allow group DataScientists to manage genai-family in compartment GenAI-Prod
AnswerB

Use grants permission to invoke models and use endpoints without management rights.

Why this answer

The 'use' verb on genai-family resources allows inference and use of models/endpoints without permitting management (create/update/delete). This matches the requirement.

99
MCQeasy

Which of the following is the correct format for a training dataset used in OCI Generative AI fine-tuning?

A.JSONL file with 'prompt' and 'completion' fields
B.CSV file with columns 'input' and 'output'
C.TXT file with one prompt-completion pair per line separated by a tab
D.Parquet file with 'text' and 'label' columns
AnswerA

This is the required format.

Why this answer

OCI GenAI fine-tuning expects a JSONL file with prompt/completion pairs.

100
MCQmedium

A data engineer is building a RAG application using OCI Generative AI Agents. They have documents stored in OCI Object Storage. Which resource must they create to make these documents searchable by the agent?

A.Create a Dedicated AI Cluster
B.Create a data source pointing to the Object Storage bucket
C.Create an endpoint
D.Create a fine-tuning job
AnswerB

Data sources connect to storage and index the content for retrieval.

Why this answer

In OCI Generative AI Agents, a data source points to Object Storage buckets and indexes the documents. Knowledge bases combine data sources, but the first step is creating the data source.

101
MCQmedium

A data scientist has fine-tuned a Cohere Command R model using the T-Few technique. They now need to deploy this custom model for real-time inference with low latency. What is the recommended deployment option in OCI Generative AI?

A.Use the OCI Generative AI Playground to test the model
B.Provision a dedicated AI cluster and host the fine-tuned model on it
C.Use the shared infrastructure endpoint with an API call
D.Create an InferenceClient pointing to the fine-tuned model directly without a cluster
AnswerB

Dedicated AI clusters allow you to deploy your own fine-tuned models with low latency and dedicated compute resources, suitable for production real-time inference.

Why this answer

Dedicated AI clusters provide isolated, low-latency inference for custom fine-tuned models. Shared infrastructure is multi-tenant and may have variable latency; on-demand inference does not support custom models directly.

102
Multi-Selectmedium

A developer is building a RAG application using OCI Generative AI Agents. They want to ensure the agent only retrieves information from approved documents in a specific compartment. Which THREE steps are required?

Select 3 answers
A.Provision a dedicated AI cluster in the same compartment
B.Create a knowledge base that references the Object Storage bucket
C.Use the Embedding API to manually generate embeddings for each document
D.Create an IAM policy that allows the agent to read objects in the specific compartment
E.Store the approved documents in an Object Storage bucket located in that compartment
AnswersB, D, E

The knowledge base is the bridge between the agent and the data; it must be configured to use that bucket.

Why this answer

IAM policies restrict access to data, placing documents in the correct compartment, and creating a knowledge base pointing to that compartment are required. Creating a dedicated cluster and using the Embedding API directly are not necessary steps for the agent.

103
MCQmedium

A developer is using the OCI Generative AI Chat API to build a conversational assistant. They want the assistant to adopt a formal tone regardless of user input. Which parameter should they set in the API request?

A.Set a high temperature (e.g., 0.9)
B.Set the system prompt to 'You are a formal assistant that responds in a professional tone.'
C.Set the frequency penalty to a high value
D.Set max tokens to a low value
AnswerB

The system prompt defines the assistant's behavior and tone across the conversation.

Why this answer

The system prompt (or preamble) sets the assistant's behavior consistently. Temperature affects randomness but not tone directly.

104
Multi-Selectmedium

A company is using OCI Generative AI Agents to implement a RAG system for employee onboarding. They want to ensure the agent only answers from the uploaded documents and avoids making up information. Which THREE configuration steps should they take?

Select 3 answers
A.Configure the agent to use only the knowledge base and disable internet search
B.Set the preamble to instruct the agent to only answer based on provided context
C.Increase the temperature to 2.0 for more creative responses
D.Create a knowledge base that indexes the onboarding documents
E.Fine-tune the underlying model on the onboarding documents
AnswersA, B, D

This ensures the agent retrieves only from provided documents.

Why this answer

To ground the agent in provided documents, they should use a knowledge base, set a preamble to restrict knowledge, and disable internet search. Fine-tuning is not needed.

105
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month
B.Use a larger foundation model with a longer context window and paste all documents into each prompt
C.Fine-tune a base LLM on the policy documents monthly
D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

106
MCQmedium

A developer is using the OCI Generative AI Chat API to build a multi-turn conversational assistant. They want the assistant to adopt a formal tone throughout the conversation. Which parameter should they set in the API request to achieve this?

A.preamble_override
B.temperature
C.frequency_penalty
D.max_tokens
AnswerA

Preamble override sets the system prompt that defines the assistant's behavior and tone for the entire conversation.

Why this answer

The 'preamble_override' parameter sets the system-level instruction (e.g., 'You are a formal assistant...') that persists across turns. The other options control generation statistics, not system behavior.

107
MCQmedium

An organization wants to allow its data science group to use OCI Generative AI services but restrict access to a specific compartment. Which IAM policy statement correctly achieves this?

A.allow group data-scientists to use generative-ai-family in compartment GenAI-Prod
B.allow group data-scientists to manage generative-ai-family in tenancy
C.allow group data-scientists to read generative-ai-family in compartment GenAI-Prod
D.allow group data-scientists to use generative-ai-family where target.compartment.id = 'GenAI-Prod'
AnswerA

This grants the necessary permissions within the specified compartment only.

Why this answer

The 'allow group <group_name> to use generative-ai-family in compartment <compartment_name>' policy grants access to all GenAI resources within that compartment. The other options either miss the compartment scope or use incorrect verbs.

108
MCQmedium

A developer is using the OCI GenAI Playground to test a summarisation model. They want the summary to be concise and less creative. Which combination of parameter adjustments would best achieve this?

A.Decrease temperature to 0.1 and increase frequency penalty
B.Set presence penalty to high and temperature to 0.5
C.Increase temperature to 0.9 and decrease frequency penalty to 0
D.Increase max tokens and set stop sequences
AnswerA

Low temperature makes output more focused and deterministic; higher frequency penalty reduces repetitive phrases, yielding concise summaries.

Why this answer

Decreasing temperature to 0.1 makes the model more deterministic and less creative, while increasing the frequency penalty discourages repetition, both of which help produce a concise, less creative summary. This combination directly aligns with the goal of reducing randomness and controlling output length.

Exam trap

Cisco often tests the misconception that increasing penalties or adjusting max tokens alone can control creativity, when in fact temperature is the primary parameter for randomness, and penalties serve different roles (frequency for repetition, presence for novelty).

How to eliminate wrong answers

Option B is wrong because setting presence penalty to high encourages the model to introduce new topics, which can increase creativity and verbosity, and a temperature of 0.5 still allows moderate randomness, neither of which supports a concise, less creative summary. Option C is wrong because increasing temperature to 0.9 increases randomness and creativity, and decreasing frequency penalty to 0 removes the penalty for repetition, both of which lead to more varied and potentially longer outputs. Option D is wrong because increasing max tokens allows longer outputs, and setting stop sequences only controls where generation ends, neither of which directly reduces creativity or ensures conciseness.

109
MCQmedium

A team wants to use OCI Generative AI Agents to build a question-answering system over documents stored in OCI Object Storage. They have created a knowledge base and are ready to test. Which API should they use to interact with the agent for multi-turn conversations?

A.Chat API
B.Embedding API
C.Sessions API
D.Generate API
AnswerC

The Sessions API provides multi-turn conversation management for OCI GenAI Agents.

Why this answer

The Sessions API is used to manage conversational sessions with OCI GenAI Agents, allowing multi-turn interactions. The Chat API is for direct LLM chat without agent capabilities. The Generate API is for single-turn text generation, and the Embedding API is for vector creation.

110
Multi-Selectmedium

A data scientist is using the OCI Generative AI Embeddings API to generate vectors for a classification task. Which TWO input types are appropriate for this use case?

Select 2 answers
A.search_query
B.clustering
C.text
D.classification
E.search_document
AnswersB, D

Optimizes embeddings for clustering tasks, which can be used for classification.

Why this answer

The Embeddings API supports input types like 'classification' and 'clustering' which optimize embeddings for those tasks. 'search_document' and 'search_query' are for search. 'text' is not a valid input type.

111
MCQhard

An administrator wants to grant a group of data scientists permission to use OCI Generative AI resources in a specific compartment, but prevent them from creating Dedicated AI Clusters. Which IAM policy statement achieves this?

A.Allow group data-scientists to read generative-ai-family in compartment genai-dev
B.Allow group data-scientists to manage generative-ai-family in compartment genai-dev
C.Allow group data-scientists to use generative-ai-family in compartment genai-dev where request.operation != 'CreateDedicatedAiCluster'
D.Allow group data-scientists to use generative-ai-models in compartment genai-dev
AnswerC

This grants use of most GenAI resources but excludes creating dedicated clusters via a condition.

Why this answer

Option C is correct because it uses the 'use' verb to grant the data scientists access to OCI Generative AI resources while adding a condition with 'request.operation != 'CreateDedicatedAiCluster'' to explicitly deny the ability to create Dedicated AI Clusters. In OCI IAM, the 'use' verb includes read and update capabilities but not create or delete, and the condition further restricts the specific create operation, aligning with the requirement to prevent cluster creation.

Exam trap

The trap here is that candidates often confuse the 'use' verb with 'manage' or 'read', or overlook the necessity of a condition to block a specific operation, assuming a broader verb like 'manage' can be restricted by a condition when it actually grants all permissions including create.

How to eliminate wrong answers

Option A is wrong because the 'read' verb only allows viewing resources, not using them (e.g., invoking models), so data scientists cannot perform inference or other actions. Option B is wrong because the 'manage' verb grants full control, including creating Dedicated AI Clusters, which violates the requirement to prevent that action. Option D is wrong because 'generative-ai-models' is a subset of 'generative-ai-family' and does not cover all necessary resources like endpoints or deployments, and it lacks the condition to block Dedicated AI Cluster creation.

112
MCQeasy

What is the primary benefit of using a Dedicated AI Cluster for inference in OCI Generative AI?

A.Ability to use any LLM model for free
B.Higher throughput and lower latency due to dedicated compute resources
C.Automatic model fine-tuning on the cluster
D.No need to create endpoints
AnswerB

Dedicated clusters offer consistent, low-latency performance without competition for resources.

Why this answer

A Dedicated AI Cluster provides exclusive, low-latency inference with reserved capacity, unlike shared infrastructure where resources are contended.

113
MCQmedium

A security administrator needs to grant a data science team access to use OCI Generative AI resources (e.g., run inference, create fine-tuning jobs) but only within a specific compartment. What is the correct IAM policy statement?

A.Allow group DataScientists to manage generative-ai-family in compartment Production
B.Allow group DataScientists to manage llm-models in compartment Production
C.Allow group DataScientists to use all-resources in tenancy
D.Allow group DataScientists to read ai-services in compartment Production
AnswerA

This policy grants the group permission to use (read, use, manage) all GenAI resources in the specified compartment.

Why this answer

The 'allow group to use generative-ai-family in compartment' is the standard OCI IAM policy for granting access to all GenAI resources in a compartment. The other options have incorrect resource types or conditions.

114
Multi-Selectmedium

A company wants to use OCI Generative AI to build a multilingual customer support chatbot. They need to understand customer queries in multiple languages and generate responses in the same language. Which TWO actions should they take? (Choose two.)

Select 2 answers
A.Use the embed-english-v3.0 model for embedding queries
B.Fine-tune Meta Llama 3 on multilingual data
C.Select Cohere Command R as the base model for chat
D.Use the embed-multilingual-v3.0 model for embedding queries
E.Use the Summarisation API for each language
AnswersC, D

Command R supports multilingual conversations without additional fine-tuning.

Why this answer

Cohere Command R (or R+) supports multilingual input and output natively, so fine-tuning is unnecessary. Using the embed-multilingual-v3.0 model for embedding customer queries enables multilingual semantic search if RAG is used. Embed-english-v3.0 only supports English, and Llama 3 is primarily English-focused.

115
MCQhard

A developer is using the OCI Generative AI Generate API (not Chat API) to create a single-turn text completion. They need to include a system-level instruction that guides the model's behavior for that request. Which parameter should they use?

A.'temperature' parameter
B.'preamble_override' parameter
C.'system' parameter
D.'max_tokens' parameter
AnswerB

In the Generate API, 'preamble_override' allows you to set a preamble that acts as a system instruction for the completion.

Why this answer

The Generate API uses 'preamble_override' to set a system instruction for the completion. The 'system' parameter is for the Chat API. 'max_tokens' and 'temperature' are not system instructions.

116
MCQmedium

A developer is using the OCI Generative AI Playground to test a Cohere Command R model. They want to reduce repetitiveness in the generated responses. Which parameter should they increase?

A.Max tokens
B.Top P
C.Frequency penalty
D.Temperature
AnswerC

A higher frequency penalty discourages the model from repeating the same tokens.

Why this answer

Frequency penalty penalizes tokens that have already appeared in the text, reducing repetition. Temperature increases randomness, top_p changes nucleus sampling, and max tokens controls length.

117
MCQmedium

A developer is fine-tuning a Cohere Command R model using OCI Data Science and the T-Few technique. They have prepared a dataset. What is the required format for the training data?

A.A CSV file with columns 'input' and 'output'
B.A Parquet file with 'text' and 'label' columns
C.A plain text file with one conversation per line
D.A JSONL file where each line contains a 'prompt' and a 'completion' field
AnswerD

The training dataset must be in JSONL format with prompt/completion pairs.

Why this answer

OCI Generative AI fine-tuning expects a JSONL file where each line is a JSON object containing a prompt and completion (or response) field. This format pairs input with expected output for supervised fine-tuning.

118
MCQeasy

Which OCI Generative AI service component is specifically designed to convert text into numerical vectors (embeddings) that can be used for semantic search and clustering?

A.Summarisation
B.Chat
C.Rerank
D.Embedding
AnswerD

The Embedding API creates vector representations of text for downstream tasks.

Why this answer

The Embedding component in OCI Generative AI service is specifically designed to convert text into numerical vectors (embeddings). These embeddings capture semantic meaning, enabling use cases like semantic search and clustering by measuring vector similarity (e.g., cosine similarity).

Exam trap

Cisco often tests the distinction between 'Rerank' and 'Embedding' because both are used in search pipelines, but only Embedding produces the initial vector representations needed for semantic search.

How to eliminate wrong answers

Option A is wrong because Summarisation is a text generation task that produces a concise summary of input text, not numerical vectors. Option B is wrong because Chat is a conversational interface for multi-turn dialogue, not a vectorization function. Option C is wrong because Rerank is a post-processing step that reorders search results based on relevance scores, not a component that generates embeddings.

119
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
B.Use a larger foundation model with a longer context window and paste all documents into each prompt
C.Train a custom model from scratch on the policy documents each month
D.Fine-tune a base LLM on the policy documents monthly
AnswerA

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

120
Multi-Selecteasy

Which TWO OCI Generative AI features are available in the Playground for testing models?

Select 2 answers
A.Adjusting parameters like temperature and max tokens
B.Setting system prompts and preamble overrides
C.Provisioning a dedicated AI cluster
D.Viewing model training metrics
E.Submitting a fine-tuning job
AnswersA, B

The Playground provides sliders and fields for common generation parameters.

Why this answer

The Playground allows interactive testing with parameter adjustment and system prompts. It does not allow fine-tuning or creating dedicated clusters.

121
MCQeasy

Which fine-tuning technique does OCI Generative AI use to efficiently update model parameters without modifying the entire model, enabling faster training on limited data?

A.T-Few
B.Full fine-tuning
C.Prefix tuning
D.LoRA
AnswerA

OCI GenAI uses the T-Few fine-tuning technique.

Why this answer

T-Few is a parameter-efficient fine-tuning technique that updates only a small fraction of model parameters.

122
MCQhard

A team fine-tuned a model using T-Few and validated it. They now want to deploy this fine-tuned model to a dedicated AI cluster for low-latency inference. What must they do FIRST?

A.Copy the fine-tuned model to an Object Storage bucket
B.Increase the temperature parameter in the inference request
C.Create a dedicated AI cluster with a specified number of model units
D.Delete the base model to free up capacity
AnswerC

The cluster must be provisioned first to host the fine-tuned model.

Why this answer

To deploy a fine-tuned model to a dedicated AI cluster for low-latency inference, you must first create the dedicated AI cluster and specify the number of model units. This cluster provides isolated compute resources that ensure consistent, low-latency performance, unlike the shared serverless endpoint. The fine-tuned model is then deployed onto this cluster, not copied to Object Storage first.

Exam trap

Cisco often tests the misconception that you must first copy the model to Object Storage before deployment, but in OCI Generative AI, the model remains in the model catalog and is deployed directly to a dedicated cluster.

How to eliminate wrong answers

Option A is wrong because copying the fine-tuned model to an Object Storage bucket is not a required first step; the model is already stored in the model catalog after fine-tuning, and deployment pulls it from there. Option B is wrong because increasing the temperature parameter affects the randomness of generated text, not the infrastructure or latency of inference. Option D is wrong because deleting the base model is unnecessary and would break the fine-tuned model, which depends on the base model's weights; capacity is managed by provisioning model units, not by deleting models.

123
Multi-Selectmedium

A data scientist is configuring a fine-tuning job in OCI Generative AI. Which TWO of the following are required inputs for creating the job?

Select 2 answers
A.Temperature setting for fine-tuning
B.Training dataset (JSONL file)
C.Max tokens limit for validation
D.Base model selection (e.g., Cohere Command R)
E.Inference endpoint name
AnswersB, D

Why this answer

Fine-tuning requires a base model and a training dataset. The other options are optional or configurable later.

124
Multi-Selecthard

A data scientist is evaluating the cost of deploying a fine-tuned model for a high-volume production application. They need low latency but are cost-sensitive. Which TWO considerations should they evaluate when choosing between on-demand (shared) and dedicated cluster pricing?

Select 2 answers
A.The cost per model unit per hour for a dedicated cluster
B.Whether the model supports streaming responses
C.The number of days required to provision the dedicated cluster
D.The availability of the model in the OCI Generative AI Playground
E.The expected monthly token volume and the on-demand per-token price
AnswersA, E

Dedicated clusters are billed per model unit hour, which must be compared to on-demand token costs.

Why this answer

The cost of model units per hour and the expected token volume help determine whether dedicated or on-demand is more economical. Cluster provisioning time and streaming availability are not direct cost factors.

125
MCQmedium

A data scientist uses OCI Generative AI Playground to test a Cohere Command R model for a summarization task. They want the summary to be concise and avoid repeating phrases. Which parameter adjustments would BEST achieve this?

A.Set temperature to 0.5 and max tokens to 50
B.Set temperature to 0.0 and frequency penalty to 0.0
C.Set temperature to 0.2 and frequency penalty to 0.8
D.Set temperature to 1.0 and presence penalty to 0.0
AnswerC

Low temperature for concise output, high frequency penalty to avoid repetition.

Why this answer

Decreasing temperature makes output more deterministic; increasing frequency penalty discourages repetition of phrases.

126
MCQmedium

A team is building a multilingual semantic search application. They need to index documents in English, Spanish, and French, and later search using queries in any of these languages. Which embedding model should they use?

A.Meta Llama 3 70B
B.Cohere Command R
C.Cohere embed-multilingual-v3.0
D.Cohere embed-english-v3.0
AnswerC

This model is designed for multilingual text, supporting English, Spanish, French, and many other languages.

Why this answer

Cohere embed-multilingual-v3.0 supports multiple languages in a single model, enabling cross-lingual semantic search. embed-english-v3.0 is English-only. Command R and Llama 3 are not embedding models.

127
MCQeasy

Which statement accurately describes the T-Few fine-tuning technique used in OCI Generative AI?

A.It automatically adjusts hyperparameters during inference.
B.It does not require any training data and works by prompting only.
C.It updates all model parameters, requiring substantial compute resources.
D.It is a parameter-efficient fine-tuning method that updates only a fraction of the model parameters.
AnswerD

T-Few uses low-rank adaptations to efficiently fine-tune models.

Why this answer

The T-Few fine-tuning technique is a parameter-efficient fine-tuning (PEFT) method that updates only a small fraction of the model's parameters, typically by introducing and training adapter layers or using low-rank updates. This approach significantly reduces computational and memory requirements compared to full fine-tuning, making it suitable for adapting large language models with limited resources. In OCI Generative AI, T-Few enables efficient customization without retraining the entire model.

Exam trap

The trap here is that candidates often confuse parameter-efficient fine-tuning (PEFT) with full fine-tuning or prompting, leading them to select options that describe full parameter updates or no training at all, rather than recognizing T-Few as a lightweight adaptation method.

How to eliminate wrong answers

Option A is wrong because T-Few does not automatically adjust hyperparameters during inference; hyperparameters are set before training and remain fixed during inference. Option B is wrong because T-Few requires training data for fine-tuning, unlike zero-shot prompting which works by prompting only without any training data. Option C is wrong because T-Few does not update all model parameters; it is specifically designed to update only a fraction of parameters, avoiding the substantial compute resources required for full fine-tuning.

128
MCQhard

A company has fine-tuned a Cohere Command R model using T-Few and wants to deploy it for real-time inference with the lowest possible latency. They have provisioned a dedicated AI cluster with 2 model units. However, latency is still higher than expected. Which action is MOST likely to reduce latency?

A.Reduce the temperature parameter to 0
B.Increase the number of model units on the dedicated AI cluster
C.Switch from dedicated AI cluster to shared infrastructure
D.Use a larger base model like Llama 3 70B
AnswerB

More model units provide additional compute capacity, reducing latency for concurrent requests.

Why this answer

Increasing model units on the dedicated cluster provides more compute capacity, reducing inference latency by parallelizing requests. Switching to shared infrastructure would likely increase latency due to multi-tenancy. Using a larger model would increase latency.

Reducing temperature does not affect latency.

129
MCQmedium

A developer is building a summarization pipeline using OCI Generative AI. They want to ensure the summary includes key points from the entire document without truncation. Which parameter should they primarily adjust?

A.Max_tokens
B.Frequency_penalty
C.Temperature
D.Top_p
AnswerA

Max_tokens sets the maximum number of tokens in the generated summary, directly addressing truncation.

Why this answer

The max_tokens parameter controls the maximum length of the generated output. Increasing it allows longer summaries, preventing truncation.

130
MCQeasy

Which OCI Generative AI model family is specifically designed for reranking search results to improve relevance?

A.Cohere Command R
B.Cohere Command R+
C.Meta Llama 3
D.Cohere Rerank
AnswerD

Cohere Rerank is specifically designed for reranking tasks.

Why this answer

Cohere Rerank is the OCI Generative AI model family specifically designed for reranking search results to improve relevance. Unlike generation-focused models, Rerank takes a query and a list of candidate documents, scoring each for relevance to the query, thereby enhancing the quality of retrieved results in RAG pipelines.

Exam trap

Cisco often tests the distinction between generative models (like Command R, Command R+, Llama 3) and specialized utility models (like Rerank), leading candidates to mistakenly select a generative model for a reranking task.

How to eliminate wrong answers

Option A is wrong because Cohere Command R is a generative model optimized for RAG and tool use, not for reranking search results. Option B is wrong because Cohere Command R+ is a larger, more capable generative model in the Command family, still focused on generation and instruction following, not reranking. Option C is wrong because Meta Llama 3 is a general-purpose large language model for text generation and understanding, not a specialized reranking model.

131
MCQmedium

An administrator needs to grant a group of data scientists access to use OCI Generative AI resources in a specific compartment. Which IAM policy statement should they use?

A.Allow group DataScientists to manage generative-ai-family in compartment ABC
B.Allow group DataScientists to inspect generative-ai-family in compartment ABC
C.Allow group DataScientists to use generative-ai-family in compartment ABC
D.Allow group DataScientists to read generative-ai-models in tenancy
AnswerC

Correct verb and resource for using GenAI.

Why this answer

The 'use' verb allows access to GenAI resources. The policy should target the specific compartment.

132
MCQmedium

A company wants to use OCI Generative AI Agents to build a RAG application over documents stored in OCI Object Storage. What must they create first?

A.A knowledge base linked to the Object Storage bucket
B.A Dedicated AI Cluster
C.An embedding endpoint
D.A fine-tuning job for the base model
AnswerA

The agent uses a knowledge base to index and retrieve data from Object Storage.

Why this answer

OCI Generative AI Agents require a knowledge base to index data sources before creating an agent.

133
MCQmedium

A data scientist wants to fine-tune a Cohere Command R model using the T-Few technique. They have prepared a dataset in JSONL format with prompt/completion pairs. Which step is REQUIRED before creating the fine-tuning job?

A.Register the dataset in OCI Data Labeling
B.Upload the dataset to an OCI Object Storage bucket
C.Deploy a dedicated AI cluster to host the base model
D.Create an OCI Functions endpoint for dataset preprocessing
AnswerB

Fine-tuning jobs in OCI GenAI read training data from Object Storage.

Why this answer

The dataset must be uploaded to an OCI Object Storage bucket so the fine-tuning job can access it. The other options are either optional or not required.

134
MCQhard

A team has fine-tuned a Cohere Command R model using T-Few on a dataset of 5,000 prompt/completion pairs. After deployment, they notice the model sometimes generates off-topic responses. Which action is most likely to improve response relevance without requiring new training data?

A.Decrease the max_tokens
B.Increase the temperature to 1.5
C.Increase the frequency_penalty
D.Set a preamble override with instructions to stay on topic
AnswerD

Preamble override provides a system-level instruction that can steer the model's behavior toward relevance.

Why this answer

Preamble override allows setting a system message that guides the model's behavior, helping to keep responses on topic. Adjusting it is a low-cost intervention.

135
MCQhard

A team fine-tuned a Cohere Command R model in OCI GenAI and validated it. They now need to deploy it for production inference with a dedicated endpoint. What is the correct sequence of steps?

A.Create an endpoint, deploy the model, then provision a Dedicated AI Cluster
B.Deploy the model to a shared endpoint, then provision a Dedicated AI Cluster for scaling
C.Create an endpoint, then provision a Dedicated AI Cluster, then deploy the model
D.Provision a Dedicated AI Cluster, deploy the model to the cluster, then create an endpoint
AnswerD

The correct order: allocate cluster, deploy model, then expose via endpoint.

Why this answer

Option D is correct because in OCI Generative AI, the correct sequence for deploying a fine-tuned model to a dedicated endpoint is: first provision a Dedicated AI Cluster (which provides the isolated compute infrastructure), then deploy the model to that cluster, and finally create an endpoint that exposes the deployed model for inference. This ensures the model is hosted on dedicated resources before the endpoint is created.

Exam trap

The trap here is that candidates often confuse the order of provisioning infrastructure versus deploying the model, mistakenly thinking the endpoint can be created first and then attached to a cluster later, but OCI GenAI requires the cluster to exist and the model to be deployed before the endpoint can be created.

How to eliminate wrong answers

Option A is wrong because it attempts to create an endpoint before the Dedicated AI Cluster is provisioned, which would fail as the endpoint requires an existing cluster to attach to. Option B is wrong because it incorrectly suggests deploying to a shared endpoint first, which is not the intended path for dedicated production inference; dedicated endpoints require a Dedicated AI Cluster, not a shared one. Option C is wrong because it creates an endpoint before the Dedicated AI Cluster is provisioned and before the model is deployed, which is invalid since the endpoint must reference an already deployed model on an existing cluster.

136
MCQmedium

An organization wants to use OCI Generative AI for summarizing long legal documents. Which OCI Generative AI service component is specifically designed for this task?

A.Chat API
B.Embedding API
C.Generate API
D.Summarisation
AnswerD

Summarisation is a dedicated component for summarizing documents.

Why this answer

Option D is correct because the Summarisation API in OCI Generative AI is a dedicated endpoint optimized for condensing long texts into concise summaries. It uses specialized model configurations and prompt engineering to handle the context window and extraction requirements of legal documents, unlike general-purpose generation endpoints.

Exam trap

Cisco often tests the distinction between a general-purpose generation API (Generate API) and a task-specific API (Summarisation), leading candidates to incorrectly choose the Generate API because they assume any text generation endpoint can handle summarization equally well.

How to eliminate wrong answers

Option A is wrong because the Chat API is designed for multi-turn conversational interactions, not for single-document summarization tasks, and lacks the specific prompt templates and length controls needed for legal document summarization. Option B is wrong because the Embedding API converts text into vector representations for semantic search or clustering, not for generating summaries. Option C is wrong because the Generate API is a general-purpose text generation endpoint that can produce summaries but is not specifically optimized or designed for summarization tasks, lacking the built-in summarization-specific parameters and model tuning that the Summarisation API provides.

← PreviousPage 2 of 2 · 136 questions total

Ready to test yourself?

Try a timed practice session using only Oci Genai Service questions.

CCNA Oci Genai Service Questions — Page 2 of 2 | Courseiva