CCNA Oci Generative Ai Service Questions — Page 1 of 2

MCQmedium

A developer is using the OCI Generative AI service API and receives a '400 Bad Request' with error 'Model not found'. What is the most likely cause?

A.The model ID is misspelled or does not exist.

B.The API request lacks authentication.

C.The input exceeds the maximum token limit.

D.The endpoint region is incorrect.

AnswerA

The error directly states 'Model not found'.

Why this answer

The error indicates the model ID specified in the request does not exist or is misspelled.

Practice this question →

MCQhard

A company wants to deploy a custom fine-tuned model for retrieval-augmented generation (RAG) using dedicated AI cluster. They need to ensure the model can handle concurrent requests from multiple applications with consistent latency. What should they configure?

A.Set a high temperature to keep responses concise.

B.Increase the number of replicas in the dedicated cluster.

C.Enable auto-scaling on the cluster.

D.Use the managed serving endpoint instead.

AnswerB

More replicas allow handling more requests concurrently without degradation.

Why this answer

Option A is correct because increasing the number of replicas in the dedicated cluster distributes the load across multiple model copies, improving concurrency and latency stability.

Practice this question →

Multi-Selecthard

A company is using dedicated AI cluster for fine-tuning. Which TWO best practices help optimize cost?

Select 2 answers

A.Use the largest replica count.

B.Manually scale down the cluster when not in use.

C.Use the managed serving endpoint instead.

D.Leave the cluster running continuously.

E.Use the smallest possible model for the task.

AnswersB, E

Reduces active compute hours.

Why this answer

Option B is correct because manually scaling down the dedicated AI cluster when not in use directly reduces compute costs by stopping idle GPU/CPU resources. In OCI Generative AI, dedicated AI clusters incur charges for provisioned capacity, so scaling down during inactivity avoids paying for unused infrastructure.

Exam trap

Oracle often tests the misconception that larger replica counts or continuous running improve performance, when in fact they only increase cost without accelerating fine-tuning convergence.

Practice this question →

MCQeasy

A company wants to use OCI Generative AI service to generate email summaries for customer support. They need to ensure low latency and data residency in Frankfurt. What should they use?

A.Use the playground in US region.

B.Use OCI AI Quick Actions.

C.Use the managed serving endpoint in Frankfurt region.

D.Create a dedicated AI cluster in Frankfurt.

AnswerC

The managed serving endpoint is available in Frankfurt, providing regional low-latency inference.

Why this answer

Option A is correct because the managed serving endpoint is available in Frankfurt region, providing low latency without the need for a dedicated cluster. Dedicated AI cluster is not necessary for latency and data residency if the managed endpoint is in the same region. The playground is for testing, and Quick Actions are for pre-built models.

Practice this question →

MCQeasy

A company deployed OCI Generative AI for a customer service chatbot. They are using the Cohere command model. The chatbot is generating responses that are too brief and often cut off mid-sentence. They have limited budget. What should they do?

A.Increase max tokens to 1024.

B.Decrease the temperature to 0.2.

C.Increase the temperature to 0.9.

D.Use a different base model like Llama.

AnswerA

Increasing max tokens gives the model more room to complete its response.

Why this answer

Option C is correct. Increasing the max tokens allows the model to generate longer, complete responses. Option A is wrong because increasing temperature would make responses more random, not longer.

Option B is wrong because decreasing temperature reduces variability but does not increase length. Option D is wrong because switching models is costly and may not address the length issue directly.

Practice this question →

MCQmedium

Refer to the exhibit. A user receives this error when calling the OCI Gen AI inference endpoint. What is the most likely cause?

A.The region name is misspelled

B.The model is not deployed in the region

C.The API key is expired

D.The model name is incorrect

AnswerB

The error indicates the model is not supported in that region.

Why this answer

The error indicates that the model is not available in the specified region. OCI Gen AI models are deployed regionally, and each region supports only a specific subset of models. If the user calls an endpoint in a region where the requested model has not been deployed, the service returns an error because the model's inference endpoint does not exist in that region's routing table.

Exam trap

Oracle often tests the distinction between 'model not found' (invalid model name) and 'model not available in region' (valid model but not deployed there), leading candidates to incorrectly select the model name option when the error message explicitly mentions regional unavailability.

How to eliminate wrong answers

Option A is wrong because a misspelled region name would typically result in a DNS resolution failure or a 404 error, not a model-not-found error. Option C is wrong because an expired API key would cause an authentication failure (HTTP 401 Unauthorized), not a model availability error. Option D is wrong because an incorrect model name would produce a 'model not found' or 'invalid model' error, but the error message in the exhibit specifically states the model is not available in the region, not that the model name is invalid.

Practice this question →

MCQhard

An organization deploys a fine-tuned model for legal document analysis using OCI Generative AI Service. They need to ensure that only authorized users in the 'LegalTeam' group can access the model endpoint. Which policy statement should be used?

A.Allow group LegalTeam to use generative-ai-model in compartment ABC

B.Allow group LegalTeam to manage generative-ai-family in compartment ABC

C.Allow group LegalTeam to read generative-ai-model in compartment ABC

D.Allow group LegalTeam to inspect generative-ai-model in compartment ABC

AnswerA

Use permission allows invoking the model for inference.

Why this answer

Option A is correct because the 'use' verb on the 'generative-ai-model' resource type grants the LegalTeam group permission to invoke the model endpoint for inference, which is the minimum privilege required for accessing a deployed fine-tuned model in OCI Generative AI Service. The 'use' permission specifically allows calling the model for text generation or analysis without granting broader management or read capabilities.

Exam trap

Oracle often tests the distinction between 'use' and 'read' on resource types that support inference endpoints, where candidates mistakenly assume 'read' is sufficient for accessing the model's functionality, but only 'use' grants the actual invocation permission required for inference operations.

How to eliminate wrong answers

Option B is wrong because 'manage' on 'generative-ai-family' grants full administrative control over all Generative AI resources (including creating, updating, deleting models and endpoints), which exceeds the requirement of only accessing the model endpoint and violates the principle of least privilege. Option C is wrong because 'read' on 'generative-ai-model' allows viewing model metadata and configuration but does not include the permission to invoke the model endpoint for inference, which requires the 'use' verb. Option D is wrong because 'inspect' on 'generative-ai-model' only permits listing and viewing basic resource information (like tags and identifiers) and provides no ability to call the model endpoint for legal document analysis.

Practice this question →

MCQeasy

A data scientist wants to quickly test a prompt with different parameters like temperature and max tokens without writing code. Which OCI GenAI feature should they use?

A.OCI CLI.

B.OCI Generative AI Playground.

C.OCI SDK.

D.OCI Data Science Notebooks.

AnswerB

Playground allows visual prompt testing with adjustable parameters.

Why this answer

The OCI Generative AI Playground is a web-based, no-code interface that allows data scientists to interactively test prompts and adjust parameters like temperature and max tokens without writing any code. This directly matches the user's requirement for quick, code-free experimentation.

Exam trap

The trap here is that candidates may confuse the OCI CLI or SDK as 'quick' tools, but the question explicitly requires 'without writing code,' which only the Playground satisfies.

How to eliminate wrong answers

Option A is wrong because the OCI CLI is a command-line tool that requires writing and executing commands, not a no-code interface for interactive prompt testing. Option C is wrong because the OCI SDK is a software development kit used for programmatic access via code in languages like Python or Java, which contradicts the 'without writing code' requirement. Option D is wrong because OCI Data Science Notebooks are Jupyter-based environments that require writing Python code to invoke the Generative AI service, not a no-code playground.

Practice this question →

MCQmedium

A data scientist is using OCI Generative AI Service to generate product descriptions. They notice that the output often repeats phrases. Which parameter adjustment would MOST directly address this issue?

A.Increase the temperature

B.Increase the max tokens

C.Increase the frequency penalty

D.Decrease the top-p value

AnswerC

Frequency penalty penalizes tokens that have already appeared, reducing repetition.

Why this answer

Option C is correct because the frequency penalty directly reduces the likelihood of the model repeating the same phrases by penalizing tokens that have already appeared in the generated text. In OCI Generative AI Service, this parameter subtracts a fixed value from the log-probability of each token each time it is generated, making repeated tokens less likely to be chosen again. This is the most direct mechanism to address repetitive output.

Exam trap

Oracle often tests the distinction between frequency penalty and temperature, where candidates mistakenly think increasing randomness (temperature) will reduce repetition, but temperature actually increases variability without targeting repetition directly.

How to eliminate wrong answers

Option A is wrong because increasing temperature adds randomness to the token selection process by scaling the logits before applying softmax, which can lead to more diverse but also more chaotic output, not specifically reducing repetition. Option B is wrong because increasing max tokens only extends the maximum length of the generated text, which may actually allow more repetition to occur rather than preventing it. Option D is wrong because decreasing top-p (nucleus sampling) restricts the sampling pool to the smallest set of tokens whose cumulative probability exceeds the threshold, which can reduce diversity and potentially increase repetition by focusing on high-probability tokens.

Practice this question →

Multi-Selectmedium

Which TWO of the following are valid ways to reduce latency when using OCI Generative AI Service?

Select 2 answers

A.Use a dedicated AI cluster

B.Reduce the max tokens parameter

C.Deploy the model in a different region

D.Use a larger model

E.Batch multiple requests

AnswersA, B

Dedicated cluster provides consistent performance and lower latency.

Why this answer

A dedicated AI cluster provides isolated compute resources (GPU nodes) for inference, eliminating resource contention from other tenants or workloads. This ensures consistent low-latency responses because the model is always warm and available without queueing delays, which is critical for real-time applications.

Exam trap

Oracle often tests the misconception that deploying in a different region or using a larger model improves performance, when in fact these actions increase latency due to network distance and computational overhead.

Practice this question →

MCQmedium

An organization wants to use OCI Generative AI for real-time document translation. They need high availability across regions. Which deployment option meets this requirement?

A.Single dedicated AI cluster in one region

B.Multiple dedicated AI clusters in different regions with a load balancer

C.Single serverless endpoint

D.Multiple serverless endpoints in different regions

AnswerB

Multi-region with load balancing ensures continuity even if one region is down.

Why this answer

Option B is correct because deploying multiple dedicated AI clusters across different regions with a load balancer ensures high availability by distributing traffic and providing failover if one region becomes unavailable. OCI Generative AI dedicated AI clusters are provisioned per region, and a load balancer can route requests to healthy clusters, meeting the requirement for real-time document translation with cross-region redundancy.

Exam trap

The trap here is that candidates often assume serverless endpoints inherently provide multi-region high availability, but in OCI Generative AI, serverless endpoints are region-scoped and do not include built-in cross-region failover or load balancing.

How to eliminate wrong answers

Option A is wrong because a single dedicated AI cluster in one region creates a single point of failure, failing the high-availability requirement. Option C is wrong because a single serverless endpoint is also region-specific and lacks cross-region redundancy, so it cannot provide high availability across regions. Option D is wrong because multiple serverless endpoints in different regions without a load balancer cannot automatically distribute traffic or handle failover; they require external routing logic to achieve high availability, which is not inherent in the serverless endpoint model.

Practice this question →

Multi-Selecthard

Which THREE models are available as part of the OCI Generative AI service?

Select 3 answers

A.Llama 3

B.GPT-4

C.Cohere Command

D.Stable Diffusion

E.Cohere Embed

AnswersA, C, E

Meta's Llama 3 is available in OCI GenAI.

Why this answer

Option A is correct because Llama 3 is one of the open-source large language models (LLMs) available through the OCI Generative AI service, alongside Cohere models. OCI Generative AI provides managed access to Llama 3 for text generation tasks, allowing users to deploy and fine-tune it within Oracle Cloud Infrastructure.

Exam trap

Oracle often tests the distinction between models available natively in OCI Generative AI versus those accessible only through external integrations, leading candidates to mistakenly include popular models like GPT-4 that are not part of the managed service.

Practice this question →

MCQmedium

A healthcare company must use OCI Generative AI for medical report generation. They need to ensure PHI is not sent to third-party models. Which approach best ensures data stays within OCI?

A.Use OCI Gen AI service with fine-tuned model on a dedicated AI cluster

B.Use OCI Gen AI service with base model in a multi-tenant environment

C.Use OCI Gen AI service with base model in a dedicated AI cluster

D.Use a third-party LLM via API Gateway

AnswerA

Best: fine-tuned model on dedicated cluster ensures data stays in OCI and improves accuracy.

Why this answer

Option B is correct because fine-tuning on a dedicated AI cluster keeps data within OCI and provides accuracy for medical domain. Option A is also within OCI but less specialized; C is multi-tenant with potential isolation concerns; D uses third-party.

Practice this question →

Multi-Selectmedium

Which TWO are benefits of using dedicated AI clusters for OCI Generative AI?

Select 2 answers

A.Automatic model updates

B.Guaranteed throughput

C.Lower cost than on-demand for all workloads

D.No need to manage scaling

E.Predictable inference latency

AnswersB, E

Throughput is reserved and not affected by other tenants.

Why this answer

Dedicated AI clusters provide predictable latency and guaranteed throughput because they are single-tenant.

Practice this question →

MCQhard

A company is using OCI Generative AI to generate code snippets and notices that the model sometimes produces code with security vulnerabilities. They have a small dataset of secure code examples. Which approach would be most effective to reduce vulnerabilities?

A.Use a different base model.

B.Fine-tune the model on the small secure code dataset.

C.Use prompt engineering with security constraints in the instruction.

D.Deploy a custom model hosted elsewhere.

AnswerC

Prompt engineering can enforce security rules without needing large datasets.

Why this answer

Option C is correct because prompt engineering allows the company to inject security constraints directly into the instruction without requiring additional training data or infrastructure. By crafting a prompt that explicitly requests secure code (e.g., 'Generate code that follows OWASP Top 10 best practices and avoids SQL injection, XSS, and buffer overflows'), the model can leverage its existing knowledge to produce safer outputs. This approach is immediate, cost-effective, and does not depend on the size or quality of the small secure code dataset.

Exam trap

The trap here is that candidates often assume fine-tuning (Option B) is always the best solution for domain-specific improvements, but they overlook the practical limitations of small datasets and the immediate effectiveness of prompt engineering for security constraints.

How to eliminate wrong answers

Option A is wrong because switching to a different base model does not guarantee reduced vulnerabilities; all general-purpose models can produce insecure code without explicit guidance, and the issue lies in the lack of security-focused constraints, not the model architecture. Option B is wrong because fine-tuning on a small dataset of secure code examples is unlikely to generalize well; the model may overfit to the limited examples and fail to address the wide variety of vulnerabilities that can appear in different contexts, and fine-tuning requires significant computational resources and expertise. Option D is wrong because deploying a custom model hosted elsewhere introduces additional complexity, cost, and latency without addressing the root cause; the problem is not about hosting location but about how the model is instructed to prioritize security.

Practice this question →

MCQeasy

A company wants to use OCI Generative AI to summarize customer feedback. They need low latency and high throughput. Which configuration should they choose?

A.Serverless endpoint with fine-tuned model

B.Dedicated AI cluster with base model

C.Dedicated AI cluster with fine-tuned model

D.Serverless endpoint with base model

AnswerB

Correct: Dedicated resources ensure low latency and high throughput.

Why this answer

Dedicated AI clusters provide guaranteed compute resources (GPUs) with no multi-tenant contention, ensuring low latency and high throughput for inference workloads. Using a base model avoids the additional overhead of fine-tuning inference, which can introduce latency due to custom weight loading and optimization steps. This combination is optimal for real-time summarization of customer feedback where response time and volume are critical.

Exam trap

Oracle often tests the misconception that fine-tuned models always outperform base models for latency, when in fact fine-tuning adds inference overhead that can degrade performance for high-throughput, low-latency use cases.

How to eliminate wrong answers

Option A is wrong because serverless endpoints share resources across tenants, leading to variable latency and potential throttling under high throughput demands, which contradicts the low-latency requirement. Option C is wrong because a fine-tuned model on a dedicated cluster adds inference overhead from custom weights and may require additional pre/post-processing, increasing latency compared to a base model. Option D is wrong because serverless endpoints with a base model still suffer from multi-tenant resource contention, making them unsuitable for guaranteed low latency and high throughput.

Practice this question →

MCQeasy

An organization wants to use OCI Generative AI to summarize long legal documents. They need to ensure the summary is concise and retains key information. Which model parameter should they set to control the length of the summary?

A.frequency_penalty

B.max_tokens

C.top_p

D.temperature

AnswerB

Max_tokens sets the maximum number of tokens in the output.

Why this answer

The max_tokens parameter limits the number of tokens in the generated output, directly controlling summary length.

Practice this question →

MCQhard

A company is deploying OCI Generative AI for a chatbot that must answer customer queries within 500ms. They choose a dedicated AI cluster but observe 2-second latency. What is the most likely cause?

A.The endpoint is not cached

B.The cluster is configured for batch inference

C.The request includes too many tokens

D.The model is too large for the cluster

AnswerB

Batch inference mode processes requests in batches, increasing latency significantly.

Why this answer

A dedicated AI cluster in OCI Generative AI is designed for real-time inference with low latency. When the cluster is configured for batch inference, it processes requests in batches rather than individually, which introduces queuing and processing delays that can easily exceed the 500ms target. This explains the observed 2-second latency, as batch mode prioritizes throughput over per-request response time.

Exam trap

The trap here is that candidates may assume any latency issue is due to model size or token limits, but Oracle often tests the distinction between real-time and batch inference configurations in dedicated clusters.

How to eliminate wrong answers

Option A is wrong because caching is not a feature of OCI Generative AI endpoints; the latency issue stems from inference processing, not from cache misses. Option C is wrong because while excessive tokens can increase latency, the 2-second delay is more consistent with batch processing overhead than with token count alone, and the cluster should handle typical token limits within the 500ms target. Option D is wrong because the model size is fixed when the dedicated cluster is provisioned; if the model were too large, the cluster would fail to deploy or would show errors, not simply exhibit high latency.

Practice this question →

MCQhard

A healthcare organization is using OCI Generative AI to analyze medical records. They must comply with HIPAA. They have set up a dedicated AI cluster with private endpoints. However, they are concerned about model hallucinations that could lead to incorrect medical advice. They want to minimize hallucinations while maintaining usefulness. Which approach is most effective?

A.Use a smaller model to reduce complexity.

B.Implement a retrieval-augmented generation (RAG) pipeline with a verified medical knowledge base.

C.Increase the temperature to encourage diverse outputs.

D.Reduce max tokens to force shorter responses.

AnswerB

RAG grounds generation in factual sources, reducing hallucinations.

Why this answer

Option B is correct. Implementing a retrieval-augmented generation (RAG) pipeline grounds the model in a verified medical knowledge base, significantly reducing hallucinations. Option A is wrong because smaller models may still hallucinate and may be less capable.

Option C is wrong because increasing temperature increases randomness, making hallucinations worse. Option D is wrong because reducing max tokens truncates output but does not address factual accuracy.

Practice this question →

MCQhard

Refer to the exhibit. A developer receives this error when trying to get details of a model they know exists. What is the most likely cause?

A.The region is incorrect; the model is in a different region

B.The model ID is misspelled

C.The model is in a different compartment that the developer cannot access

D.The developer does not have the 'inspect' permission on the model

AnswerC

The error explicitly mentions the compartment ID, and if the model resides in another compartment, the user would get this error even if they have permissions on that other compartment but were not targeting it.

Why this answer

Option D is correct. The error message indicates either the model doesn't exist in the specified compartment or the user lacks permission. Since the model exists but the compartment in the command ('ocid1.compartment.oc1..example') may be incorrect, the most likely cause is that the model is in a different compartment the developer cannot access.

Option A is possible but less likely if the ID is copied correctly. Option B is incorrect because the region is specified and the error is not about region mismatch. Option C is plausible but the error specifically mentions compartment and permission.

Practice this question →

MCQmedium

Which authentication method should be used to securely call the OCI Generative AI API from a microservice running on OCI Compute?

A.OAuth 2.0 client credentials

B.SAML 2.0 assertion

C.Instance principal

D.OCI API signing key

AnswerD

OCI API signing key (a key pair) is the standard method for authenticating API requests.

Why this answer

API keys are the standard way to authenticate service calls, especially for automated scripts and applications.

Practice this question →

MCQmedium

Refer to the exhibit. A developer runs the command and receives the error. What is the issue?

A.The max-tokens value exceeds the allowed range.

B.The message is too short.

C.The chat-id is invalid.

D.The endpoint is incorrect.

AnswerA

The error explicitly states the valid range.

Why this answer

The max-tokens parameter is set to 600, which exceeds the allowed range of 1 to 500.

Practice this question →

MCQeasy

Refer to the exhibit. A user in group GenAIUsers tries to use the `oci generative-ai model chat` command but gets 'not authorized'. Why?

A.The policy is not active yet.

B.The command requires managing the model.

C.The statement should be 'use' instead of 'read'.

D.The group is not in the correct compartment.

AnswerC

The verb 'use' is needed to invoke the model.

Why this answer

The policy grants 'read' permission, but the chat command requires 'use' permission.

Practice this question →

MCQhard

A developer is using the OCI Generative AI SDK to call a custom fine-tuned model. They get an HTTP 404 error. What is the most likely issue?

A.The API key is invalid

B.The compartment OCID is missing in the request

C.The region is not specified

D.The model endpoint ID is incorrect

AnswerD

An incorrect endpoint ID returns 404 Not Found.

Why this answer

Option D is correct: The model endpoint ID must be correct. Option A (compartment) would cause 403. Option B (Authentication) would cause 401.

Option C (region) would cause 404 if wrong, but endpoint ID is more specific.

Practice this question →

Multi-Selectmedium

Which TWO factors are most important when deciding between on-demand and dedicated AI clusters for OCI GenAI?

Select 2 answers

A.Fine-tuning capability

B.Model size

C.Data residency

D.Number of concurrent requests

E.Latency requirements

AnswersD, E

Dedicated clusters are better for high concurrency due to reserved capacity.

Why this answer

The number of concurrent requests (D) is critical because dedicated AI clusters provide guaranteed throughput and predictable performance for high-volume workloads, while on-demand clusters may throttle or queue requests under heavy load. Latency requirements (E) are equally important because dedicated clusters offer consistent low-latency inference by avoiding resource contention, whereas on-demand clusters can introduce variable latency due to shared infrastructure. Together, these factors directly determine whether a workload needs the isolation and guaranteed resources of a dedicated cluster or can tolerate the elasticity and potential variability of on-demand provisioning.

Exam trap

Oracle often tests the misconception that fine-tuning capability or model size are primary differentiators between on-demand and dedicated clusters, when in fact both cluster types support these features, and the real decision hinges on concurrency and latency guarantees.

Practice this question →

MCQeasy

A company requires a generative AI service to automatically summarize customer support transcripts. Which OCI Generative AI model is most suitable for this task?

A.Llama 3 70B

B.Cohere Embed

C.Cohere Command

D.Fine-tuned Llama 2

AnswerC

Cohere Command is designed for text generation, including summarization, and is a direct choice for this scenario.

Why this answer

Cohere Command is a large language model specifically designed for text generation tasks such as summarization, making it the most suitable choice for automatically summarizing customer support transcripts. Unlike embedding models or base Llama variants, Command is optimized for instruction-following and generating coherent, concise summaries from conversational data.

Exam trap

Oracle often tests the distinction between embedding models (Cohere Embed) and generative models (Cohere Command), leading candidates to mistakenly choose an embedding model for a text generation task like summarization.

How to eliminate wrong answers

Option A is wrong because Llama 3 70B is a general-purpose generative model that, while capable of summarization, is not specifically optimized for the summarization task in OCI Generative AI service; Cohere Command is the designated model for text generation and summarization within OCI. Option B is wrong because Cohere Embed is a text embedding model designed for semantic search and similarity tasks, not for generating summaries or any text output. Option D is wrong because Fine-tuned Llama 2, though customizable, is not a pre-built model offered by OCI Generative AI for summarization; OCI provides Cohere Command as the primary ready-to-use model for such generative tasks.

Practice this question →

Multi-Selecthard

Which THREE factors should be considered when choosing a model for a summarization task using OCI Generative AI?

Select 3 answers

A.Inference endpoint location.

B.Number of parameters.

C.Model training data cut-off date.

D.Context window size.

E.Supported languages.

AnswersB, D, E

More parameters generally mean better performance but higher cost.

Why this answer

Options A, C, and D are correct. Context window size (A) determines how much text the model can process at once. Number of parameters (C) affects capability and cost.

Supported languages (D) ensures the model can handle the input language. Option B is wrong because model training data cut-off date is less critical for summarization. Option E is wrong because inference endpoint location is about deployment, not model selection.

Practice this question →

MCQhard

Refer to the exhibit. The API Gateway fails to invoke the Generative AI service. What is the most likely missing configuration?

A.The API Gateway does not have an internet gateway.

B.The JWT token is expired.

C.The Generative AI model is not deployed.

D.The API Gateway is not in the same VCN as the service's private endpoint.

AnswerD

Private endpoints require the caller to be in the same VCN or have a service gateway.

Why this answer

Since the service endpoint is private, the API Gateway must be in the same VCN to have network connectivity.

Practice this question →

MCQmedium

A company uses OCI Generative AI service for customer support summarization. They notice the model frequently misses key details and generates hallucinations. What should they do first?

A.Adjust the prompt to be more specific and include few-shot examples.

B.Increase the temperature parameter.

C.Use a different base model.

D.Increase the max tokens.

AnswerA

Clear prompts with examples guide the model to produce accurate, relevant summaries.

Why this answer

Option C is correct because improving prompt engineering with specific instructions and few-shot examples reduces hallucinations and improves accuracy. Option A is wrong because increasing temperature increases randomness, making hallucinations worse. Option B is wrong because switching models is a more drastic step that may not address the root cause.

Option D is wrong because increasing max tokens does not improve accuracy.

Practice this question →

MCQeasy

A user gets a 'Model not found' error when calling an OCI Gen AI endpoint. What is the most likely cause?

A.The model is not available in the region

B.The request format is wrong

C.The API key is invalid

D.The endpoint is not deployed

AnswerD

Correct: Model not found usually means the endpoint hasn't been created or deployed.

Why this answer

The 'Model not found' error in OCI Generative AI typically occurs when the model endpoint has not been deployed or activated in the user's tenancy. Even if the model is available in the region and the request format is correct, the endpoint must be explicitly deployed (e.g., via the OCI Console, CLI, or SDK) before inference calls can succeed. This is a common prerequisite for using dedicated AI endpoints in OCI.

Exam trap

The trap here is that candidates confuse 'model not found' with model unavailability in the region, but Cisco tests the specific distinction between a model being listed in the catalog versus having a deployed endpoint ready for inference.

How to eliminate wrong answers

Option A is wrong because the model being unavailable in the region would typically result in a 'Model not supported in this region' or 'Service error' rather than a 'Model not found' error; the error message specifically indicates the endpoint is missing, not the model's regional availability. Option B is wrong because an incorrect request format (e.g., malformed JSON, missing required fields) would produce a 400 Bad Request or validation error, not a 'Model not found' error. Option C is wrong because an invalid API key would result in a 401 Unauthorized or 403 Forbidden error, not a 'Model not found' error, as authentication is checked before model resolution.

Practice this question →

MCQmedium

A company is using OCI Generative AI for customer support chatbots. They notice that responses sometimes include offensive content. Which built-in safety feature should they configure?

A.Content moderation filters

B.Configure stop sequences

C.Set a maximum token limit

D.Adjust the temperature parameter

AnswerA

These filters detect and block offensive or harmful content in inputs and outputs.

Why this answer

Option A is correct: Content moderation filters block harmful content. Option B (temperature) controls randomness, not safety. Option C (max tokens) limits length.

Option D (stop sequences) stops generation at specific tokens.

Practice this question →

MCQeasy

A startup wants to quickly prototype a chatbot using OCI Generative AI service. They have no prior experience with OCI. They want to test different models and parameters without writing any code and within a few minutes. They also want to save prompts and compare results. Which approach should they use?

A.Create a dedicated AI cluster and use the OCI SDK.

B.Use the OCI Generative AI Playground.

C.Use OCI Data Science Notebooks with the GenAI SDK.

D.Use OCI Functions to invoke the GenAI API.

AnswerB

Playground offers immediate testing with no code and built-in history.

Why this answer

Option D is correct because the OCI Generative AI Playground provides a no-code interface for testing models, adjusting parameters, and saving prompt history. The other options require setup or code.

Practice this question →

MCQeasy

A developer wants to integrate OCI GenAI into a Java application. Which SDK should they use?

A.OCI JavaScript SDK.

B.OCI Python SDK.

C.OCI Java SDK.

D.OCI CLI.

AnswerC

The Java SDK is designed for Java applications.

Why this answer

Option B is correct because the OCI Java SDK provides native Java support for calling OCI services including GenAI.

Practice this question →

Multi-Selectmedium

Which THREE are valid ways to interact with OCI Generative AI?

Select 3 answers

A.OCI Mobile App.

B.OCI Data Science Notebooks.

C.OCI REST API.

D.OCI CLI.

E.OCI Console Playground.

AnswersC, D, E

REST API is the underlying interface for all interactions.

Why this answer

Options A, B, and C are correct. The OCI Console Playground, CLI, and REST API are all direct interfaces. The others are not standard ways to invoke the service.

Practice this question →

MCQmedium

A financial firm wants to use OCI Generative AI for contract analysis. They need to reduce costs by using a smaller, specialized model. Which approach should they take?

A.Use a large base model (e.g., Cohere Command) on a serverless endpoint

B.Use a large base model on a dedicated AI cluster

C.Use a third-party LLM

D.Fine-tune a smaller base model on a dedicated AI cluster

AnswerD

Smaller fine-tuned model reduces cost while meeting specialization needs.

Why this answer

Option D is correct because fine-tuning a smaller base model on a dedicated AI cluster allows the financial firm to tailor the model specifically for contract analysis tasks, reducing computational overhead and cost compared to using a large general-purpose model. OCI Generative AI supports fine-tuning of smaller models like Cohere Command Light on dedicated AI clusters, enabling domain-specific optimization without the expense of running a large model for every inference.

Exam trap

Oracle often tests the misconception that larger models are always better for specialized tasks, but the trap here is that fine-tuning a smaller model on a dedicated AI cluster provides both cost efficiency and domain accuracy, which candidates overlook in favor of familiar large-model options.

How to eliminate wrong answers

Option A is wrong because using a large base model (e.g., Cohere Command) on a serverless endpoint incurs higher per-token costs and lacks the specialization needed for contract analysis, contradicting the requirement to reduce costs with a smaller model. Option B is wrong because deploying a large base model on a dedicated AI cluster increases infrastructure costs and still does not provide the targeted performance of a fine-tuned smaller model for contract-specific tasks. Option C is wrong because using a third-party LLM introduces data sovereignty, latency, and integration concerns, and does not leverage OCI's native fine-tuning capabilities for cost-effective specialization.

Practice this question →

MCQmedium

A retail company uses OCI Generative AI Agents to power a product recommendation chatbot on their e-commerce website. The chatbot is integrated with a knowledge base containing product descriptions, customer reviews, and inventory data. Recently, the chatbot has started recommending out-of-stock products frequently, leading to customer frustration. The development team verified that the knowledge base is updated in real-time with inventory data. The chatbot's configuration uses a chunking strategy with a chunk size of 500 tokens and an overlap of 50 tokens. The team suspects the issue is related to how the agent retrieves information. They have access to OCI Logging and Monitoring. Which course of action should the team take first?

A.Decrease the chunk size to 250 tokens to make chunks more specific.

B.Reduce the temperature parameter of the model to 0.2 to reduce hallucinations.

C.Enable auto-scaling on the AI cluster to improve response speed.

D.Increase the chunk overlap from 50 to 150 tokens to ensure inventory status is captured in multiple chunks.

AnswerD

Greater overlap ensures that inventory updates are not missed, improving the relevance of retrieved context.

Why this answer

The core issue is that the chatbot retrieves chunks that contain product descriptions but may miss the inventory status because the chunking strategy does not reliably include both pieces of information together. Increasing the chunk overlap from 50 to 150 tokens ensures that inventory data, which may be at the boundary of a chunk, is captured in multiple overlapping chunks, thereby increasing the likelihood that the retrieval step returns a chunk containing both the product and its current stock level. This directly addresses the retrieval gap without altering model behavior or infrastructure.

Exam trap

Oracle often tests the misconception that retrieval issues are always solved by adjusting model parameters (like temperature) or infrastructure scaling, when the real fix lies in tuning the chunking strategy to ensure critical metadata is not lost at chunk boundaries.

How to eliminate wrong answers

Option A is wrong because decreasing chunk size to 250 tokens would make chunks more specific but would also increase the number of chunks and the risk that inventory status is split across even more chunks, potentially worsening the problem. Option B is wrong because reducing the temperature parameter reduces randomness in generation but does not affect how the agent retrieves information from the knowledge base; the issue is retrieval, not hallucination. Option C is wrong because enabling auto-scaling improves response speed and throughput but does not change the content or structure of the chunks being retrieved, so it cannot fix the missing inventory data.

Practice this question →

Multi-Selectmedium

Which THREE factors should be considered when choosing a base model for OCI Gen AI?

Select 3 answers

A.Availability in the desired region

B.Cost per token

C.Supported languages

D.Model size

E.Training data format

AnswersA, B, D

Model must be supported in the region where you deploy.

Why this answer

Option A is correct because OCI Gen AI models are deployed in specific regions, and availability varies by region due to data residency requirements and infrastructure placement. Before selecting a base model, you must verify that the model is available in your desired OCI region to avoid deployment failures or latency issues.

Exam trap

Oracle often tests the misconception that model size (option D) is the most critical factor, but while model size affects performance and cost, it is not one of the three key factors listed; the trap is that candidates may overvalue model size and overlook regional availability and cost per token.

Practice this question →

MCQmedium

A company uses OCI Generative AI Service to build a chatbot for customer support. They notice that the model sometimes generates inappropriate responses. What is the MOST effective way to mitigate this without retraining the model?

A.Fine-tune the model with curated safe examples

B.Configure system instructions to define acceptable behavior

C.Reduce the temperature parameter to 0

D.Use the moderation API to filter responses

AnswerB

System instructions constrain the model's output at inference time without retraining.

Why this answer

Configuring system instructions is the most effective approach because it allows you to define the model's behavior and constraints at inference time without modifying the underlying model weights. In OCI Generative AI Service, system instructions act as a persistent prompt that guides the model's responses, enabling you to explicitly prohibit inappropriate content and enforce safety guidelines. This is a non-invasive, immediate mitigation that does not require the time, cost, or data preparation associated with retraining or fine-tuning.

Exam trap

Oracle often tests the distinction between inference-time controls (like system instructions) and training-time modifications (like fine-tuning), trapping candidates who assume that only retraining can fix behavioral issues, when in fact prompt-level constraints are the fastest and most practical solution for immediate mitigation.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires retraining the model with curated datasets, which is time-consuming, resource-intensive, and contradicts the question's constraint of 'without retraining the model.' Option C is wrong because reducing the temperature to 0 makes the model deterministic and less creative, but it does not prevent inappropriate responses—it only reduces randomness, not the likelihood of generating harmful content based on learned patterns. Option D is wrong because OCI Generative AI Service does not have a built-in 'moderation API' like some other cloud providers; while you could implement a separate content filter, this would be an external post-processing step rather than a direct configuration of the model's behavior, and the question asks for the most effective method within the service itself.

Practice this question →

Multi-Selecthard

Which TWO steps are necessary to deploy a fine-tuned model on a dedicated AI cluster?

Select 2 answers

A.Create the dedicated AI cluster

B.Set up a VCN with public subnet

C.Upload the model artifacts to OCI Object Storage

D.Obtain a third-party license

E.Create a model deployment endpoint

AnswersA, E

The cluster must exist to host the model.

Why this answer

Options A and C are necessary. You must create the dedicated AI cluster and then create a model deployment endpoint pointing to that cluster. Uploading artifacts (B) is typically handled automatically; VCN setup (D) may be needed but not always; third-party license (E) is not required.

Practice this question →

MCQmedium

A data scientist receives an error when calling the embed_text API: "InvalidRequest: input too long". What is the most likely cause and solution?

A.The model specified is not supported for embeddings; use a different model.

B.The input text exceeds the maximum token limit for the model; truncate the input.

C.The API request rate exceeds the tenancy limit; reduce the request rate.

D.The API key is invalid or expired; regenerate the key.

AnswerB

Embedding models have a fixed maximum input length.

Why this answer

Option C is correct because embedding models have a maximum token input length (e.g., 512 tokens); truncating the input resolves the error. Option A is incorrect because rate limiting returns a 429 status. Option B is incorrect because the API key is not related to input length.

Option D is incorrect because model availability returns a model not found error.

Practice this question →

Multi-Selecthard

Which THREE factors should be considered when choosing between fine-tuning a model and using a pre-trained model with prompt engineering? (Select three.)

Select 3 answers

A.Required response time

B.Size of available dataset

C.Internet connectivity

D.Available budget for compute resources

E.Need for domain-specific terminology

AnswersB, D, E

Fine-tuning requires a sufficiently large dataset; prompt engineering can work with few examples.

Why this answer

Option B is correct because the size of the available dataset is a critical factor: fine-tuning requires a sufficiently large, labeled dataset (typically thousands of examples) to adjust model weights effectively, while prompt engineering can work with zero or few examples. If the dataset is too small, fine-tuning risks overfitting and poor generalization, making prompt engineering the safer choice.

Exam trap

Oracle often tests the misconception that response time or internet connectivity are decisive factors, when in reality the core trade-off is between data availability and the need for deep domain adaptation versus lightweight, zero-shot customization.

Practice this question →

MCQmedium

Refer to the exhibit. The user requested a long story but the response is cut short. What is the most likely cause?

A.The model content filter blocked part of the output.

B.There was a network error during inference.

C.The max_tokens parameter is too low for the requested length.

D.The model is not capable of generating long stories.

AnswerC

max_tokens=100 restricts output length; finish_reason 'length' confirms this.

Why this answer

The finish_reason is 'length' indicating the output hit the max_tokens limit. The model stopped because it reached the token limit.

Practice this question →

MCQeasy

A developer wants to generate text embeddings using OCI Generative AI. Which model endpoint should they call?

A.POST /v1/generate

B.POST /v1/summarize

C.POST /v1/embed

D.POST /v1/chat

AnswerC

The embed endpoint returns vector embeddings for input text.

Why this answer

Option C is correct: The embed endpoint is for generating embeddings. Option A (generate) is for text generation. Option B (summarize) is for summarization.

Option D (chat) is for conversation.

Practice this question →

Multi-Selecteasy

Which THREE OCI Generative AI service features help in controlling the cost of API calls? (Select three.)

Select 3 answers

A.Using a smaller model

B.Increasing temperature

C.Using stop sequences

D.Setting max_tokens limit

E.Enabling response streaming

AnswersA, C, D

Smaller models have lower per-token pricing.

Why this answer

Setting max_tokens limits output length, using a smaller model reduces cost per token, and using stop sequences can end generation early to save tokens.

Practice this question →

Multi-Selectmedium

Which TWO configurations are required to use a custom fine-tuned model on OCI Gen AI?

Select 2 answers

A.A security list

B.A dedicated AI cluster

C.Training data in Object Storage

D.An API key

E.A serverless endpoint

AnswersB, C

Required for fine-tuning and hosting custom models.

Why this answer

Options A and C are required. A dedicated AI cluster is needed for training and inference, and training data must be stored in Object Storage. Serverless endpoint (B) is optional, API key (D) is always required but not specific to custom models, and security list (E) is network configuration.

Practice this question →

MCQhard

An enterprise is fine-tuning a Cohere model using OCI Generative AI for a domain-specific task. After training, the model shows high accuracy on validation data but poor performance on unseen test data. What is the most likely cause?

A.The training dataset was too small

B.The number of training epochs was too low

C.The model overfitted to training data

D.The learning rate was set too high

AnswerC

High validation accuracy but poor test accuracy is classic overfitting.

Why this answer

Option D is correct: Overfitting occurs when model learns training data too well. Option A (learning rate) is possible but overfitting is more indicative. Option B (dataset size) could cause underfitting, not overfitting.

Option C (epochs) too many can cause overfitting, but the symptom matches D.

Practice this question →

Multi-Selecthard

Which THREE factors are important when designing a multi-turn conversational agent using OCI Generative AI Agents?

Select 3 answers

A.Always generate the longest possible response to be thorough.

B.Manage the context window size to avoid truncating important earlier messages.

C.Implement guardrails to detect and filter sensitive topics or harmful intents.

D.Disable logging to reduce latency and cost.

E.Enable session management to maintain conversation history across turns.

AnswersB, C, E

If the context window is too small, the agent may lose track of earlier parts of the conversation.

Why this answer

Managing the context window size is critical because OCI Generative AI Agents have a fixed token limit for the conversation history. If the context window is exceeded, the agent truncates the oldest messages, which can remove essential context from earlier turns, leading to incoherent or incorrect responses. Proper management ensures that the most relevant history is retained without exceeding the model's maximum input length.

Exam trap

Oracle often tests the misconception that longer responses are better for thoroughness, when in fact they degrade performance and user experience, and that disabling logging is a harmless optimization, whereas it removes critical observability and debugging capabilities.

Practice this question →

MCQmedium

A user has attached an IAM policy granting access to the generative-ai-family resource type, but API calls to the Generative AI service return a 403 Forbidden error. What is the most likely cause?

A.The service is not enabled in the tenancy.

B.The policy does not include the 'use' verb for the resource type.

C.The user has not created a Dedicated AI Cluster.

D.The policy does not specify the correct compartment or resource.

AnswerD

Policies must include a target compartment; otherwise, access defaults to denial.

Why this answer

Option C is correct because OCI Generative AI resources are compartment-scoped, and if the policy does not specify the correct compartment, access is denied. Option A is incorrect because the policy allows the resource type. Option B is incorrect because Dedicated AI Cluster is not a prerequisite for managed inference.

Option D is incorrect because the policy already includes the service.

Practice this question →

MCQeasy

A startup is building a chatbot for customer support using OCI Generative AI Service. The chatbot needs to answer queries about product features based on a knowledge base of product documentation. Which configuration is most appropriate for this use case?

A.Use the Summarization task type to generate concise answers from the documentation.

B.Use a Cohere Command model with the knowledge base as context in a prompt, and enable retrieval-augmented generation (RAG) via OCI Generative AI Agents.

C.Fine-tune a Llama 2 70B model on the product documentation to create a custom model.

D.Use the Code Generation model to produce SQL queries that retrieve answers from a database.

AnswerB

This approach uses a foundation model with RAG to ground responses in the knowledge base, which is ideal for question answering.

Why this answer

Option B is correct because OCI Generative AI Agents with retrieval-augmented generation (RAG) allows the chatbot to dynamically retrieve relevant chunks from the product documentation knowledge base and inject them as context into a Cohere Command model prompt. This approach ensures answers are grounded in the latest documentation without requiring fine-tuning, and it scales efficiently as the knowledge base grows.

Exam trap

Oracle often tests the distinction between task-specific models (summarization, code generation) and the RAG architecture, leading candidates to mistakenly choose a simpler task type like summarization instead of recognizing the need for retrieval-augmented generation.

How to eliminate wrong answers

Option A is wrong because the Summarization task type is designed to condense a given text into a shorter summary, not to answer specific queries by retrieving and reasoning over a knowledge base; it lacks the retrieval component needed for question answering. Option C is wrong because fine-tuning a Llama 2 70B model on product documentation would be computationally expensive, requires significant labeled data, and does not easily accommodate updates to the documentation without retraining, making it impractical for a dynamic knowledge base. Option D is wrong because Code Generation models are specialized for generating code (e.g., SQL, Python), not for answering natural language questions from a knowledge base; using SQL queries would require a structured database schema, which is not the case for unstructured product documentation.

Practice this question →

MCQeasy

A retail company wants to generate product descriptions from attribute data. They have no prior AI experience. Which approach is most appropriate?

A.Use the Cohere Command model with carefully crafted prompts.

B.Train a custom model from scratch.

C.Fine-tune a model on a synthetic dataset.

D.Use the Cohere Embed model to generate embeddings and then decode.

AnswerA

Cohere Command can generate descriptions directly with simple prompts, requiring no additional training.

Why this answer

Using a pre-trained model with prompt engineering is the fastest and most cost-effective way to start.

Practice this question →

MCQmedium

A developer is using OCI Generative AI Service to generate product descriptions. The outputs are often too generic and lack brand-specific tone. The developer has a small set of 20 high-quality example descriptions. What is the most efficient approach to improve output quality?

A.Fine-tune a base model on the 20 examples.

B.Use few-shot prompting by including the 20 examples in the prompt.

C.Use a more detailed system prompt describing the brand tone.

D.Use chain-of-thought prompting to guide the model step by step.

AnswerB

Few-shot prompting leverages examples without retraining, ideal for small datasets.

Why this answer

Option B is correct because few-shot prompting is the most efficient approach when you have a small set of high-quality examples (20 in this case). It allows the model to infer the desired tone and style directly from the provided examples without requiring any training or fine-tuning, which would be inefficient and potentially ineffective with such a small dataset. In OCI Generative AI Service, few-shot prompting leverages the model's in-context learning capability to adapt its output to the brand-specific tone.

Exam trap

Oracle often tests the misconception that fine-tuning is always the best approach for customization, but candidates overlook the fact that with very small datasets (like 20 examples), few-shot prompting is more practical and efficient than fine-tuning.

How to eliminate wrong answers

Option A is wrong because fine-tuning a base model on only 20 examples is inefficient and unlikely to produce reliable results; fine-tuning typically requires hundreds to thousands of high-quality examples to avoid overfitting and to meaningfully adjust model weights. Option C is wrong because a more detailed system prompt describing the brand tone, while helpful, is less effective than providing concrete examples; the model may still produce generic outputs without specific stylistic references. Option D is wrong because chain-of-thought prompting is designed to improve reasoning and step-by-step logic, not to adapt tone or style; it does not address the core issue of generating brand-specific product descriptions.

Practice this question →

MCQhard

A financial services firm needs to extract named entities from legal contracts using OCI Generative AI. They require high accuracy and must handle domain-specific terminology. Which approach is most effective?

A.Fine-tune a Cohere Command model using a dataset of annotated legal contracts.

B.Use the base Cohere Command model with zero-shot prompting for entity extraction.

C.Use the Cohere Chat model with system prompts describing the entities.

D.Use the Cohere Embed model to generate embeddings and then train a separate classifier.

AnswerA

Fine-tuning adapts the model to the domain and entity types.

Why this answer

Option A is correct because fine-tuning a Cohere Command model on a labeled dataset of legal contracts yields the best accuracy for domain-specific entities. Option B is incorrect because zero-shot extraction is less accurate for specialized terms. Option C is incorrect because using a generic embedding model would require a separate classifier and may underperform.

Option D is incorrect because prompt engineering alone cannot achieve high accuracy for complex entity extraction.

Practice this question →

MCQeasy

Which OCI Generative AI capability allows you to provide example input-output pairs to guide the model's behavior without fine-tuning?

A.Few-shot learning

B.Reinforcement Learning from Human Feedback (RLHF)

C.Prompt engineering

D.Fine-tuning

AnswerA

In-context learning with a few examples steers the model without retraining.

Why this answer

Option A is correct: Few-shot learning uses examples in the prompt. Option B (Fine-tuning) retrains the model. Option C (Prompt engineering) is broader.

Option D (RLHF) uses human feedback.

Practice this question →

Multi-Selecthard

Which THREE parameters can be adjusted to reduce repetition in generated text? (Choose three.)

Select 3 answers

A.presence_penalty

B.top_k

C.max_tokens

D.frequency_penalty

E.temperature

AnswersA, B, D

Penalizes tokens that have appeared at all.

Why this answer

Options A, C, and D help reduce repetition. B (temperature) increases randomness but can also cause repetition? Actually temperature can reduce repetition but not primarily. Typical repetition reduction uses frequency, presence, and top_k.

But top_k affects diversity. Correct: frequency, presence, and top_k. Temperature affects randomness.

So A, C, D are correct. B is not primarily for repetition.

Practice this question →

Multi-Selectmedium

Which TWO measures can help reduce the risk of generating toxic or unsafe content when using OCI Generative AI Service?

Select 2 answers

A.Use few-shot prompting with examples that demonstrate safe and appropriate responses.

B.Disable model monitoring and logging to reduce overhead.

C.Increase the temperature parameter to make output more deterministic.

D.Fine-tune the model on a large dataset without any safety filtering.

E.Enable the built-in content filtering features provided by OCI Generative AI Service.

AnswersA, E

Safe examples help steer the model toward desired behavior.

Why this answer

Few-shot prompting provides the model with explicit examples of safe, appropriate responses, which helps steer the model's behavior toward desired outputs and reduces the likelihood of generating toxic or unsafe content. This technique leverages in-context learning to align the model's responses with the provided examples, making it a practical measure for content safety.

Exam trap

Oracle often tests the misconception that increasing temperature or disabling monitoring improves safety, when in fact these actions increase randomness and reduce oversight, respectively.

Practice this question →

MCQmedium

A company notices that their OCI GenAI managed serving endpoint returns incomplete responses for long prompts. What is the most likely cause?

A.The model's context window is exceeded.

B.The max tokens parameter is set too high.

C.The top_p value is too low.

D.The temperature is set too high.

AnswerA

Models have a maximum input token limit; exceeding it truncates the input or output.

Why this answer

Option A is correct because incomplete responses typically indicate that the input prompt exceeds the model's context window. Other options affect output quality but not truncation.

Practice this question →

MCQhard

A financial services company has deployed a custom fine-tuned model using OCI Generative AI service on a dedicated AI cluster for automated report generation. They use a Python application that sends prompts via the OCI SDK. Recently, they started seeing 429 Too Many Requests errors intermittently. The dedicated cluster has 2 replicas and the application is making about 100 requests per second. The cluster's documented throughput is 50 requests per second per replica. The company has not set up any throttling limits. What is the most likely cause of the 429 errors?

A.The application is exceeding the cluster's replica capacity.

B.The OCI SDK version is outdated.

C.The API calls are not authenticated.

D.The model's context window is too small for the prompts.

AnswerA

The cluster is at maximum throughput; bursts push over the limit.

Why this answer

Option B is correct because the application's request rate (100 requests/second) matches the cluster's total capacity (2 replicas x 50 = 100). Any burst can exceed capacity, causing 429 errors. Other options would produce different error codes.

Practice this question →

MCQeasy

Which OCI Generative AI parameter controls the diversity of generated text by increasing the probability of less likely tokens?

A.frequency_penalty

B.top_k

C.temperature

D.top_p

AnswerC

Higher temperature increases randomness by scaling logits before softmax.

Why this answer

Option C is correct: Temperature above 1 increases randomness. Option A (top_p) is nucleus sampling, option B (top_k) limits to top K tokens, option D (frequency_penalty) reduces repetition.

Practice this question →

MCQeasy

Refer to the exhibit. A user runs this command and gets an error: 'InvalidParameter: unit-shape 'GPU.1' is not supported in this region. Supported shapes: GPU.2, GPU.4'. What should the user do?

A.Use unit-shape GPU.2

B.Increase unit count

C.Change compartment

D.Use a different region

AnswerA

GPU.2 is explicitly supported in the error message.

Why this answer

Option B is correct because the error indicates GPU.1 is unsupported, and GPU.2 is listed as supported. Changing region (A) may work but is unnecessary; increasing unit count (C) or changing compartment (D) will not resolve the shape issue.

Practice this question →

MCQmedium

A company is fine-tuning a Llama model on OCI with dedicated AI cluster. They want to use their own training data stored in Oracle Object Storage. What must they do to ensure the fine-tuning job can access the data?

A.Upload the data to the dedicated cluster's local storage.

B.Use OCI Data Flow to transfer data.

C.Configure a service principal for the cluster to read the bucket.

D.Create a pre-authenticated request for the bucket.

AnswerC

A service principal grants the cluster permissions to access resources like Object Storage.

Why this answer

Option D is correct because the dedicated AI cluster needs a service principal (resource principal) with permissions to read the Object Storage bucket. Other options are not valid for controlled access.

Practice this question →

MCQeasy

A developer is using OCI Generative AI service's CLI to generate text but gets a 'Rate limit exceeded' error. What is the most likely cause?

A.The number of requests per minute exceeds the allowed quota.

B.The API key is invalid.

C.The model is not available in that region.

D.The input text is too long.

AnswerA

Rate limiting is enforced to control throughput.

Why this answer

Rate limit exceeded indicates the request quota per minute has been reached.

Practice this question →

Multi-Selectmedium

Which TWO configuration steps are required to enable OCI Generative AI service in a tenancy? (Select two.)

Select 2 answers

A.Enable the generative AI service in the console.

B.Set up a VCN with service gateway.

C.Create a dynamic group.

D.Subscribe to the service in the tenancy.

E.Create a policy to allow access.

AnswersD, E

Subscription enables the service for the tenancy.

Why this answer

You must subscribe to the service in the tenancy and then create a policy to allow access.

Practice this question →

MCQmedium

A healthcare startup uses OCI Generative AI to automatically generate patient summary reports from clinical notes. They use the Cohere command model (command-r-plus) with default parameters. Over the past week, the team has noticed two issues: (1) the summaries occasionally contain medical inaccuracies, such as incorrect drug dosages or misinterpreted lab results, and (2) the response time has increased from an average of 2 seconds to over 10 seconds. The application has a high volume of concurrent requests, and the startup has already increased the max tokens to 4096 and set temperature to 0.1. The model appears to perform well on general language tasks but struggles with specialized medical terminology. The team is looking for a long-term solution that balances accuracy, latency, and cost. What should they do?

A.Switch to a larger base model, such as Llama 2 70B, to improve knowledge of medical terms

B.Fine-tune the Cohere command model using a curated dataset of medical notes and correct summaries

C.Implement a caching layer to store and reuse responses for identical queries

D.Reduce the max tokens parameter from 4096 to 1024 to decrease response time

AnswerB

Fine-tuning adapts the model to the medical domain, improving accuracy for the specific use case. This also reduces the need for high token counts, indirectly improving latency.

Why this answer

Option B is the best course of action. Fine-tuning the model on a curated medical dataset will improve accuracy for domain-specific terminology and context. This addresses the root cause of inaccuracies.

While fine-tuning itself is time-consuming, it reduces latency in the long run because the model becomes more efficient for the specific task. Option A (reducing max tokens) might improve latency but won't fix inaccuracies. Option C (switching to a larger model) may increase latency and cost without guaranteeing improvement on medical terms.

Option D (caching) does not address inaccurate responses for new queries.

Practice this question →

MCQeasy

A developer wants to use the OCI Generative AI service to generate text using a Cohere model. Which SDK class should be used for inference calls?

A.GenerativeAiInferenceClient

B.CustomModelClient

C.AiInferenceClient

D.GenerativeAiClient

AnswerA

This client provides methods like generate_text and embed_text for inference.

Why this answer

Option B is correct because the GenerativeAiInferenceClient is the dedicated client for inference operations in the OCI Python SDK. Option A is incorrect because GenerativeAiClient is for managing models and endpoints, not inference. Option C is incorrect because it's a generic term.

Option D is incorrect because it's for managing custom models.

Practice this question →

Multi-Selectmedium

Which THREE of the following are supported capabilities of OCI Generative AI Service?

Select 3 answers

A.Text summarization

B.Sentiment analysis

C.Image generation

D.Question answering

E.Code generation

AnswersA, D, E

Summarization is a core capability.

Why this answer

Option A is correct because OCI Generative AI Service includes a dedicated text summarization capability that uses large language models (LLMs) to generate concise summaries from longer documents. This feature is part of the service's core generative AI offerings, supporting use cases like meeting notes summarization and document abstraction.

Exam trap

Oracle often tests the distinction between OCI Generative AI Service (text generation only) and other OCI AI services (e.g., AI Language for sentiment analysis, Vision for image tasks), causing candidates to mistakenly attribute all AI capabilities to the generative service.

Practice this question →

MCQmedium

Users report that inference requests to the OCI Generative AI service are taking longer than expected. The application uses the on-demand endpoint. What is the most likely cause of the increased latency?

A.The inference model is not fine-tuned for the use case.

B.The on-demand endpoint experiences shared resource contention.

C.The selected model is too large for the use case.

D.The API request timeout is set too low.

AnswerB

On-demand endpoints are multi-tenant; high concurrent usage can cause latency spikes.

Why this answer

On-demand endpoints share resources; during peak usage, resource contention increases latency. Dedicated AI clusters provide predictable performance.

Practice this question →

MCQmedium

A company wants to use OCI Generative AI service to automatically generate product descriptions for an e-commerce catalog. They have 10,000 products. What is the best approach to ensure high-quality, consistent descriptions?

A.Use a pre-trained summarization model.

B.Use a template-based generation with keyword insertion.

C.Use the built-in chat model with few-shot examples in the prompt.

D.Fine-tune a base model on a dataset of existing product descriptions.

AnswerD

Fine-tuning adapts the model to the specific domain and produces consistent outputs across many products.

Why this answer

Fine-tuning a base model on existing product descriptions ensures the model learns the specific style and terminology, leading to consistent and high-quality outputs for a large number of products.

Practice this question →

MCQhard

You are a cloud architect at a healthcare company that uses OCI Generative AI Service to analyze patient records and generate clinical summaries. The service is deployed in the Frankfurt region with a dedicated AI cluster. Recently, the compliance team flagged that some generated summaries contain hallucinated diagnoses not present in the source records. They demand immediate mitigation. The current setup uses the default model (cohere.command-r-08-2024) with temperature=0.7, top_p=0.9, and max_tokens=2048. The application sends the entire patient record as a single prompt. You have access to OCI Logging, monitoring metrics (latency, request count, token count, safety filter rejections), and the AI service's model fine-tuning capability. You must reduce hallucinations while minimizing latency increase. What is the most effective course of action?

A.Switch to cohere.command-light model for faster inference and add a post-processing step using a BERT-based NER model to validate entities.

B.Increase max_tokens to 4096 and use chunked processing with overlapping context windows to provide more context.

C.Enable the safety filter with strict content moderation and set up OCI Logging to audit all generations.

D.Reduce temperature to 0.2, top_p to 0.5, and fine-tune the model on a curated dataset of 5,000 clinical summaries with a learning rate of 0.00005 and batch size of 8.

AnswerD

Lower temperature/top_p yields more deterministic outputs; fine-tuning on domain-specific data directly reduces hallucinations.

Why this answer

Option D is correct because reducing temperature and top_p makes the model more deterministic, reducing randomness and thus hallucinations. Fine-tuning on curated clinical data with a lower learning rate and smaller batch size aligns the model to the domain without excessive training. Option A might reduce hallucinations but increases latency and token cost.

Option B only adds a safety filter, which does not address factual accuracy. Option C may change style but not reduce hallucinations.

Practice this question →

MCQmedium

A data scientist is fine-tuning a model on OCI Generative AI to generate code comments. They use a dataset of 10,000 examples. After fine-tuning, the model generates comments that are too similar to the training data and lack generalization. What is the most likely cause?

A.Incorrect tokenizer.

B.Insufficient training data.

C.Too many training epochs.

D.Too high learning rate.

AnswerC

Excessive epochs cause the model to memorize training data, reducing generalization.

Why this answer

Option C is correct. Too many training epochs lead to overfitting, where the model memorizes the training data rather than learning general patterns. Option A is wrong because 10,000 examples is not insufficient for fine-tuning; overfitting is more likely.

Option B is wrong because a high learning rate causes divergence, not memorization. Option D is wrong because an incorrect tokenizer would cause garbled output, not just over-similarity.

Practice this question →

MCQmedium

A healthcare company is deploying an OCI Generative AI service to summarize patient notes. They have recently moved from a managed serving endpoint to a dedicated AI cluster to ensure data privacy. The fine-tuned model is deployed on a dedicated cluster in the US West region. Users report that the summarization responses are now slower and occasionally timeout. The IT team checks the metrics: the cluster has 1 replica and CPU utilization is at 90%. The Object Storage bucket containing the model artifacts is in the same region. They have increased the timeout in their client configuration to 120 seconds, but still get timeouts. What should they do first to address the issue?

A.Move the Object Storage bucket to a local NVMe cache in the cluster.

B.Move the model back to a managed serving endpoint in a different region.

C.Increase the number of replicas in the dedicated cluster.

D.Increase the max tokens parameter in the API call.

AnswerC

Adding replicas provides more compute capacity to handle the load.

Why this answer

Option B is correct because the cluster is overloaded with a single replica, causing high CPU and timeouts. Increasing replicas distributes the load and reduces latency. Other options are either counterproductive or not feasible.

Practice this question →

MCQeasy

A team is using OCI Generative AI Agents to build a customer support bot. The bot sometimes generates answers that contradict the knowledge base. What is the most likely cause?

A.The chunking strategy for the knowledge base does not capture enough context overlap.

B.The max tokens value is too low, truncating the response.

C.The temperature parameter is set too high, causing the model to hallucinate.

D.The model's repetition penalty is too high.

AnswerA

If chunks are too small or lack overlap, the model may not retrieve all relevant information, leading to inconsistencies.

Why this answer

Option A is correct because when the chunking strategy lacks sufficient context overlap, the retrieved chunks may omit critical surrounding information, causing the generative AI model to infer missing details incorrectly and produce answers that contradict the knowledge base. In OCI Generative AI Agents, the chunking strategy determines how documents are split into smaller pieces for retrieval; without adequate overlap, the model loses the semantic continuity needed to stay faithful to the source material.

Exam trap

Oracle often tests the misconception that hallucinations are always caused by temperature settings, when in fact retrieval quality issues like poor chunking are a more common root cause in RAG-based systems.

How to eliminate wrong answers

Option B is wrong because a low max tokens value truncates the response length but does not cause the model to generate contradictory content; it simply cuts off the output prematurely. Option C is wrong because while a high temperature parameter increases randomness and can lead to hallucinations, the question specifically states the bot contradicts the knowledge base, which is more directly tied to retrieval failures (chunking) than to generation randomness. Option D is wrong because a high repetition penalty discourages the model from repeating phrases, which might reduce fluency but does not cause contradictions with the knowledge base.

Practice this question →

MCQhard

A team is building a Retrieval-Augmented Generation (RAG) pipeline using OCI Generative AI. They need to store and retrieve document embeddings for semantic search. Which OCI service is most appropriate as the vector store?

A.OCI Search with OpenSearch

B.OCI Streaming

C.OCI Object Storage

D.OCI Autonomous Database with AI Vector Search

AnswerA

OpenSearch supports vector storage and k-NN search, making it ideal for RAG pipelines.

Why this answer

OCI Search with OpenSearch provides vector database capabilities that integrate natively with OCI GenAI for RAG workflows.

Practice this question →

MCQhard

A company is deploying a multi-language chatbot using OCI Generative AI Service. The chatbot must support English, Spanish, and French. The team finds that responses in Spanish are less accurate than in English. They have a small bilingual dataset. What is the best approach?

A.Use a multilingual base model (e.g., mT5) and fine-tune on the bilingual dataset (English and Spanish) using cross-lingual transfer learning.

B.Use prompt engineering with language-specific instructions in the system prompt.

C.Translate all user queries to English, process them, then translate responses back.

D.Train separate fine-tuned models for each language.

AnswerA

Cross-lingual transfer leverages English data to improve Spanish performance, and fine-tuning on bilingual data further boosts accuracy.

Why this answer

Option A is correct because fine-tuning a multilingual base model like mT5 on a small bilingual dataset leverages cross-lingual transfer learning, where knowledge from high-resource languages (English) improves performance on low-resource languages (Spanish). This approach is specifically designed for scenarios with limited data and directly addresses the accuracy gap without requiring separate models or translation pipelines.

Exam trap

The trap here is that candidates often overestimate the power of prompt engineering (Option B) for language-specific accuracy, underestimating that systematic linguistic errors require model adaptation through fine-tuning or transfer learning, not just instruction tuning.

How to eliminate wrong answers

Option B is wrong because prompt engineering with language-specific instructions does not adapt the model's internal representations; it merely provides contextual cues, which is insufficient to correct systematic inaccuracies in a specific language. Option C is wrong because translating queries to English and back introduces translation errors, latency, and loss of nuance, and does not improve the model's native understanding of Spanish. Option D is wrong because training separate fine-tuned models for each language is inefficient with a small bilingual dataset and fails to exploit cross-lingual transfer, leading to poor performance on the low-resource language.

Practice this question →

MCQeasy

Refer to the exhibit. In this RAG pipeline, what is the role of the 'embedding_model' variable?

A.It converts text into vector representations for similarity search.

B.It applies guardrails to filter content.

C.It fine-tunes the model on the provided texts.

D.It generates text completions based on prompts.

AnswerA

Embeddings are used to index and retrieve relevant documents via vector similarity.

Why this answer

The embedding model converts text into numerical vector representations that can be stored and searched for similarity.

Practice this question →

MCQmedium

A company uses OCI Generative AI Service to generate personalized email content. They need to ensure that personally identifiable information (PII) is not included in the model's training data. What should they do?

A.Encrypt the training data with OCI Vault

B.Use the moderation API to scan outputs

C.Use a dedicated model endpoint

D.Enable data redaction in the service

AnswerD

Data redaction removes PII before processing.

Why this answer

Option D is correct because OCI Generative AI Service provides a built-in data redaction feature that automatically detects and removes personally identifiable information (PII) from training data before it is used for model training. This ensures compliance with data privacy regulations without requiring manual preprocessing or external tools.

Exam trap

The trap here is confusing data redaction (pre-training data sanitization) with output moderation (post-generation filtering), leading candidates to incorrectly select the moderation API option.

How to eliminate wrong answers

Option A is wrong because encrypting training data with OCI Vault protects data at rest and in transit but does not remove or redact PII from the content itself; encryption does not prevent PII from being included in model training. Option B is wrong because the moderation API is designed to scan and filter model outputs (inference results) for inappropriate content, not to sanitize training data before training occurs. Option C is wrong because using a dedicated model endpoint isolates the model instance but does not alter or filter the training data; it addresses data residency or performance concerns, not PII removal.

Practice this question →