Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 76150

500 questions total · 7pages · All types, answers revealed

Page 1

Page 2 of 7

Page 3
76
MCQhard

Your organization has deployed a generative AI model for a multilingual translation service on OCI Model Deployment. The model is a 13B parameter transformer hosted on a single VM.GPU.A100.1 shape with 2 replicas. Recently, the service experiences intermittent timeouts when a burst of requests arrives. You have enabled autoscaling based on CPU utilization, but the scaling is too slow. After investigation, you find that the model inference time is highly variable due to different sequence lengths. You need to ensure the service can handle sudden spikes without timeouts. Which solution should you implement?

A.Implement a request queue (e.g., OCI Queue) to buffer requests and process them asynchronously
B.Increase the maximum number of replicas and prewarm additional replicas before expected traffic
C.Reduce the model size to a 7B parameter model to decrease inference time
D.Use autoscaling based on the number of messages in the request queue
AnswerA

Queuing decouples traffic spikes from the model, preventing timeouts.

Why this answer

Option A is correct because implementing a request queue (e.g., OCI Queue) decouples request ingestion from processing, allowing the service to buffer bursts of requests and process them asynchronously. This prevents timeouts by smoothing out the variable inference times caused by differing sequence lengths, as the queue absorbs spikes and the model processes at its own pace. Autoscaling based on CPU utilization is too slow for sudden spikes, but a queue provides immediate relief by not dropping requests.

Exam trap

The trap here is that candidates often assume autoscaling (option B or D) is sufficient for burst handling, but they overlook that autoscaling has inherent latency (minutes to provision new replicas), whereas a request queue provides immediate buffering to absorb spikes without dropping requests.

How to eliminate wrong answers

Option B is wrong because increasing the maximum number of replicas and prewarming them only helps if the scaling mechanism is fast enough to react; it does not address the root cause of variable inference times and still relies on autoscaling, which is too slow for sudden bursts. Option C is wrong because reducing the model size to a 7B parameter model would degrade translation quality and does not solve the intermittent timeout issue caused by variable sequence lengths; it might reduce average inference time but not eliminate spikes. Option D is wrong because autoscaling based on the number of messages in the request queue would still be reactive and subject to latency in provisioning new replicas, and it does not prevent timeouts during the scaling delay; the queue itself is the primary solution to buffer requests.

77
MCQeasy

When chunking a large Python code repository for a RAG application, which chunking strategy is best suited to preserve code semantics and functionality?

A.Semantic chunking based on function and class definitions
B.Fixed-size chunking with 512 tokens
C.Sentence-level chunking
D.Character-level chunking
AnswerA

Keeps logical code blocks intact.

Why this answer

Semantic chunking (e.g., splitting by function definitions, classes) keeps related code together, preserving context. Fixed-size or sentence splitting can break syntactical units. Character splitting destroys meaning.

78
MCQeasy

Refer to the exhibit. A user in group GenAIUsers reports that they cannot call the OCI Generative AI API. What is the most likely issue?

A.The policy statement is missing the 'inspect' verb.
B.The policy is in INACTIVE state.
C.The compartment ID in the policy does not match the user's compartment.
D.The user is not in the group GenAIUsers.
AnswerC

The policy applies to 'ExampleCompartment' by name, but the user may be in a different compartment. The compartment OCID in the policy header does not match the compartment name in the statement, indicating a mismatch.

Why this answer

The policy is scoped to a specific compartment ID, but the user's compartment does not match that ID. For OCI IAM policies to grant access to resources like the Generative AI API, the policy must be written for the compartment where the resource resides or where the user operates. Since the user is in a different compartment, the policy does not apply, causing the API call to fail.

Exam trap

Oracle often tests the misconception that a user's group membership is the sole factor for policy applicability, ignoring that the compartment scope in the policy statement must match the user's compartment or resource compartment for the policy to take effect.

How to eliminate wrong answers

Option A is wrong because the 'inspect' verb is not required for calling the Generative AI API; the policy uses 'allow group GenAIUsers to manage generative-ai-family in compartment ...', which includes all verbs (inspect, read, use, manage) and is sufficient. Option B is wrong because the exhibit shows the policy is in ACTIVE state, not INACTIVE; an INACTIVE policy would be explicitly marked and would not enforce any rules. Option D is wrong because the user reports being in group GenAIUsers, and the policy targets that group; if the user were not in the group, the error would be an authorization failure, but the issue here is compartment mismatch.

79
Multi-Selectmedium

Which THREE factors should be considered when designing a chunking strategy for a RAG application?

Select 3 answers
A.Desired granularity of retrieval
B.Number of GPUs available
C.Database indexing method
D.Document structure
E.Embedding model's maximum input tokens
AnswersA, D, E

Smaller chunks allow more precise retrieval; larger chunks provide more context.

Why this answer

Document structure (e.g., paragraphs), embedding model token limit, and desired retrieval granularity are key. GPU availability is unrelated; indexing method is post-chunking.

80
MCQhard

A healthcare organization is using OCI Generative AI to analyze medical records. They must comply with HIPAA. They have set up a dedicated AI cluster with private endpoints. However, they are concerned about model hallucinations that could lead to incorrect medical advice. They want to minimize hallucinations while maintaining usefulness. Which approach is most effective?

A.Use a smaller model to reduce complexity.
B.Implement a retrieval-augmented generation (RAG) pipeline with a verified medical knowledge base.
C.Increase the temperature to encourage diverse outputs.
D.Reduce max tokens to force shorter responses.
AnswerB

RAG grounds generation in factual sources, reducing hallucinations.

Why this answer

Option B is correct. Implementing a retrieval-augmented generation (RAG) pipeline grounds the model in a verified medical knowledge base, significantly reducing hallucinations. Option A is wrong because smaller models may still hallucinate and may be less capable.

Option C is wrong because increasing temperature increases randomness, making hallucinations worse. Option D is wrong because reducing max tokens truncates output but does not address factual accuracy.

81
Multi-Selectmedium

Which TWO actions should be taken to monitor model drift in a deployed generative AI model? (Select TWO)

Select 2 answers
A.Compare inference statistics over time
B.Retrain the model weekly
C.Use OCI Data Labeling for new data
D.Set up alerts on accuracy metrics
E.Deploy multiple model versions
AnswersA, D

Tracking statistics like output length or sentiment can indicate drift.

Why this answer

Comparing inference statistics over time (Option A) is correct because model drift in generative AI is detected by monitoring changes in output distributions, token probabilities, or response patterns relative to baseline metrics. This allows you to identify when the model's behavior deviates from expected performance due to shifts in input data or underlying patterns.

Exam trap

Oracle often tests the distinction between monitoring actions (detecting drift) and remediation actions (retraining, labeling, deploying versions), so candidates mistakenly select retraining or labeling as monitoring steps.

82
Multi-Selecteasy

Which TWO of the following are valid approaches to serve a RAG application in OCI with low latency?

Select 2 answers
A.Pre-compute embeddings and answers for all possible questions.
B.Deploy the vector store on multiple regions to reduce network latency.
C.Increase the chunk size to reduce the number of retrievals.
D.Implement a caching layer for frequently asked questions.
E.Use an LLM that supports streaming response for faster user feedback.
AnswersD, E

Caching avoids redundant retrieval and generation, reducing latency for common queries.

Why this answer

Options C and D are correct. Using an LLM with streaming response (C) reduces perceived latency. Caching common queries (D) avoids repeated retrieval and generation.

Option A is wrong because pre-computing all possible answers is impractical. Option B is wrong because increasing chunk size can increase latency due to more tokens to process.

83
MCQhard

A generative AI model deployed on OCI Model Deployment is experiencing high tail latency. The model is a large language model that processes variable-length input sequences. Profiling shows that inference time varies significantly: short inputs (100 tokens) take 100ms, while long inputs (2000 tokens) take 2 seconds. The application requires consistent low latency (<500ms) for most requests. You want to reduce the variance in inference time without major changes to the model architecture. Which technique should you apply?

A.Implement dynamic batching that groups requests of similar lengths together before inference
B.Increase the number of replicas to distribute the load evenly
C.Reduce the model size by removing layers or using a smaller version
D.Deploy multiple model endpoints for different length ranges and route requests accordingly
AnswerA

Grouping by length reduces the overhead from padding and stabilizes inference time.

Why this answer

Dynamic batching groups requests of similar input lengths together, which reduces the variance in inference time by ensuring that each batch processes tokens of comparable size. This minimizes the padding overhead and keeps the per-request latency more predictable, directly addressing the high tail latency caused by variable-length sequences without altering the model architecture.

Exam trap

The trap here is that candidates often confuse horizontal scaling (Option B) with latency variance reduction, but scaling replicas does not address the root cause of variable inference time due to sequence length differences.

How to eliminate wrong answers

Option B is wrong because increasing the number of replicas distributes load but does not reduce the variance in inference time for individual requests; it may even increase tail latency due to additional network hops and synchronization overhead. Option C is wrong because reducing model size (e.g., removing layers or using a smaller version) constitutes a major architectural change, which the question explicitly prohibits, and it would degrade model quality. Option D is wrong because deploying multiple endpoints for different length ranges adds operational complexity and does not inherently reduce variance; it merely separates traffic, but each endpoint still processes variable-length inputs with high tail latency unless combined with dynamic batching.

84
MCQeasy

A developer receives the above error when trying to send a request to a model endpoint. What is the most likely reason?

A.The endpoint was deleted by an administrator
B.The network connection to OCI is down
C.The API key is invalid
D.The model is still being deployed
AnswerA

The specific error indicates the endpoint is deleted.

Why this answer

The error message indicates that the model endpoint is not found. In OCI Generative AI, when an administrator deletes an endpoint, subsequent requests to that endpoint's URL return a 404 Not Found error. This is the most likely reason because the endpoint resource no longer exists in the tenancy, and the request cannot be routed to any model.

Exam trap

The trap here is that candidates often confuse a 404 Not Found with network or authentication issues, but the specific error code directly points to the resource (endpoint) not existing, which is most commonly caused by deletion.

How to eliminate wrong answers

Option B is wrong because a network connection issue to OCI would typically result in a timeout or connection refused error, not a 404 Not Found. Option C is wrong because an invalid API key would cause a 401 Unauthorized or 403 Forbidden error, not a 404. Option D is wrong because if the model is still being deployed, the endpoint would return a 503 Service Unavailable or a provisioning status error, not a 404.

85
MCQhard

The job fails with "InvalidParameter: trainingDatasetUri". What should the administrator check first?

A.Whether the compartment has sufficient budget.
B.Whether the model ID supports fine-tuning.
C.Whether the bucket exists and the file is accessible.
D.Whether the parameters JSON is correctly formatted.
AnswerC

Invalid URI errors often stem from missing buckets or incorrect paths.

Why this answer

The error indicates the training dataset URI is invalid. The most common cause is that the bucket does not exist or the file path is incorrect. Model compatibility, budget, and parameter format would yield different errors.

86
MCQeasy

A startup wants to minimize costs when using OCI Generative AI service for a chatbot application that experiences sporadic usage. Which deployment strategy is most cost-effective?

A.Use a pre-built model with a dedicated endpoint
B.Use the serverless on-demand API without dedicated endpoints
C.Provision a dedicated endpoint for low latency
D.Deploy the model on OCI Compute with autoscaling
AnswerB

Pay per request, no idle costs.

Why this answer

Option B is correct because serverless on-demand pricing charges only for usage, ideal for sporadic workloads. Option A is wrong because dedicated endpoints incur hourly costs regardless of usage. Option C is wrong because pre-built models may also have per-request costs but dedicated endpoints are not cost-effective.

Option D is wrong because running models on OCI Compute adds management overhead and costs.

87
MCQeasy

A company has fine-tuned a custom Llama 3 model using OCI Data Science for a chatbot. They now need a production-grade inference endpoint with auto-scaling. Which OCI service should they use?

A.OCI Functions
B.OCI Data Science Model Deployment
C.OCI Generative AI Service
D.OCI Kubernetes Engine (OKE)
AnswerC

Correct: OCI Generative AI Service offers managed endpoints for fine-tuned models with scaling.

Why this answer

Option C is correct because OCI Generative AI Service provides a fully managed, production-grade inference endpoint with built-in auto-scaling for custom models like fine-tuned Llama 3. It abstracts infrastructure management, offers serverless deployment, and integrates with OCI Data Science for model import, making it the ideal choice for a chatbot requiring scalable inference.

Exam trap

Oracle often tests the misconception that OCI Data Science Model Deployment is the correct choice for any custom model deployment, but the trap here is that for production-grade, auto-scaling inference of a fine-tuned LLM, OCI Generative AI Service is the managed, purpose-built service that eliminates the operational complexity of manual scaling and infrastructure management.

How to eliminate wrong answers

Option A is wrong because OCI Functions is a serverless compute service for event-driven, stateless code snippets (functions) with a maximum timeout of 5 minutes, not suitable for hosting large language models like Llama 3 that require persistent GPU resources and long-running inference. Option B is wrong because OCI Data Science Model Deployment is designed for deploying custom models but requires manual configuration of auto-scaling policies and does not natively support the optimized inference infrastructure (e.g., dedicated GPU clusters) that OCI Generative AI Service provides for fine-tuned models. Option D is wrong because OCI Kubernetes Engine (OKE) is a container orchestration service that demands significant operational overhead for managing GPU nodes, scaling, and model serving infrastructure, whereas the question specifies a need for a production-grade inference endpoint with auto-scaling, which OCI Generative AI Service delivers as a managed service.

88
MCQhard

A company has deployed a generative AI endpoint using a custom fine-tuned model. They observe that the endpoint is returning 429 (Too Many Requests) errors during business hours. They need to handle this without losing requests. What should they implement?

A.Increase the endpoint's max tokens limit.
B.Implement client-side retry with exponential backoff.
C.Reduce the number of concurrent requests from the application.
D.Use a dedicated AI cluster with higher capacity.
AnswerB

Retry with backoff is the standard approach to handle 429 errors.

Why this answer

Option B is correct because implementing client-side retry with exponential backoff is the standard approach to handle HTTP 429 (Too Many Requests) errors without losing requests. When the OCI Generative AI endpoint returns a 429 status, the client can automatically retry the request after a delay that increases exponentially, reducing the load on the endpoint while ensuring all requests are eventually processed. This pattern is recommended by OCI and follows best practices for rate-limited APIs, as it allows the system to recover from transient capacity issues without manual intervention.

Exam trap

Oracle often tests the misconception that increasing capacity (Option D) or reducing concurrency (Option C) is the primary solution for rate limiting, when in fact the correct answer is a client-side retry mechanism that preserves request integrity.

How to eliminate wrong answers

Option A is wrong because increasing the max tokens limit does not address rate limiting; it only affects the maximum length of generated text per request, not the number of requests allowed per time window. Option C is wrong because reducing concurrent requests from the application would prevent some requests from being sent, effectively losing them rather than handling them gracefully; the goal is to avoid losing requests, not to drop them. Option D is wrong because using a dedicated AI cluster with higher capacity is a costly and potentially over-provisioned solution that does not address the immediate need to handle existing 429 errors without losing requests; it also does not provide a mechanism to retry failed requests.

89
MCQhard

During inference with OCI Generative AI, you notice that the model is generating repetitive phrases. Which combination of parameters can help reduce repetition?

A.Top_p = 0.1, frequency_penalty = 0.5
B.Top_p = 0.9, frequency_penalty = 0.5
C.Top_p = 0.9, frequency_penalty = 0.0
D.Top_p = 1.0, frequency_penalty = 0.0
AnswerB

This combination applies a gentle penalty on repeated tokens while keeping token selection diverse, effectively reducing repetition.

Why this answer

Option B is correct because a high Top_p value (0.9) allows the model to consider a diverse set of tokens, reducing the chance of getting stuck in repetitive loops, while a positive frequency_penalty (0.5) actively penalizes tokens that have already been generated, discouraging the model from repeating the same phrases. Together, these parameters balance creativity and repetition suppression.

Exam trap

Oracle often tests the misconception that lowering Top_p (making it more restrictive) reduces repetition, when in fact it can worsen repetition by limiting the model to only the most probable tokens, which are often the same ones already used.

How to eliminate wrong answers

Option A is wrong because Top_p = 0.1 is too restrictive, forcing the model to sample from only the top 10% of probable tokens, which actually increases the likelihood of repetitive patterns by narrowing the token pool. Option C is wrong because frequency_penalty = 0.0 means no penalty is applied for repeated tokens, so even with a high Top_p, the model has no disincentive to repeat phrases. Option D is wrong because Top_p = 1.0 (no nucleus sampling) combined with frequency_penalty = 0.0 provides no mechanism to reduce repetition, effectively using raw probability sampling without any diversity-enhancing constraints.

90
MCQhard

An architect is optimizing an LLM application that processes long documents. The model has a 4096 token limit, but the documents are often 8000 tokens. They are using a chunking strategy. However, model responses sometimes miss key information that spans across chunks. Which technique most directly addresses this issue?

A.Randomly select parts of the document to include.
B.Increase the max_tokens parameter for longer outputs.
C.Use overlapping chunks to maintain context continuity.
D.Use a model with a larger context window.
AnswerC

Overlapping ensures that information at chunk boundaries is not lost.

Why this answer

Option C is correct because overlapping chunks ensure that tokens at the boundaries of one chunk are also present at the start of the next, preserving context continuity. This prevents the model from losing information that spans across chunk boundaries, which is a common issue when processing documents longer than the model's 4096-token context window.

Exam trap

Oracle often tests the misconception that increasing output length (max_tokens) can compensate for input context limitations, but the trap here is that max_tokens only affects the response length, not the model's ability to see the full document.

How to eliminate wrong answers

Option A is wrong because randomly selecting parts of the document discards structured information and introduces unpredictability, making it impossible to reliably capture cross-chunk dependencies. Option B is wrong because increasing the max_tokens parameter only controls the length of the generated output, not the input context size; the model still cannot process the full 8000-token document at once. Option D is wrong because while using a model with a larger context window would solve the problem, it is not a chunking strategy and may not be feasible due to cost, latency, or model availability; the question specifically asks for a technique that addresses the issue within the current chunking approach.

91
Multi-Selecthard

Which TWO factors most significantly influence the computational cost of fine-tuning a large language model?

Select 2 answers
A.Batch size
B.Number of model parameters
C.Maximum sequence length
D.Quantization bits
E.Dataset size
AnswersB, C

More parameters increase compute and memory requirements.

Why this answer

Model size (parameters) directly determines FLOPs and memory. Training sequence length affects memory and compute per step. Option C is wrong because batch size affects throughput but not fundamental cost per token.

Option D is wrong because quantization usually reduces cost. Option E is wrong because dataset size affects total steps but per-step cost is dominated by model size and sequence length.

92
MCQmedium

A data scientist is using the OCI Generative AI service to generate text completions. The API calls are returning HTTP 400 errors with the message 'Invalid model parameters'. What is the most likely cause?

A.The API key is expired
B.The request exceeds the rate limit
C.The endpoint URL is incorrect
D.One or more model parameters (e.g., temperature, top_p) are outside the accepted range
AnswerD

Invalid parameters lead to client error 400.

Why this answer

The HTTP 400 error with 'Invalid model parameters' directly indicates that one or more of the parameters sent in the API request (such as temperature, top_p, max_tokens, or stop sequences) are outside the acceptable range defined by the OCI Generative AI service. For example, temperature must be between 0 and 1, and top_p between 0 and 1, and sending a value like 2.0 for temperature would trigger this error. The other options (expired key, rate limit, incorrect endpoint) would produce different HTTP status codes or error messages.

Exam trap

Oracle often tests the distinction between HTTP 4xx error codes and their specific meanings, so the trap here is that candidates may confuse a 400 Bad Request (parameter validation failure) with authentication (401) or rate-limiting (429) errors, especially when the error message is generic.

How to eliminate wrong answers

Option A is wrong because an expired API key would result in an HTTP 401 Unauthorized error, not a 400 Bad Request with 'Invalid model parameters'. Option B is wrong because exceeding the rate limit would return an HTTP 429 Too Many Requests error, not a 400 error. Option C is wrong because an incorrect endpoint URL would typically result in an HTTP 404 Not Found error or a connection failure, not a 400 error with a message about model parameters.

93
MCQhard

Refer to the exhibit. A developer receives this error when trying to get details of a model they know exists. What is the most likely cause?

A.The region is incorrect; the model is in a different region
B.The model ID is misspelled
C.The model is in a different compartment that the developer cannot access
D.The developer does not have the 'inspect' permission on the model
AnswerC

The error explicitly mentions the compartment ID, and if the model resides in another compartment, the user would get this error even if they have permissions on that other compartment but were not targeting it.

Why this answer

Option D is correct. The error message indicates either the model doesn't exist in the specified compartment or the user lacks permission. Since the model exists but the compartment in the command ('ocid1.compartment.oc1..example') may be incorrect, the most likely cause is that the model is in a different compartment the developer cannot access.

Option A is possible but less likely if the ID is copied correctly. Option B is incorrect because the region is specified and the error is not about region mismatch. Option C is plausible but the error specifically mentions compartment and permission.

94
MCQmedium

Which authentication method should be used to securely call the OCI Generative AI API from a microservice running on OCI Compute?

A.OAuth 2.0 client credentials
B.SAML 2.0 assertion
C.Instance principal
D.OCI API signing key
AnswerD

OCI API signing key (a key pair) is the standard method for authenticating API requests.

Why this answer

API keys are the standard way to authenticate service calls, especially for automated scripts and applications.

95
MCQmedium

A multi-turn chatbot needs to maintain context across user queries. The context window is limited. What design should be used?

A.Use a summary of previous turns and add new input.
B.Store context in a separate database and retrieve each time.
C.Reset context after each turn.
D.Keep the entire conversation history in each request.
AnswerA

Correct: Summarization preserves context within limits.

Why this answer

Option A is correct because summarizing previous turns and appending the new input efficiently manages the limited context window of large language models (LLMs). This approach preserves essential conversational context without exceeding token limits, ensuring coherent multi-turn interactions.

Exam trap

Oracle often tests the misconception that storing context externally (Option B) bypasses the context window limit, but the retrieved data must still be injected into the model's input, which is constrained by the same token budget.

How to eliminate wrong answers

Option B is wrong because storing context in a separate database and retrieving it each time introduces latency and does not inherently solve the context window limitation; the retrieved context still needs to fit into the model's input. Option C is wrong because resetting context after each turn breaks conversational continuity, making the chatbot unable to reference prior exchanges. Option D is wrong because keeping the entire conversation history in each request quickly exceeds the context window's token limit, causing truncation or errors.

96
MCQeasy

A company is building a RAG application using OCI Generative AI and wants to store embeddings for document retrieval. Which OCI service is most appropriate for storing and querying vector embeddings?

A.OCI MySQL Database
B.OCI Data Flow
C.OCI Search with OpenSearch
D.OCI Object Storage
AnswerC

OpenSearch supports k-nearest neighbor (k-NN) search and is the recommended vector store in OCI.

Why this answer

OCI Search with OpenSearch provides native vector search capabilities (k-NN) suitable for storing and querying embeddings. Object Storage is for blob data, MySQL is relational, and Data Flow is for big data processing.

97
MCQmedium

Refer to the exhibit. A developer runs the command and receives the error. What is the issue?

A.The max-tokens value exceeds the allowed range.
B.The message is too short.
C.The chat-id is invalid.
D.The endpoint is incorrect.
AnswerA

The error explicitly states the valid range.

Why this answer

The max-tokens parameter is set to 600, which exceeds the allowed range of 1 to 500.

98
MCQeasy

A company wants to use OCI Generative AI to summarize customer reviews. Which model parameter should be adjusted to control the creativity of the summary?

A.Temperature
B.Frequency penalty
C.Top-k
D.Presence penalty
AnswerA

Temperature directly controls randomness and creativity.

Why this answer

Temperature controls the randomness of token selection in the model's output distribution. A higher temperature (e.g., 0.9) makes the summary more creative and diverse, while a lower temperature (e.g., 0.1) makes it more deterministic and focused. For summarizing customer reviews, adjusting temperature directly influences how novel or conservative the generated text will be.

Exam trap

Oracle often tests the distinction between parameters that control randomness (temperature) versus those that control repetition (frequency/presence penalties) or sampling pool size (top-k), leading candidates to confuse diversity with creativity.

How to eliminate wrong answers

Option B (Frequency penalty) is wrong because it reduces the likelihood of repeating the same tokens or phrases, which controls redundancy rather than creativity. Option C (Top-k) is wrong because it limits the sampling pool to the k most likely next tokens, which affects diversity but not the overall creativity or randomness of the output. Option D (Presence penalty) is wrong because it penalizes tokens that have already appeared in the text, encouraging the model to introduce new topics, but this does not directly control the creativity of the summary.

99
MCQmedium

A manufacturing company uses OCI OpenSearch to build a RAG application that retrieves procedural documents. After deployment, queries often return outdated procedures even though the vector index was refreshed. What is the most likely cause?

A.The embedding model was fine-tuned on outdated data.
B.The full-text search index is not synchronized with the vector index after updates.
C.The BM25 scoring algorithm prioritizes older documents due to term frequency.
D.The chunk overlap percentage is too high, causing duplicate context.
AnswerB

Outdated procedures remain in the text index if not reindexed.

Why this answer

Option B is correct because in a RAG application using OCI OpenSearch, the vector index and full-text search index are separate. When procedural documents are updated, the full-text search index may reflect changes immediately, but the vector index requires explicit re-indexing or synchronization to update embeddings. If the vector index is not refreshed after updates, queries can still retrieve outdated vector representations, leading to outdated results despite the index being refreshed.

Exam trap

The trap here is that candidates may assume 'refreshing the vector index' automatically synchronizes it with document updates, but in practice, vector indexes require explicit re-embedding and re-indexing, which is often overlooked in RAG architectures.

How to eliminate wrong answers

Option A is wrong because fine-tuning the embedding model on outdated data would affect all embeddings, not just those from refreshed documents, and the scenario specifies the vector index was refreshed, implying embeddings were regenerated. Option C is wrong because BM25 scoring is used for full-text search, not vector search; it prioritizes documents based on term frequency and inverse document frequency, not age, and would not cause outdated procedures to be returned if the vector index is correctly synchronized. Option D is wrong because chunk overlap percentage affects context continuity and duplication, not the freshness of retrieved data; high overlap might cause duplicate chunks but not outdated procedures.

100
MCQeasy

Refer to the exhibit. A user in group GenAIUsers tries to use the `oci generative-ai model chat` command but gets 'not authorized'. Why?

A.The policy is not active yet.
B.The command requires managing the model.
C.The statement should be 'use' instead of 'read'.
D.The group is not in the correct compartment.
AnswerC

The verb 'use' is needed to invoke the model.

Why this answer

The policy grants 'read' permission, but the chat command requires 'use' permission.

101
MCQmedium

A company is using OCI Generative AI service with a dedicated AI cluster for text generation. They notice that the latency is higher than expected. The cluster is in the Ashburn region, and users are distributed globally. What is the most effective way to reduce latency?

A.Enable the OCI Generative AI inference optimizer
B.Deploy dedicated AI clusters in regions closer to the users
C.Increase the number of nodes in the dedicated AI cluster
D.Use a content delivery network (CDN) to cache responses
AnswerB

Geographic proximity reduces network round-trip time.

Why this answer

Latency for globally distributed users is primarily driven by network distance and the speed of light. Deploying dedicated AI clusters in regions closer to the users reduces the physical distance data must travel, directly minimizing network round-trip time (RTT). This is the most effective architectural change because OCI's Generative AI service processes each request on the dedicated cluster and cannot bypass geographic latency through software optimizations alone.

Exam trap

The trap here is that candidates often confuse throughput improvements (scaling nodes or using an optimizer) with latency reduction, failing to recognize that geographic proximity is the only way to address network round-trip time for globally distributed users.

How to eliminate wrong answers

Option A is wrong because the OCI Generative AI inference optimizer is a software-level tuning feature that improves throughput and model efficiency, but it does not reduce network latency caused by geographic distance. Option C is wrong because increasing the number of nodes in the dedicated AI cluster improves parallel processing capacity and throughput, but does not reduce the per-request network latency for users far from the Ashburn region. Option D is wrong because a CDN caches static content (e.g., images, HTML), but Generative AI text responses are dynamic, unique per request, and cannot be cached to serve different users.

102
Multi-Selecthard

A company is deploying a RAG application for legal document analysis using OCI. Which three best practices should be followed to mitigate hallucinations? (Choose 3.)

Select 3 answers
A.Implement a fallback to abstain from answering if confidence is low.
B.Use a low temperature setting for the generation model.
C.Provide the source document citations in the prompt.
D.Include a verification step via a secondary model.
E.Increase the number of retrieved chunks to 10.
AnswersA, B, C

Avoids generating incorrect answers when retrieval is uncertain.

Why this answer

Options A, B, and E are correct. Providing source citations keeps the model grounded, low temperature reduces randomness, and a fallback to abstain avoids false answers. Option C is possible but adds complexity and latency.

Option D increases noise and may increase hallucinations.

103
MCQhard

A developer is using the OCI Generative AI SDK to call a custom fine-tuned model. They get an HTTP 404 error. What is the most likely issue?

A.The API key is invalid
B.The compartment OCID is missing in the request
C.The region is not specified
D.The model endpoint ID is incorrect
AnswerD

An incorrect endpoint ID returns 404 Not Found.

Why this answer

Option D is correct: The model endpoint ID must be correct. Option A (compartment) would cause 403. Option B (Authentication) would cause 401.

Option C (region) would cause 404 if wrong, but endpoint ID is more specific.

104
MCQeasy

An OCI administrator wants to limit which users can invoke a specific LLM endpoint. Which resource type should be used?

A.OCI Audit
B.OCI Vault
C.Network security groups
D.IAM policies
AnswerD

IAM policies define who can perform actions on resources.

Why this answer

Option A is correct because IAM policies control access to OCI resources, including Generative AI endpoints. Option B (Network security groups) control network traffic, not user access. Option C (Vault) manages secrets.

Option D (Audit) logs events but does not enforce access.

105
Multi-Selecteasy

A company is deploying a generative AI model for a real-time inference API. To ensure high availability and cost efficiency under variable load, which two configurations should they implement? (Choose two.)

Select 2 answers
A.Use a single replica with a larger GPU to handle all traffic
B.Deploy the model in a single availability domain to simplify management
C.Disable connection draining on the load balancer
D.Set the number of model deployment replicas to at least 2
E.Enable autoscaling based on average CPU utilization
AnswersD, E

Multiple replicas provide redundancy and high availability.

Why this answer

Option D is correct because deploying at least two replicas ensures high availability by eliminating a single point of failure; if one replica fails, the other can still serve inference requests. This is a standard best practice for production workloads on OCI, where model deployment replicas are distributed across fault domains to maintain service continuity.

Exam trap

Oracle often tests the misconception that a single, powerful GPU instance is sufficient for high availability, but the exam expects you to recognize that redundancy through multiple replicas and autoscaling are required for both availability and cost efficiency under variable load.

106
Multi-Selectmedium

Which TWO factors are most important when deciding between on-demand and dedicated AI clusters for OCI GenAI?

Select 2 answers
A.Fine-tuning capability
B.Model size
C.Data residency
D.Number of concurrent requests
E.Latency requirements
AnswersD, E

Dedicated clusters are better for high concurrency due to reserved capacity.

Why this answer

The number of concurrent requests (D) is critical because dedicated AI clusters provide guaranteed throughput and predictable performance for high-volume workloads, while on-demand clusters may throttle or queue requests under heavy load. Latency requirements (E) are equally important because dedicated clusters offer consistent low-latency inference by avoiding resource contention, whereas on-demand clusters can introduce variable latency due to shared infrastructure. Together, these factors directly determine whether a workload needs the isolation and guaranteed resources of a dedicated cluster or can tolerate the elasticity and potential variability of on-demand provisioning.

Exam trap

Oracle often tests the misconception that fine-tuning capability or model size are primary differentiators between on-demand and dedicated clusters, when in fact both cluster types support these features, and the real decision hinges on concurrency and latency guarantees.

107
MCQeasy

A small business is building an internal Q&A bot using OCI Generative AI with RAG. They have indexed their product manuals into OCI OpenSearch using a precomputed embedding model. When they test queries, the bot often returns answers that are only partially relevant, and sometimes it cannot find answers for questions that are clearly present in the manuals. The developers suspect the chunking strategy is suboptimal. Currently, they use a fixed chunk size of 512 tokens with no overlap. What should they do to improve retrieval relevance?

A.Increase the chunk size to 1024 tokens to include more context.
B.Add a 20% token overlap between consecutive chunks.
C.Reduce the chunk size to 256 tokens to increase precision.
D.Switch to a sentence-based chunking strategy with no overlap.
AnswerB

Overlap ensures that context spanning chunk boundaries is preserved.

Why this answer

Option A is correct because adding overlap helps capture context across chunk boundaries, which is a common cause for missing information. Option B may reduce precision and cause noise. Option C without overlap still misses context.

Option D may increase missed context due to smaller chunks.

108
MCQeasy

A retail company uses OCI Generative AI to generate product descriptions. They observe the model occasionally produces biased content. Which technique should be applied to reduce bias in model outputs?

A.Increase the max_tokens parameter.
B.Apply prompt engineering with explicit instructions to avoid bias.
C.Reduce the model's inference temperature to 0.
D.Use a different random seed for each request.
AnswerB

Prompt engineering is the recommended approach to guide the model towards desired behavior and reduce bias.

Why this answer

Option B is correct because prompt engineering allows you to explicitly instruct the model to avoid biased content, such as by including directives like 'Ensure the description is neutral and unbiased' in the system or user prompt. This technique directly influences the model's output generation without altering its underlying parameters, making it a targeted and effective approach for reducing bias in OCI Generative AI models.

Exam trap

The trap here is that candidates often confuse parameter tuning (like temperature or max_tokens) with content-level controls, assuming that reducing randomness or increasing output length can mitigate bias, when in fact bias is a training data issue that requires explicit instruction via prompt engineering to override.

How to eliminate wrong answers

Option A is wrong because increasing max_tokens only extends the maximum length of the generated text, which does not address the content's bias—it may even allow more biased statements to be produced. Option C is wrong because reducing inference temperature to 0 makes the model deterministic and less creative, but it does not inherently remove bias; biased patterns in the training data can still be reproduced with high confidence. Option D is wrong because using a different random seed for each request only affects the randomness of sampling (when temperature > 0), not the underlying bias in the model's learned associations or outputs.

109
Multi-Selecthard

Which three statements about transformer architecture are correct? (Choose three.)

Select 3 answers
A.The softmax function is used in the attention mechanism to normalize attention scores.
B.The feed-forward network applies a different set of weights for each token position.
C.Positional encodings are necessary because the model is not recurrent.
D.The self-attention layer allows the model to weigh the importance of different tokens.
E.The encoder-decoder structure is used in GPT models.
AnswersA, C, D

Softmax converts attention scores into probabilities.

Why this answer

Option A is correct because the softmax function is applied to the raw attention scores (the dot products between queries and keys) to convert them into a probability distribution that sums to 1. This normalization allows the model to assign a relative weight to each token in the sequence, ensuring that the weighted sum of values is stable and interpretable.

Exam trap

Oracle often tests the distinction between encoder-decoder and decoder-only architectures, trapping candidates who assume all transformer-based models follow the original encoder-decoder design, when in fact GPT and other autoregressive models use only the decoder stack.

110
MCQeasy

A company requires a generative AI service to automatically summarize customer support transcripts. Which OCI Generative AI model is most suitable for this task?

A.Llama 3 70B
B.Cohere Embed
C.Cohere Command
D.Fine-tuned Llama 2
AnswerC

Cohere Command is designed for text generation, including summarization, and is a direct choice for this scenario.

Why this answer

Cohere Command is a large language model specifically designed for text generation tasks such as summarization, making it the most suitable choice for automatically summarizing customer support transcripts. Unlike embedding models or base Llama variants, Command is optimized for instruction-following and generating coherent, concise summaries from conversational data.

Exam trap

Oracle often tests the distinction between embedding models (Cohere Embed) and generative models (Cohere Command), leading candidates to mistakenly choose an embedding model for a text generation task like summarization.

How to eliminate wrong answers

Option A is wrong because Llama 3 70B is a general-purpose generative model that, while capable of summarization, is not specifically optimized for the summarization task in OCI Generative AI service; Cohere Command is the designated model for text generation and summarization within OCI. Option B is wrong because Cohere Embed is a text embedding model designed for semantic search and similarity tasks, not for generating summaries or any text output. Option D is wrong because Fine-tuned Llama 2, though customizable, is not a pre-built model offered by OCI Generative AI for summarization; OCI provides Cohere Command as the primary ready-to-use model for such generative tasks.

111
MCQeasy

Which OCI service provides pre-trained models for custom text classification without requiring fine-tuning?

A.OCI Generative AI
B.OCI AI Language
C.OCI Data Science
D.OCI Vision
AnswerB

OCI AI Language provides pre-trained models for text classification without fine-tuning.

Why this answer

B is correct because OCI AI Language provides pre-trained models that can perform custom text classification out-of-the-box without requiring fine-tuning. It offers built-in models for common NLP tasks like sentiment analysis, entity extraction, and text classification, allowing users to classify text into custom categories defined by their own labels without additional training.

Exam trap

Oracle often tests the distinction between pre-trained models that require no fine-tuning versus platforms that require custom model training, leading candidates to mistakenly choose OCI Data Science or OCI Generative AI when the question specifically asks for a service that provides pre-trained models for custom text classification without fine-tuning.

How to eliminate wrong answers

Option A is wrong because OCI Generative AI focuses on generating text, images, and code using large language models, not on pre-trained models for custom text classification without fine-tuning. Option C is wrong because OCI Data Science is a platform for building, training, and deploying custom machine learning models, requiring users to fine-tune or train models from scratch rather than providing pre-trained classification models. Option D is wrong because OCI Vision is designed for image analysis tasks such as object detection and image classification, not for text classification.

112
MCQhard

An organization is deploying multiple generative AI models on a shared dedicated AI cluster. They need to isolate resource usage for each model to avoid interference. Which strategy is recommended?

A.Use separate fine-tuning jobs for each model
B.Configure multiple virtual clusters within the dedicated AI cluster using compartment quotas
C.Use OCI Resource Manager to allocate resources
D.Deploy each model on its own dedicated AI cluster
AnswerD

Correct: Each cluster has dedicated hardware, ensuring no resource contention.

Why this answer

Option D is correct because deploying each model on its own dedicated AI cluster provides complete hardware-level isolation, ensuring that resource usage (e.g., GPU memory, compute cycles) for one model does not interfere with another. In OCI Generative AI, dedicated AI clusters are single-tenant instances, so each model gets exclusive access to its allocated infrastructure, eliminating contention. This is the recommended strategy for strict isolation in shared environments.

Exam trap

Oracle often tests the misconception that logical isolation (e.g., compartment quotas or virtual clusters) is sufficient for performance isolation, when in fact hardware-level separation is required to prevent interference in shared AI clusters.

How to eliminate wrong answers

Option A is wrong because separate fine-tuning jobs do not isolate runtime inference resources; they only isolate training workloads, and models still share the same cluster during inference. Option B is wrong because virtual clusters with compartment quotas provide logical isolation via resource limits but do not prevent resource contention at the hardware level (e.g., GPU memory oversubscription). Option C is wrong because OCI Resource Manager is an infrastructure-as-code tool for provisioning resources, not a mechanism for runtime resource isolation between models.

113
Multi-Selecthard

Which THREE factors should be considered when choosing a model for a summarization task using OCI Generative AI?

Select 3 answers
A.Inference endpoint location.
B.Number of parameters.
C.Model training data cut-off date.
D.Context window size.
E.Supported languages.
AnswersB, D, E

More parameters generally mean better performance but higher cost.

Why this answer

Options A, C, and D are correct. Context window size (A) determines how much text the model can process at once. Number of parameters (C) affects capability and cost.

Supported languages (D) ensures the model can handle the input language. Option B is wrong because model training data cut-off date is less critical for summarization. Option E is wrong because inference endpoint location is about deployment, not model selection.

114
MCQmedium

A financial firm deploys a RAG application using OCI OpenSearch. They observe that the LLM sometimes generates incorrect answers that are not supported by the retrieved documents. Which technique directly addresses this issue?

A.Use a more detailed system prompt instructing the model to not make up information.
B.Increase the temperature parameter of the LLM to reduce creativity.
C.Implement a post-generation verification step that checks if the answer is grounded in the retrieved chunks.
D.Increase the number of retrieved documents to provide more context.
AnswerC

Directly verifies faithfulness.

Why this answer

Option C is correct because it directly addresses the problem of hallucination by verifying that the LLM's output is factually supported by the retrieved documents. In a RAG pipeline, the LLM may still generate unsupported content even with good retrieval; a post-generation grounding check explicitly validates each claim against the source chunks, ensuring answer fidelity.

Exam trap

Oracle often tests the misconception that prompt engineering or parameter tuning alone can solve hallucination in RAG, when in fact a dedicated verification step is required to enforce factual grounding.

How to eliminate wrong answers

Option A is wrong because a more detailed system prompt instructing the model not to make up information is a soft constraint that LLMs can easily ignore, especially when the model is confident in its fabricated answer; it does not provide a deterministic mechanism to prevent hallucination. Option B is wrong because increasing the temperature parameter actually increases randomness and creativity, making hallucinations more likely; reducing temperature (closer to 0) would make outputs more deterministic and less creative, but it still does not guarantee grounding in retrieved documents. Option D is wrong because increasing the number of retrieved documents can introduce irrelevant or conflicting context, potentially confusing the LLM and increasing the chance of unsupported answers; it does not enforce that the final answer is actually supported by any specific chunk.

115
MCQhard

Refer to the exhibit. The API Gateway fails to invoke the Generative AI service. What is the most likely missing configuration?

A.The API Gateway does not have an internet gateway.
B.The JWT token is expired.
C.The Generative AI model is not deployed.
D.The API Gateway is not in the same VCN as the service's private endpoint.
AnswerD

Private endpoints require the caller to be in the same VCN or have a service gateway.

Why this answer

Since the service endpoint is private, the API Gateway must be in the same VCN to have network connectivity.

116
MCQhard

A real-time customer support chatbot uses RAG with OCI Generative AI. The average response time is 5 seconds, which is too slow. The team identifies the vector search as the bottleneck. Which optimization would most reduce latency?

A.Use approximate nearest neighbor (ANN) search with a lower recall setting
B.Increase the number of retrieved documents to 10
C.Move the vector store to a different region
D.Switch to a larger embedding model for better accuracy
AnswerA

ANN speeds up search by sacrificing some recall, which can be mitigated by re-ranking.

Why this answer

Using approximate nearest neighbor (ANN) search with a lower recall setting reduces search time by trading off some accuracy for speed, which is often acceptable in real-time applications.

117
MCQeasy

A team wants to evaluate an LLM's performance on a text classification task. Which metric is most appropriate for a balanced dataset?

A.BLEU score
B.Perplexity
C.Accuracy
D.ROUGE score
AnswerC

Accuracy directly measures correct predictions, appropriate for balanced data.

Why this answer

Accuracy is the most appropriate metric for evaluating an LLM on a text classification task with a balanced dataset because it directly measures the proportion of correctly predicted labels out of total predictions. For balanced classes, accuracy provides a reliable and intuitive performance indicator without the distortion caused by class imbalance.

Exam trap

Oracle often tests the distinction between metrics for generation tasks (BLEU, ROUGE, perplexity) versus classification tasks (accuracy, F1-score), and the trap here is assuming a language model metric like perplexity applies to any NLP task, when it is specific to probabilistic language modeling.

How to eliminate wrong answers

Option A is wrong because BLEU score is designed for evaluating machine translation quality by comparing n-gram overlap between generated and reference text, not for classification tasks. Option B is wrong because perplexity measures how well a language model predicts a sequence of tokens, typically used for language modeling or generation, not for discrete label classification. Option D is wrong because ROUGE score is used for summarization evaluation by measuring recall-oriented overlap of n-grams, not for classification accuracy.

118
MCQeasy

A developer notices that an LLM's responses are too verbose. Which parameter adjustment would most effectively reduce verbosity?

A.Increase frequency_penalty
B.Increase top_p
C.Decrease max_tokens
D.Decrease temperature
AnswerC

Max_tokens directly controls the maximum output length, reducing verbosity.

Why this answer

Decreasing max_tokens directly limits the maximum length of the LLM's response, which is the most straightforward way to reduce verbosity. This parameter caps the number of tokens the model can generate, forcing it to produce shorter completions. Other parameters like frequency_penalty, top_p, and temperature influence the style, diversity, or randomness of the output but do not directly control response length.

Exam trap

The trap here is that candidates confuse parameters that affect output style (temperature, top_p, frequency_penalty) with the one that directly controls output length (max_tokens), leading them to choose a parameter that changes how the model says something rather than how much it says.

How to eliminate wrong answers

Option A is wrong because increasing frequency_penalty reduces repetition by penalizing tokens that have already appeared, which can actually make responses more varied and potentially longer as the model avoids reusing words. Option B is wrong because increasing top_p (nucleus sampling) considers a larger set of probable tokens, which can increase diversity and often leads to longer, more exploratory responses. Option D is wrong because decreasing temperature makes the model more deterministic and focused on high-probability tokens, but it does not cap the length of the response; the model can still generate verbose text if it deems it likely.

119
MCQmedium

A company uses OCI Generative AI service for customer support summarization. They notice the model frequently misses key details and generates hallucinations. What should they do first?

A.Adjust the prompt to be more specific and include few-shot examples.
B.Increase the temperature parameter.
C.Use a different base model.
D.Increase the max tokens.
AnswerA

Clear prompts with examples guide the model to produce accurate, relevant summaries.

Why this answer

Option C is correct because improving prompt engineering with specific instructions and few-shot examples reduces hallucinations and improves accuracy. Option A is wrong because increasing temperature increases randomness, making hallucinations worse. Option B is wrong because switching models is a more drastic step that may not address the root cause.

Option D is wrong because increasing max tokens does not improve accuracy.

120
MCQeasy

A user gets a 'Model not found' error when calling an OCI Gen AI endpoint. What is the most likely cause?

A.The model is not available in the region
B.The request format is wrong
C.The API key is invalid
D.The endpoint is not deployed
AnswerD

Correct: Model not found usually means the endpoint hasn't been created or deployed.

Why this answer

The 'Model not found' error in OCI Generative AI typically occurs when the model endpoint has not been deployed or activated in the user's tenancy. Even if the model is available in the region and the request format is correct, the endpoint must be explicitly deployed (e.g., via the OCI Console, CLI, or SDK) before inference calls can succeed. This is a common prerequisite for using dedicated AI endpoints in OCI.

Exam trap

The trap here is that candidates confuse 'model not found' with model unavailability in the region, but Cisco tests the specific distinction between a model being listed in the catalog versus having a deployed endpoint ready for inference.

How to eliminate wrong answers

Option A is wrong because the model being unavailable in the region would typically result in a 'Model not supported in this region' or 'Service error' rather than a 'Model not found' error; the error message specifically indicates the endpoint is missing, not the model's regional availability. Option B is wrong because an incorrect request format (e.g., malformed JSON, missing required fields) would produce a 400 Bad Request or validation error, not a 'Model not found' error. Option C is wrong because an invalid API key would result in a 401 Unauthorized or 403 Forbidden error, not a 'Model not found' error, as authentication is checked before model resolution.

121
MCQmedium

An enterprise is deploying a chat application using a large language model. Users report that the model sometimes generates toxic or biased responses. Which best practice should be applied to mitigate this issue?

A.Use few-shot prompting with examples of toxic responses so the model learns to avoid them.
B.Increase the max_tokens parameter to allow the model more context to correct itself.
C.Disable the temperature parameter to make outputs deterministic.
D.Implement a content filtering layer using a safety classifier to detect and block toxic outputs.
AnswerD

Safety classifiers directly filter toxic content.

Why this answer

Option D is correct because implementing a content filtering layer using a safety classifier is a proven best practice to detect and block toxic or biased outputs in real-time. This approach acts as a guardrail, intercepting harmful responses before they reach users, and is independent of the model's internal parameters or training data.

Exam trap

Oracle often tests the misconception that adjusting model parameters (like temperature or max_tokens) can fix safety issues, when in reality, safety requires external guardrails like content filters.

How to eliminate wrong answers

Option A is wrong because few-shot prompting with examples of toxic responses would not teach the model to avoid them; instead, it could inadvertently reinforce undesirable patterns, as the model may learn to mimic the toxic examples rather than suppress them. Option B is wrong because increasing the max_tokens parameter does not help the model correct its own toxicity; it simply allows longer outputs, which could include more harmful content. Option C is wrong because disabling the temperature parameter (setting it to 0) makes outputs deterministic but does not address the underlying issue of toxic or biased generation; the model can still produce harmful responses consistently.

122
MCQmedium

A company is using OCI Generative AI for customer support chatbots. They notice that responses sometimes include offensive content. Which built-in safety feature should they configure?

A.Content moderation filters
B.Configure stop sequences
C.Set a maximum token limit
D.Adjust the temperature parameter
AnswerA

These filters detect and block offensive or harmful content in inputs and outputs.

Why this answer

Option A is correct: Content moderation filters block harmful content. Option B (temperature) controls randomness, not safety. Option C (max tokens) limits length.

Option D (stop sequences) stops generation at specific tokens.

123
MCQeasy

A startup wants to quickly prototype a chatbot using OCI Generative AI service. They have no prior experience with OCI. They want to test different models and parameters without writing any code and within a few minutes. They also want to save prompts and compare results. Which approach should they use?

A.Create a dedicated AI cluster and use the OCI SDK.
B.Use the OCI Generative AI Playground.
C.Use OCI Data Science Notebooks with the GenAI SDK.
D.Use OCI Functions to invoke the GenAI API.
AnswerB

Playground offers immediate testing with no code and built-in history.

Why this answer

Option D is correct because the OCI Generative AI Playground provides a no-code interface for testing models, adjusting parameters, and saving prompt history. The other options require setup or code.

124
Multi-Selectmedium

An organization is implementing a RAG system using OCI GenAI. Which two are best practices for optimizing retrieval and generation? (Choose two.)

Select 2 answers
A.Use the same embedding model for both retrieval and generation
B.Store all documents in a single large index
C.Use semantic search (embeddings) for document retrieval
D.Implement caching for frequently asked questions
E.Disable summarization to save inference costs
AnswersC, D

Semantic search captures meaning beyond keywords, improving relevance.

Why this answer

Option C is correct because semantic search using embeddings retrieves documents based on meaning rather than keyword matching, which significantly improves the relevance of context provided to the LLM in a RAG system. This aligns with best practices for OCI GenAI, where embedding models convert text into vector representations for similarity search in a vector database.

Exam trap

Oracle often tests the misconception that retrieval and generation should share the same model, but in practice they are optimized separately, and candidates may confuse 'embedding model' with 'generation model' in a RAG context.

125
MCQmedium

Refer to the exhibit. What is a potential issue with this OCI OpenSearch index template configuration?

A.The ef_construction parameter is set too low
B.The space_type in settings (l2) differs from the method's space_type (cosinesimil)
C.Number of replicas is 0, which provides no redundancy
D.The dimension 1024 is too large for the knn_vector type
AnswerB

This mismatch can cause inconsistency in how distances are computed during indexing and search.

Why this answer

The space_type defined at the index settings (l2) conflicts with the space_type defined in the method mapping (cosinesimil). This mismatch can lead to incorrect distance calculations and poor retrieval results.

126
Multi-Selecthard

Which TWO best practices should be followed when designing a RAG application using OCI GenAI? (Select two.)

Select 2 answers
A.Batch all user queries to minimize costs
B.Store raw documents in the vector database for easy updates
C.Use OCI Dedicated AI Cluster for inference to ensure data privacy
D.Store embedding API keys in OCI Vault and rotate frequently
E.Use dedicated AI endpoints for sensitive workloads
AnswersC, E

Keeps data within tenancy.

Why this answer

Using dedicated AI endpoints (A) ensures isolation and performance. Monitoring with Vault (B) is good for secrets, not logs. Using inference on dedicated AI clusters (D) is a best practice.

Batching queries (C) is fine but not top. Storing raw documents (E) is unnecessary.

127
MCQhard

A team is deploying a RAG system that uses OCI Generative AI to answer questions about internal HR policies. The system must comply with data residency requirements: all data processing must stay within a specific OCI region. The team uses OCI Data Science for orchestration. Which architecture BEST meets the data residency requirement?

A.Deploy the generative AI model endpoints within the same OCI region as the data and compute.
B.Use OCI Generative AI endpoints in a different region but store data in the required region.
C.Use an external third-party LLM endpoint that guarantees data residency.
D.Store embeddings in a different region but run inference in the required region.
AnswerA

All components remain in the specified region, ensuring compliance.

Why this answer

Option A is correct because deploying the generative AI model endpoints within the same OCI region as the data and compute ensures that all data processing—including inference, embedding generation, and vector search—occurs entirely within the required region, satisfying data residency requirements. OCI Generative AI endpoints are region-specific and do not automatically route requests to other regions, so co-locating all components avoids any cross-region data transfer.

Exam trap

Oracle often tests the misconception that data residency only applies to storage, not to processing—candidates may think storing data in the required region is sufficient, but the trap is that inference and embedding generation also count as data processing and must occur in the same region.

How to eliminate wrong answers

Option B is wrong because using OCI Generative AI endpoints in a different region while storing data in the required region would cause inference requests and model processing to occur outside the required region, violating data residency requirements. Option C is wrong because an external third-party LLM endpoint that guarantees data residency still requires data to leave the OCI region to reach the external service, which breaks the requirement that all data processing must stay within a specific OCI region. Option D is wrong because storing embeddings in a different region while running inference in the required region means the embedding data (derived from HR policies) resides outside the required region, failing the data residency constraint.

128
MCQmedium

A developer is reviewing the model card for an LLM on OCI Generative AI and notices it was trained on a dataset that is predominantly English. The application will serve users in multiple languages. What is the most likely limitation of using this model without additional steps?

A.The embedding vectors will be less accurate for any language.
B.The model may produce lower quality responses in non-English languages.
C.The model will hallucinate facts more frequently.
D.The context window size will be effectively reduced.
AnswerB

Training data imbalance leads to weaker performance on underrepresented languages.

Why this answer

Option B is correct because a model trained mainly on English may perform poorly on non-English inputs due to biased language representations. Option A (always hallucinating) is not specific to language. Option C (token limit reduced) is unrelated.

Option D (embedding quality drop) is a possibility but the primary limitation is language coverage.

129
MCQeasy

A developer wants to integrate OCI GenAI into a Java application. Which SDK should they use?

A.OCI JavaScript SDK.
B.OCI Python SDK.
C.OCI Java SDK.
D.OCI CLI.
AnswerC

The Java SDK is designed for Java applications.

Why this answer

Option B is correct because the OCI Java SDK provides native Java support for calling OCI services including GenAI.

130
Multi-Selecthard

A company is designing a generative AI solution on OCI that must comply with data privacy regulations. Which three best practices should they follow? (Choose three.)

Select 3 answers
A.Enable audit logging for all inference requests
B.Allocate a dedicated compartment for generative AI resources to apply specific IAM policies
C.Use dedicated AI clusters with private endpoints to keep data within the OCI network
D.Store all inference inputs and outputs in a public bucket for transparency
E.Use customer-managed keys (CMK) for encrypting model artifacts and inference data
AnswersA, C, E

Audit logs help demonstrate compliance with data privacy regulations.

Why this answer

Option A is correct because enabling audit logging for all inference requests is a fundamental data privacy best practice. It provides an immutable record of who accessed the generative AI service, what data was sent, and when, which is essential for compliance audits and detecting unauthorized access. OCI Audit service captures these events automatically when configured, ensuring traceability without storing the actual inference payloads.

Exam trap

The trap here is that candidates may confuse general resource management best practices (like compartments) with specific data privacy compliance requirements, or mistakenly think public buckets are acceptable for transparency when they actually create a severe data exposure risk.

131
Multi-Selectmedium

Which THREE are valid ways to interact with OCI Generative AI?

Select 3 answers
A.OCI Mobile App.
B.OCI Data Science Notebooks.
C.OCI REST API.
D.OCI CLI.
E.OCI Console Playground.
AnswersC, D, E

REST API is the underlying interface for all interactions.

Why this answer

Options A, B, and C are correct. The OCI Console Playground, CLI, and REST API are all direct interfaces. The others are not standard ways to invoke the service.

132
MCQmedium

During load testing, the RAG application's response time increases significantly. The vector search is performed on millions of vectors. Which optimization would MOST reduce latency?

A.Increase the number of replicas in OpenSearch
B.Shard the index by document type
C.Use approximate nearest neighbor (ANN) search instead of exact
D.Use a smaller embedding model
AnswerC

ANN search is orders of magnitude faster than exact search for large datasets.

Why this answer

Approximate Nearest Neighbor (ANN) search uses indexes like HNSW to trade a small amount of accuracy for large speed gains, drastically reducing query time. Increasing replicas helps throughput but not per-query latency. Sharding organizes data but does not inherently reduce latency.

A smaller model may reduce computation but also harms quality.

133
Multi-Selectmedium

A data scientist is evaluating different models for a summarization task. Which two metrics are commonly used to evaluate the quality of generated summaries?

Select 2 answers
A.F1 score
B.Mean Average Precision
C.ROUGE
D.Perplexity
E.BLEU
AnswersC, E

ROUGE measures overlap of n-grams between generated and reference summaries, commonly used for summarization.

Why this answer

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a standard metric for summarization that measures the overlap of n-grams, word sequences, or word pairs between the generated summary and reference summaries. It focuses on recall, making it well-suited for evaluating how well the generated summary captures the key content from the reference.

Exam trap

Oracle often tests the distinction between metrics used for summarization (ROUGE) versus translation (BLEU) versus language modeling (Perplexity), and candidates may confuse BLEU as a summarization metric because it also evaluates text generation, but it is primarily designed for translation tasks.

134
MCQhard

A team has deployed a generative AI model using OCI Data Science model deployment. The endpoint is behind a load balancer. Users report that after 5 minutes of inactivity, the first request takes over 30 seconds to respond, while subsequent requests are fast. What is the most likely cause and solution?

A.The model deployment has an idle timeout that scales down to zero; configure a minimum number of instances or use a warm-up request
B.The load balancer is scaling based on CPU utilization; increase the CPU threshold
C.The VCN has a network latency issue; use a different availability domain
D.The inference code has a lazy initialization; pre-load the model in the deployment script
AnswerA

Idle timeout causes cold start; setting min replicas or health check warm-up solves it.

Why this answer

The described behavior—first request after 5 minutes of inactivity taking over 30 seconds, with subsequent requests fast—is a classic symptom of an idle timeout that scales the model deployment to zero instances. OCI Data Science model deployments support auto-scaling with an idle timeout (default 5 minutes) that can reduce the number of instances to zero when no requests are received. When a new request arrives, it must wait for a new instance to spin up, causing the delay.

The solution is to configure a minimum number of instances (e.g., 1) to keep the model warm, or use a warm-up request to prevent the idle timeout from triggering.

Exam trap

Oracle often tests the distinction between infrastructure-level idle timeouts (which cause cold starts after inactivity) and application-level lazy initialization (which causes a one-time delay after deployment), and candidates may confuse the 5-minute inactivity pattern with a code initialization issue rather than a scaling policy.

How to eliminate wrong answers

Option B is wrong because the load balancer scaling based on CPU utilization would cause performance degradation under high load, not a cold-start delay after inactivity; increasing the CPU threshold would not address the idle timeout issue. Option C is wrong because a VCN network latency issue would cause consistently slow responses, not a pattern where only the first request after inactivity is slow. Option D is wrong because lazy initialization in the inference code would cause a delay on the first request after deployment or code reload, not specifically after 5 minutes of inactivity; the 5-minute window matches the default idle timeout of the model deployment, not a code-level initialization.

135
MCQhard

Refer to the exhibit. The output is very short and cuts off mid-sentence. Which parameter is most likely the cause?

A.Max-tokens is too low
B.Temperature is too high
C.Model ID incorrect
D.Top-p is too high
AnswerA

If the output exceeds max-tokens, it gets truncated, causing the cut-off.

Why this answer

The 'max-tokens' parameter limits the number of tokens in the generated response. Setting it to 500, while typically sufficient, might still cause truncation if the model's context window is nearly full or if the prompt is long. However, among options, 'max-tokens' is the direct control for output length.

Option C is correct.

136
MCQeasy

A developer wants to integrate generative AI capabilities into an application using REST API calls. Which OCI Generative AI service endpoint should they use for text generation?

A./completions
B./models
C./inference
D./chat
AnswerA

/completions is the endpoint for generating text completions.

Why this answer

Option A is correct because the OCI Generative AI service exposes a REST API endpoint at `/completions` specifically for text generation tasks. This endpoint accepts a prompt and returns a generated text completion, aligning directly with the developer's requirement to integrate generative AI capabilities via REST API calls.

Exam trap

The trap here is that candidates confuse the `/chat` endpoint (designed for conversational AI) with the `/completions` endpoint (designed for single-turn text generation), or assume `/inference` is a generic catch-all endpoint for all AI tasks.

How to eliminate wrong answers

Option B is wrong because `/models` is used to list or retrieve metadata about available generative AI models, not to perform text generation. Option C is wrong because `/inference` is not a valid endpoint in the OCI Generative AI REST API; the correct endpoint for inference-based text generation is `/completions`. Option D is wrong because `/chat` is an endpoint designed for conversational AI interactions (multi-turn chat), not for single-turn text generation tasks.

137
MCQmedium

A team deploys a generative AI model endpoint and notices intermittent 429 Too Many Requests errors. The endpoint is configured with auto-scaling using a dedicated AI cluster. What is the most likely cause?

A.The model's context window exceeded
B.Insufficient storage on the cluster
C.The auto-scaling policy is not aggressive enough
D.Rate limiting at the OCI API Gateway
AnswerC

Auto-scaling may not be scaling up quickly enough to handle traffic spikes, leading to throttling.

Why this answer

The 429 Too Many Requests error indicates that the endpoint is receiving more requests than it can handle. With auto-scaling enabled on a dedicated AI cluster, the most likely cause is that the auto-scaling policy is not aggressive enough to keep up with the request rate, meaning it scales up too slowly or has insufficient maximum instance limits to handle the traffic spike.

Exam trap

The trap here is that candidates often confuse client-side rate limiting (API Gateway) with server-side capacity issues (auto-scaling lag), but the dedicated AI cluster configuration points directly to insufficient scaling policy aggressiveness.

How to eliminate wrong answers

Option A is wrong because exceeding the model's context window would result in a 400 Bad Request or an input length error, not a 429 rate-limiting error. Option B is wrong because insufficient storage on the cluster would manifest as disk-full errors or model loading failures, not HTTP 429 responses which are specifically about request throttling. Option D is wrong because the question states the endpoint is configured with auto-scaling using a dedicated AI cluster, implying the endpoint is directly exposed without an OCI API Gateway in front; rate limiting at the API Gateway would be a separate layer and is not mentioned in the configuration.

138
MCQmedium

A company uses OCI Generative AI service to power a chatbot. After deployment, the chatbot starts generating inappropriate responses. Which action should be taken first?

A.Increase the temperature parameter.
B.Fine-tune the model on customer-specific data.
C.Switch to a larger model.
D.Adjust the prompt template to include safety instructions.
AnswerD

Adding safety instructions in the prompt is a quick and effective safeguard.

Why this answer

Option D is correct because adjusting the prompt template to include safety instructions is the fastest and most direct way to mitigate inappropriate responses without retraining or changing model parameters. In OCI Generative AI, prompt engineering—including explicit safety guidelines—can immediately constrain the model's output behavior by providing clear guardrails in the context window.

Exam trap

Oracle often tests the misconception that safety issues require model retraining or parameter tuning, when in fact prompt engineering is the first-line, low-cost intervention recommended in OCI documentation.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter would make the model's output more random and creative, likely worsening inappropriate responses rather than fixing them. Option B is wrong because fine-tuning on customer-specific data requires significant time, cost, and labeled data, and does not directly address safety issues—it is an over-engineered solution for a problem that can be solved with prompt adjustments. Option C is wrong because switching to a larger model does not inherently improve safety; larger models may even generate more complex or unexpected inappropriate content without proper guardrails.

139
MCQhard

A team uses OCI Generative AI's summarization feature to condense legal documents. The summaries sometimes omit critical clauses. Which parameter adjustment is most likely to improve completeness?

A.Adjust frequencyPenalty.
B.Increase temperature.
C.Decrease topP.
D.Increase maxTokens.
AnswerD

A larger token limit enables longer summaries, helping to include critical clauses.

Why this answer

Increasing maxTokens (option D) is the most direct way to improve completeness because it extends the maximum length of the generated summary, allowing the model to include more content from the source legal document. Critical clauses are often omitted when the token limit truncates the output before the model can cover all essential sections. This parameter controls the output length, not the style or randomness of the generation.

Exam trap

Oracle often tests the misconception that randomness parameters (temperature, topP) or repetition penalties control output length, when in fact only maxTokens directly determines how much text the model can produce.

How to eliminate wrong answers

Option A is wrong because frequencyPenalty reduces repetition by penalizing tokens that have already appeared, which does not address the omission of critical clauses—it only discourages the model from repeating itself. Option B is wrong because increasing temperature adds randomness to token selection, which can make the summary less coherent and more likely to skip important details, not improve completeness. Option C is wrong because decreasing topP narrows the set of candidate tokens to only the most probable ones, which can make the output more conservative and even more likely to omit less common but critical clauses.

140
MCQmedium

A financial firm wants to use OCI Generative AI for contract analysis. They need to reduce costs by using a smaller, specialized model. Which approach should they take?

A.Use a large base model (e.g., Cohere Command) on a serverless endpoint
B.Use a large base model on a dedicated AI cluster
C.Use a third-party LLM
D.Fine-tune a smaller base model on a dedicated AI cluster
AnswerD

Smaller fine-tuned model reduces cost while meeting specialization needs.

Why this answer

Option D is correct because fine-tuning a smaller base model on a dedicated AI cluster allows the financial firm to tailor the model specifically for contract analysis tasks, reducing computational overhead and cost compared to using a large general-purpose model. OCI Generative AI supports fine-tuning of smaller models like Cohere Command Light on dedicated AI clusters, enabling domain-specific optimization without the expense of running a large model for every inference.

Exam trap

Oracle often tests the misconception that larger models are always better for specialized tasks, but the trap here is that fine-tuning a smaller model on a dedicated AI cluster provides both cost efficiency and domain accuracy, which candidates overlook in favor of familiar large-model options.

How to eliminate wrong answers

Option A is wrong because using a large base model (e.g., Cohere Command) on a serverless endpoint incurs higher per-token costs and lacks the specialization needed for contract analysis, contradicting the requirement to reduce costs with a smaller model. Option B is wrong because deploying a large base model on a dedicated AI cluster increases infrastructure costs and still does not provide the targeted performance of a fine-tuned smaller model for contract-specific tasks. Option C is wrong because using a third-party LLM introduces data sovereignty, latency, and integration concerns, and does not leverage OCI's native fine-tuning capabilities for cost-effective specialization.

141
MCQeasy

An application uses RAG to answer customer queries, but answers are often incomplete because the retrieved chunks do not contain full context. Which adjustment should the developer make?

A.Use a different embedding model
B.Increase chunk overlap
C.Increase the number of retrieved chunks
D.Decrease chunk size
AnswerC

Retrieving more chunks provides more context to the generation model.

Why this answer

Increasing the number of retrieved chunks gives the model more contextual information, leading to more complete answers.

142
MCQhard

Refer to the exhibit. A developer creates an index mapping for a vector search application. When performing a k-NN search query, the query fails with a parsing error. What is the most likely cause?

A.The query is missing the 'knn' query clause.
B.The dimension value 768 does not match the embedding model's output dimension.
C.The knn setting is not enabled at the index level.
D.The space_type should be 'l2' instead of 'cosinesimil'.
E.The engine should be 'nmslib' instead of 'faiss'.
AnswerB

A mismatch between mapping dimension and model dimension causes a parsing error during search.

Why this answer

The dimension in the mapping (768) must exactly match the output dimension of the embedding model used to generate the vectors. A mismatch causes parsing errors. The knn setting is correctly enabled (A is wrong). cosinesimil is a valid space_type (C is wrong). faiss is a valid engine (D is wrong).

Missing query clause would not cause a parsing error on the index (E is wrong).

143
Multi-Selectmedium

A company is designing a generative AI application using OCI Generative AI. Which two factors should be considered when selecting the appropriate model? (Choose two.)

Select 2 answers
A.Model's training data cutoff date
B.Availability in all OCI regions
C.Supported languages
D.Maximum token output limit
E.Built-in safety filters
AnswersA, D

The cutoff date indicates how recent the model's knowledge is.

Why this answer

The model's training data cutoff date determines the temporal scope of the model's knowledge. For generative AI applications requiring up-to-date information or compliance with data recency requirements, selecting a model with a cutoff date that aligns with the use case is critical. OCI Generative AI models have specific cutoff dates (e.g., June 2023 for certain models), and using a model with an older cutoff may produce outdated or factually incorrect responses.

Exam trap

The trap here is that candidates often confuse service-level features (like safety filters or regional availability) with model-specific selection criteria, leading them to pick options that are technically true but irrelevant to the core decision of choosing the right model for a generative AI application.

144
MCQmedium

A retail company uses OCI Generative AI Agents to power a product recommendation chatbot on their e-commerce website. The chatbot is integrated with a knowledge base containing product descriptions, customer reviews, and inventory data. Recently, the chatbot has started recommending out-of-stock products frequently, leading to customer frustration. The development team verified that the knowledge base is updated in real-time with inventory data. The chatbot's configuration uses a chunking strategy with a chunk size of 500 tokens and an overlap of 50 tokens. The team suspects the issue is related to how the agent retrieves information. They have access to OCI Logging and Monitoring. Which course of action should the team take first?

A.Decrease the chunk size to 250 tokens to make chunks more specific.
B.Reduce the temperature parameter of the model to 0.2 to reduce hallucinations.
C.Enable auto-scaling on the AI cluster to improve response speed.
D.Increase the chunk overlap from 50 to 150 tokens to ensure inventory status is captured in multiple chunks.
AnswerD

Greater overlap ensures that inventory updates are not missed, improving the relevance of retrieved context.

Why this answer

The core issue is that the chatbot retrieves chunks that contain product descriptions but may miss the inventory status because the chunking strategy does not reliably include both pieces of information together. Increasing the chunk overlap from 50 to 150 tokens ensures that inventory data, which may be at the boundary of a chunk, is captured in multiple overlapping chunks, thereby increasing the likelihood that the retrieval step returns a chunk containing both the product and its current stock level. This directly addresses the retrieval gap without altering model behavior or infrastructure.

Exam trap

Oracle often tests the misconception that retrieval issues are always solved by adjusting model parameters (like temperature) or infrastructure scaling, when the real fix lies in tuning the chunking strategy to ensure critical metadata is not lost at chunk boundaries.

How to eliminate wrong answers

Option A is wrong because decreasing chunk size to 250 tokens would make chunks more specific but would also increase the number of chunks and the risk that inventory status is split across even more chunks, potentially worsening the problem. Option B is wrong because reducing the temperature parameter reduces randomness in generation but does not affect how the agent retrieves information from the knowledge base; the issue is retrieval, not hallucination. Option C is wrong because enabling auto-scaling improves response speed and throughput but does not change the content or structure of the chunks being retrieved, so it cannot fix the missing inventory data.

145
MCQmedium

A company wants to use OCI Generative AI to summarize customer support tickets. They need to ensure that the model does not output any sensitive information. Which technique should they implement?

A.Prompt engineering to instruct the model to exclude sensitive information.
B.Use a smaller model that is less likely to memorize data.
C.Enable content filtering on the endpoint.
D.Disable the use of training data in the endpoint configuration.
AnswerA

Carefully crafted prompts can guide the model to avoid leaking sensitive data.

Why this answer

Prompt engineering is the correct technique because it allows the company to explicitly instruct the generative AI model to exclude sensitive information from its outputs. By crafting a system prompt or user prompt with specific directives (e.g., 'Do not include any personally identifiable information, account numbers, or confidential data in your summary'), the model's behavior is directly controlled at inference time. This is a lightweight, flexible approach that does not require changing the model architecture or endpoint configuration, and it is the most direct way to enforce output constraints in OCI Generative AI.

Exam trap

Oracle often tests the misconception that disabling training data or using a smaller model can prevent sensitive output, when in fact prompt engineering is the primary technique for controlling model behavior at inference time in OCI Generative AI.

How to eliminate wrong answers

Option B is wrong because using a smaller model does not guarantee the exclusion of sensitive information; smaller models can still memorize and output sensitive data from their training set, and model size is unrelated to output filtering. Option C is wrong because content filtering on the endpoint typically blocks predefined categories (e.g., hate speech, violence) but is not designed to dynamically detect and remove sensitive business data like customer support ticket details. Option D is wrong because disabling the use of training data in the endpoint configuration (e.g., setting 'trainingDataConsent' to false) only prevents the model from being fine-tuned or retrained on the input data; it does not affect the model's output behavior during inference, so sensitive information can still appear in summaries.

146
MCQmedium

A developer is getting a 401 Unauthorized error when calling the OCI Generative AI inference API. What is the most likely cause?

A.The API endpoint has reached its rate limit
B.The request is missing or has an invalid authentication signature
C.The model does not support the requested parameters
D.The model is not deployed
AnswerB

401 Unauthorized specifically indicates authentication failure.

Why this answer

A 401 Unauthorized error specifically indicates a failure in authentication, not authorization or resource availability. The OCI Generative AI inference API requires every request to include a valid signature based on the OCI Signature Version 1 algorithm (RFC 2104 HMAC-SHA256). If the request is missing the Authorization header or the signature is malformed (e.g., incorrect key ID, mismatched signing string, or expired timestamp), the API gateway rejects it with a 401 response.

Exam trap

Oracle often tests the distinction between HTTP status codes (401 vs 403 vs 429 vs 404) to see if candidates confuse authentication failures with authorization, rate limiting, or resource availability issues.

How to eliminate wrong answers

Option A is wrong because a rate limit exceeded (HTTP 429) would return a 'Too Many Requests' error, not a 401 Unauthorized. Option C is wrong because unsupported parameters typically result in a 400 Bad Request error, not a 401. Option D is wrong because a model that is not deployed would return a 404 Not Found or a 400 error, as the endpoint itself would be unreachable or the model ID invalid, not an authentication failure.

147
Multi-Selecteasy

A company wants to ensure their RAG application complies with data residency requirements. Data must not leave a specific OCI region. Which TWO actions are necessary? (Choose two.)

Select 2 answers
A.Enable cross-region access for the vector store
B.Use a global search interface
C.Deploy the OCI Generative AI service endpoint in the required region
D.Configure the vector store to replicate across regions for high availability
E.Use an embedding model hosted in the required region
AnswersC, E

Keeps LLM inference and any data sent to the endpoint within the region.

Why this answer

Using an embedding model hosted in the required region (A) and deploying the OCI Generative AI service endpoint in that region (C) ensure that all data processing stays within the region.

148
Multi-Selectmedium

Which THREE factors should be considered when choosing a base model for OCI Gen AI?

Select 3 answers
A.Availability in the desired region
B.Cost per token
C.Supported languages
D.Model size
E.Training data format
AnswersA, B, D

Model must be supported in the region where you deploy.

Why this answer

Option A is correct because OCI Gen AI models are deployed in specific regions, and availability varies by region due to data residency requirements and infrastructure placement. Before selecting a base model, you must verify that the model is available in your desired OCI region to avoid deployment failures or latency issues.

Exam trap

Oracle often tests the misconception that model size (option D) is the most critical factor, but while model size affects performance and cost, it is not one of the three key factors listed; the trap is that candidates may overvalue model size and overlook regional availability and cost per token.

149
MCQhard

A developer in the GenAIDevelopers group tries to call the OCI Generative AI inference API but receives an unauthorized error. Which statement best explains the issue?

A.The developer lacks the required permission to 'use generative-ai-inference' to invoke the model.
B.The developer does not have permission to use the generative-ai-model-family.
C.The policy does not allow any operations in the compartment.
D.The compartment name in the policy is incorrect.
AnswerA

They only have 'create', but inference invocation requires 'use'.

Why this answer

Option C is correct. The policy only allows creating inference requests but not to 'inspect' or 'use' inference. The developer needs the 'use' permission on the inference resource to actually call it.

Option A is wrong because model-family is allowed. Option B is wrong because the policy allows operations. Option D is wrong because the compartment is correct.

150
MCQeasy

A data scientist needs to fine-tune a model on OCI Generative AI. Which of the following is a required parameter in the fine-tuning request?

A.hyperparameters
B.model_name
C.dataset_type
D.All of the above
AnswerD

All three (model_name, dataset_type, hyperparameters) are required for a fine-tuning request.

Why this answer

In OCI Generative AI, the fine-tuning request requires all three parameters: hyperparameters (to define training behavior like learning rate and epochs), model_name (to specify the base model being fine-tuned), and dataset_type (to indicate the format of the training data, such as 'TEXT' or 'MULTI_TURN'). Therefore, 'All of the above' is correct because each listed option is a mandatory field in the fine-tuning API call.

Exam trap

Oracle often tests the 'All of the above' pattern when each individual option is factually correct but candidates incorrectly assume only one is required, missing the comprehensive nature of the fine-tuning request.

How to eliminate wrong answers

Option A is wrong because hyperparameters are indeed required, but the question asks for 'a required parameter' and the correct answer includes all options, so selecting only A would be incomplete. Option B is wrong because model_name is required, but again, it is not the only required parameter. Option C is wrong because dataset_type is required, but the question expects the comprehensive answer that all three are necessary.

The trap is that each individual option is technically required, but the question is designed to test whether you know that all three are mandatory in the fine-tuning request.

Page 1

Page 2 of 7

Page 3

All pages