Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 175

500 questions total · 7pages · All types, answers revealed

Page 1 of 7

Page 2
1
Multi-Selecteasy

Which two metrics would you monitor to ensure a generative AI deployment on OCI is operating efficiently? (Choose two.)

Select 2 answers
A.Number of active users
B.Object Storage bucket size
C.Request throughput (requests per second)
D.Average inference latency
E.Model accuracy on validation set
AnswersC, D

Throughput indicates how many requests the system can handle.

Why this answer

Request throughput (requests per second) is a critical metric for monitoring the operational efficiency of a generative AI deployment on OCI because it directly measures the system's capacity to handle incoming inference requests. If throughput drops below expected levels, it indicates a bottleneck in the compute resources (e.g., GPU utilization) or the serving infrastructure, which can lead to degraded user experience and potential timeouts.

Exam trap

Oracle often tests the distinction between model quality metrics (like accuracy) and operational efficiency metrics (like latency and throughput), so the trap here is that candidates mistakenly select 'Model accuracy on validation set' because they confuse model performance with deployment performance.

2
MCQhard

You are deploying a generative AI solution on OCI for a healthcare client that requires strict data residency (data must remain in the EU) and low-latency inference. The solution uses a fine-tuned LLM model (7B parameters) stored in Object Storage in the Frankfurt region. You have set up an OCI Data Science model deployment endpoint with GPU shape VM.GPU.A10.1, using a single replica. During load testing with 50 concurrent users, you observe high latency (average 8 seconds per request) and occasional 504 gateway timeouts. The model deployment logs show no errors, and the model loads successfully. You have confirmed that the Object Storage bucket is in the same region and that the network latency between the client and the endpoint is minimal (under 5 ms). Which action should you take to reduce latency and eliminate timeouts?

A.Increase the model deployment endpoint timeout setting from 60 seconds to 300 seconds in the OCI console.
B.Upgrade the model deployment shape to VM.GPU.A100.4 and keep a single replica.
C.Increase the number of replicas to 3 and enable autoscaling based on CPU utilization.
D.Move the model deployment to the US East (Ashburn) region to leverage lower-cost GPU capacity and reduce latency.
AnswerC

Option D is correct because increasing the number of replicas to handle concurrent requests reduces queuing and improves throughput, while also enabling load balancing to avoid timeouts.

Why this answer

Option C is correct because the high latency and 504 timeouts with 50 concurrent users indicate that a single GPU replica is overwhelmed by the request queue. Increasing replicas to 3 distributes the load across multiple endpoints, while enabling autoscaling based on CPU utilization ensures dynamic scaling to handle traffic spikes. This directly reduces per-request latency and eliminates timeouts without violating data residency requirements.

Exam trap

The trap here is that candidates often confuse increasing timeout (Option A) or upgrading GPU size (Option B) as solutions to concurrency issues, when in fact horizontal scaling via replicas is required to handle multiple simultaneous requests without violating data residency constraints.

How to eliminate wrong answers

Option A is wrong because increasing the timeout from 60 to 300 seconds only masks the symptom of slow inference; it does not address the root cause of insufficient compute capacity, and 504 timeouts will still occur if requests queue up beyond the timeout. Option B is wrong because upgrading to a larger GPU (A100.4) with a single replica increases throughput per request but does not resolve the concurrency bottleneck; with 50 concurrent users, a single replica still serializes requests, leading to high latency and potential timeouts. Option D is wrong because moving the deployment to US East violates the strict data residency requirement that data must remain in the EU, and it does not solve the concurrency issue; latency from cross-region data transfer would also increase.

3
MCQeasy

Refer to the exhibit. A user deployed a custom model via OCI Data Science and registered it in the Model Catalog. They use the correct OCID but get this error. What is the most likely issue?

A.The model is not fine-tuned
B.The model is not deployed to an endpoint
C.The compartment OCID is missing
D.The model is in a different region
AnswerB

Deployment is required to serve inference requests; registration is not sufficient.

Why this answer

A model must be deployed to an endpoint to be accessible via the inference API. Registration in the Model Catalog alone does not create an endpoint. Option A is correct.

4
MCQmedium

A developer is using the OCI Generative AI service API and receives a '400 Bad Request' with error 'Model not found'. What is the most likely cause?

A.The model ID is misspelled or does not exist.
B.The API request lacks authentication.
C.The input exceeds the maximum token limit.
D.The endpoint region is incorrect.
AnswerA

The error directly states 'Model not found'.

Why this answer

The error indicates the model ID specified in the request does not exist or is misspelled.

5
MCQhard

A company wants to deploy a custom fine-tuned model for retrieval-augmented generation (RAG) using dedicated AI cluster. They need to ensure the model can handle concurrent requests from multiple applications with consistent latency. What should they configure?

A.Set a high temperature to keep responses concise.
B.Increase the number of replicas in the dedicated cluster.
C.Enable auto-scaling on the cluster.
D.Use the managed serving endpoint instead.
AnswerB

More replicas allow handling more requests concurrently without degradation.

Why this answer

Option A is correct because increasing the number of replicas in the dedicated cluster distributes the load across multiple model copies, improving concurrency and latency stability.

6
Multi-Selecteasy

Which TWO are valid methods to monitor the performance of a generative AI model deployed on OCI Data Science?

Select 2 answers
A.Use OCI Notifications to receive alerts on model drift
B.Use OCI Monitoring service to track custom metrics like latency and throughput
C.Use OCI Logging service to collect inference logs
D.Use OCI Events service to trigger retraining on low accuracy
E.Use OCI Audit service to review API call logs
AnswersB, C

Allows pushing custom metrics from the inference script.

Why this answer

Option B is correct because OCI Monitoring service allows you to define and track custom metrics such as inference latency (e.g., p50/p99 response times) and throughput (requests per second) for your generative AI model deployed on OCI Data Science. This enables real-time performance monitoring and alerting based on thresholds you set, which is essential for production AI workloads.

Exam trap

Oracle often tests the distinction between monitoring (OCI Monitoring), logging (OCI Logging), and notification/event services, so candidates mistakenly select OCI Notifications or OCI Events as monitoring tools when they are actually reactive or alerting services.

7
Multi-Selectmedium

Which TWO are best practices for chunking documents in a RAG pipeline? (Choose two.)

Select 2 answers
A.Use fixed-size chunks regardless of content boundaries.
B.Always use small chunks (e.g., 100 characters).
C.Chunk entire documents as a single chunk.
D.Use semantic chunking to preserve meaning.
E.Overlap chunks to avoid missing context at boundaries.
AnswersD, E

Semantic chunking maintains context within chunks.

Why this answer

Semantic chunking ensures each chunk contains a coherent idea, and chunk overlap prevents loss of context at boundaries, improving retrieval quality.

8
Multi-Selecthard

A company is using dedicated AI cluster for fine-tuning. Which TWO best practices help optimize cost?

Select 2 answers
A.Use the largest replica count.
B.Manually scale down the cluster when not in use.
C.Use the managed serving endpoint instead.
D.Leave the cluster running continuously.
E.Use the smallest possible model for the task.
AnswersB, E

Reduces active compute hours.

Why this answer

Option B is correct because manually scaling down the dedicated AI cluster when not in use directly reduces compute costs by stopping idle GPU/CPU resources. In OCI Generative AI, dedicated AI clusters incur charges for provisioned capacity, so scaling down during inactivity avoids paying for unused infrastructure.

Exam trap

Oracle often tests the misconception that larger replica counts or continuous running improve performance, when in fact they only increase cost without accelerating fine-tuning convergence.

9
MCQeasy

A company wants to use OCI Generative AI service to generate email summaries for customer support. They need to ensure low latency and data residency in Frankfurt. What should they use?

A.Use the playground in US region.
B.Use OCI AI Quick Actions.
C.Use the managed serving endpoint in Frankfurt region.
D.Create a dedicated AI cluster in Frankfurt.
AnswerC

The managed serving endpoint is available in Frankfurt, providing regional low-latency inference.

Why this answer

Option A is correct because the managed serving endpoint is available in Frankfurt region, providing low latency without the need for a dedicated cluster. Dedicated AI cluster is not necessary for latency and data residency if the managed endpoint is in the same region. The playground is for testing, and Quick Actions are for pre-built models.

10
Multi-Selectmedium

Which three factors most significantly affect the quality of an LLM's output? (Select THREE)

Select 3 answers
A.Model's context window size
B.Clarity of the prompt
C.Number of GPUs used during inference
D.Temperature setting
E.Quality of training data
AnswersB, D, E

Correct: Clear prompts yield more accurate responses.

Why this answer

Option B is correct because the clarity of the prompt directly determines how well the LLM interprets the user's intent. A well-structured, unambiguous prompt reduces ambiguity and guides the model toward generating relevant and coherent responses, while a vague or poorly worded prompt often leads to off-target or nonsensical output.

Exam trap

Oracle often tests the misconception that hardware resources like GPU count directly improve output quality, whereas in reality they only affect performance metrics like latency and throughput, not the semantic quality of the generated text.

11
Multi-Selecteasy

Which TWO are advantages of using retrieval-augmented generation (RAG) over fine-tuning for incorporating new knowledge?

Select 2 answers
A.Better at capturing domain-specific writing style
B.Enables the model to access up-to-date information without retraining
C.Eliminates the need for a vector database
D.Reduces token usage and latency compared to fine-tuning
E.Cost-effective for large corpora that change frequently
AnswersB, E

RAG retrieves fresh data from external sources.

Why this answer

Option B is correct because RAG retrieves relevant, up-to-date information from an external knowledge base at inference time, allowing the model to answer questions about recent events or proprietary data without requiring any retraining. This is a key advantage over fine-tuning, which would need a new training cycle to incorporate the same new knowledge.

Exam trap

Oracle often tests the misconception that RAG is always faster or cheaper than fine-tuning, when in reality RAG introduces retrieval latency and higher token usage, making it less suitable for low-latency or high-throughput scenarios.

12
MCQeasy

A company deployed OCI Generative AI for a customer service chatbot. They are using the Cohere command model. The chatbot is generating responses that are too brief and often cut off mid-sentence. They have limited budget. What should they do?

A.Increase max tokens to 1024.
B.Decrease the temperature to 0.2.
C.Increase the temperature to 0.9.
D.Use a different base model like Llama.
AnswerA

Increasing max tokens gives the model more room to complete its response.

Why this answer

Option C is correct. Increasing the max tokens allows the model to generate longer, complete responses. Option A is wrong because increasing temperature would make responses more random, not longer.

Option B is wrong because decreasing temperature reduces variability but does not increase length. Option D is wrong because switching models is costly and may not address the length issue directly.

13
Multi-Selectmedium

A company is deploying a large generative AI model on OCI using GPU compute instances. They want to optimize inference cost while maintaining acceptable latency. Which TWO strategies should they implement?

Select 2 answers
A.Enable provisioned concurrency on all models.
B.Select the smallest GPU instance type that meets latency requirements.
C.Increase the max-tokens parameter to generate longer responses.
D.Deploy the model on multiple large GPU instances to handle peak load.
E.Use an inference endpoint with auto-scaling to match demand.
AnswersB, E

Choosing appropriate instance size avoids paying for unused capacity.

Why this answer

Option B is correct because selecting the smallest GPU instance type that meets latency requirements directly reduces compute cost per inference without sacrificing user experience. This aligns with OCI's pay-as-you-go GPU pricing, where larger instances incur higher hourly costs. The key is to right-size the GPU based on model memory footprint and inference throughput, not to over-provision.

Exam trap

Oracle often tests the misconception that 'bigger GPU instances always improve performance' or that 'provisioned concurrency applies to all OCI services,' when in fact it is specific to serverless compute and irrelevant to GPU inference endpoints.

14
MCQeasy

A developer is using OCI Generative AI to build a question-answering system over a large corpus of technical manuals. The developer uses the Cohere Embed model to generate embeddings and stores them in an OCI OpenSearch cluster. Queries are slow and the team needs to reduce latency. Which approach is BEST for improving search speed while maintaining acceptable accuracy?

A.Increase the embedding dimension for better representation.
B.Reduce the k value in the nearest neighbor search.
C.Use exact nearest neighbor search instead of approximate.
D.Increase the index refresh interval to reduce write overhead.
AnswerB

Fewer neighbors means less distance computation and faster retrieval.

Why this answer

Reducing the k value in the nearest neighbor search directly decreases the number of vectors that must be compared during query time, which lowers latency. In approximate nearest neighbor (ANN) search, a smaller k means fewer candidates are evaluated, speeding up retrieval while still maintaining acceptable accuracy if the original k was unnecessarily high. This is the most effective tuning knob for latency in vector search systems like OCI OpenSearch with Cohere embeddings.

Exam trap

The trap here is that candidates often confuse reducing k with reducing accuracy, but in practice, many RAG systems use a k value larger than necessary, and reducing it to a reasonable minimum (e.g., from 20 to 5) can dramatically improve speed without noticeable quality loss.

How to eliminate wrong answers

Option A is wrong because increasing the embedding dimension increases the computational cost of distance calculations and memory usage, which would worsen latency, not improve it. Option B is wrong because exact nearest neighbor search (k-NN) requires scanning all vectors, which is O(n) and significantly slower than approximate methods, especially on large corpora. Option D is wrong because increasing the index refresh interval reduces write overhead but does not affect query latency; it only delays the visibility of new documents.

15
MCQmedium

A team uses OCI OpenSearch as a vector database for RAG. Some queries return no results despite relevant documents being indexed. What is a likely cause?

A.The vector index is not refreshed after adding documents
B.The number of candidates (k) is set too low
C.The query text is too long for the embedding model
D.The embedding model is incompatible with the document language
AnswerB

A very low k value may result in no matches being returned for queries with distant nearest neighbors.

Why this answer

If k (the number of nearest neighbors to return) is set too low, the search may not find any documents within the top k, returning no results.

16
MCQmedium

Refer to the exhibit. A user receives this error when calling the OCI Gen AI inference endpoint. What is the most likely cause?

A.The region name is misspelled
B.The model is not deployed in the region
C.The API key is expired
D.The model name is incorrect
AnswerB

The error indicates the model is not supported in that region.

Why this answer

The error indicates that the model is not available in the specified region. OCI Gen AI models are deployed regionally, and each region supports only a specific subset of models. If the user calls an endpoint in a region where the requested model has not been deployed, the service returns an error because the model's inference endpoint does not exist in that region's routing table.

Exam trap

Oracle often tests the distinction between 'model not found' (invalid model name) and 'model not available in region' (valid model but not deployed there), leading candidates to incorrectly select the model name option when the error message explicitly mentions regional unavailability.

How to eliminate wrong answers

Option A is wrong because a misspelled region name would typically result in a DNS resolution failure or a 404 error, not a model-not-found error. Option C is wrong because an expired API key would cause an authentication failure (HTTP 401 Unauthorized), not a model availability error. Option D is wrong because an incorrect model name would produce a 'model not found' or 'invalid model' error, but the error message in the exhibit specifically states the model is not available in the region, not that the model name is invalid.

17
MCQmedium

A team wants to use OCI Generative AI to generate synthetic data for training a model. They are concerned about the cost of API calls. Which pricing model would be most cost-effective for high-volume batch processing?

A.OCI Universal Credits with per-request charges
B.Monthly subscription with limited requests
C.Pay-as-you-go per request
D.Reserved capacity with a fixed monthly fee
AnswerD

Reserved capacity offers predictable pricing and lower per-request cost for high volume.

Why this answer

Option D is correct because reserved capacity with a fixed monthly fee provides the lowest per-request cost for high-volume batch processing. OCI Generative AI offers dedicated capacity pricing, which is ideal for predictable, large-scale workloads where you commit to a certain throughput, avoiding per-request charges that would accumulate significantly with high volume.

Exam trap

Oracle often tests the misconception that pay-as-you-go is always the cheapest for any workload, but the trap here is that high-volume batch processing benefits from reserved capacity's flat fee, which lowers per-request costs significantly compared to per-request pricing models.

How to eliminate wrong answers

Option A is wrong because OCI Universal Credits with per-request charges would be expensive for high-volume batch processing, as each API call incurs a separate cost, leading to unpredictable and high expenses. Option B is wrong because a monthly subscription with limited requests would cap the number of requests, making it unsuitable for high-volume batch processing where you need to generate large amounts of synthetic data without hitting a limit. Option C is wrong because pay-as-you-go per request is the most expensive model for high-volume workloads, as costs scale linearly with each API call, whereas reserved capacity offers a flat fee for better cost predictability.

18
MCQmedium

Which OCI service provides a managed vector database capability that can be used as a knowledge base in a RAG architecture?

A.OCI MySQL HeatWave
B.OCI Database (Autonomous Database)
C.OCI Search with OpenSearch
D.OCI Object Storage
AnswerC

OpenSearch includes the k-NN plugin for vector search, managed by OCI.

Why this answer

OCI Search with OpenSearch supports vector indexing and search natively, making it a suitable managed solution for RAG vector storage.

19
MCQhard

An organization deploys a fine-tuned model for legal document analysis using OCI Generative AI Service. They need to ensure that only authorized users in the 'LegalTeam' group can access the model endpoint. Which policy statement should be used?

A.Allow group LegalTeam to use generative-ai-model in compartment ABC
B.Allow group LegalTeam to manage generative-ai-family in compartment ABC
C.Allow group LegalTeam to read generative-ai-model in compartment ABC
D.Allow group LegalTeam to inspect generative-ai-model in compartment ABC
AnswerA

Use permission allows invoking the model for inference.

Why this answer

Option A is correct because the 'use' verb on the 'generative-ai-model' resource type grants the LegalTeam group permission to invoke the model endpoint for inference, which is the minimum privilege required for accessing a deployed fine-tuned model in OCI Generative AI Service. The 'use' permission specifically allows calling the model for text generation or analysis without granting broader management or read capabilities.

Exam trap

Oracle often tests the distinction between 'use' and 'read' on resource types that support inference endpoints, where candidates mistakenly assume 'read' is sufficient for accessing the model's functionality, but only 'use' grants the actual invocation permission required for inference operations.

How to eliminate wrong answers

Option B is wrong because 'manage' on 'generative-ai-family' grants full administrative control over all Generative AI resources (including creating, updating, deleting models and endpoints), which exceeds the requirement of only accessing the model endpoint and violates the principle of least privilege. Option C is wrong because 'read' on 'generative-ai-model' allows viewing model metadata and configuration but does not include the permission to invoke the model endpoint for inference, which requires the 'use' verb. Option D is wrong because 'inspect' on 'generative-ai-model' only permits listing and viewing basic resource information (like tags and identifiers) and provides no ability to call the model endpoint for legal document analysis.

20
MCQeasy

A startup needs to deploy an LLM for a simple FAQ chatbot on OCI with low latency. Which model choice is most appropriate?

A.Use a medium-sized model with high precision.
B.Use an ensemble of models.
C.Use the largest available model for best quality.
D.Use a smaller, task-specific fine-tuned model.
AnswerD

Correct: Small models are fast and adequate for simple tasks.

Why this answer

Option B is correct because a smaller fine-tuned model offers faster inference and sufficient accuracy for simple FAQs. Option A is overkill and slow, Option C may still be large, and Option D adds unnecessary complexity.

21
MCQeasy

A data scientist wants to quickly test a prompt with different parameters like temperature and max tokens without writing code. Which OCI GenAI feature should they use?

A.OCI CLI.
B.OCI Generative AI Playground.
C.OCI SDK.
D.OCI Data Science Notebooks.
AnswerB

Playground allows visual prompt testing with adjustable parameters.

Why this answer

The OCI Generative AI Playground is a web-based, no-code interface that allows data scientists to interactively test prompts and adjust parameters like temperature and max tokens without writing any code. This directly matches the user's requirement for quick, code-free experimentation.

Exam trap

The trap here is that candidates may confuse the OCI CLI or SDK as 'quick' tools, but the question explicitly requires 'without writing code,' which only the Playground satisfies.

How to eliminate wrong answers

Option A is wrong because the OCI CLI is a command-line tool that requires writing and executing commands, not a no-code interface for interactive prompt testing. Option C is wrong because the OCI SDK is a software development kit used for programmatic access via code in languages like Python or Java, which contradicts the 'without writing code' requirement. Option D is wrong because OCI Data Science Notebooks are Jupyter-based environments that require writing Python code to invoke the Generative AI service, not a no-code playground.

22
MCQmedium

An organization is deploying a large language model on OCI using a dedicated AI cluster. They need to minimize inference latency. Which configuration step is most critical?

A.Set up a load balancer across multiple regions
B.Configure the cluster to use high-bandwidth RDMA networking
C.Use a single VM shape to reduce network hops
D.Disable model parallelism to simplify setup
AnswerB

Correct: RDMA enables ultra-low-latency communication between nodes, essential for performance.

Why this answer

RDMA (Remote Direct Memory Access) bypasses the CPU and kernel to transfer data directly between GPU memories, drastically reducing latency and CPU overhead. In a dedicated AI cluster on OCI, high-bandwidth RDMA networking (e.g., using RoCE v2 or InfiniBand) is the most critical step to minimize inference latency because model parallelism and tensor parallelism across nodes depend on fast, low-latency interconnects. Without RDMA, even with optimized model parallelism, the network becomes the bottleneck, increasing per-token latency.

Exam trap

Oracle often tests the misconception that load balancing or simplifying the model architecture (e.g., disabling parallelism) reduces latency, when in fact the critical bottleneck for distributed inference is inter-node communication, which RDMA directly addresses.

How to eliminate wrong answers

Option A is wrong because a multi-region load balancer adds cross-region network latency and is designed for high availability and geographic distribution, not for minimizing inference latency within a single cluster. Option C is wrong because using a single VM shape does not reduce network hops; inference on large models requires multiple GPUs across nodes, and a single VM cannot host the full model, so network hops are inevitable. Option D is wrong because disabling model parallelism would force the entire model onto a single GPU, which is impossible for large models and would actually increase latency due to memory swapping or inability to load the model, not simplify setup.

23
MCQmedium

A developer notices that the RAG system returns irrelevant chunks when the user query contains typos or abbreviations. Which technique would BEST improve retrieval robustness for such queries?

A.Decrease the chunk size to focus on smaller units.
B.Increase the number of retrieved chunks to cover more variations.
C.Use a spell-checker on the retrieved chunks.
D.Implement query rewriting or expansion using a language model before embedding.
AnswerD

Rewriting corrects typos and expands abbreviations, improving embedding quality.

Why this answer

Option D is correct because query rewriting or expansion using a language model (LLM) directly addresses typos and abbreviations by generating a corrected or enriched query before embedding. This improves the semantic alignment between the user's intent and the vector search, ensuring that even noisy input retrieves relevant chunks. Techniques like spelling correction or synonym expansion at query time are far more effective than post-retrieval fixes or parameter tuning.

Exam trap

Oracle often tests the misconception that retrieval robustness can be improved by tuning chunk size or retrieval count, when the real bottleneck is the quality of the query embedding itself.

How to eliminate wrong answers

Option A is wrong because decreasing chunk size does not fix typos or abbreviations; it only changes the granularity of retrieval units, potentially missing context or increasing noise. Option B is wrong because increasing the number of retrieved chunks may include more irrelevant results without correcting the query's semantic mismatch caused by typos or abbreviations. Option C is wrong because applying a spell-checker on retrieved chunks is a post-retrieval fix that cannot recover relevance lost during embedding of a malformed query; the damage is already done at the retrieval stage.

24
MCQmedium

A data scientist is using OCI Generative AI Service to generate product descriptions. They notice that the output often repeats phrases. Which parameter adjustment would MOST directly address this issue?

A.Increase the temperature
B.Increase the max tokens
C.Increase the frequency penalty
D.Decrease the top-p value
AnswerC

Frequency penalty penalizes tokens that have already appeared, reducing repetition.

Why this answer

Option C is correct because the frequency penalty directly reduces the likelihood of the model repeating the same phrases by penalizing tokens that have already appeared in the generated text. In OCI Generative AI Service, this parameter subtracts a fixed value from the log-probability of each token each time it is generated, making repeated tokens less likely to be chosen again. This is the most direct mechanism to address repetitive output.

Exam trap

Oracle often tests the distinction between frequency penalty and temperature, where candidates mistakenly think increasing randomness (temperature) will reduce repetition, but temperature actually increases variability without targeting repetition directly.

How to eliminate wrong answers

Option A is wrong because increasing temperature adds randomness to the token selection process by scaling the logits before applying softmax, which can lead to more diverse but also more chaotic output, not specifically reducing repetition. Option B is wrong because increasing max tokens only extends the maximum length of the generated text, which may actually allow more repetition to occur rather than preventing it. Option D is wrong because decreasing top-p (nucleus sampling) restricts the sampling pool to the smallest set of tokens whose cumulative probability exceeds the threshold, which can reduce diversity and potentially increase repetition by focusing on high-probability tokens.

25
Multi-Selecthard

A team is designing a RAG system for a multilingual knowledge base. Which TWO strategies are appropriate? (Choose two.)

Select 2 answers
A.Store separate vector indices per language
B.Disable vector search for non-English queries
C.Translate all documents to English before indexing
D.Use a different embedding model per language
E.Use a single embedding model trained for multilingual text
AnswersA, E

Separate indices allow language-specific preprocessing and retrieval optimizations.

Why this answer

Using a multilingual embedding model (A) handles multiple languages in a single pipeline, and storing separate vector indices per language (C) allows optimized retrieval for each language.

26
Multi-Selectmedium

Which TWO of the following are valid ways to reduce latency when using OCI Generative AI Service?

Select 2 answers
A.Use a dedicated AI cluster
B.Reduce the max tokens parameter
C.Deploy the model in a different region
D.Use a larger model
E.Batch multiple requests
AnswersA, B

Dedicated cluster provides consistent performance and lower latency.

Why this answer

A dedicated AI cluster provides isolated compute resources (GPU nodes) for inference, eliminating resource contention from other tenants or workloads. This ensures consistent low-latency responses because the model is always warm and available without queueing delays, which is critical for real-time applications.

Exam trap

Oracle often tests the misconception that deploying in a different region or using a larger model improves performance, when in fact these actions increase latency due to network distance and computational overhead.

27
MCQeasy

A startup is building a customer support chatbot using RAG with OCI Generative AI. They have a large corpus of FAQ documents stored as PDFs in OCI Object Storage. The developer uses OCI Language to embed the text and stores vectors in OCI OpenSearch. During testing, the chatbot often fails to answer questions because relevant FAQ entries are not retrieved. The team suspects the chunking size is too large, causing loss of specific details. After reducing chunk size, retrieval improves slightly but still misses many answers. What should the team do NEXT?

A.Use a sliding window chunking strategy with overlap
B.Increase the number of retrieved chunks (k)
C.Switch to a different embedding model
D.Manually rephrase the queries
AnswerA

Overlap preserves context across chunk boundaries, improving recall.

Why this answer

The problem is likely chunk boundaries cutting off context. Using a sliding window with overlap ensures continuity, so relevant information is not lost at chunk edges. Increasing k adds noise; switching model is expensive; manual rephrasing is not scalable.

28
MCQeasy

Refer to the exhibit. Users in the group cannot create a new custom model deployment on a Dedicated AI Cluster. What is the most likely missing permission?

A.Manage ai-document-understanding
B.Manage ai-agents
C.Use of virtual-network-family for cluster networking
D.Manage instance-configurations
AnswerC

Dedicated AI Clusters require VCN networking permissions to provision networking resources.

Why this answer

Creating a custom model deployment on a Dedicated AI Cluster requires the user to have the 'Use of virtual-network-family for cluster networking' permission. This permission allows the user to specify and manage the virtual network (VCN) and subnet that the cluster uses for networking. Without it, the deployment fails because the cluster cannot be attached to the required network resources.

Exam trap

The trap here is that candidates often focus on AI-specific permissions (like ai-document-understanding or ai-agents) and overlook the underlying networking permission required for cluster-based deployments, assuming that cluster creation is purely an AI service operation.

How to eliminate wrong answers

Option A is wrong because 'Manage ai-document-understanding' is a permission for managing document understanding AI services, not for deploying custom models on a Dedicated AI Cluster. Option B is wrong because 'Manage ai-agents' controls permissions for AI agent resources, which are separate from model deployment on clusters. Option D is wrong because 'Manage instance-configurations' is related to compute instance configurations, not to the networking setup required for a Dedicated AI Cluster deployment.

29
Multi-Selecthard

Which THREE factors should be considered when choosing a vector store for a RAG application in OCI?

Select 3 answers
A.The maximum vector dimension supported.
B.Capability of hybrid search (vector + keyword).
C.The CPU architecture of the vector store nodes.
D.Support for multi-tenancy and isolation.
E.Indexing latency for adding new vectors.
AnswersB, D, E

Hybrid search improves retrieval by combining semantic and exact matches, especially for rare terms.

Why this answer

Options B, D, and E are correct. Indexing latency (B) affects how fast new documents can be ingested. Multi-tenancy (D) is important for enterprise use.

Hybrid search support (E) combines vector and keyword search for better recall. Option A is wrong because CPU architecture is irrelevant. Option C is wrong because vector dimension is determined by the embedding model, not the store.

30
MCQhard

A healthcare company is deploying a RAG application using OCI Generative AI and wants to ensure patient data privacy. They cannot send sensitive data to a public embedding endpoint. Which approach should they take to embed documents while maintaining data residency and security?

A.Use the standard OCI Generative AI public endpoint with data encryption in transit.
B.Use an external embedding service that complies with HIPAA in a different cloud region.
C.Hash the document text before sending to the public embedding endpoint.
D.Provision a dedicated AI cluster in their OCI tenancy to host the embedding model.
AnswerD

A dedicated cluster keeps all data within the customer's tenancy, meeting data privacy and residency requirements.

Why this answer

Option C is correct because OCI allows deploying Cohere models as dedicated AI clusters, ensuring data does not leave the customer's tenancy. Option A is wrong because the public endpoint processes data in Oracle's shared infrastructure. Option B is wrong because using a third-party API violates data residency.

Option D is wrong because hashing before embedding destroys semantic meaning.

31
MCQmedium

A data scientist is using OCI Data Science to build a RAG system for medical literature. They have a large corpus of PDFs. They used the default OCI Generative AI embedding model and chunked each PDF into 512-character segments with 10% overlap. However, queries about specific drug doses often return incorrect information, even though the correct dose is present in the corpus. Upon inspection, they find that the retrieved chunks often contain partial dose information or miss the context units (e.g., mg vs. mcg). What improvement should they prioritize?

A.Implement a secondary verification step using a rule-based pattern matcher.
B.Use a semantic chunking strategy that respects document structure (e.g., paragraphs, sections).
C.Increase the chunk overlap to 50% to ensure more context.
D.Fine-tune the embedding model on medical text.
AnswerB

Preserving natural boundaries ensures that related information stays in one chunk.

Why this answer

Option A is correct because semantic chunking that respects document structure (e.g., paragraphs, sections) will keep dose information together. Option B (more overlap) may help but does not address structural breaks. Option C (fine-tuning) is heavy and may not fix chunk boundaries.

Option D is a workaround, not a core fix.

32
MCQeasy

A user wants to use OCI Generative AI to generate marketing copy. They want the output to be more creative and varied. Which parameter should they adjust?

A.Set temperature to 0.
B.Increase the temperature parameter.
C.Decrease the temperature parameter.
D.Increase the max_tokens parameter.
AnswerB

Higher temperature increases randomness, leading to more creative and varied text generation.

Why this answer

Increasing the temperature parameter makes the model's output more random and diverse, which is ideal for creative tasks like generating marketing copy. A higher temperature (e.g., 0.7–1.0) increases the probability of sampling less likely tokens, leading to more varied and imaginative text. Setting temperature to 0 would make the output deterministic and repetitive, which is the opposite of what the user wants.

Exam trap

Oracle often tests the misconception that increasing max_tokens or adjusting other parameters like top_p can substitute for temperature when the goal is to increase creativity, but only temperature directly controls randomness and diversity in token selection.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 forces the model to always choose the most likely token, resulting in deterministic, repetitive, and less creative output. Option C is wrong because decreasing the temperature reduces randomness, making the output more conservative and less varied, which contradicts the goal of creativity. Option D is wrong because increasing max_tokens only extends the maximum length of the generated text; it does not affect the creativity or variability of the output.

33
MCQmedium

An organization wants to use OCI Generative AI for real-time document translation. They need high availability across regions. Which deployment option meets this requirement?

A.Single dedicated AI cluster in one region
B.Multiple dedicated AI clusters in different regions with a load balancer
C.Single serverless endpoint
D.Multiple serverless endpoints in different regions
AnswerB

Multi-region with load balancing ensures continuity even if one region is down.

Why this answer

Option B is correct because deploying multiple dedicated AI clusters across different regions with a load balancer ensures high availability by distributing traffic and providing failover if one region becomes unavailable. OCI Generative AI dedicated AI clusters are provisioned per region, and a load balancer can route requests to healthy clusters, meeting the requirement for real-time document translation with cross-region redundancy.

Exam trap

The trap here is that candidates often assume serverless endpoints inherently provide multi-region high availability, but in OCI Generative AI, serverless endpoints are region-scoped and do not include built-in cross-region failover or load balancing.

How to eliminate wrong answers

Option A is wrong because a single dedicated AI cluster in one region creates a single point of failure, failing the high-availability requirement. Option C is wrong because a single serverless endpoint is also region-specific and lacks cross-region redundancy, so it cannot provide high availability across regions. Option D is wrong because multiple serverless endpoints in different regions without a load balancer cannot automatically distribute traffic or handle failover; they require external routing logic to achieve high availability, which is not inherent in the serverless endpoint model.

34
Multi-Selecthard

Which THREE models are available as part of the OCI Generative AI service?

Select 3 answers
A.Llama 3
B.GPT-4
C.Cohere Command
D.Stable Diffusion
E.Cohere Embed
AnswersA, C, E

Meta's Llama 3 is available in OCI GenAI.

Why this answer

Option A is correct because Llama 3 is one of the open-source large language models (LLMs) available through the OCI Generative AI service, alongside Cohere models. OCI Generative AI provides managed access to Llama 3 for text generation tasks, allowing users to deploy and fine-tune it within Oracle Cloud Infrastructure.

Exam trap

Oracle often tests the distinction between models available natively in OCI Generative AI versus those accessible only through external integrations, leading candidates to mistakenly include popular models like GPT-4 that are not part of the managed service.

35
MCQmedium

A company has fine-tuned a large language model using OCI Generative AI service. When attempting to deploy the model to a dedicated endpoint, the deployment fails with an error indicating insufficient capacity. Which action should be taken to resolve this issue?

A.Delete existing endpoints to free capacity
B.Deploy the model to a different OCI region
C.Use a pre-built model instead of the fine-tuned model
D.Request a service limit increase for dedicated endpoints
AnswerD

OCI allows customers to request higher limits for resources like dedicated endpoints.

Why this answer

Option C is correct because capacity quotas can be increased by requesting a service limit increase. Option A is wrong because deploying to a different region may not address capacity issues in the current region. Option B is wrong because using a pre-built model would not leverage the fine-tuned model.

Option D is wrong because deleting other endpoints may not be necessary if quota increase is possible.

36
MCQeasy

A company uses a RAG pipeline with OCI Data Science and Cohere embeddings. They notice that retrieval recall is low for domain-specific acronyms. What is the best practice to improve this?

A.Reduce the cosine similarity threshold in the vector search.
B.Expand acronyms to their full forms during document preprocessing and indexing.
C.Fine-tune the embedding model with domain-specific acronyms.
D.Increase the chunk size to include more context around acronyms.
AnswerB

Full forms improve semantic matching.

Why this answer

Expanding acronyms to their full forms during document preprocessing and indexing ensures that the embedding model can map the acronym to its semantic meaning, improving retrieval recall for domain-specific terms. Cohere embeddings are trained on general text, so without expansion, acronyms like 'NLP' may not match queries for 'Natural Language Processing' in vector space. This preprocessing step directly addresses the root cause of low recall for acronyms.

Exam trap

Oracle often tests the misconception that fine-tuning the embedding model is the default fix for retrieval issues, when in practice simpler preprocessing techniques like acronym expansion are more efficient and recommended for domain-specific vocabulary gaps.

How to eliminate wrong answers

Option A is wrong because reducing the cosine similarity threshold would increase the number of retrieved chunks but also introduce more irrelevant results, degrading precision without fixing the underlying embedding mismatch for acronyms. Option C is wrong because fine-tuning the embedding model is resource-intensive and typically unnecessary for this issue; preprocessing acronyms is a simpler, more effective solution that avoids retraining. Option D is wrong because increasing chunk size may add more context but does not resolve the core problem that the acronym itself is not semantically represented in the embedding space, so the retrieval still fails to match the intended concept.

37
MCQmedium

A development team notices that their RAG application returns responses slowly when processing large PDF documents (100+ pages). They need to improve response time without significantly reducing retrieval quality. Which action is most effective?

A.Add a reranking step after initial retrieval
B.Use a smaller chunk size during document ingestion
C.Increase the topK parameter to retrieve more context
D.Switch to a larger embedding model for better accuracy
AnswerB

Smaller chunks mean faster embedding and retrieval.

Why this answer

Using smaller chunk sizes reduces the amount of text per embedding, speeding up retrieval and subsequent processing. Increasing topK would retrieve more contexts and slow down response. Switching to a more expensive model or adding a reranker would increase latency.

38
Multi-Selecthard

Which THREE components are required to deploy a custom generative AI model on OCI Data Science model deployment?

Select 3 answers
A.A load balancer to distribute traffic
B.An inference script (e.g., score.py) to handle prediction requests
C.A model artifact containing the model files
D.An API signing key for authentication
E.A deployment configuration specifying resources and environment
AnswersB, C, E

Required to define how the model is called.

Why this answer

Option B is correct because OCI Data Science model deployment requires an inference script (typically score.py) to define how the model processes incoming prediction requests. This script is the entry point that loads the model artifact and executes inference logic, making it an essential component for serving predictions.

Exam trap

The trap here is that candidates confuse optional infrastructure components like load balancers or API keys with the mandatory deployment components, leading them to select A or D instead of recognizing that the inference script, model artifact, and deployment configuration are the three required elements.

39
Multi-Selecteasy

Which TWO methods can be used to invoke a generative AI model deployed on OCI?

Select 2 answers
A.Using OCI Notifications service
B.Using the OCI Console web interface
C.Sending HTTP requests to the model endpoint URL
D.Using OCI Events service
E.Using the OCI SDK (e.g., Python, Java)
AnswersC, E

Direct REST calls are standard.

Why this answer

Option C is correct because generative AI models deployed on OCI expose a RESTful endpoint that accepts HTTP requests (typically POST with JSON payloads) for inference. This is the standard method for programmatic access to the model, allowing integration with any HTTP client.

Exam trap

Oracle often tests the distinction between management-plane actions (like using the Console or Events to trigger deployments) and data-plane actions (like invoking the model via SDK or HTTP), leading candidates to confuse OCI services that manage resources with those that perform inference.

40
MCQhard

An enterprise RAG application experiences high latency during peak hours. The architecture uses OCI OpenSearch with a single node cluster storing 5 million vectors (768 dimensions). The search uses exact k-NN (EF_SEARCH=500). The average query takes 1.5 seconds, but the SLA requires <500ms. The team considers several options: A) Switch to ANN with lower recall (HNSW with ef_search=50), B) Scale OpenSearch cluster to 3 nodes, C) Reduce embedding dimension to 256 using PCA, D) Increase the number of shards from 1 to 10. Which option provides the best balance of latency reduction and minimal impact on retrieval quality? (Assume all options are feasible)

A.Scale OpenSearch cluster to 3 nodes
B.Increase the number of shards from 1 to 10
C.Switch to ANN (HNSW with ef_search=50)
D.Reduce embedding dimension to 256 using PCA
AnswerB

More shards divide the vector set, allowing parallel exact searches on smaller partitions, reducing latency without quality loss.

Why this answer

Increasing shards on the same node partitions the index, so each shard contains fewer vectors, making exact search faster. This reduces latency without sacrificing accuracy. ANN reduces recall, scaling adds cost and complexity, and dimension reduction can degrade embedding quality.

41
MCQhard

A team has set up a RAG pipeline using OCI Data Science with OCI OpenSearch as the vector store. The embedding model is from the OCI Generative AI service. Users note that the vector search returns irrelevant documents for many queries. Which of the following is the most likely cause?

A.The chunk size is too large, causing overlapping context
B.The OpenSearch cluster is too small to handle the load
C.The query is not being converted to an embedding before search
D.The embedding model dimension does not match OpenSearch index dimension
AnswerC

Without embedding the query, the vector store cannot perform semantic similarity search.

Why this answer

If the query is not converted to an embedding before search, the vector store may interpret the raw text as an invalid query or fall back to lexical search, yielding irrelevant results.

42
MCQeasy

When invoking the OCI Generative AI service from a RAG application, the developer receives a 401 Unauthorized error. The application uses resource principal authentication from an OCI Data Science notebook session. What is the most likely fix?

A.Add the Generative AI service to the subnet's security list
B.Use an API key instead of resource principal
C.Ensure the dynamic group includes the data science notebook session and has the correct policy
D.Restart the notebook session
AnswerC

The dynamic group must match the session, and the policy must grant access to the Generative AI service.

Why this answer

Resource principal requires that the dynamic group containing the session has a policy granting the 'use generative-ai-embeddings' permission. If the session is not in the dynamic group, the policy does not apply.

43
MCQmedium

A healthcare company must use OCI Generative AI for medical report generation. They need to ensure PHI is not sent to third-party models. Which approach best ensures data stays within OCI?

A.Use OCI Gen AI service with fine-tuned model on a dedicated AI cluster
B.Use OCI Gen AI service with base model in a multi-tenant environment
C.Use OCI Gen AI service with base model in a dedicated AI cluster
D.Use a third-party LLM via API Gateway
AnswerA

Best: fine-tuned model on dedicated cluster ensures data stays in OCI and improves accuracy.

Why this answer

Option B is correct because fine-tuning on a dedicated AI cluster keeps data within OCI and provides accuracy for medical domain. Option A is also within OCI but less specialized; C is multi-tenant with potential isolation concerns; D uses third-party.

44
Multi-Selecteasy

Which TWO statements about tokens in large language models are correct?

Select 2 answers
A.Common tokenization methods include word-based and subword-based.
B.All tokens have the same embedding size.
C.Tokens are only used during training.
D.Tokens are always whole words.
E.The maximum number of tokens a model can process is called the context window.
AnswersA, E

Word-based and subword-based are standard tokenization approaches.

Why this answer

Option A is correct because common tokenization methods in large language models include word-based tokenization (splitting text into whole words) and subword-based tokenization (like Byte-Pair Encoding or WordPiece), which handle out-of-vocabulary words and morphological variations more effectively. Subword tokenization is widely used in models like GPT and BERT to balance vocabulary size and coverage.

Exam trap

Oracle often tests the distinction between tokenization methods and the fixed embedding dimension, leading candidates to incorrectly assume that tokens vary in embedding size or that tokens must be whole words.

45
MCQhard

An enterprise is using OCI Generative AI with a RAG architecture. They observe that the LLM sometimes produces hallucinated answers that are not supported by the retrieved documents. Which strategy is most effective in reducing these hallucinations?

A.Increase the temperature parameter to make outputs more focused.
B.Provide clear instructions in the system prompt to answer only based on the provided context.
C.Use a smaller LLM to reduce model capacity.
D.Retrieve more chunks (increase top-k) to provide more context.
AnswerB

Explicit grounding instructions guide the model to stick to retrieved documents, reducing unsupported claims.

Why this answer

Option D is correct because instructing the LLM to only answer based on context reduces hallucinations. Option A is wrong because increasing temperature increases randomness, worsening hallucinations. Option B is wrong because adding more retrieved chunks may introduce conflicting information.

Option C is wrong because using a smaller model may increase hallucination.

46
MCQhard

A company wants to deploy a custom generative AI model for generating synthetic data for training other models. The model requires approximately 20GB of memory and must be accessible via a REST API with authentication. Additionally, the team needs to monitor for data drift over time. Which combination of OCI services best meets these requirements with minimal operational overhead?

A.OCI Compute with custom Docker container and Prometheus monitoring
B.OCI Data Science Model Deployment with OCI Monitoring and OCI Logging
C.OCI Functions with API Gateway for authentication
D.OCI Data Flow with OCI Data Catalog for model registry
AnswerB

Model Deployment supports large models, authentication, and integrates with Monitoring and Logging for drift detection.

Why this answer

Option B is correct because OCI Data Science Model Deployment provides a managed environment for hosting custom generative AI models with REST API endpoints and built-in authentication via OCI IAM. It integrates natively with OCI Monitoring and OCI Logging to track data drift and operational metrics without requiring additional infrastructure setup, minimizing operational overhead.

Exam trap

The trap here is that candidates may confuse OCI Functions (serverless) as suitable for long-running model inference, but its memory and timeout limits make it impractical for a 20GB model, while OCI Data Science Model Deployment is purpose-built for this scenario.

How to eliminate wrong answers

Option A is wrong because OCI Compute with a custom Docker container requires manual management of the host, scaling, and authentication, and Prometheus monitoring adds operational overhead for setup and maintenance, which contradicts the 'minimal operational overhead' requirement. Option C is wrong because OCI Functions is a serverless compute service designed for short-lived, stateless functions (max 5-minute execution and limited memory, typically up to 10GB), not for hosting a persistent 20GB generative AI model with a REST API. Option D is wrong because OCI Data Flow is a managed Apache Spark service for batch data processing, not for hosting real-time model inference endpoints, and OCI Data Catalog is for metadata management, not model registry or monitoring data drift.

47
Multi-Selectmedium

Which TWO are benefits of using dedicated AI clusters for OCI Generative AI?

Select 2 answers
A.Automatic model updates
B.Guaranteed throughput
C.Lower cost than on-demand for all workloads
D.No need to manage scaling
E.Predictable inference latency
AnswersB, E

Throughput is reserved and not affected by other tenants.

Why this answer

Dedicated AI clusters provide predictable latency and guaranteed throughput because they are single-tenant.

48
MCQhard

An application using OCI Generative AI returns a 403 Forbidden error when attempting to invoke a model. The user's API key is valid and the endpoint is correct. What is the most likely cause?

A.The model is out of capacity.
B.The region is not supported for the selected model.
C.The API request body is malformed.
D.The IAM policy does not grant the necessary permission to use the model.
AnswerD

403 errors indicate permission denial; the policy likely lacks an 'allow' statement.

Why this answer

OCI requires specific IAM policies to allow users to use Generative AI services. A missing or incorrect policy is the typical cause of 403 errors.

49
MCQhard

A research team is experimenting with few-shot prompting to improve a model's performance on a complex reasoning task. They find that the model's performance degrades when the few-shot examples are too similar to each other. What is the likely cause and best remedy?

A.The model has not seen enough examples. Increase the number of few-shot examples.
B.The examples are presented in a confusing order. Reorder them by difficulty.
C.The examples lack diversity, causing the model to overfit to a narrow pattern. Use more diverse examples.
D.The temperature is too low, making the model too deterministic. Increase temperature slightly.
AnswerC

Diverse examples reduce bias and improve generalization.

Why this answer

When few-shot examples are too similar, the model overfits to a narrow pattern, reducing its ability to generalize to the diverse reasoning paths required by the task. This is a known limitation of in-context learning: the model treats the examples as a template rather than as diverse demonstrations. Using more diverse examples exposes the model to a wider range of reasoning patterns, improving robustness.

Exam trap

Oracle often tests the misconception that more examples always improve performance, when in fact diversity is critical to prevent overfitting in few-shot prompting.

How to eliminate wrong answers

Option A is wrong because the issue is not the quantity of examples but their lack of diversity; adding more similar examples would worsen overfitting. Option B is wrong because the order of examples (by difficulty) does not address the core problem of pattern overfitting; confusing order may affect performance but is not the likely cause here. Option D is wrong because temperature controls randomness in token sampling, not the model's sensitivity to example diversity; a low temperature would make outputs more deterministic but does not cause or fix overfitting to narrow patterns.

50
MCQhard

A DBA has created the above vector index. After running queries, they observe that recall is lower than expected for approximate searches. Which change would most likely improve recall while maintaining query performance?

A.Change the index type from IVF to HNSW.
B.Increase the TARGET ACCURACY value to 99.
C.Increase the number of neighbor partitions (NEIGHBOR PARTITIONS) to 8.
D.Reduce the number of neighbor partitions to 2.
AnswerB

A higher TARGET ACCURACY forces the approximate search to consider more vectors, increasing recall at the cost of some latency.

Why this answer

Option D is correct because increasing the TARGET ACCURACY parameter forces the index to consider more candidates, improving recall. Option A is wrong because increasing neighbor partitions may improve performance but not necessarily recall. Option B is wrong because changing to HNSW would alter the index type but may require more rebuild.

Option C is wrong because reducing neighbor partitions reduces recall.

51
MCQhard

A company is using OCI Generative AI to generate code snippets and notices that the model sometimes produces code with security vulnerabilities. They have a small dataset of secure code examples. Which approach would be most effective to reduce vulnerabilities?

A.Use a different base model.
B.Fine-tune the model on the small secure code dataset.
C.Use prompt engineering with security constraints in the instruction.
D.Deploy a custom model hosted elsewhere.
AnswerC

Prompt engineering can enforce security rules without needing large datasets.

Why this answer

Option C is correct because prompt engineering allows the company to inject security constraints directly into the instruction without requiring additional training data or infrastructure. By crafting a prompt that explicitly requests secure code (e.g., 'Generate code that follows OWASP Top 10 best practices and avoids SQL injection, XSS, and buffer overflows'), the model can leverage its existing knowledge to produce safer outputs. This approach is immediate, cost-effective, and does not depend on the size or quality of the small secure code dataset.

Exam trap

The trap here is that candidates often assume fine-tuning (Option B) is always the best solution for domain-specific improvements, but they overlook the practical limitations of small datasets and the immediate effectiveness of prompt engineering for security constraints.

How to eliminate wrong answers

Option A is wrong because switching to a different base model does not guarantee reduced vulnerabilities; all general-purpose models can produce insecure code without explicit guidance, and the issue lies in the lack of security-focused constraints, not the model architecture. Option B is wrong because fine-tuning on a small dataset of secure code examples is unlikely to generalize well; the model may overfit to the limited examples and fail to address the wide variety of vulnerabilities that can appear in different contexts, and fine-tuning requires significant computational resources and expertise. Option D is wrong because deploying a custom model hosted elsewhere introduces additional complexity, cost, and latency without addressing the root cause; the problem is not about hosting location but about how the model is instructed to prioritize security.

52
MCQmedium

An application using OCI GenAI experiences high response times. Which change will most directly reduce latency?

A.Reduce the number of output tokens requested.
B.Increase the max tokens parameter.
C.Switch to a fine-tuned version of the same model.
D.Enable batched inference.
AnswerA

Correct: Fewer tokens means faster generation.

Why this answer

Option A is correct because reducing the number of output tokens directly speeds up generation. Option B increases latency, Option C may not affect latency, and Option D helps throughput but not individual request latency.

53
MCQmedium

A company's RAG application ingests news articles that are updated frequently. The vector store in OCI OpenSearch contains embeddings of the articles. The team notices that outdated information is still retrieved even after updating the source documents. What is the most effective way to ensure the vector store reflects the latest content?

A.Increase the TTL for vector indices
B.Rely on the LLM to ignore outdated information
C.Re-index the entire vector store daily
D.Use the OCI OpenSearch document update API to replace embeddings for changed documents
AnswerD

Targeted updates minimize cost and ensure real-time accuracy.

Why this answer

Using the OCI OpenSearch document update API to replace embeddings for changed documents is efficient and targeted, ensuring immediate consistency.

54
Multi-Selectmedium

Which TWO factors most directly impact the consistency of text generated by an LLM when the same prompt is used multiple times?

Select 2 answers
A.Top_p
B.Batch size
C.Max_tokens
D.Temperature
E.Seed
AnswersA, D

Top_p controls the nucleus of tokens considered; lower values make output more focused.

Why this answer

Top_p (nucleus sampling) directly impacts consistency by controlling the cumulative probability threshold for token selection. A lower Top_p (e.g., 0.1) restricts the model to only the most probable tokens, reducing randomness and making outputs more deterministic across repeated prompts. This parameter, along with Temperature, is a primary lever for managing output variability in LLMs.

Exam trap

Oracle often tests the distinction between inference-time parameters (Temperature, Top_p) and training/hardware parameters (Batch size), or between parameters that control randomness (Temperature, Top_p) versus those that control output length (Max_tokens), leading candidates to mistakenly select Seed as a primary consistency factor.

55
MCQmedium

A user runs the CLI command shown but receives only one model in the list, even though they know there are more models available in the compartment. What is the most likely reason?

A.The compartment does not contain the other models.
B.The other models are in a different region.
C.The CLI command requires the --all flag to list all models.
D.The user lacks permissions for other compartments.
AnswerC

Without --all, the CLI returns only the first page of results.

Why this answer

Option C is correct because the OCI CLI paginates results by default; using the --all flag retrieves all models. Options A and D are possible but the most common cause is pagination; B is less likely since the user is querying a specific compartment.

56
MCQeasy

A company wants to use OCI Generative AI to summarize customer feedback. They need low latency and high throughput. Which configuration should they choose?

A.Serverless endpoint with fine-tuned model
B.Dedicated AI cluster with base model
C.Dedicated AI cluster with fine-tuned model
D.Serverless endpoint with base model
AnswerB

Correct: Dedicated resources ensure low latency and high throughput.

Why this answer

Dedicated AI clusters provide guaranteed compute resources (GPUs) with no multi-tenant contention, ensuring low latency and high throughput for inference workloads. Using a base model avoids the additional overhead of fine-tuning inference, which can introduce latency due to custom weight loading and optimization steps. This combination is optimal for real-time summarization of customer feedback where response time and volume are critical.

Exam trap

Oracle often tests the misconception that fine-tuned models always outperform base models for latency, when in fact fine-tuning adds inference overhead that can degrade performance for high-throughput, low-latency use cases.

How to eliminate wrong answers

Option A is wrong because serverless endpoints share resources across tenants, leading to variable latency and potential throttling under high throughput demands, which contradicts the low-latency requirement. Option C is wrong because a fine-tuned model on a dedicated cluster adds inference overhead from custom weights and may require additional pre/post-processing, increasing latency compared to a base model. Option D is wrong because serverless endpoints with a base model still suffer from multi-tenant resource contention, making them unsuitable for guaranteed low latency and high throughput.

57
MCQeasy

An organization wants to use OCI Generative AI to summarize long legal documents. They need to ensure the summary is concise and retains key information. Which model parameter should they set to control the length of the summary?

A.frequency_penalty
B.max_tokens
C.top_p
D.temperature
AnswerB

Max_tokens sets the maximum number of tokens in the output.

Why this answer

The max_tokens parameter limits the number of tokens in the generated output, directly controlling summary length.

58
MCQeasy

A startup needs to deploy a large language model for a customer support chatbot that requires low latency and cost efficiency. They are evaluating OCI Generative AI models. Which model type is most appropriate?

A.Embedding model (e.g., cohere.embed)
B.Instruct model (e.g., cohere.command)
C.Image generation model
D.Base model (e.g., cohere.base)
AnswerB

Instruct models are fine-tuned to follow instructions, making them ideal for chatbots.

Why this answer

The startup requires low latency and cost efficiency for a customer support chatbot. Instruct models like cohere.command are specifically fine-tuned to follow conversational instructions and generate concise, task-oriented responses, making them ideal for interactive chatbot applications. They balance performance and cost better than base models, which lack instruction-following capability, and embedding models, which are designed for semantic search rather than text generation.

Exam trap

Oracle often tests the distinction between base models and instruct models, trapping candidates who assume a base model can be used directly for task-specific applications without fine-tuning or instruction alignment.

How to eliminate wrong answers

Option A is wrong because embedding models (e.g., cohere.embed) are designed to convert text into vector representations for tasks like semantic search or clustering, not for generating conversational responses. Option C is wrong because image generation models are used for creating or editing images, not for text-based customer support interactions. Option D is wrong because base models (e.g., cohere.base) are general-purpose language models that have not been fine-tuned for instruction following, leading to less relevant and less controllable outputs for a chatbot use case.

59
MCQeasy

A team wants to deploy an LLM for real-time inference with low latency. Which OCI deployment option is best?

A.OCI Data Science Model Deployment with GPU shapes
B.OCI Functions with CPU
C.OCI Events
D.OCI Streaming
AnswerA

GPU shapes provide the compute power needed for low-latency LLM inference.

Why this answer

OCI Data Science Model Deployment with GPU shapes is the best option because it provides managed, scalable, low-latency inference endpoints for LLMs. GPU shapes (e.g., VM.GPU.A10) are essential for the parallel matrix computations required by transformer-based models, and the deployment service supports auto-scaling and load balancing to maintain real-time response times.

Exam trap

The trap here is that candidates may confuse OCI Functions (a serverless compute service) with a viable inference platform, overlooking the GPU requirement for LLM workloads, or mistakenly think OCI Streaming can process inference requests because of its 'real-time' label.

How to eliminate wrong answers

Option B (OCI Functions with CPU) is wrong because OCI Functions is a serverless compute service designed for short-lived, stateless functions, and CPU-only execution cannot meet the low-latency requirements of LLM inference due to the lack of GPU acceleration for large matrix operations. Option C (OCI Events) is wrong because OCI Events is a notification and orchestration service for reacting to infrastructure changes, not a compute platform for running inference workloads. Option D (OCI Streaming) is wrong because OCI Streaming is a real-time data ingestion and messaging service (based on Apache Kafka) for handling event streams, not for executing LLM inference.

60
MCQhard

A company wants to build a multi-modal RAG system that can retrieve both text and images based on a user query. Which approach is most aligned with OCI GenAI capabilities?

A.Use OCI Document Understanding to convert images to text, then index text
B.Use separate vector stores for text and image embeddings
C.Use image captioning to generate text descriptions and index those
D.Utilize a multi-modal embedding model from OCI GenAI to embed both text and images into a common vector space
AnswerD

Multi-modal models enable direct retrieval of both types.

Why this answer

OCI GenAI supports multi-modal models like Cohere's multimodal embedding model, which can embed text and images into a shared vector space, enabling retrieval across modalities. Separate text and image models would not align the vectors. OCR-based text-only approach loses image semantics.

Using multiple vector stores complicates retrieval.

61
MCQmedium

A developer is building a code generation assistant. The model occasionally produces syntactically correct but semantically wrong code. Which technique directly addresses semantic correctness?

A.Expand the token vocabulary
B.Lower the temperature to 0
C.Apply RLHF using human-validated code examples
D.Increase beam search width
AnswerC

RLHF directly optimizes for desired outcomes like semantic correctness.

Why this answer

Reinforcement Learning from Human Feedback (RLHF) directly addresses semantic correctness by fine-tuning the model using human-validated code examples. This process teaches the model to prefer outputs that are not only syntactically valid but also logically correct and aligned with developer intent, reducing semantically wrong code generation.

Exam trap

Oracle often tests the misconception that adjusting decoding parameters (temperature, beam search) or tokenization can fix semantic errors, when in fact only training techniques like RLHF that incorporate human feedback can directly improve semantic correctness.

How to eliminate wrong answers

Option A is wrong because expanding the token vocabulary increases the range of tokens the model can generate but does not improve the model's ability to reason about code semantics or correct logical errors. Option B is wrong because lowering the temperature to 0 makes the model deterministic, reducing randomness but not fixing underlying semantic misunderstandings; it may still produce the same incorrect logic repeatedly. Option D is wrong because increasing beam search width explores more candidate sequences during decoding, which can improve syntactic fluency but does not directly address semantic correctness or logical accuracy.

62
MCQmedium

A company is using OCI Generative AI service to generate product descriptions. They notice that the model sometimes generates biased content. Which approach should they take to mitigate bias while maintaining performance?

A.Fine-tune the model with a balanced, curated dataset that reduces bias
B.Use a larger model without fine-tuning
C.Post-process outputs to remove biased phrases
D.Switch to a different pre-built model from OCI
AnswerA

Fine-tuning allows adjusting model behavior.

Why this answer

Fine-tuning with a balanced, curated dataset directly addresses the root cause of bias by adjusting the model's internal weights to reduce reliance on biased patterns in the original training data. This approach preserves the model's generative performance for product descriptions because it retrains only on domain-specific, unbiased examples, unlike post-processing which merely filters outputs without correcting the underlying model behavior.

Exam trap

Oracle often tests the misconception that post-processing or model swapping is a sufficient fix for bias, when in fact only fine-tuning or retraining can address the root cause without sacrificing performance.

How to eliminate wrong answers

Option B is wrong because using a larger model without fine-tuning does not inherently reduce bias; larger models can amplify existing biases from their training data and may even introduce new biases due to increased complexity. Option C is wrong because post-processing outputs to remove biased phrases is a superficial fix that can degrade performance by altering the natural language flow and may miss subtle or context-dependent biases, while also adding latency. Option D is wrong because switching to a different pre-built model from OCI merely changes the source of bias without guaranteeing a reduction; all pre-built models inherit biases from their training corpora, and the new model may perform worse on the specific product description task.

63
MCQhard

An organization is deploying a generative AI model that requires GPU acceleration for inference. They are using OCI Data Science Model Deployment. The model is expected to handle variable traffic, with occasional spikes. Which scaling option should they configure to ensure cost-efficiency and responsiveness?

A.Use OCI Generative AI on-demand API with a serverless endpoint
B.Use CPU-only instances and rely on batching
C.Configure autoscaling with a minimum of 1 and maximum of 10 GPU instances
D.Deploy with a fixed number of 1 GPU instance
AnswerC

Autoscaling matches capacity to load.

Why this answer

Option C is correct because autoscaling with a minimum of 1 and maximum of 10 GPU instances allows the deployment to dynamically adjust capacity in response to variable traffic and spikes, ensuring cost-efficiency by scaling down during low demand and responsiveness by scaling up during peaks. OCI Data Science Model Deployment supports autoscaling policies that can be configured with GPU shapes, making it the optimal choice for a generative AI model requiring GPU acceleration.

Exam trap

Oracle often tests the misconception that serverless endpoints (Option A) are always the best for variable traffic, but candidates must recognize that OCI Generative AI on-demand API is a pre-built model service, not a custom model deployment, and thus does not support custom GPU scaling policies.

How to eliminate wrong answers

Option A is wrong because OCI Generative AI on-demand API with a serverless endpoint is a managed service for accessing pre-built models, not a custom model deployment, and it does not provide the granular control over GPU instances needed for the organization's own model. Option B is wrong because CPU-only instances lack the GPU acceleration required for inference of generative AI models, leading to unacceptable latency and throughput, even with batching. Option D is wrong because a fixed number of 1 GPU instance cannot handle variable traffic with occasional spikes, resulting in either over-provisioning during low traffic (wasting cost) or under-provisioning during spikes (causing performance degradation or failures).

64
MCQmedium

A developer receives the above error when querying a RAG application. What is the most likely cause and recommended action?

A.The API rate limit has been exceeded; wait for the retry period and implement exponential backoff.
B.The model is deprecated; update to the latest model.
C.The endpoint URL is incorrect; verify the OCI region endpoint.
D.The request payload is malformed; check the input format.
AnswerA

429 means rate limit.

Why this answer

The 429 (Too Many Requests) error indicates the API rate limit has been exceeded. In OCI Generative AI services, rate limits are enforced per tenancy and per region; the recommended action is to wait for the retry-after period and implement exponential backoff to avoid overwhelming the service.

Exam trap

Oracle often tests the distinction between HTTP status codes (429 vs 400 vs 404 vs 410) to see if candidates can map the exact error to the correct cause rather than guessing based on general troubleshooting.

How to eliminate wrong answers

Option B is wrong because a model deprecation would return a 404 or 410 error, not a 429. Option C is wrong because an incorrect endpoint URL would result in a 404 or connection timeout, not a rate-limit error. Option D is wrong because a malformed payload would produce a 400 Bad Request error, not a 429.

65
Multi-Selecthard

Which THREE steps are required to deploy a custom generative AI model using OCI Data Science Model Deployment?

Select 3 answers
A.Fine-tune the model using OCI Generative AI service
B.Create a model artifact (e.g., pickle, ONNX) with inference code
C.Register the model in OCI Generative AI service
D.Upload the model artifact to an OCI Object Storage bucket
E.Create a model deployment using the OCI Data Science Model Deployment service
AnswersB, D, E

Model must be packaged with dependencies for serving.

Why this answer

Option B is correct because deploying a custom generative AI model via OCI Data Science Model Deployment requires packaging the model and its inference code into a standardized artifact format (e.g., pickle, ONNX). This artifact is the core input that the deployment runtime loads to serve predictions, making it an essential step in the workflow.

Exam trap

The trap here is confusing the OCI Generative AI service's managed model lifecycle (fine-tuning and registration) with the custom model deployment workflow in OCI Data Science, leading candidates to incorrectly select steps that belong to the managed service rather than the custom deployment pipeline.

66
Multi-Selecteasy

Which TWO of the following are sources of training data for fine-tuning a model in OCI Generative AI?

Select 2 answers
A.OCI Object Storage bucket
B.OCI File Storage
C.OCI Database
D.Local file uploaded through the OCI Console
E.OCI Streaming
AnswersA, D

Object Storage is a common source for large datasets.

Why this answer

Option A is correct because OCI Object Storage is a supported source for training data when fine-tuning models in OCI Generative AI. The service can directly access data stored in Object Storage buckets via service-level integrations, allowing you to reference large datasets without local uploads.

Exam trap

Oracle often tests the distinction between storage services that are directly integrated with AI fine-tuning (Object Storage) versus general-purpose storage or data services (File Storage, Database, Streaming) that require additional middleware or are not supported at all.

67
MCQmedium

A company is building a RAG application using OCI Generative AI and OCI Search with OpenSearch. Users report that the responses from the LLM are not relevant to the queries, even though the document chunks seem appropriate. What is the most likely cause?

A.The embedding model is not suited for the domain.
B.Reranking is not enabled in the OpenSearch query.
C.The top K value is set too high.
D.The chunk size is too small, causing loss of context.
AnswerB

Reranking reorders search results for better relevance, significantly impacting quality.

Why this answer

Enabling reranking improves the relevance of retrieved documents by reordering them based on semantic match with the query. Without reranking, the initial vector search results may not be optimally ordered.

68
MCQhard

A company is deploying OCI Generative AI for a chatbot that must answer customer queries within 500ms. They choose a dedicated AI cluster but observe 2-second latency. What is the most likely cause?

A.The endpoint is not cached
B.The cluster is configured for batch inference
C.The request includes too many tokens
D.The model is too large for the cluster
AnswerB

Batch inference mode processes requests in batches, increasing latency significantly.

Why this answer

A dedicated AI cluster in OCI Generative AI is designed for real-time inference with low latency. When the cluster is configured for batch inference, it processes requests in batches rather than individually, which introduces queuing and processing delays that can easily exceed the 500ms target. This explains the observed 2-second latency, as batch mode prioritizes throughput over per-request response time.

Exam trap

The trap here is that candidates may assume any latency issue is due to model size or token limits, but Oracle often tests the distinction between real-time and batch inference configurations in dedicated clusters.

How to eliminate wrong answers

Option A is wrong because caching is not a feature of OCI Generative AI endpoints; the latency issue stems from inference processing, not from cache misses. Option C is wrong because while excessive tokens can increase latency, the 2-second delay is more consistent with batch processing overhead than with token count alone, and the cluster should handle typical token limits within the 500ms target. Option D is wrong because the model size is fixed when the dedicated cluster is provisioned; if the model were too large, the cluster would fail to deploy or would show errors, not simply exhibit high latency.

69
Multi-Selecteasy

Which TWO are best practices for building a RAG application on OCI? (Choose two.)

Select 2 answers
A.Use a vector database such as OCI OpenSearch with ANN indexes for storing embeddings.
B.Generate embeddings for documents at query time to ensure freshness.
C.Pre-index the documents and update the index periodically to reflect new content.
D.Store the source documents only in OCI Object Storage and retrieve them at query time using full-text search.
E.Use a different embedding model for documents and queries to capture distinct semantics.
AnswersA, C

ANN indexes enable fast similarity search.

Why this answer

Option A is correct because OCI OpenSearch with Approximate Nearest Neighbor (ANN) indexes is a best practice for vector storage and retrieval in RAG applications. ANN indexes enable efficient similarity search over high-dimensional embeddings, which is essential for retrieving relevant context from large document collections at low latency.

Exam trap

Oracle often tests the misconception that real-time embedding generation or full-text search can substitute for precomputed vector indexes in RAG, when in practice latency and semantic alignment requirements make pre-indexing and ANN search mandatory.

70
Multi-Selectmedium

Which TWO of the following are valid ways to consume OCI Generative AI models?

Select 2 answers
A.Using the OCI Console chat interface
B.Using the OCI CLI
C.Using the OCI Generative AI REST API
D.Using OCI SDK for Python
E.Using OCI Data Flow
AnswersA, C

The console provides a chat UI to interact with models.

Why this answer

A is correct because the OCI Console provides a built-in chat interface that allows users to interact directly with Generative AI models without writing any code. This interface is part of the OCI Generative AI service's web-based console, enabling prompt testing and model evaluation through a graphical user interface.

Exam trap

Oracle often tests the distinction between direct consumption methods (like the console and REST API) versus indirect tools (like SDKs and CLI) that require additional layers of abstraction or are not designed for model inference.

71
MCQeasy

Which technique allows an LLM to be adapted to a new task with only a few examples?

A.Few-shot learning
B.Fine-tuning
C.Pre-training
D.Prompt engineering
AnswerA

Few-shot learning provides examples in the prompt to adapt the model to a new task.

Why this answer

Few-shot learning (option A) uses a handful of examples in the prompt to guide the model's behavior without fine-tuning. This is a form of prompt engineering specifically designed for few-shot scenarios.

72
Multi-Selecthard

A team is fine-tuning a generative AI model on OCI using a custom dataset. The training job fails with an out-of-memory error. Which THREE actions should they take to resolve this issue?

Select 3 answers
A.Use gradient accumulation to simulate larger batch sizes.
B.Increase the learning rate to speed up training.
C.Use a larger GPU shape with more memory.
D.Reduce the batch size.
E.Increase the number of training epochs.
AnswersA, C, D

Gradient accumulation allows effective large batches with less memory.

Why this answer

Option A is correct because gradient accumulation allows the model to simulate the effect of a larger batch size without increasing memory usage. Instead of computing gradients over a single large batch, the optimizer accumulates gradients over several smaller batches before performing a weight update. This technique effectively decouples the batch size from memory consumption, enabling training on large models or high-resolution inputs that would otherwise cause an out-of-memory error.

Exam trap

Oracle often tests the misconception that increasing the learning rate or epochs can resolve memory errors, when in fact only actions that directly reduce per-step memory footprint (like reducing batch size, using gradient accumulation, or upgrading to a larger GPU) are effective.

73
MCQeasy

A developer wants to integrate OCI Generative AI into a web application. Which API authentication method is recommended for programmatic access?

A.Pre-authenticated request
B.API key-based signing
C.OAuth 2.0 client credentials
D.Username and password in the header
AnswerB

OCI uses API signing (based on RSA keys) for all REST API calls.

Why this answer

Option B is correct because OCI APIs require request signing using an API signing key (an RSA key pair) for programmatic access. The developer must generate a key pair, upload the public key to the OCI console, and then use the private key to sign each HTTP request using the OCI Signature Version 1 algorithm (based on RFC 2104 HMAC-SHA256). This ensures authentication without transmitting secrets over the wire.

Exam trap

The trap here is that candidates may confuse OAuth 2.0 (common in other cloud providers like AWS or Azure) with OCI's requirement for API key-based signing, leading them to select OAuth 2.0 client credentials.

How to eliminate wrong answers

Option A is wrong because pre-authenticated requests (PARs) are used for temporary access to specific OCI Object Storage buckets or objects, not for authenticating API calls to OCI Generative AI services. Option C is wrong because OCI does not support OAuth 2.0 client credentials for direct API authentication; OCI uses IAM-based API signing keys or instance principals for programmatic access. Option D is wrong because sending username and password in the header is a basic authentication scheme that is insecure and not supported by OCI APIs; OCI requires cryptographic request signing.

74
MCQmedium

A developer is using the OCI Generative AI Python SDK. They receive a 400 error 'InvalidParameter'. What is the most likely reason?

A.Exceeded token limit
B.Network timeout
C.Invalid model name
D.Missing API key
AnswerC

An invalid model name is a parameter error, resulting in 400 InvalidParameter.

Why this answer

Option C is correct because an invalid model name is a common cause of InvalidParameter errors. Option A (missing API key) would cause authentication errors (401). Option B (exceeded token limit) may cause a different error.

Option D (network timeout) would cause a timeout error.

75
Multi-Selecteasy

A developer is troubleshooting an OCI Generative AI inference request that returns a 400 Bad Request error. Which three common causes could result in this error? (Choose three.)

Select 3 answers
A.Incorrect endpoint URL
B.Invalid API key in the request header
C.Missing required parameters in the request body
D.Exceeding the model's maximum token limit
E.Network connectivity issues
AnswersA, C, D

A wrong URL may cause the request to be malformed or routed incorrectly, resulting in a 400.

Why this answer

A 400 Bad Request error indicates the server cannot process the request due to client-side issues. An incorrect endpoint URL (A) is a common cause because the request is sent to the wrong OCI Generative AI service endpoint (e.g., using a chat endpoint for a text generation model), leading to a malformed request that the server rejects. Missing required parameters (C) in the request body, such as 'compartmentId' or 'modelId', also triggers a 400 error as the API cannot validate or process the inference without them.

Exceeding the model's maximum token limit (D) results in a 400 error because the input or output exceeds the model's configured token capacity, which the API validates before processing.

Exam trap

Oracle often tests the distinction between HTTP 4xx status codes, where candidates confuse 400 Bad Request (client-side malformed request) with 401 Unauthorized (invalid credentials) or 403 Forbidden (insufficient permissions), leading them to incorrectly select invalid API key as a cause for a 400 error.

Page 1 of 7

Page 2

All pages