Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 (1Z0-1127) — Questions 451500

500 questions total · 7pages · All types, answers revealed

Page 6

Page 7 of 7

451
Multi-Selectmedium

Which TWO factors should be considered when selecting a base model for fine-tuning on OCI Generative AI service?

Select 2 answers
A.The model's training dataset size
B.The model's size and number of parameters
C.The model's license and terms of use
D.The model's training framework (PyTorch vs TensorFlow)
E.The model's built-in features like content filtering
AnswersB, C

Larger models consume more resources and cost more to serve.

Why this answer

When selecting a base model for fine-tuning on OCI Generative AI service, the model's size and number of parameters (B) directly impact computational cost, training time, and the model's capacity to learn from your dataset. The model's license and terms of use (C) are critical because commercial use, redistribution, and fine-tuning rights vary per model (e.g., Llama 2 vs. GPT-based models), and violating these can lead to legal or compliance issues.

Exam trap

Oracle often tests the misconception that technical details like training framework or dataset size are relevant, when in fact the exam focuses on operational and legal factors (size/license) that directly affect deployment and compliance in OCI's managed service.

452
Multi-Selecthard

Which THREE steps are necessary to secure access to the OCI Generative AI inference API in a production environment?

Select 3 answers
A.Enable encryption with OCI Vault keys for all inference data.
B.Configure network security groups to allow only trusted source IPs to the inference endpoint.
C.Create IAM policies that grant the 'use' verb on generative-ai-family resources.
D.Use private endpoints to access the Generative AI service from a VCN.
E.Apply data masking policies to obfuscate sensitive information in prompts.
AnswersB, C, D

NSGs provide network-level security.

Why this answer

Option B is correct because network security groups (NSGs) allow you to restrict inbound traffic to the Generative AI inference endpoint to only trusted source IP addresses, reducing the attack surface. In a production environment, this is a fundamental network-layer security control to prevent unauthorized access to the API.

Exam trap

Oracle often tests the distinction between network-layer controls (NSGs, private endpoints) and data-layer controls (encryption, masking), expecting candidates to recognize that securing API access requires network and IAM controls, not data protection features.

453
MCQmedium

Refer to the exhibit. A team created this dedicated AI cluster. However, when they try to create a model deployment, the deployment fails with an error indicating insufficient public IPs. What change to the cluster configuration should they make?

A.Change assignPublicIp to true.
B.Increase the nodeCount to 8.
C.Attach a different subnet that has more available public IPs.
D.Change the AI cluster shape to VM.GPU.A10.2.
AnswerA

Correct: Enabling public IPs allows nodes to have public endpoints.

Why this answer

The error indicates insufficient public IPs because the cluster's subnet does not have enough available public IP addresses. Setting `assignPublicIp` to `true` in the cluster configuration allows the cluster to automatically allocate public IPs from the subnet's pool, resolving the shortage. This is required for model deployments that need public endpoints.

Exam trap

The trap here is that candidates might think the issue is a subnet IP shortage (Option C) or a scaling problem (Option B), when the real cause is a misconfigured public IP assignment flag that prevents the cluster from using available IPs.

How to eliminate wrong answers

Option B is wrong because increasing the nodeCount to 8 would require even more public IPs, exacerbating the shortage rather than fixing it. Option C is wrong because attaching a different subnet with more public IPs is a workaround, but the root cause is that the cluster is not configured to assign public IPs; changing the subnet does not enable the assignment. Option D is wrong because changing the AI cluster shape to VM.GPU.A10.2 does not affect public IP allocation; it only changes the GPU type and compute capacity.

454
MCQmedium

An e-commerce company uses OCI Generative AI to generate product descriptions. They have fine-tuned the model on their product catalog. They notice that the descriptions are accurate but lack creativity and are repetitive. They want to maintain accuracy while adding variety. What change should they make?

A.Increase the top_p sampling from 0.9 to 1.0.
B.Increase the temperature from 0.2 to 0.5.
C.Use a different base model.
D.Add more training examples with diverse descriptions.
AnswerB

A moderate temperature increase adds variety while preserving factual accuracy.

Why this answer

Option B is correct. Slightly increasing temperature (e.g., from 0.2 to 0.5) introduces controlled variability without significantly compromising accuracy. Option A is wrong because top_p=1.0 samples from the full distribution, which can add noise.

Option C is wrong because adding more training data requires effort and time, and may not immediately add variety. Option D is wrong because changing the base model could hurt accuracy and requires retraining.

455
MCQmedium

An AI specialist is troubleshooting why a fine-tuned model produces inconsistent results across different inference calls. What is the most likely cause?

A.The base model is not suitable
B.The temperature is set too high
C.The model is overfitted
D.The fine-tuning dataset is too small
AnswerB

High temperature increases randomness, causing variable outputs.

Why this answer

Temperature controls the randomness of token sampling during inference. A high temperature (e.g., >1.0) increases the probability of selecting less likely tokens, causing the model to produce varied outputs for the same input across different calls. This is the most direct cause of inconsistent results when the base model and fine-tuning are otherwise sound.

Exam trap

Oracle often tests the misconception that overfitting (Option C) causes inconsistency, but overfitting actually reduces variance by memorizing patterns; the trap is confusing output variability with poor generalization.

How to eliminate wrong answers

Option A is wrong because an unsuitable base model would cause consistently poor or biased outputs, not inconsistency across calls; the base model's suitability affects overall quality, not per-call variance. Option C is wrong because overfitting leads to memorization of training data, producing deterministic or near-identical outputs for similar inputs, not random inconsistency. Option D is wrong because a small fine-tuning dataset typically causes underfitting or poor generalization, not random variation across inference calls; inconsistency from small data would manifest as high variance across different inputs, not across repeated calls with the same input.

456
MCQhard

An organization is fine-tuning a large language model on OCI Data Science. They must ensure that the training data remains within a specific geographic region and is encrypted at rest. Which combination of resources should they use?

A.OCI Object Storage bucket with a bucket policy and default encryption, created in the required region.
B.OCI Database with Transparent Data Encryption, storing the training data in tables.
C.OCI File Storage with export options and encryption, mounted to the Data Science session.
D.OCI Block Volume with encryption, attached to the Data Science notebook session.
AnswerA

Bucket policy controls access, encryption secures data at rest, and region selection ensures data residency.

Why this answer

Option A is correct because OCI Object Storage with default encryption ensures data is encrypted at rest using AES-256, and a bucket policy can enforce that data remains within a specific geographic region by restricting cross-region replication or access. This combination directly meets the requirements of regional data residency and encryption at rest for training data used in OCI Data Science.

Exam trap

The trap here is that candidates may confuse encryption at rest with data residency enforcement, assuming any encrypted storage (like Block Volume or File Storage) automatically guarantees geographic containment, but only Object Storage provides bucket-level policies to explicitly restrict data movement across regions.

How to eliminate wrong answers

Option B is wrong because OCI Database with Transparent Data Encryption is designed for transactional workloads, not for storing large-scale training data for LLM fine-tuning, and it does not inherently enforce geographic region constraints on the data. Option C is wrong because OCI File Storage with export options and encryption can be mounted to a Data Science session, but it does not provide native mechanisms to enforce regional data residency; the data could be replicated or accessed across regions. Option D is wrong because OCI Block Volume with encryption attached to a notebook session encrypts data at rest, but it does not offer policy controls to ensure the data remains within a specific geographic region, as block volumes are tied to the compute instance's availability domain, not the broader region.

457
Multi-Selecthard

An organization is planning to use OCI Generative AI for sensitive customer data. Which three OCI services or features should they consider for data governance and security?

Select 3 answers
A.OCI Vault for managing API keys
B.OCI Data Safe for data masking and encryption
C.OCI IAM for access control
D.OCI Data Labeling for annotating data
E.OCI Audit for logging API calls
AnswersA, C, E

Secure storage of API keys and secrets is crucial for authentication to Generative AI endpoints.

Why this answer

Option A is correct because OCI Vault is a dedicated service for securely storing and managing secrets, including API keys used to authenticate to OCI Generative AI. By centralizing API key management in Vault, organizations can enforce rotation policies, access controls, and audit trails, which is critical for protecting sensitive customer data when invoking generative AI models.

Exam trap

Oracle often tests the misconception that database security services like Data Safe apply to all data in OCI, but candidates must recognize that Generative AI operates through API calls and does not use a relational database, making Data Safe irrelevant here.

458
MCQhard

An engineer sets beam search width to 1 during inference on OCI Generative AI. What is the most likely effect on output?

A.More memory usage
B.More diverse outputs
C.Better quality
D.Faster inference
AnswerD

Greedy decoding is the fastest decoding method as it considers only one candidate path.

Why this answer

Beam search width of 1 corresponds to greedy decoding, which selects the highest probability token at each step. This results in deterministic but often less diverse and potentially lower quality outputs compared to wider beam search. Option B is correct.

459
MCQeasy

An administrator needs to grant a data science team access to create and manage generative AI model endpoints in a specific compartment. Which policy should they create?

A.Allow group DataScientists to manage all-resources in compartment Production
B.Allow group DataScientists to use generative-ai-model-family in compartment Production
C.Allow group DataScientists to read generative-ai-model-family in compartment Production
D.Allow group DataScientists to manage generative-ai-model-family in compartment Production
AnswerD

This policy grants the required permissions.

Why this answer

Option D is correct because the verb 'manage' grants full CRUD (Create, Read, Update, Delete) permissions on the 'generative-ai-model-family' resource type, which is the specific resource family for generative AI model endpoints in OCI. This allows the DataScientists group to create and manage endpoints within the specified compartment without granting broader access to all resources.

Exam trap

Oracle often tests the distinction between 'use' and 'manage' verbs, where candidates mistakenly choose 'use' thinking it covers creation, but 'use' only allows invocation and access, not resource lifecycle management.

How to eliminate wrong answers

Option A is wrong because 'manage all-resources' grants excessive permissions beyond what is needed, including access to unrelated services like compute or storage, violating the principle of least privilege. Option B is wrong because 'use' only allows actions like invoking or accessing the resource, but does not permit creating, updating, or deleting model endpoints. Option C is wrong because 'read' only allows viewing or listing resources, with no ability to create or manage endpoints.

460
Multi-Selecthard

Which THREE of the following are best practices when deploying a generative AI model on OCI?

Select 3 answers
A.Store API keys in the model endpoint configuration.
B.Set up autoscaling for the endpoint.
C.Disable logging to save costs.
D.Use a dedicated AI cluster for production endpoints.
E.Enable content filtering on the endpoint.
AnswersB, D, E

Autoscaling handles variable load efficiently.

Why this answer

Option B is correct because autoscaling ensures that the generative AI endpoint can dynamically adjust compute resources based on real-time inference traffic, maintaining low latency and high availability while optimizing cost. On OCI, autoscaling policies can be configured for dedicated AI clusters to scale the number of model serving replicas in response to metrics like CPU utilization or request queue depth.

Exam trap

Oracle often tests the misconception that disabling logging is a valid cost-saving measure, but in reality, logging is essential for operational visibility and compliance, and costs can be managed through sampling or retention policies rather than outright disabling.

461
MCQeasy

A developer needs to integrate OCI Generative AI into a Python application. Which SDK should they use?

A.Boto3
B.OCI Python SDK
C.Google Cloud client
D.OpenAI library
AnswerB

Correct: OCI Python SDK is the standard integration method.

Why this answer

Option A is correct because OCI provides an official Python SDK for interacting with all services including Generative AI. The other options are for different cloud providers.

462
MCQmedium

During fine-tuning of a Cohere model on OCI Data Science, the loss curve shows a sharp spike after epoch 3. What is the most appropriate action?

A.Gradient clipping.
B.Reduce learning rate.
C.Add more training data.
D.Increase batch size.
AnswerA

Gradient clipping limits gradient values, preventing explosion and stabilizing training.

Why this answer

A sharp spike in the loss curve after epoch 3 during fine-tuning indicates a gradient explosion, where the gradients become excessively large and destabilize the model's weights. Gradient clipping is the most appropriate action because it directly caps the gradient norm (e.g., using `max_grad_norm=1.0` in Cohere's fine-tuning API) to prevent these spikes, ensuring stable training without altering the learning dynamics.

Exam trap

Oracle often tests the distinction between gradient explosion (sharp spikes) and learning rate divergence (gradual increase), leading candidates to incorrectly choose reducing the learning rate instead of gradient clipping.

How to eliminate wrong answers

Option B is wrong because reducing the learning rate addresses gradual divergence or oscillation, not sudden spikes; a sharp spike is a sign of gradient explosion, not a learning rate that is too high. Option C is wrong because adding more training data improves generalization and reduces overfitting but does not mitigate gradient instability during training. Option D is wrong because increasing batch size can stabilize gradient estimates but may also increase memory usage and does not directly prevent individual gradient values from becoming too large; it can even exacerbate gradient explosion by averaging over more samples.

463
MCQhard

You are a machine learning engineer at a large e-commerce company. You have been tasked with deploying a large language model to power a customer service chatbot that handles product returns and refunds. The model will answer customer queries based on a knowledge base of return policies and FAQs. The company has strict requirements: (1) responses must be factually accurate and grounded in the knowledge base, (2) the system must be cost-effective, and (3) latency should be under 2 seconds per response. You decide to use a pre-trained LLM from OCI Data Science and implement retrieval-augmented generation (RAG). You have two options for the retriever: a dense embedding-based retriever (e.g., using OCI AI Language embeddings) or a sparse keyword-based retriever (e.g., BM25). You also need to decide on the generation model size: a 7B parameter model or a 70B parameter model. You run a pilot test: with the dense retriever + 7B model, average latency is 1.8 seconds and accuracy is 85%. With the sparse retriever + 7B model, latency is 1.2 seconds but accuracy drops to 75%. With the 70B model (any retriever), latency exceeds 5 seconds. Which combination should you choose to meet all requirements?

A.Sparse retriever + 70B model.
B.Dense retriever + 70B model.
C.Sparse retriever + 7B model.
D.Dense retriever + 7B model.
AnswerD

Meets both latency and accuracy requirements.

Why this answer

Option D (dense retriever + 7B model) is correct because it meets all three requirements: factual accuracy (85% accuracy from dense retrieval grounding), latency under 2 seconds (1.8 seconds), and cost-effectiveness (7B model is cheaper to run than 70B). The dense retriever provides better semantic matching for nuanced return policy queries, while the 7B model keeps inference fast and affordable.

Exam trap

Oracle often tests the trade-off between retrieval accuracy and model size, where candidates mistakenly prioritize a larger model (70B) for better generation quality, ignoring that the latency constraint makes it infeasible, or choose a sparse retriever thinking it's faster, but overlook the critical accuracy requirement for grounded responses.

How to eliminate wrong answers

Option A is wrong because the 70B model with any retriever exceeds 5 seconds latency, violating the 2-second requirement. Option B is wrong because the 70B model also exceeds 5 seconds latency, failing the latency requirement. Option C is wrong because the sparse retriever (BM25) with the 7B model yields only 75% accuracy, which is below the acceptable factual accuracy threshold given the strict requirement for grounded responses.

464
MCQhard

A developer makes an API call to generate text with top_p=1.5. What is the correct way to fix this error?

A.Remove the top_p parameter from the request
B.Increase the temperature parameter to compensate
C.Set top_p to a value between 0 and 1, e.g., 0.9
D.Use the top_k parameter instead
AnswerC

Correcting the value to within the allowed range fixes the error.

Why this answer

Option C is correct: top_p must be between 0 and 1. Setting to 0.9 is within range. Option A (using top_k) changes parameter, but the fix is to correct top_p.

Option B (remove top_p) removes it, but default may be 1? Actually default is 1, but the error says between 0 and 1, so 1 is valid. But the developer wants top_p, so adjust value. Option D (increase temperature) unrelated.

465
MCQmedium

An administrator creates this IAM policy to allow a group to use a specific generative AI model. However, users report a 403 Forbidden error. What is the most likely issue?

A.The policy syntax is invalid for OCI IAM; missing required fields
B.The resource OCID is incorrect
C.The policy does not specify the compartment where the model resides
D.The action name is wrong; it should be 'ai:generate-text' (correct already)
AnswerA

OCI IAM policies use a different JSON structure with 'subject', 'action', 'resource' arrays.

Why this answer

Option D is correct: The policy is missing the 'subject' and 'target' statements; it uses incorrect syntax. IAM policies require 'subject' and 'target' blocks. The provided JSON is not a valid OCI policy format.

Option A (compartment) is wrong because policy format is wrong. Option B (resource OCID) is fine. Option C (action) is fine.

The syntax is the issue.

466
MCQeasy

A data scientist fine-tunes a model using OCI Data Science and wants to deploy it as a managed endpoint in OCI Generative AI. What must they do first?

A.Upload model artifacts to Object Storage and register in Model Catalog
B.Write a custom container
C.Create a dedicated AI cluster
D.Use OCI CLI to create an endpoint
AnswerA

This is the required first step to deploy a custom model.

Why this answer

To deploy a fine-tuned model as a managed endpoint in OCI Generative AI, the model artifacts must first be uploaded to Object Storage and registered in the Model Catalog. This is a prerequisite because OCI Generative AI endpoints pull model artifacts from the Model Catalog, which references the storage location. Without registration, the service cannot locate or serve the model.

Exam trap

The trap here is that candidates assume they can directly create an endpoint using CLI or SDK without first registering the model in the Model Catalog, overlooking the mandatory registration step that links the artifacts to the serving infrastructure.

How to eliminate wrong answers

Option B is wrong because custom containers are not required for managed endpoints in OCI Generative AI; the service provides built-in serving infrastructure for supported model formats. Option C is wrong because a dedicated AI cluster is used for training or batch inference, not for deploying a managed endpoint, which uses OCI's shared serving infrastructure. Option D is wrong because using OCI CLI to create an endpoint is a valid method, but it cannot succeed until the model is registered in the Model Catalog; the CLI command requires a model OCID from the catalog.

467
Multi-Selecteasy

Which THREE components are essential in a typical RAG architecture built on OCI? (Select three.)

Select 3 answers
A.Vector database (e.g., OCI OpenSearch, Autonomous Database)
B.Data ingestion pipeline with Apache Spark
C.Embedding model (e.g., Cohere Embed)
D.Large language model (e.g., Cohere Command)
E.Prompt template for system instructions
AnswersA, C, D

Required for storing and retrieving embeddings.

Why this answer

A vector store (A) for similarity search, an LLM (B) for generating answers, and a prompt template (E) to combine context and query. Embedding model (D) is also essential (but not listed as a separate option? Actually D is embedding model, so that is also essential. But we need exactly three.

The correct ones are A, B, D. A vector store, an LLM, and an embedding model are core. Prompt template is also core, but we have to select three.

Let's adjust: Options: A: vector store, B: LLM, C: data pipeline, D: embedding model, E: prompt template. Essential: A, B, D. Prompt template is important but not strictly essential if prompt is hardcoded.

Data pipeline is important but not part of runtime RAG. So A, B, D.

468
MCQmedium

A data scientist is building a RAG application that processes PDF invoices. The extraction step uses OCI Document Understanding to convert PDFs to text. The scientist then splits the text into chunks and generates embeddings using OCI Generative AI. However, the retrieval often misses critical fields like invoice numbers and dates. Which preprocessing step would MOST likely improve retrieval of these specific fields?

A.Increase the chunk size to include entire invoices.
B.Apply stemming and lemmatization to the text before chunking.
C.Tag each chunk with metadata such as invoice number, date, and vendor, and use metadata filtering during retrieval.
D.Switch from dense embeddings to sparse embeddings for better exact match.
AnswerC

Metadata filtering enables precise retrieval based on structured fields.

Why this answer

Option C is correct because metadata tagging and filtering directly address the retrieval of specific fields like invoice numbers and dates. By attaching metadata (e.g., invoice number, date, vendor) to each chunk and filtering on these metadata fields during retrieval, the RAG system can precisely locate the relevant chunks without relying solely on semantic similarity. This approach leverages OCI Document Understanding's ability to extract structured data and OCI Generative AI's vector search capabilities to combine dense embeddings with exact metadata matching.

Exam trap

Oracle often tests the misconception that increasing chunk size or changing embedding type alone can solve retrieval failures for structured fields, when in reality metadata filtering is the correct technique for precise field-level retrieval in RAG applications.

How to eliminate wrong answers

Option A is wrong because increasing chunk size to include entire invoices reduces granularity, making it harder to retrieve specific fields like invoice numbers and dates, and may exceed the context window of the embedding model, degrading retrieval quality. Option B is wrong because stemming and lemmatization reduce words to root forms, which can obscure exact matches for critical fields like invoice numbers (e.g., 'INV-12345' becomes 'inv-12345') and dates (e.g., '2023-01-15' might be altered), harming retrieval precision. Option D is wrong because sparse embeddings (e.g., TF-IDF) improve exact keyword matching but still rely on the text content of chunks; without metadata tagging, the system cannot filter chunks by field type, so critical fields may still be missed if they appear in chunks with low keyword overlap.

469
Multi-Selectmedium

Which TWO actions are recommended best practices for managing costs when using OCI Generative AI dedicated AI clusters?

Select 2 answers
A.Provision a fixed number of nodes to handle peak load
B.Use preemptible instances for non-critical inference workloads
C.Use autoscaling to adjust nodes based on demand
D.Stop the dedicated AI cluster when not in use
E.Use pay-as-you-go billing instead of preemptible instances
AnswersB, C

Preemptible instances are cheaper and suitable for fault-tolerant tasks.

Why this answer

Option B is correct because preemptible instances in OCI are significantly cheaper than standard instances and are ideal for non-critical inference workloads that can tolerate interruptions. This aligns with cost optimization best practices by allowing you to use spare compute capacity at a reduced rate for tasks that do not require continuous availability.

Exam trap

The trap here is that candidates may think stopping a dedicated AI cluster is a valid cost-saving action, but OCI dedicated AI clusters do not support a 'stop' state—you must terminate the cluster, which loses all configuration and data, making it impractical for intermittent use.

470
MCQmedium

A data science team at a healthcare company has fine-tuned a Llama 2 model using OCI Data Science and registered it in the Model Catalog. They want to deploy it as a managed endpoint using OCI Generative AI. The model requires 64 GB of GPU memory. The team has created a dedicated AI cluster with a single node shape that has 48 GB GPU memory. When they attempt to deploy the model, the deployment fails with an error indicating insufficient resources. The team has verified that the model artifact is correct and that the compartment policies allow deployment. What should the team do to successfully deploy the model?

A.Increase the number of nodes in the cluster to 2.
B.Enable model parallelism to split the model across nodes.
C.Select a node shape with higher GPU memory, such as 80 GB.
D.Reduce the model's precision from FP16 to INT8 to lower memory usage.
AnswerC

Using a node shape with sufficient memory allows the model to be loaded.

Why this answer

Option C is correct because the model requires 64 GB of GPU memory, but the dedicated AI cluster uses a node shape with only 48 GB. The only way to satisfy the memory requirement is to select a node shape with higher GPU memory, such as 80 GB, as OCI Generative AI managed endpoints require a single node to host the entire model. Increasing nodes or enabling model parallelism does not help because OCI Generative AI does not support distributed inference across nodes for managed endpoints, and reducing precision may not guarantee the model fits or may degrade accuracy.

Exam trap

The trap here is that candidates may think adding more nodes or enabling model parallelism can aggregate GPU memory, but OCI Generative AI managed endpoints do not support distributed inference across nodes, so the only valid solution is to use a node shape with sufficient single-GPU memory.

How to eliminate wrong answers

Option A is wrong because increasing the number of nodes to 2 does not solve the memory issue; OCI Generative AI managed endpoints deploy the model on a single node, and additional nodes are not used to aggregate GPU memory for inference. Option B is wrong because model parallelism is not supported for managed endpoints in OCI Generative AI; the service expects the entire model to fit on one node's GPU memory. Option D is wrong because reducing precision from FP16 to INT8 may lower memory usage, but it is not a guaranteed fix and could introduce accuracy loss; moreover, the question states the model requires 64 GB of GPU memory, and the team should first ensure the hardware meets the requirement rather than altering the model.

471
MCQmedium

You manage a generative AI model deployed on OCI Model Deployment that serves a chatbot application. The model is a 13B parameter LLM on a VM.GPU.A100.1 shape. Recently, you rolled out a new version of the model that is supposed to improve response quality. However, after the update, the application starts returning HTTP 500 errors and memory usage spikes. You need to update to the new version without causing downtime. The current deployment has 2 replicas with autoscaling enabled. Which strategy should you use to safely deploy the new model version?

A.Directly update the existing model deployment with the new model artifact
B.Create a second deployment with the new model, test it, then shift traffic using a load balancer
C.Stop the existing deployment, update the model artifact, then start the deployment
D.Increase the number of replicas to 4, then update the model
AnswerB

Blue-green deployment ensures no downtime and safe rollout.

Why this answer

Option B is correct because it implements a blue/green deployment strategy: you create a second deployment with the new model, test it in isolation, and then shift traffic using a load balancer. This avoids downtime and allows you to validate the new model before exposing it to production traffic, which is critical given the observed HTTP 500 errors and memory spikes.

Exam trap

The trap here is that candidates may assume increasing replicas provides safety through redundancy, but it does not prevent the new model from causing errors on all replicas; the key is isolation via a separate deployment and traffic shifting.

How to eliminate wrong answers

Option A is wrong because directly updating the existing model deployment with the new artifact would cause in-place changes, potentially triggering the memory spike and HTTP 500 errors on the live replicas, leading to downtime. Option C is wrong because stopping the existing deployment before updating causes complete downtime, violating the requirement to update without downtime. Option D is wrong because increasing replicas to 4 and then updating still performs an in-place update on all replicas, which does not isolate the faulty model and can still cause errors and memory spikes across the entire fleet.

472
MCQhard

Refer to the exhibit. A user runs 'oci generative-ai model list' and sees this output. They then try to use 'cohere.command-light' but get an error. What is the most likely reason?

A.The model is in INACTIVE state
B.The API key does not have access
C.The model is not listed
D.The region is wrong
AnswerA

INACTIVE models cannot be used for inference.

Why this answer

Option B is correct because the model 'cohere.command-light' has lifecycle-state 'INACTIVE', meaning it cannot be used. Option A is false because it is listed; C and D would produce different errors.

473
Multi-Selectmedium

Which three techniques are commonly used to reduce the risk of prompt injection in LLM applications? (Choose three.)

Select 3 answers
A.Enabling prompt validation against regex patterns.
B.Output filtering.
C.Increasing temperature.
D.Input sanitization.
E.Using role-based system prompts.
AnswersB, D, E

Filtering outputs can block dangerous responses.

Why this answer

Output filtering (B) is correct because it acts as a post-processing defense that scans the LLM's generated output for malicious content, such as leaked system prompts or injected commands, before it reaches the user. This technique helps mitigate the impact of successful prompt injections by catching and neutralizing harmful outputs that bypass input controls.

Exam trap

Oracle often tests the distinction between security controls and model parameters, so the trap here is that candidates mistakenly think adjusting model settings like temperature can reduce injection risk, when in fact only input/output controls and system prompt design are effective.

474
MCQeasy

An organization wants to use an LLM to summarize legal documents. Which consideration is most important for ensuring accurate summaries?

A.Fine-tune the model on a curated legal corpus
B.Use the largest available general-purpose model
C.Rely on zero-shot summarization with careful prompting
D.Pre-train a new model from scratch on legal texts
AnswerA

Domain-specific fine-tuning teaches the model legal terminology and reasoning.

Why this answer

Legal documents require precise understanding, so fine-tuning on legal data is critical. Option B is wrong because larger models don't guarantee domain accuracy. Option C is wrong because pre-training from scratch is expensive and unnecessary.

Option D is wrong because zero-shot may miss legal nuances.

475
MCQeasy

A data scientist wants to fine-tune a generative AI model on proprietary customer data. What is a best practice for preparing the training dataset?

A.Randomly sample 1000 records from production logs.
B.Use the same dataset as the base model's pre-training data.
C.Curate a dataset of domain-specific examples with clear input-output pairs.
D.Use the largest available public dataset from the internet.
AnswerC

Domain-specific curated data ensures the model learns the desired behavior for the target use case.

Why this answer

Option C is correct because fine-tuning a generative AI model on proprietary data requires a curated, domain-specific dataset with clear input-output pairs. This ensures the model learns the desired task (e.g., summarization, classification) without introducing noise or irrelevant patterns, which is critical for OCI Generative AI Service fine-tuning where data quality directly impacts model performance.

Exam trap

Oracle often tests the misconception that more data (random or public) is always better for fine-tuning, when in fact curated, domain-specific data with clear input-output pairs is essential for effective adaptation without degrading base model capabilities.

How to eliminate wrong answers

Option A is wrong because randomly sampling 1000 records from production logs introduces noise, missing labels, and imbalanced distributions, which degrade fine-tuning quality and may cause catastrophic forgetting. Option B is wrong because using the same dataset as the base model's pre-training data provides no new information, leading to zero improvement and potential overfitting to already learned patterns. Option D is wrong because using the largest available public dataset from the internet introduces irrelevant or conflicting data, diluting domain-specific learning and violating data privacy requirements for proprietary customer data.

476
MCQhard

A company needs to integrate OCI Generative AI Service with an existing application that uses OCI IAM for authentication. They want to use resource principal to allow the application to call the service without storing API keys. Which step is REQUIRED?

A.Create an OCI API key for the application
B.Enable the Generative AI Service for resource principal in the tenancy
C.Assign the application to a group with admin privileges
D.Create a dynamic group and a policy granting access to the Generative AI Service
AnswerD

Dynamic group with matching rules and a policy are required for resource principal.

Why this answer

Resource principal authentication in OCI requires the application to be represented by a dynamic group, which matches instances or resources based on defined rules. A policy must then grant that dynamic group access to the Generative AI Service. This avoids storing API keys by using OCI IAM's built-in resource principal token exchange.

Exam trap

Oracle often tests the misconception that resource principal requires a tenancy-wide setting or an API key, when in fact the correct mechanism is a dynamic group combined with a targeted IAM policy.

How to eliminate wrong answers

Option A is wrong because creating an OCI API key would reintroduce the need to store and manage secrets, which resource principal is designed to eliminate. Option B is wrong because there is no tenancy-level toggle to 'enable' the Generative AI Service for resource principal; the service is always available for resource principal, but access is controlled via dynamic groups and policies. Option C is wrong because assigning the application to a group with admin privileges violates the principle of least privilege and is unnecessary; a custom policy granting only the required permissions to the dynamic group is sufficient and more secure.

477
MCQhard

A multinational corporation plans to deploy OCI Generative AI in multiple OCI regions for disaster recovery. They have fine-tuned a custom model in the primary region. What is the recommended approach to make the fine-tuned model available in the secondary region with minimal manual effort?

A.Create an IAM policy to allow cross-region access to the model from the secondary region.
B.Use OCI Cross-Region Replication for the model's underlying object storage bucket and the dedicated AI cluster.
C.Redeploy the fine-tuning job in the secondary region using the same training data.
D.Copy the model artifact to the secondary region's object storage bucket and create a new dedicated endpoint there.
AnswerD

This leverages existing model artifacts and can be automated with OCI CLI or SDK.

Why this answer

Option D is correct because fine-tuned models are stored in object storage; replicating the bucket and using custom automation to recreate the dedicated endpoint ensures availability. Option A is incorrect because there is no automatic cross-region replication for models. Option B is incorrect because redeploying from scratch is time-consuming.

Option C is incorrect because manual policy changes are not the main issue.

478
MCQmedium

A company has deployed a RAG application using OCI Generative AI service with a vector store in OCI OpenSearch. Users report that answers are often incomplete or irrelevant. The application uses a single prompt template with a fixed chunk size of 1000 tokens. Which action is most likely to improve answer quality?

A.Disable vector search and rely solely on the LLM's pre-trained knowledge
B.Use a smaller embedding model to reduce noise
C.Implement a re-ranking step after vector search
D.Increase the chunk size to 2000 tokens
AnswerC

Re-ranking improves precision by ordering chunks based on relevance to the query.

Why this answer

Implementing a re-ranking step after vector search helps filter and prioritize the most relevant chunks, improving answer quality. Larger chunks may dilute context, smaller models reduce accuracy, and disabling vector search defeats RAG purpose.

479
MCQmedium

A company wants to use OCI Generative AI but must comply with GDPR. Which feature ensures data residency?

A.Encryption at rest
B.Access control policies
C.Data localization with dedicated AI clusters
D.Audit logging
AnswerC

Dedicated clusters in chosen regions enforce data residency.

Why this answer

Option B is correct because dedicated AI clusters can be deployed in specific regions to ensure data does not leave that region. Other options address security or logging but not residency.

480
MCQeasy

A developer is building a RAG pipeline using OCI Data Science and wants to store vector embeddings. Which OCI service is optimized for vector search and can be used as a vector store?

A.OCI Autonomous Database
B.OCI OpenSearch
C.OCI Object Storage
D.OCI Streaming
AnswerB

OCI OpenSearch includes a vector database plugin for k-NN similarity search, making it a suitable vector store.

Why this answer

Option B is correct because OCI OpenSearch provides a vector engine that supports k-NN search. Option A is wrong because OCI Object Storage is not a searchable vector store. Option C is wrong because OCI Autonomous Database has vector capabilities but is not primarily optimized for vector search at scale.

Option D is wrong because OCI Streams is for streaming data.

481
MCQhard

An enterprise wants to use OCI Generative AI to generate personalized email campaigns. They have a large customer database with preferences and past purchase history. Which design is best for high relevance and scalability?

A.Use a single prompt with all customer details as context.
B.Fine-tune a model on each customer's history separately.
C.Use a rule-based engine with AI-generated templates.
D.Use a pipeline: retrieve relevant customer data and inject into prompt.
AnswerD

This RAG approach is scalable and maintains relevance.

Why this answer

A retrieval-augmented generation (RAG) pipeline retrieves relevant customer data and injects it into the prompt, balancing personalization and scalability.

482
Multi-Selecthard

Which THREE techniques effectively reduce query latency in a RAG system?

Select 3 answers
A.Pre-compute embeddings for all documents
B.Use approximate nearest neighbor search
C.Use a larger generation model
D.Increase the number of shards
E.Use a smaller embedding model
AnswersA, B, E

Pre-computed embeddings avoid real-time embedding calls during query.

Why this answer

Pre-computing embeddings for all documents eliminates the need to generate embeddings at query time, which is a computationally expensive step. By storing pre-computed vector representations, the system can directly perform similarity searches against the index, significantly reducing latency.

Exam trap

Oracle often tests the misconception that increasing model size or shard count always improves performance, but in RAG systems, these changes can introduce latency penalties due to higher computational overhead or distributed coordination costs.

483
MCQmedium

A financial services company is deploying a RAG system for regulatory compliance queries. The system uses OCI Data Science to run a custom embedding model fine-tuned on regulatory documents. The index in OpenSearch uses cosine similarity and HNSW algorithm. Users report that queries containing synonyms to regulatory terms (e.g., "AML" vs "Anti-Money Laundering") often fail to retrieve relevant documents. Which combination of improvements would be MOST effective? (Assume budget and latency constraints)

A.Increase the `m` parameter in HNSW to improve recall
B.Fine-tune the embedding model further on a dataset of synonyms
C.Implement a hybrid search combining keyword and vector search
D.Use query expansion with a thesaurus before embedding
AnswerC

Hybrid search (BM25 + vector) directly captures exact term matches, bridging the synonym gap effectively.

Why this answer

Hybrid search (combining keyword (BM25) and vector search) catches exact synonym matches from text. Query expansion helps but may not be as reliable. Fine-tuning on synonyms is possible but time-consuming.

Increasing HNSW m slightly improves recall but does not address synonym gap.

484
MCQmedium

A developer notices that the RAG application returns irrelevant chunks for user queries. The embedding model used is `cohere.embed-english-light-v3.0`. Which action is MOST likely to improve relevance?

A.Reduce the number of retrieved chunks (k)
B.Increase the chunk size
C.Switch to a larger embedding model (e.g., cohere.embed-english-v3.0)
D.Use a different similarity metric (e.g., Euclidean instead of cosine)
AnswerC

Larger models produce higher-quality embeddings, improving retrieval relevance.

Why this answer

Switching to the larger `cohere.embed-english-v3.0` model provides more powerful embeddings, capturing more semantic information. Increasing chunk size may include irrelevant content; changing similarity metric has marginal effect; reducing retrieved chunks may miss relevant ones.

485
MCQhard

During fine-tuning of a large language model on OCI, you notice that the model's performance on the validation set is not improving after several epochs, but the training loss continues to decrease. What is the most likely cause?

A.The learning rate is too high.
B.The validation set is not representative.
C.The model is overfitting to the training data.
D.The training data is too small.
AnswerC

Overfitting occurs when the model memorizes training examples, causing training loss to drop while validation performance plateaus or declines. This is the most likely cause.

Why this answer

When training loss decreases but validation performance stagnates or worsens, the model is overfitting to the training data. It memorizes the training examples but fails to generalize. A high learning rate might cause divergence, not this pattern.

Too small training data can contribute to overfitting but is not the direct symptom. An unrepresentative validation set could cause mismatch, but the described pattern is classic overfitting.

486
MCQeasy

A healthcare company is using OCI GenAI to generate patient summaries from clinical notes. The model output sometimes includes hallucinated medical facts, such as incorrect dosages or diagnoses, which could be dangerous. The team needs to improve factual accuracy while maintaining data privacy. They have a large collection of internal medical knowledge bases (clinical guidelines, drug databases) that are stored in OCI Object Storage. The current implementation uses a zero-shot prompt with the base Cohere Command model. The data science team has limited GPU resources and wants to avoid building a complex pipeline. Which course of action best addresses the hallucination problem?

A.Increase the temperature parameter to 0.9 to encourage more deterministic outputs.
B.Use prompt engineering to add 'Only provide facts that are absolutely certain.'
C.Implement a RAG pipeline that retrieves relevant documents from the internal knowledge bases and includes them in the prompt.
D.Fine-tune the Cohere model on a publicly available medical dataset like PubMed.
AnswerC

RAG grounds generation in retrieved facts, significantly reducing hallucinations.

Why this answer

Option C is correct because a Retrieval-Augmented Generation (RAG) pipeline directly addresses hallucination by grounding the model's output in verified, internal medical knowledge bases stored in OCI Object Storage. This approach retrieves relevant clinical guidelines or drug database entries and includes them in the prompt, providing factual context without requiring fine-tuning or complex GPU-intensive pipelines. It also preserves data privacy by keeping sensitive medical data within OCI and avoids exposing it to external model training.

Exam trap

Oracle often tests the misconception that prompt engineering alone can reliably eliminate hallucinations, but the trap here is that without external knowledge injection (RAG), the model cannot overcome its inherent tendency to fabricate facts, especially in high-stakes domains like healthcare.

How to eliminate wrong answers

Option A is wrong because increasing the temperature to 0.9 actually increases randomness and creativity, making outputs less deterministic and more prone to hallucinations, not less. Option B is wrong because prompt engineering with a vague instruction like 'Only provide facts that are absolutely certain' does not supply the model with actual factual data; the model still relies on its internal parametric knowledge, which is the source of hallucinations. Option D is wrong because fine-tuning on a publicly available dataset like PubMed introduces public, non-confidential data that may not align with the company's internal medical knowledge, and it requires significant GPU resources and complex pipeline management, which the team explicitly wants to avoid.

487
MCQeasy

A developer is testing a RAG application using OCI Generative AI. They receive an error: 'The model cohere.command-r-plus-v1:0 is not supported in this region.' What is the most likely cause?

A.The endpoint URL is incorrectly formatted.
B.The model is not available in the selected OCI region.
C.The tenancy is in a different availability domain.
D.The model name has a typo.
AnswerB

Cohere models are deployed in specific regions; the developer may be in a region where the model isn't provisioned.

Why this answer

The error message explicitly states that the model 'cohere.command-r-plus-v1:0' is not supported in the region. OCI Generative AI models are region-specific; each model is deployed only in certain OCI regions (e.g., us-ashburn-1, eu-frankfurt-1). If the selected region does not host that model, the API returns this error regardless of endpoint formatting, tenancy configuration, or model name spelling.

Exam trap

Oracle often tests the misconception that model availability is global across all OCI regions, leading candidates to overlook region-specific model deployment restrictions.

How to eliminate wrong answers

Option A is wrong because an incorrectly formatted endpoint URL would typically produce a 404 Not Found or a connection error, not a model-not-supported error. Option C is wrong because availability domains are a concept for compute instances, not for Generative AI model availability; the error is about regional model support, not AD-level placement. Option D is wrong because a typo in the model name would result in a 'model not found' error (e.g., 400 Bad Request), not a region-specific unsupported error.

488
MCQmedium

A developer runs an OCI GenAI chat request with system prompt "You are a sarcastic assistant." The output is offensive. How can the developer enforce safety policies?

A.Use the OCI GenAI content moderation filter.
B.Change model to LLAMA.
C.Increase maxTokens.
D.Set temperature to 0.
AnswerA

Content moderation filters explicitly block harmful or offensive content in outputs.

Why this answer

Option A is correct because the OCI GenAI content moderation filter is specifically designed to enforce safety policies by detecting and blocking offensive, harmful, or policy-violating content in both input prompts and model outputs. By enabling this filter, the developer can prevent the model from generating offensive responses even when a system prompt like 'You are a sarcastic assistant' encourages undesirable behavior.

Exam trap

Oracle often tests the misconception that adjusting model parameters (like temperature or maxTokens) or switching model families can substitute for explicit content moderation, when in fact safety enforcement requires dedicated filtering mechanisms that operate independently of model behavior.

How to eliminate wrong answers

Option B is wrong because changing the model to LLAMA does not inherently enforce safety policies; LLAMA models have their own safety risks and require separate content moderation or fine-tuning to block offensive outputs. Option C is wrong because increasing maxTokens only extends the maximum length of the generated response, which does nothing to prevent offensive content—it may even allow the model to produce more harmful text. Option D is wrong because setting temperature to 0 makes the model deterministic (greedy decoding) but does not filter or moderate content; it can still generate offensive responses if the training data or system prompt encourages such behavior.

489
MCQhard

A data scientist in group DataScientists uses the OCI Generative AI SDK to start a fine-tuning job in compartment AIResources. They receive the error shown. What is the most likely cause?

A.The compartment AIResources does not exist.
B.The fine-tuning API is not yet available in that region.
C.The fine-tuning job requires additional IAM policies for accessing the training data in Object Storage.
D.The data scientist is not in the DataScientists group.
AnswerC

The policy must also grant permissions on Object Storage buckets containing the training data.

Why this answer

Option C is correct because the error message indicates a permissions issue related to accessing training data in Object Storage. When using the OCI Generative AI SDK to start a fine-tuning job, the data scientist's IAM policies must explicitly grant read access to the bucket and objects containing the training data. Without these policies, the API call fails even if the user is in the correct group and the compartment exists.

Exam trap

The trap here is that candidates assume the error is about group membership or compartment existence, when in fact the fine-tuning job's dependency on Object Storage permissions is a classic oversight in OCI IAM policy configuration.

How to eliminate wrong answers

Option A is wrong because if the compartment AIResources did not exist, the error would be a '404 Not Found' or 'CompartmentNotFound' error, not a permissions-related error. Option B is wrong because the fine-tuning API is available in all OCI regions where Generative AI is supported; region unavailability would produce a 'ServiceNotSupported' or 'RegionNotSupported' error. Option D is wrong because the user is explicitly stated to be in the DataScientists group, and group membership alone does not grant access to Object Storage; IAM policies must be attached to the group or compartment to allow read access to training data.

490
MCQmedium

An administrator created the above IAM policies. A member of the GenerativeAIAdmins group reports they cannot invoke the model endpoint. Which permission is missing?

A.Permission to access the compartment
B.Permission to manage generative-ai-model
C.Permission to use or manage generative-ai-endpoint
D.Permission to read the model's training data
AnswerC

Only inspect is granted; need use or manage to invoke.

Why this answer

The error occurs because the IAM policy grants permissions for 'generative-ai-model' but not for 'generative-ai-endpoint'. Invoking a model endpoint requires the 'use' or 'manage' permission on the 'generative-ai-endpoint' resource type, as the endpoint is the runtime interface that handles inference requests. Without this permission, the API call to the endpoint is denied, even if the user has access to the underlying model.

Exam trap

The trap here is that candidates confuse the 'generative-ai-model' resource type (used for model lifecycle management) with the 'generative-ai-endpoint' resource type (required for runtime inference), leading them to select Option B instead of C.

How to eliminate wrong answers

Option A is wrong because compartment access is typically granted via a separate policy statement (e.g., 'Allow group to read compartments') and is not the specific missing permission for invoking an endpoint; the error is about resource-type permissions, not compartment-level access. Option B is wrong because 'manage generative-ai-model' allows management of the model resource (e.g., creating, updating, deleting models) but does not grant the runtime permission needed to invoke the endpoint for inference. Option D is wrong because reading the model's training data is a data-plane permission unrelated to endpoint invocation; model training data access is governed by object storage or data catalog policies, not by generative-ai-endpoint permissions.

491
MCQhard

An engineer configured the above index mapping for vector search. When performing a k-NN search, the results are unexpected. What is the most likely issue?

A.The space type 'cosinesimil' is not supported; it should be 'cosine'.
B.The dimension 768 does not match the embedding model's output dimension.
C.The mapping uses 'knn_vector' type with 'faiss' engine, which is incompatible.
D.The space type at the index level and mapping level are mismatched.
AnswerD

Mismatch causes incorrect distance calculations.

Why this answer

Option D is correct because OpenSearch requires the space type to be consistently defined at both the index-level settings (method.parameters.space_type) and the field-level mapping (space_type). A mismatch between these two causes the k-NN search to behave unexpectedly, as the engine uses the index-level setting for distance computation while the mapping-level setting may be used for validation or other purposes.

Exam trap

Oracle often tests the nuance that OpenSearch requires consistency between index-level and mapping-level space_type settings, a detail that candidates overlook because they assume only the mapping-level setting matters.

How to eliminate wrong answers

Option A is wrong because 'cosinesimil' is a valid space type in OpenSearch (an abbreviation for cosine similarity), not an unsupported value. Option B is wrong because while a dimension mismatch can cause issues, the question states the mapping is configured for vector search and the results are unexpected; the dimension 768 is a common embedding size and is not inherently incorrect without evidence of mismatch. Option C is wrong because 'knn_vector' type with 'faiss' engine is fully compatible and supported in OpenSearch for vector search workloads.

492
MCQeasy

A developer wants to generate text using the OCI Generative AI service via the API. Which endpoint should they use to send a text generation request?

A./v1/chat/completions
B./v1/embeddings
C./v1/completions
D./v1/models
AnswerC

This is the correct endpoint for text generation requests.

Why this answer

Option C is correct because the OCI Generative AI service uses the /v1/completions endpoint for text generation requests, as documented in the OCI Generative AI API reference. This endpoint accepts a prompt and generates a continuation of the text, making it the appropriate choice for general text generation tasks.

Exam trap

Oracle often tests the distinction between OCI-specific endpoints and those from other AI services like OpenAI, so candidates may mistakenly choose /v1/chat/completions if they confuse OCI Generative AI with ChatGPT's API.

How to eliminate wrong answers

Option A is wrong because /v1/chat/completions is an endpoint used by OpenAI's ChatGPT API, not by OCI Generative AI, which does not have a dedicated chat completions endpoint. Option B is wrong because /v1/embeddings is used for generating vector embeddings of text, not for generating new text completions. Option D is wrong because /v1/models is used to list available models or retrieve model metadata, not to send a text generation request.

493
MCQeasy

What is the primary purpose of an embedding model in a RAG pipeline?

A.To convert text into numerical vectors.
B.To generate human-like responses.
C.To rank search results.
D.To summarize long documents.
AnswerA

Embedding models encode text semantically into vectors.

Why this answer

Embedding models convert text into dense vector representations that can be used for similarity search in a vector database.

494
MCQhard

An AI team is fine-tuning a large language model using OCI Data Science and plans to deploy the fine-tuned model using the Generative AI service's custom model deployment. What is the required format for the model artifacts?

A.A Git repository URL
B.A single .pth file
C.A Docker image with the model and inference code
D.A .zip archive containing model weights and configuration files
AnswerD

The custom model deployment requires a zip archive with all necessary files.

Why this answer

The OCI Generative AI service requires custom model artifacts to be packaged as a .zip archive containing the model weights, configuration files (e.g., config.json, tokenizer files), and any necessary inference code. This format ensures the service can extract and load the model correctly into its managed inference infrastructure, aligning with the standard Hugging Face model repository structure.

Exam trap

The trap here is that candidates may confuse OCI Generative AI's custom model deployment with OCI Data Science model deployment, which does support Docker images, leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because a Git repository URL is not a supported artifact format for OCI Generative AI custom model deployment; the service expects a static artifact file, not a live repository reference. Option B is wrong because a single .pth file contains only PyTorch model weights without the required configuration files (e.g., config.json, tokenizer.json) and inference code, making it incomplete for deployment. Option C is wrong because OCI Generative AI custom model deployment does not accept Docker images; it uses a serverless, managed inference environment that expects a .zip archive of model artifacts, not a containerized application.

495
MCQhard

A developer implements a RAG chatbot using OCI Generative AI with streaming enabled. The chatbot fails to remember earlier conversation turns during a session. What is the most likely cause?

A.The max_tokens parameter is set too low.
B.The streaming endpoint does not support conversation history.
C.The application does not include previous messages in the request.
D.The temperature parameter is too high.
AnswerC

Session memory requires the client to send the conversation history in the messages list.

Why this answer

To maintain conversation history, the application must explicitly pass previous messages in each request. Without it, the model treats each query independently.

496
MCQeasy

A retail company uses OCI Generative AI Service to build a RAG chatbot for product recommendations. The chatbot should consider both the user's query and the retrieved product descriptions. Which component of the RAG pipeline is responsible for combining these inputs before sending to the LLM?

A.Reranker
B.Document retriever
C.Embedding model
D.Prompt template
AnswerD

Merges user query and context into a single prompt.

Why this answer

The prompt template is the component in a RAG pipeline that structures the final input to the LLM by combining the user's query with the retrieved product descriptions. It defines the format and instructions (e.g., 'Based on these product descriptions, recommend...') that the LLM uses to generate a coherent response. Without a prompt template, the raw query and documents would be sent without context, leading to poor or irrelevant outputs.

Exam trap

Oracle often tests the misconception that the embedding model or retriever handles input combination, when in fact those components only deal with vector representation and retrieval, not prompt assembly.

How to eliminate wrong answers

Option A is wrong because a reranker reorders retrieved documents based on relevance scores after initial retrieval, but it does not combine inputs with the user query for the LLM. Option B is wrong because the document retriever fetches relevant documents from the vector store using similarity search, but it does not merge them with the query into a single prompt. Option C is wrong because the embedding model converts text into vector representations for search, but it plays no role in assembling the final input to the LLM.

497
MCQeasy

A company is building a chatbot using OCI Generative AI service. They want to ensure that the model responses are grounded in their internal knowledge base. Which approach should they use?

A.Prompt engineering with few-shot examples
B.Fine-tuning the model on the internal knowledge base
C.Model distillation to compress the knowledge base
D.Retrieval-Augmented Generation (RAG)
AnswerD

RAG retrieves relevant documents from a knowledge base and uses them to generate grounded responses.

Why this answer

Retrieval-Augmented Generation (RAG) is the correct approach because it retrieves relevant documents from the company's internal knowledge base at inference time and provides them as context to the LLM, ensuring the model's responses are grounded in verifiable, up-to-date information without modifying the model itself. This directly addresses the requirement to ground responses in an internal knowledge base while avoiding the cost and complexity of retraining.

Exam trap

The trap here is that candidates often confuse fine-tuning (Option B) as the only way to incorporate proprietary data, overlooking that RAG provides a more flexible, cost-effective, and updatable method for grounding responses in a dynamic knowledge base without altering model weights.

How to eliminate wrong answers

Option A is wrong because prompt engineering with few-shot examples only provides a handful of static examples in the prompt, which cannot dynamically retrieve or incorporate the full breadth of an internal knowledge base, leading to hallucinations on unseen or specific internal data. Option B is wrong because fine-tuning the model on the internal knowledge base would embed that data into the model's weights, making it expensive to update, prone to catastrophic forgetting, and unable to guarantee factual grounding for new or changing documents without retraining. Option C is wrong because model distillation compresses a larger model into a smaller one for efficiency, but it does not introduce external knowledge retrieval; it merely replicates the behavior of the teacher model, which still lacks access to the internal knowledge base.

498
MCQhard

A machine learning team is fine-tuning a 7B parameter Llama 2 model on a custom dataset of 10,000 documents using OCI Data Science and GPU instances. They encounter out-of-memory (OOM) errors during the fine-tuning process. They are using a batch size of 8 and a sequence length of 2048. They cannot increase the GPU memory. Which change should they prioritize to resolve the OOM?

A.Enable gradient accumulation with steps of 4 or more.
B.Use mixed precision training (FP16).
C.Reduce the model size by using a 3B parameter version.
D.Decrease the number of training epochs.
AnswerA

Correct: Gradient accumulation reduces memory per step without changing effective batch size.

Why this answer

Option B is correct because enabling gradient accumulation allows the effective batch size to be maintained while reducing per-step memory usage. Option A changes the model entirely, Option C may not fix the memory issue, and Option D helps but may still OOM if the batch size is too high; gradient accumulation is more directly targeted.

499
MCQeasy

A developer is using OCI GenAI to generate structured data. They often get responses that include additional commentary or markdown. Which prompt engineering technique should they use to ensure only JSON output?

A.Set top_p to 0.1.
B.Use a model with a larger context window.
C.Add 'Return only JSON' at the end of the prompt.
D.Increase the temperature to 1.5.
AnswerC

Correct: Direct instruction enforces format.

Why this answer

Option C is correct because explicitly instructing the model to 'Return only JSON' directly constrains the output format, reducing the likelihood of extraneous commentary or markdown. This technique leverages prompt engineering to guide the model's behavior without altering inference parameters like temperature or top_p, which control randomness rather than output structure.

Exam trap

Oracle often tests the misconception that adjusting sampling parameters (like temperature or top_p) can enforce output format, when in fact these parameters control randomness and diversity, not structural constraints—leading candidates to overlook the direct prompt engineering solution.

How to eliminate wrong answers

Option A is wrong because setting top_p to 0.1 reduces the nucleus sampling threshold, making the model more deterministic but not preventing it from generating additional text or markdown; it controls token selection diversity, not output format. Option B is wrong because a larger context window allows the model to process more input tokens but does not enforce a specific output structure; it addresses memory limitations, not format constraints. Option D is wrong because increasing temperature to 1.5 raises randomness, which can actually increase the likelihood of unpredictable or verbose responses, including unwanted commentary, rather than ensuring strict JSON output.

500
MCQmedium

A team has deployed a generative AI model and needs to monitor inference performance and set up alerts for increased error rates. Which OCI service should they integrate with?

A.OCI Monitoring
B.OCI Cloud Guard
C.OCI Events
D.OCI Logging
AnswerA

Correct: Monitoring provides metrics and alerting for inference endpoints.

Why this answer

OCI Monitoring is the correct service because it provides metrics and alarms for tracking inference performance (e.g., latency, throughput) and error rates from deployed generative AI models. It allows you to set up threshold-based alerts on custom or predefined metrics, enabling proactive incident response. This directly addresses the requirement to monitor inference performance and alert on increased error rates.

Exam trap

Oracle often tests the distinction between monitoring (metrics/alarms) and logging (raw events) — candidates mistakenly choose OCI Logging because they think 'error rates' require log analysis, but OCI Monitoring is designed for metric-based alerting with thresholds.

How to eliminate wrong answers

Option B is wrong because OCI Cloud Guard is a security posture management service that detects misconfigurations and security threats, not a real-time performance monitoring or alerting tool for inference metrics. Option C is wrong because OCI Events is a notification service that reacts to state changes in OCI resources (e.g., object creation, instance termination) but does not natively track or alert on time-series performance metrics like error rates. Option D is wrong because OCI Logging collects and stores log data for audit and troubleshooting, but it lacks built-in metric-based alerting capabilities for monitoring inference performance trends or setting threshold alarms.

Page 6

Page 7 of 7

All pages