Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 practice test

Using OCI Generative AI Service

Building LLM Applications with RAG and Vector Search

Deploying and Managing Generative AI on OCI

All domains with question counts →

Study 1Z0-1127 by topic

Topic pages go deep on individual concepts — each one covers a specific exam topic with questions, explanations, and study notes.

Fundamentals of Large Language Models practice questions

Practise 1Z0-1127 questions linked to Fundamentals of Large Language Models.

Using OCI Generative AI Service practice questions

Practise 1Z0-1127 questions linked to Using OCI Generative AI Service.

Building LLM Applications with RAG and Vector Search practice questions

Practise 1Z0-1127 questions linked to Building LLM Applications with RAG and Vector Search.

Deploying and Managing Generative AI on OCI practice questions

Practise 1Z0-1127 questions linked to Deploying and Managing Generative AI on OCI.

1Z0-1127 fundamentals practice questions

Practise 1Z0-1127 questions linked to 1Z0-1127 fundamentals.

1Z0-1127 scenario practice questions

Practise 1Z0-1127 questions linked to 1Z0-1127 scenario.

1Z0-1127 troubleshooting practice questions

Practise 1Z0-1127 questions linked to 1Z0-1127 troubleshooting.

Most searched

Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 practice questions

Start practice test

Question 1mediummultiple choice

Read the full Building LLM Applications with RAG and Vector Search explanation →

A developer wants to deploy a RAG application using OCI Generative AI for both embedding and text generation while minimizing costs. Which strategy is most effective?

Trap 1: Use a larger generation model

Larger generation models increase cost per generation.

Trap 2: Reduce chunk size to decrease embedding calls

Smaller chunks may increase the number of chunks and thus embedding calls.

Trap 3: Use a larger embedding model for better accuracy

Larger models cost more per API call.

Study all Building LLM Applications with RAG and Vector Search common traps →

A
Use a larger generation model
Why wrong: Larger generation models increase cost per generation.
B
Cache frequent queries and their embeddings
Caching reduces redundant embedding API calls, lowering costs.
C
Reduce chunk size to decrease embedding calls
Why wrong: Smaller chunks may increase the number of chunks and thus embedding calls.
D
Use a larger embedding model for better accuracy
Why wrong: Larger models cost more per API call.

Read the full Using OCI Generative AI Service explanation →

Question 2hardmultiple choice

A data scientist fine-tuned a model on OCI Gen AI using a dedicated AI cluster. After deployment, the model gives inaccurate results. Which troubleshooting step should they take first?

Trap 1: Switch to a different base model

Base model may not be the root cause if fine-tuning data is flawed.

Trap 2: Increase the cluster size

Cluster size affects performance, not accuracy.

Trap 3: Use a serverless endpoint

Endpoint type does not fix accuracy issues.

Study all Using OCI Generative AI Service common traps →

A
Switch to a different base model
Why wrong: Base model may not be the root cause if fine-tuning data is flawed.
B
Increase the cluster size
Why wrong: Cluster size affects performance, not accuracy.
C
Use a serverless endpoint
Why wrong: Endpoint type does not fix accuracy issues.
D
Check the training data for bias or quality issues
Training data quality directly impacts model accuracy.

Read the full Using OCI Generative AI Service explanation →

Question 3mediummultiple choice

Users report that inference requests to the OCI Generative AI service are taking longer than expected. The application uses the on-demand endpoint. What is the most likely cause of the increased latency?

Trap 1: The inference model is not fine-tuned for the use case.

Fine-tuning affects accuracy, not latency. The issue is performance, not model suitability.

Trap 2: The selected model is too large for the use case.

Model size affects inference speed, but the on-demand endpoint automatically scales; the more common cause is shared resource contention.

Trap 3: The API request timeout is set too low.

Timeout settings affect client-side waits, not server-side inference latency.

Study all Using OCI Generative AI Service common traps →

A
The inference model is not fine-tuned for the use case.
Why wrong: Fine-tuning affects accuracy, not latency. The issue is performance, not model suitability.
B
The on-demand endpoint experiences shared resource contention.
On-demand endpoints are multi-tenant; high concurrent usage can cause latency spikes.
C
The selected model is too large for the use case.
Why wrong: Model size affects inference speed, but the on-demand endpoint automatically scales; the more common cause is shared resource contention.
D
The API request timeout is set too low.
Why wrong: Timeout settings affect client-side waits, not server-side inference latency.

Read the full Using OCI Generative AI Service explanation →

Question 4mediummultiple choice

Refer to the exhibit. A developer runs the command and receives the error. What is the issue?

Network Topology

Trap 1: The message is too short.

Message length is not the issue.

Trap 2: The chat-id is invalid.

The error does not mention chat-id.

Trap 3: The endpoint is incorrect.

The endpoint is valid; the error is about max-tokens.

Study all Using OCI Generative AI Service common traps →

A
The max-tokens value exceeds the allowed range.
The error explicitly states the valid range.
B
The message is too short.
Why wrong: Message length is not the issue.
C
The chat-id is invalid.
Why wrong: The error does not mention chat-id.
D
The endpoint is incorrect.
Why wrong: The endpoint is valid; the error is about max-tokens.

Read the full Using OCI Generative AI Service explanation →

Question 5easymultiple choice

A developer wants to integrate OCI GenAI into a Java application. Which SDK should they use?

Trap 1: OCI JavaScript SDK.

JavaScript SDK is for Node.js or browser applications.

Trap 2: OCI Python SDK.

Python SDK is for Python applications, not Java.

Trap 3: OCI CLI.

CLI is a command-line tool, not an SDK for integration.

Study all Using OCI Generative AI Service common traps →

A
OCI JavaScript SDK.
Why wrong: JavaScript SDK is for Node.js or browser applications.
B
OCI Python SDK.
Why wrong: Python SDK is for Python applications, not Java.
C
OCI Java SDK.
The Java SDK is designed for Java applications.
D
OCI CLI.
Why wrong: CLI is a command-line tool, not an SDK for integration.

Read the full Fundamentals of Large Language Models explanation →

Question 6hardmulti select

Which TWO factors most significantly influence the computational cost of fine-tuning a large language model?

Trap 1: Batch size

Batch size affects memory but not per-token compute cost.

Trap 2: Quantization bits

Quantization reduces cost, not increases.

Trap 3: Dataset size

Dataset size affects total training time but not per-step cost.

Study all Fundamentals of Large Language Models common traps →

A
Batch size
Why wrong: Batch size affects memory but not per-token compute cost.
B
Number of model parameters
More parameters increase compute and memory requirements.
C
Maximum sequence length
Longer sequences increase attention computation and memory usage.
D
Quantization bits
Why wrong: Quantization reduces cost, not increases.
E
Dataset size
Why wrong: Dataset size affects total training time but not per-step cost.

Read the full Fundamentals of Large Language Models explanation →

Question 7easymultiple choice

An organization wants to use an LLM to summarize legal documents. Which consideration is most important for ensuring accurate summaries?

Trap 1: Use the largest available general-purpose model

Size alone doesn't ensure domain expertise.

Trap 2: Rely on zero-shot summarization with careful prompting

Zero-shot may miss critical legal details.

Trap 3: Pre-train a new model from scratch on legal texts

Pre-training from scratch is resource-intensive and seldom needed.

Study all Fundamentals of Large Language Models common traps →

A
Fine-tune the model on a curated legal corpus
Domain-specific fine-tuning teaches the model legal terminology and reasoning.
B
Use the largest available general-purpose model
Why wrong: Size alone doesn't ensure domain expertise.
C
Rely on zero-shot summarization with careful prompting
Why wrong: Zero-shot may miss critical legal details.
D
Pre-train a new model from scratch on legal texts
Why wrong: Pre-training from scratch is resource-intensive and seldom needed.

Question 8mediummultiple choice

Study all Fundamentals of Large Language Models common traps →

A healthcare startup is building an AI assistant to help doctors draft clinical notes from patient-physician conversations. They have a large language model that is fine-tuned on medical data. During testing, they notice the model occasionally generates plausible-sounding but incorrect medical recommendations. The startup wants to deploy the assistant to assist doctors, not replace them. They have the following options: (A) Deploy the model as-is and rely on doctors to catch errors, (B) Add a disclaimer that the model may make mistakes, (C) Implement a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base before presenting to doctors, (D) Reduce the model's temperature to 0 to ensure deterministic outputs. Which option best balances safety and utility?

Trap 1: Add a disclaimer that the model may make mistakes.

Disclaimer does not reduce risk of incorrect advice.

Trap 2: Deploy the model as-is and rely on doctors to catch errors.

Doctors may miss errors; this is unsafe.

Trap 3: Reduce the model's temperature to 0 to ensure deterministic outputs.

Deterministic outputs can still be incorrect.

A
Implement a fact-checking pipeline that cross-references outputs with a trusted medical knowledge base.
Fact-checking reduces hallucinations and ensures accuracy.
B
Add a disclaimer that the model may make mistakes.
Why wrong: Disclaimer does not reduce risk of incorrect advice.
C
Deploy the model as-is and rely on doctors to catch errors.
Why wrong: Doctors may miss errors; this is unsafe.
D
Reduce the model's temperature to 0 to ensure deterministic outputs.
Why wrong: Deterministic outputs can still be incorrect.

Read the full Fundamentals of Large Language Models explanation →

Question 9hardmultiple choice

A team is fine-tuning a large language model for a domain-specific Q&A application. After fine-tuning, they observe that the model performs well on the training distribution but struggles with out-of-distribution (OOD) questions. Which approach would best improve OOD robustness?

Trap 1: Use early stopping based on training loss to avoid overfitting.

Early stopping on training loss may not address OOD issues.

Trap 2: Reduce the model size to prevent overfitting to the training data.

Smaller model has less capacity to learn generalizable features.

Trap 3: Increase the learning rate during fine-tuning to adapt faster to…

Higher learning rate can cause instability and catastrophic forgetting.

Study all Fundamentals of Large Language Models common traps →

A
Include a diverse set of examples from related domains in the fine-tuning dataset.
Diverse data improves generalization and OOD performance.
B
Use early stopping based on training loss to avoid overfitting.
Why wrong: Early stopping on training loss may not address OOD issues.
C
Reduce the model size to prevent overfitting to the training data.
Why wrong: Smaller model has less capacity to learn generalizable features.
D
Increase the learning rate during fine-tuning to adapt faster to new patterns.
Why wrong: Higher learning rate can cause instability and catastrophic forgetting.

Read the full Using OCI Generative AI Service explanation →

Question 10mediummulti select

Which TWO measures can help reduce the risk of generating toxic or unsafe content when using OCI Generative AI Service?

Trap 1: Disable model monitoring and logging to reduce overhead.

Disabling monitoring reduces the ability to detect and respond to toxic outputs.

Trap 2: Increase the temperature parameter to make output more…

Higher temperature increases randomness, which can increase the chance of toxic outputs.

Trap 3: Fine-tune the model on a large dataset without any safety filtering.

Fine-tuning without safety filtering can embed toxic patterns into the model.

Study all Using OCI Generative AI Service common traps →

A
Use few-shot prompting with examples that demonstrate safe and appropriate responses.
Safe examples help steer the model toward desired behavior.
B
Disable model monitoring and logging to reduce overhead.
Why wrong: Disabling monitoring reduces the ability to detect and respond to toxic outputs.
C
Increase the temperature parameter to make output more deterministic.
Why wrong: Higher temperature increases randomness, which can increase the chance of toxic outputs.
D
Fine-tune the model on a large dataset without any safety filtering.
Why wrong: Fine-tuning without safety filtering can embed toxic patterns into the model.
E
Enable the built-in content filtering features provided by OCI Generative AI Service.
Content filters block harmful outputs based on predefined categories.

Read the full Using OCI Generative AI Service explanation →

Question 11hardmultiple choice

Refer to the exhibit. A user runs the command shown and receives the error: 'ServiceError: NotAuthorizedOrNotFound'. What is the MOST likely cause?

Network Topology

Trap 1: The CLI is not configured with OCI credentials

Missing credentials would give an authentication error, not NotAuthorizedOrNotFound.

Trap 2: The model ID is incorrectly formatted

An invalid format would likely cause a validation error, not a 404/403.

Trap 3: The model is in a different region than iad

Region mismatch would give a different error.

Study all Using OCI Generative AI Service common traps →

A
The CLI is not configured with OCI credentials
Why wrong: Missing credentials would give an authentication error, not NotAuthorizedOrNotFound.
B
The user does not have the 'inspect' permission on the model
NotAuthorizedOrNotFound is common when permissions are insufficient.
C
The model ID is incorrectly formatted
Why wrong: An invalid format would likely cause a validation error, not a 404/403.
D
The model is in a different region than iad
Why wrong: Region mismatch would give a different error.

Question 12hardmultiple choice

Study all Using OCI Generative AI Service common traps →

You are a cloud architect at a healthcare company that uses OCI Generative AI Service to analyze patient records and generate clinical summaries. The service is deployed in the Frankfurt region with a dedicated AI cluster. Recently, the compliance team flagged that some generated summaries contain hallucinated diagnoses not present in the source records. They demand immediate mitigation. The current setup uses the default model (cohere.command-r-08-2024) with temperature=0.7, top_p=0.9, and max_tokens=2048. The application sends the entire patient record as a single prompt. You have access to OCI Logging, monitoring metrics (latency, request count, token count, safety filter rejections), and the AI service's model fine-tuning capability. You must reduce hallucinations while minimizing latency increase. What is the most effective course of action?

Trap 1: Switch to cohere.command-light model for faster inference and add a…

A lighter model may be faster but likely less accurate; post-processing NER helps but does not prevent hallucinations at generation time.

Trap 2: Increase max_tokens to 4096 and use chunked processing with…

Chunking with overlap may reduce hallucinations by providing more context, but increasing max_tokens increases latency and cost; the improvement might be marginal.

Trap 3: Enable the safety filter with strict content moderation and set up…

Safety filters block harmful content but do not reduce hallucinations about medical facts; auditing only detects issues after the fact.

A
Switch to cohere.command-light model for faster inference and add a post-processing step using a BERT-based NER model to validate entities.
Why wrong: A lighter model may be faster but likely less accurate; post-processing NER helps but does not prevent hallucinations at generation time.
B
Increase max_tokens to 4096 and use chunked processing with overlapping context windows to provide more context.
Why wrong: Chunking with overlap may reduce hallucinations by providing more context, but increasing max_tokens increases latency and cost; the improvement might be marginal.
C
Enable the safety filter with strict content moderation and set up OCI Logging to audit all generations.
Why wrong: Safety filters block harmful content but do not reduce hallucinations about medical facts; auditing only detects issues after the fact.
D
Reduce temperature to 0.2, top_p to 0.5, and fine-tune the model on a curated dataset of 5,000 clinical summaries with a learning rate of 0.00005 and batch size of 8.
Lower temperature/top_p yields more deterministic outputs; fine-tuning on domain-specific data directly reduces hallucinations.

Read the full Using OCI Generative AI Service explanation →

Question 13mediummultiple choice

An enterprise deployed a custom fine-tuned model for generating financial reports. After the first month, the model's outputs began to include outdated information and occasional factual errors. The team suspects data drift. What is the best course of action?

Trap 1: Switch to a newer base model like Llama 3.1 without retraining.

A newer base model may still require fine-tuning on the specific domain data to be accurate.

Trap 2: Decrease the temperature parameter to 0.1 to reduce model…

Temperature controls randomness, not factual accuracy; it won't fix outdated knowledge.

Trap 3: Increase the max tokens value to allow longer responses.

Max tokens only affects response length, not quality or timeliness.

Study all Using OCI Generative AI Service common traps →

A
Switch to a newer base model like Llama 3.1 without retraining.
Why wrong: A newer base model may still require fine-tuning on the specific domain data to be accurate.
B
Decrease the temperature parameter to 0.1 to reduce model creativity.
Why wrong: Temperature controls randomness, not factual accuracy; it won't fix outdated knowledge.
C
Retrain the model on the latest financial data and monitor for drift.
Retraining with current data mitigates data drift and improves output accuracy.
D
Increase the max tokens value to allow longer responses.
Why wrong: Max tokens only affects response length, not quality or timeliness.

Read the full Building LLM Applications with RAG and Vector Search explanation →

Question 14easymultiple choice

A developer is building a RAG application using Oracle Cloud Infrastructure (OCI) Document Understanding and OCI Generative AI. After chunking documents and generating embeddings, the developer observes that the retrieval step often returns chunks that are semantically unrelated to the query. Which action is MOST likely to improve retrieval relevance?

Trap 1: Switch from a dense embedding model to a sparse embedding model.

The embedding model choice is secondary; chunking is the primary issue.

Trap 2: Increase the chunk size to capture more context.

Larger chunks may include irrelevant content, reducing precision.

Trap 3: Reduce the number of retrieved chunks (k) in the vector search.

Reducing k may cause relevant passages to be missed.

Study all Building LLM Applications with RAG and Vector Search common traps →

A
Switch from a dense embedding model to a sparse embedding model.
Why wrong: The embedding model choice is secondary; chunking is the primary issue.
B
Adjust the chunk size and chunk overlap to better capture coherent passages.
Proper chunking helps preserve meaning and improves retrieval accuracy.
C
Increase the chunk size to capture more context.
Why wrong: Larger chunks may include irrelevant content, reducing precision.
D
Reduce the number of retrieved chunks (k) in the vector search.
Why wrong: Reducing k may cause relevant passages to be missed.

Question 15hardmultiple choice

Study all Building LLM Applications with RAG and Vector Search common traps →

A company is deploying a RAG pipeline using OCI Data Science and OCI Generative AI. The pipeline uses a Cohere command model for generation and a Cohere embed model for retrieval. The team notices that the model occasionally produces hallucinated answers that are not supported by the retrieved context. Which strategy is MOST effective at reducing hallucinations?

Trap 1: Increase the temperature parameter of the generation model.

Higher temperature increases randomness, potentially worsening hallucinations.

Trap 2: Increase the number of retrieved chunks (k) to provide more context.

More context can include irrelevant or contradictory information.

Trap 3: Use a larger generative model with more parameters.

Larger models may still hallucinate; size alone does not guarantee faithful output.

A
Implement a faithfulness verification step that re-ranks retrieved passages based on alignment with the generated answer.
A verification step can detect and mitigate unsupported claims.
B
Increase the temperature parameter of the generation model.
Why wrong: Higher temperature increases randomness, potentially worsening hallucinations.
C
Increase the number of retrieved chunks (k) to provide more context.
Why wrong: More context can include irrelevant or contradictory information.
D
Use a larger generative model with more parameters.
Why wrong: Larger models may still hallucinate; size alone does not guarantee faithful output.

Read the full Building LLM Applications with RAG and Vector Search explanation →

Question 16easymultiple choice

A developer is using OCI Generative AI to build a question-answering system over a large corpus of technical manuals. The developer uses the Cohere Embed model to generate embeddings and stores them in an OCI OpenSearch cluster. Queries are slow and the team needs to reduce latency. Which approach is BEST for improving search speed while maintaining acceptable accuracy?

Trap 1: Increase the embedding dimension for better representation.

Higher dimensionality increases computation and slows search.

Trap 2: Use exact nearest neighbor search instead of approximate.

Exact search is slower than approximate methods.

Trap 3: Increase the index refresh interval to reduce write overhead.

Index refresh affects write performance, not search latency.

Study all Building LLM Applications with RAG and Vector Search common traps →

A
Increase the embedding dimension for better representation.
Why wrong: Higher dimensionality increases computation and slows search.
B
Reduce the k value in the nearest neighbor search.
Fewer neighbors means less distance computation and faster retrieval.
C
Use exact nearest neighbor search instead of approximate.
Why wrong: Exact search is slower than approximate methods.
D
Increase the index refresh interval to reduce write overhead.
Why wrong: Index refresh affects write performance, not search latency.

Read the full Building LLM Applications with RAG and Vector Search explanation →

Question 17hardmultiple choice

An engineer configured the above index mapping for vector search. When performing a k-NN search, the results are unexpected. What is the most likely issue?

Exhibit

Refer to the exhibit.

document index mapping:
{
  "settings": {
    "index": {
      "knn": true,
      "knn.space_type": "cosinesimil"
    }
  },
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "space_type": "l2"
        }
      },
      "metadata": {
        "type": "object"
      }
    }
  }
}

Trap 1: The space type 'cosinesimil' is not supported; it should be…

'cosinesimil' is valid.

Trap 2: The dimension 768 does not match the embedding model's output…

768 is a common dimension.

Trap 3: The mapping uses 'knn_vector' type with 'faiss' engine, which is…

faiss is a supported engine.

Study all Building LLM Applications with RAG and Vector Search common traps →

A
The space type 'cosinesimil' is not supported; it should be 'cosine'.
Why wrong: 'cosinesimil' is valid.
B
The dimension 768 does not match the embedding model's output dimension.
Why wrong: 768 is a common dimension.
C
The mapping uses 'knn_vector' type with 'faiss' engine, which is incompatible.
Why wrong: faiss is a supported engine.
D
The space type at the index level and mapping level are mismatched.
Mismatch causes incorrect distance calculations.

Read the full Deploying and Managing Generative AI on OCI explanation →

Question 18mediummultiple choice

An administrator runs the above CLI command to check the status of a dedicated AI cluster. The cluster is ACTIVE with capacity 10. However, a user reports that inference requests to this cluster are failing with a '429 Too Many Requests' error. What is the most likely cause?

Exhibit

Refer to the exhibit.

```
$ oci generative-ai dedicated-ai-cluster get --dedicated-ai-cluster-id ocid1.dedicatedaicluster.oc1.iad.xxxxx
{
  "data": {
    "capacity": 10,
    "id": "ocid1.dedicatedaicluster.oc1.iad.xxxxx",
    "lifecycle-state": "ACTIVE",
    "time-created": "2024-01-15T10:00:00Z",
    "time-updated": "2024-01-15T10:00:00Z"
  }
}
```

Trap 1: The cluster does not have enough nodes to handle the load

Capacity 10 nodes may be sufficient; error is rate limiting.

Trap 2: The user is not in the same compartment as the cluster

Compartment mismatch would cause 404 or 401, not 429.

Trap 3: The cluster is not in ACTIVE state

The output shows ACTIVE.

Study all Deploying and Managing Generative AI on OCI common traps →

A
The cluster is hitting the maximum inference requests per minute limit
429 indicates rate limit; the cluster has a requests-per-minute limit separate from node count.
B
The cluster does not have enough nodes to handle the load
Why wrong: Capacity 10 nodes may be sufficient; error is rate limiting.
C
The user is not in the same compartment as the cluster
Why wrong: Compartment mismatch would cause 404 or 401, not 429.
D
The cluster is not in ACTIVE state
Why wrong: The output shows ACTIVE.

Question 19hardmultiple choice

Study all Deploying and Managing Generative AI on OCI common traps →

You are deploying a generative AI solution on OCI for a healthcare client that requires strict data residency (data must remain in the EU) and low-latency inference. The solution uses a fine-tuned LLM model (7B parameters) stored in Object Storage in the Frankfurt region. You have set up an OCI Data Science model deployment endpoint with GPU shape VM.GPU.A10.1, using a single replica. During load testing with 50 concurrent users, you observe high latency (average 8 seconds per request) and occasional 504 gateway timeouts. The model deployment logs show no errors, and the model loads successfully. You have confirmed that the Object Storage bucket is in the same region and that the network latency between the client and the endpoint is minimal (under 5 ms). Which action should you take to reduce latency and eliminate timeouts?

Trap 1: Increase the model deployment endpoint timeout setting from 60…

Option C is wrong because increasing timeout only masks the symptom without addressing the root cause (insufficient capacity).

Trap 2: Upgrade the model deployment shape to VM.GPU.A100.4 and keep a…

Option A is wrong because upgrading to a larger GPU (A100) increases compute power per request, but with only one replica, concurrency remains a bottleneck; scaling out is more effective for high concurrency.

Trap 3: Move the model deployment to the US East (Ashburn) region to…

Option B is wrong because moving to a different region increases data residency risk and may add latency.

A
Increase the model deployment endpoint timeout setting from 60 seconds to 300 seconds in the OCI console.
Why wrong: Option C is wrong because increasing timeout only masks the symptom without addressing the root cause (insufficient capacity).
B
Upgrade the model deployment shape to VM.GPU.A100.4 and keep a single replica.
Why wrong: Option A is wrong because upgrading to a larger GPU (A100) increases compute power per request, but with only one replica, concurrency remains a bottleneck; scaling out is more effective for high concurrency.
C
Increase the number of replicas to 3 and enable autoscaling based on CPU utilization.
Option D is correct because increasing the number of replicas to handle concurrent requests reduces queuing and improves throughput, while also enabling load balancing to avoid timeouts.
D
Move the model deployment to the US East (Ashburn) region to leverage lower-cost GPU capacity and reduce latency.
Why wrong: Option B is wrong because moving to a different region increases data residency risk and may add latency.

Read the full Deploying and Managing Generative AI on OCI explanation →

Question 20mediummultiple choice

A team is deploying a generative AI model using OCI Functions for serverless inference. They are experiencing cold start latency of over 10 seconds for the first invocation after idle periods. What is the best strategy to reduce cold start latency?

Trap 1: Migrate the inference to OCI Data Flow for better performance.

Data Flow is for big data processing, not real-time inference.

Trap 2: Reduce the function timeout to force faster execution.

Reducing timeout does not affect cold start and may cause errors.

Trap 3: Increase the memory allocation for the function.

More memory can speed up cold start but provisioned concurrency is more direct.

Study all Deploying and Managing Generative AI on OCI common traps →

A
Migrate the inference to OCI Data Flow for better performance.
Why wrong: Data Flow is for big data processing, not real-time inference.
B
Use provisioned concurrency to keep a set number of function instances warm.
Provisioned concurrency eliminates cold start by pre-warming instances.
C
Reduce the function timeout to force faster execution.
Why wrong: Reducing timeout does not affect cold start and may cause errors.
D
Increase the memory allocation for the function.
Why wrong: More memory can speed up cold start but provisioned concurrency is more direct.

Read the full Deploying and Managing Generative AI on OCI explanation →

Question 21mediummultiple choice

A company has fine-tuned a large language model using OCI Generative AI service. When attempting to deploy the model to a dedicated endpoint, the deployment fails with an error indicating insufficient capacity. Which action should be taken to resolve this issue?

Trap 1: Delete existing endpoints to free capacity

Unnecessary if a limit increase is possible; also disrupts existing workloads.

Trap 2: Deploy the model to a different OCI region

This may avoid capacity issues but is not a direct resolution and could introduce latency.

Trap 3: Use a pre-built model instead of the fine-tuned model

This disregards the value of the fine-tuned model.

Study all Deploying and Managing Generative AI on OCI common traps →

A
Delete existing endpoints to free capacity
Why wrong: Unnecessary if a limit increase is possible; also disrupts existing workloads.
B
Deploy the model to a different OCI region
Why wrong: This may avoid capacity issues but is not a direct resolution and could introduce latency.
C
Use a pre-built model instead of the fine-tuned model
Why wrong: This disregards the value of the fine-tuned model.
D
Request a service limit increase for dedicated endpoints
OCI allows customers to request higher limits for resources like dedicated endpoints.

Read the full Deploying and Managing Generative AI on OCI explanation →

Question 22easymultiple choice

A startup wants to minimize costs when using OCI Generative AI service for a chatbot application that experiences sporadic usage. Which deployment strategy is most cost-effective?

Trap 1: Use a pre-built model with a dedicated endpoint

Pre-built models can be used via on-demand, but dedicated endpoint is unnecessary.

Trap 2: Provision a dedicated endpoint for low latency

Dedicated endpoints have hourly costs, even when idle.

Trap 3: Deploy the model on OCI Compute with autoscaling

Adds operational overhead and may not be as simple as the managed service.

Study all Deploying and Managing Generative AI on OCI common traps →

A
Use a pre-built model with a dedicated endpoint
Why wrong: Pre-built models can be used via on-demand, but dedicated endpoint is unnecessary.
B
Use the serverless on-demand API without dedicated endpoints
Pay per request, no idle costs.
C
Provision a dedicated endpoint for low latency
Why wrong: Dedicated endpoints have hourly costs, even when idle.
D
Deploy the model on OCI Compute with autoscaling
Why wrong: Adds operational overhead and may not be as simple as the managed service.

Read the full Deploying and Managing Generative AI on OCI explanation →

Question 23easymultiple choice

A user wants to access the OCI Generative AI service programmatically. Which credential method is recommended for use in a production application running on OCI Compute?

Trap 1: API signing keys

Keys can be compromised if stored in code.

Trap 2: User password and OCID

Passwords are not used for API authentication.

Trap 3: Resource principal

Resource principal is for OCI resources like Functions, not compute instances.

Study all Deploying and Managing Generative AI on OCI common traps →

A
API signing keys
Why wrong: Keys can be compromised if stored in code.
B
Instance principal
Instance principal dynamically obtains credentials via instance metadata service.
C
User password and OCID
Why wrong: Passwords are not used for API authentication.
D
Resource principal
Why wrong: Resource principal is for OCI resources like Functions, not compute instances.

Read the full Deploying and Managing Generative AI on OCI explanation →

Question 24mediummultiple choice

A company has deployed a generative AI model endpoint on OCI. They want to monitor token usage and latency for cost optimization. Which OCI service should they use to collect these metrics?

Trap 1: OCI Events

Events trigger actions based on changes, not for continuous metric collection.

Trap 2: OCI Notifications

Notifications are for alerting, not metric collection.

Trap 3: OCI Logging

Logging captures logs, not metrics. Token usage and latency are metric data.

Study all Deploying and Managing Generative AI on OCI common traps →

A
OCI Monitoring
OCI Monitoring collects and visualizes metrics such as token count and latency.
B
OCI Events
Why wrong: Events trigger actions based on changes, not for continuous metric collection.
C
OCI Notifications
Why wrong: Notifications are for alerting, not metric collection.
D
OCI Logging
Why wrong: Logging captures logs, not metrics. Token usage and latency are metric data.