Question 2 of 500

Deploying and Managing Generative AI on OCI →hardMultiple ChoiceObjective-mapped

Quick Answer

The answer is to increase the number of replicas to 3 and enable autoscaling based on CPU utilization. This resolves high latency and timeouts because a single GPU replica cannot handle 50 concurrent inference requests, creating a queue that drives average response times to 8 seconds and triggers 504 errors; horizontal scaling distributes the load across multiple endpoints, reducing per-request wait time, while CPU-based autoscaling dynamically adjusts capacity during traffic spikes. On the Oracle Cloud Infrastructure Generative AI Professional 1Z0-1127 exam, this scenario tests your understanding that scaling out replicas—not scaling up GPU shapes—is the correct fix for throughput bottlenecks in model deployment endpoints, and a common trap is assuming you need a larger GPU or faster storage when the logs show no errors. Remember the memory tip: "One replica, many requests? Scale out, not up—CPU autoscaling keeps timeouts in check."

1Z0-1127 Deploying and Managing Generative AI on OCI Practice Question

This 1Z0-1127 practice question tests your understanding of deploying and managing generative ai on oci. Examine the command output carefully: the correct answer depends on what the output actually shows, not on general recall alone. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

You are deploying a generative AI solution on OCI for a healthcare client that requires strict data residency (data must remain in the EU) and low-latency inference. The solution uses a fine-tuned LLM model (7B parameters) stored in Object Storage in the Frankfurt region. You have set up an OCI Data Science model deployment endpoint with GPU shape VM.GPU.A10.1, using a single replica. During load testing with 50 concurrent users, you observe high latency (average 8 seconds per request) and occasional 504 gateway timeouts. The model deployment logs show no errors, and the model loads successfully. You have confirmed that the Object Storage bucket is in the same region and that the network latency between the client and the endpoint is minimal (under 5 ms). Which action should you take to reduce latency and eliminate timeouts?

Question 1hardmultiple choice

Read the full NAT/PAT explanation →

A
Increase the model deployment endpoint timeout setting from 60 seconds to 300 seconds in the OCI console.
Why wrong: Option C is wrong because increasing timeout only masks the symptom without addressing the root cause (insufficient capacity).
B
Upgrade the model deployment shape to VM.GPU.A100.4 and keep a single replica.
Why wrong: Option A is wrong because upgrading to a larger GPU (A100) increases compute power per request, but with only one replica, concurrency remains a bottleneck; scaling out is more effective for high concurrency.
C
Increase the number of replicas to 3 and enable autoscaling based on CPU utilization.
Option D is correct because increasing the number of replicas to handle concurrent requests reduces queuing and improves throughput, while also enabling load balancing to avoid timeouts.
D
Move the model deployment to the US East (Ashburn) region to leverage lower-cost GPU capacity and reduce latency.
Why wrong: Option B is wrong because moving to a different region increases data residency risk and may add latency.

Full breakdown with real-world context →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

✓

Increase the number of replicas to 3 and enable autoscaling based on CPU utilization.

Option C is correct because the high latency and 504 timeouts with 50 concurrent users indicate that a single GPU replica is overwhelmed by the request queue. Increasing replicas to 3 distributes the load across multiple endpoints, while enabling autoscaling based on CPU utilization ensures dynamic scaling to handle traffic spikes. This directly reduces per-request latency and eliminates timeouts without violating data residency requirements.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

✗
Increase the model deployment endpoint timeout setting from 60 seconds to 300 seconds in the OCI console.
Why it's wrong here
Option C is wrong because increasing timeout only masks the symptom without addressing the root cause (insufficient capacity).
✗
Upgrade the model deployment shape to VM.GPU.A100.4 and keep a single replica.
Why it's wrong here
Option A is wrong because upgrading to a larger GPU (A100) increases compute power per request, but with only one replica, concurrency remains a bottleneck; scaling out is more effective for high concurrency.
✓
Increase the number of replicas to 3 and enable autoscaling based on CPU utilization.
Why this is correct
Option D is correct because increasing the number of replicas to handle concurrent requests reduces queuing and improves throughput, while also enabling load balancing to avoid timeouts.
Related concept
Read the scenario before looking for a memorised answer.
✗
Move the model deployment to the US East (Ashburn) region to leverage lower-cost GPU capacity and reduce latency.
Why it's wrong here
Option B is wrong because moving to a different region increases data residency risk and may add latency.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates often confuse increasing timeout (Option A) or upgrading GPU size (Option B) as solutions to concurrency issues, when in fact horizontal scaling via replicas is required to handle multiple simultaneous requests without violating data residency constraints.

Detailed technical explanation

How to think about this question

OCI Data Science model deployment endpoints use a load balancer to distribute requests across replicas, but a single replica processes requests sequentially on the GPU. With 50 concurrent users, the request queue builds up, causing average latency to exceed the default 60-second timeout. Autoscaling based on CPU utilization triggers when the replica's CPU usage exceeds a threshold (e.g., 70%), adding replicas to handle the load. The VM.GPU.A10.1 has limited GPU memory (24 GB) and compute, so multiple replicas are essential for concurrent inference.

KKey Concepts to Remember

Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.

TExam Day Tips

Watch for words such as best, first, most likely and least administrative effort.
Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A small business has 20 workstations on the 192.168.1.0/24 network and one public IP from its ISP. The router uses PAT (NAT overload) so all 20 devices share one public address using different source ports. NAT questions test whether you understand the four address terms and which direction each translation applies.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related 1Z0-1127 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Fundamentals of Large Language Models practice questions

Practise 1Z0-1127 questions linked to Fundamentals of Large Language Models.

Using OCI Generative AI Service practice questions

Practise 1Z0-1127 questions linked to Using OCI Generative AI Service.

Building LLM Applications with RAG and Vector Search practice questions

Practise 1Z0-1127 questions linked to Building LLM Applications with RAG and Vector Search.

Deploying and Managing Generative AI on OCI practice questions

Practise 1Z0-1127 questions linked to Deploying and Managing Generative AI on OCI.

1Z0-1127 fundamentals practice questions

Practise 1Z0-1127 questions linked to 1Z0-1127 fundamentals.

1Z0-1127 scenario practice questions

Practise 1Z0-1127 questions linked to 1Z0-1127 scenario.

1Z0-1127 troubleshooting practice questions

Practise 1Z0-1127 questions linked to 1Z0-1127 troubleshooting.

Practice this exam

Start a free 1Z0-1127 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

10 questions 20 questions 30 questions 50 questions Timed 30

1Z0-1127 practice-test guide →Study guide →Browse all practice tests

FAQ

Questions learners often ask

What does this 1Z0-1127 question test?

Deploying and Managing Generative AI on OCI — This question tests Deploying and Managing Generative AI on OCI — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Increase the number of replicas to 3 and enable autoscaling based on CPU utilization. — Option C is correct because the high latency and 504 timeouts with 50 concurrent users indicate that a single GPU replica is overwhelmed by the request queue. Increasing replicas to 3 distributes the load across multiple endpoints, while enabling autoscaling based on CPU utilization ensures dynamic scaling to handle traffic spikes. This directly reduces per-request latency and eliminates timeouts without violating data residency requirements.

What should I do if I get this 1Z0-1127 question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Keep practising

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

This 1Z0-1127 practice question is part of Courseiva's free Oracle certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the 1Z0-1127 exam.