1Z0-1127 Practice Test 2 — 25 Questions

Question 1

You are deploying a generative AI solution on OCI for a healthcare client that requires strict data residency (data must remain in the EU) and low-latency inference. The solution uses a fine-tuned LLM model (7B parameters) stored in Object Storage in the Frankfurt region. You have set up an OCI Data Science model deployment endpoint with GPU shape VM.GPU.A10.1, using a single replica. During load testing with 50 concurrent users, you observe high latency (average 8 seconds per request) and occasional 504 gateway timeouts. The model deployment logs show no errors, and the model loads successfully. You have confirmed that the Object Storage bucket is in the same region and that the network latency between the client and the endpoint is minimal (under 5 ms). Which action should you take to reduce latency and eliminate timeouts?

Accepted Answer

Increase the number of replicas to 3 and enable autoscaling based on CPU utilization.. Option C is correct because the high latency and 504 timeouts with 50 concurrent users indicate that a single GPU replica is overwhelmed by the request queue. Increasing replicas to 3 distributes the load across multiple endpoints, while enabling autoscaling based on CPU utilization ensures dynamic scaling to handle traffic spikes. This directly reduces per-request latency and eliminates timeouts without violating data residency requirements.

Answer

Increase the model deployment endpoint timeout setting from 60 seconds to 300 seconds in the OCI console.

Answer

Upgrade the model deployment shape to VM.GPU.A100.4 and keep a single replica.

Answer

Move the model deployment to the US East (Ashburn) region to leverage lower-cost GPU capacity and reduce latency.

Question 2

A team is deploying a generative AI model using OCI Functions for serverless inference. They are experiencing cold start latency of over 10 seconds for the first invocation after idle periods. What is the best strategy to reduce cold start latency?

Accepted Answer

Use provisioned concurrency to keep a set number of function instances warm.. Option B is correct because OCI Functions supports provisioned concurrency, which keeps a specified number of instances warm. Option A (increasing memory) can reduce cold start but not as effectively. Option C (reducing timeout) might cause failures. Option D (using OCI Data Flow) is for data processing, not inference.

Answer

Migrate the inference to OCI Data Flow for better performance.

Answer

Reduce the function timeout to force faster execution.

Answer

Increase the memory allocation for the function.

Question 3

A startup wants to minimize costs when using OCI Generative AI service for a chatbot application that experiences sporadic usage. Which deployment strategy is most cost-effective?

Accepted Answer

Use the serverless on-demand API without dedicated endpoints. Option B is correct because serverless on-demand pricing charges only for usage, ideal for sporadic workloads. Option A is wrong because dedicated endpoints incur hourly costs regardless of usage. Option C is wrong because pre-built models may also have per-request costs but dedicated endpoints are not cost-effective. Option D is wrong because running models on OCI Compute adds management overhead and costs.

Answer

Use a pre-built model with a dedicated endpoint

Answer

Provision a dedicated endpoint for low latency

Answer

Deploy the model on OCI Compute with autoscaling

Question 4

A user wants to access the OCI Generative AI service programmatically. Which credential method is recommended for use in a production application running on OCI Compute?

Accepted Answer

Instance principal. Instance principal authentication is the recommended method for production applications running on OCI Compute because it allows the application to authenticate with OCI services without managing or embedding any credentials. The OCI Compute instance assumes a dynamic group and IAM policy that grants it permissions, and the SDK automatically handles token exchange via the instance metadata service, eliminating the need for long-lived secrets.

Answer

API signing keys

Answer

User password and OCID

Answer

Resource principal

Question 5

A company has deployed a generative AI model endpoint on OCI. They want to monitor token usage and latency for cost optimization. Which OCI service should they use to collect these metrics?

Accepted Answer

OCI Monitoring. A is correct because OCI Monitoring is the native telemetry service that collects and stores metrics such as token usage (e.g., input/output token counts) and latency (e.g., model inference latency) from OCI Generative AI endpoints. These metrics are automatically emitted by the OCI Generative AI service and can be queried via the Monitoring API or visualized in the Console, enabling cost optimization by tracking consumption patterns.

Answer

OCI Events

Answer

OCI Notifications

Answer

OCI Logging

Question 6

A company has multiple teams sharing an OCI Generative AI Dedicated AI Cluster. They need to ensure that each team can only access their own fine-tuned models and cannot see or invoke models from other teams. What is the best approach?

Accepted Answer

Use OCI compartments and IAM policies with resource-level permissions for models. OCI compartments and IAM policies with resource-level permissions allow you to grant granular access to specific models within a Dedicated AI Cluster. By placing each team's fine-tuned models in separate compartments and writing policies that restrict access to those compartments, you ensure teams can only see and invoke their own models. This approach leverages OCI's native identity and access management without requiring separate clusters or network-level isolation.

Answer

Train separate models for each team

Answer

Encrypt model artifacts with different keys for each team

Answer

Use network security lists to isolate traffic

Question 7

A developer wants to invoke an OCI Generative AI model from an application running on a compute instance in OCI. The instance is in a private subnet. What is the most secure method to access the model endpoint?

Accepted Answer

Use a Service Gateway to access the endpoint privately.. A Service Gateway allows resources in a private subnet to access OCI services, including the Generative AI model endpoint, over the OCI private network without traversing the internet. This is the most secure method because traffic stays within the OCI backbone, avoiding exposure to public IPs and reducing the attack surface.

Answer

Use an Internet Gateway and public endpoint.

Answer

Use a VPN Connect to connect to the model's public IP.

Answer

Use a NAT Gateway to access the endpoint.

Question 8

A company is designing a generative AI solution on OCI that must comply with data privacy regulations. Which three best practices should they follow? (Choose three.)

Accepted Answer

Enable audit logging for all inference requests. Option A is correct because enabling audit logging for all inference requests is a fundamental data privacy best practice. It provides an immutable record of who accessed the generative AI service, what data was sent, and when, which is essential for compliance audits and detecting unauthorized access. OCI Audit service captures these events automatically when configured, ensuring traceability without storing the actual inference payloads.

Answer

Allocate a dedicated compartment for generative AI resources to apply specific IAM policies

Answer

Store all inference inputs and outputs in a public bucket for transparency

Question 9

Which OCI Generative AI service model family supports fine-tuning with custom datasets?

Accepted Answer

Cohere Command. Cohere Command is the model family within OCI Generative AI that supports fine-tuning with custom datasets, allowing users to adapt the model for domain-specific tasks like summarization or classification. In contrast, Cohere Embed is designed for generating text embeddings, Cohere Summarize is a specialized endpoint for summarization without fine-tuning support, and GPT-3 is not natively available in OCI Generative AI for fine-tuning.

Answer

Cohere Embed

Answer

Cohere Summarize

Answer

GPT-3

Question 10

A developer wants to integrate generative AI capabilities into an application using REST API calls. Which OCI Generative AI service endpoint should they use for text generation?

Accepted Answer

/completions. Option A is correct because the OCI Generative AI service exposes a REST API endpoint at `/completions` specifically for text generation tasks. This endpoint accepts a prompt and returns a generated text completion, aligning directly with the developer's requirement to integrate generative AI capabilities via REST API calls.

Answer

/models

Answer

/inference

Answer

/chat

Question 11

Refer to the exhibit. The dashboard shows latency grouped by modelId, but some points are missing for certain modelIds. Which of the following is the most likely reason?

Accepted Answer

The modelIds with missing data may have been deleted or are inactive. Option C is correct because in OCI's Generative AI service, model deployments are associated with specific modelIds. If a modelId is deleted or its deployment is deactivated, the corresponding telemetry data (e.g., latency metrics) will no longer be reported, causing gaps in the dashboard. The dashboard aggregates metrics only for active modelIds, so missing points indicate that those modelIds are no longer in service.

Answer

The metric name is misspelled

Answer

The aggregation interval is too short

Answer

The compartmentId is incorrect

Question 12

Which THREE factors should be considered when choosing between fine-tuning a model and using a pre-trained model with prompt engineering? (Select three.)

Accepted Answer

Size of available dataset. Option B is correct because the size of the available dataset is a critical factor: fine-tuning requires a sufficiently large, labeled dataset (typically thousands of examples) to adjust model weights effectively, while prompt engineering can work with zero or few examples. If the dataset is too small, fine-tuning risks overfitting and poor generalization, making prompt engineering the safer choice.

Answer

Required response time

Answer

Internet connectivity

Question 13

A company wants to deploy a custom generative AI model for generating synthetic data for training other models. The model requires approximately 20GB of memory and must be accessible via a REST API with authentication. Additionally, the team needs to monitor for data drift over time. Which combination of OCI services best meets these requirements with minimal operational overhead?

Accepted Answer

OCI Data Science Model Deployment with OCI Monitoring and OCI Logging. Option B is correct because OCI Data Science Model Deployment provides a managed environment for hosting custom generative AI models with REST API endpoints and built-in authentication via OCI IAM. It integrates natively with OCI Monitoring and OCI Logging to track data drift and operational metrics without requiring additional infrastructure setup, minimizing operational overhead.

Answer

OCI Compute with custom Docker container and Prometheus monitoring

Answer

OCI Functions with API Gateway for authentication

Answer

OCI Data Flow with OCI Data Catalog for model registry

Question 14

A company is deploying a generative AI model for a real-time inference API. To ensure high availability and cost efficiency under variable load, which two configurations should they implement? (Choose two.)

Accepted Answer

Set the number of model deployment replicas to at least 2. Option D is correct because deploying at least two replicas ensures high availability by eliminating a single point of failure; if one replica fails, the other can still serve inference requests. This is a standard best practice for production workloads on OCI, where model deployment replicas are distributed across fault domains to maintain service continuity.

Answer

Use a single replica with a larger GPU to handle all traffic

Answer

Deploy the model in a single availability domain to simplify management

Answer

Disable connection draining on the load balancer

Question 15

An OCI administrator is configuring access control for OCI Generative AI. Which three IAM components are required to allow a group of data scientists to call the GenerateText API? (Choose three.)

Accepted Answer

An IAM group for the data scientists. An IAM group is required to organize the data scientists into a logical set of principals. IAM policies are then attached to this group to grant permissions, ensuring only members of the group can call the GenerateText API. Without a group, you cannot apply a policy to a collection of users.

Answer

A local peering gateway

Answer

A dynamic group

Question 16

Your team has deployed a fine-tuned GPT-2 model on OCI Model Deployment for a simple text generation API. The model performs text completion for short prompts (e.g., 50 tokens). The endpoint is working but response times are over 10 seconds for these short prompts. The model size is approximately 500MB and you used a VM.Standard.E3.Flex shape (2 OCPU, 16GB RAM). The deployment is in a single replica with no autoscaling. You have verified that the network latency is minimal (<5ms). The model was trained in OCI Data Science using a GPU shape, but during deployment you selected a CPU shape to reduce cost. The model is a transformer-based neural network. You've also confirmed that the deployment is healthy and there are no errors in the logs. The memory usage is within limits. What is the most likely cause of the high latency?

Accepted Answer

Missing GPU acceleration for inference. Option D is correct because GPT-2 is a transformer-based neural network that relies heavily on matrix multiplications, which are far more efficiently executed on GPUs due to their parallel architecture. Even though the model is only 500MB, CPU inference for transformer models is notoriously slow because CPUs process sequential operations, while GPUs can parallelize the attention mechanism and feed-forward layers. The 10-second latency for a 50-token prompt is a classic symptom of missing GPU acceleration, as the CPU shape (2 OCPU) lacks the specialized tensor cores needed for fast transformer inference.

Answer

High network latency between the client and the model endpoint

Answer

Model is too large for the VM.Standard.E3.Flex shape

Answer

Insufficient CPU resources for the model size

Question 17

You manage a generative AI model deployed on OCI Model Deployment that serves a chatbot application. The model is a 13B parameter LLM on a VM.GPU.A100.1 shape. Recently, you rolled out a new version of the model that is supposed to improve response quality. However, after the update, the application starts returning HTTP 500 errors and memory usage spikes. You need to update to the new version without causing downtime. The current deployment has 2 replicas with autoscaling enabled. Which strategy should you use to safely deploy the new model version?

Accepted Answer

Create a second deployment with the new model, test it, then shift traffic using a load balancer. Option B is correct because it implements a blue/green deployment strategy: you create a second deployment with the new model, test it in isolation, and then shift traffic using a load balancer. This avoids downtime and allows you to validate the new model before exposing it to production traffic, which is critical given the observed HTTP 500 errors and memory spikes.

Answer

Directly update the existing model deployment with the new model artifact

Answer

Stop the existing deployment, update the model artifact, then start the deployment

Answer

Increase the number of replicas to 4, then update the model

Question 18

A company is deploying a generative AI model on OCI for an internal application that must comply with strict security policies. The model will be accessed by a limited group of users. Which three actions should the administrator take to ensure security? (Choose three.)

Accepted Answer

Deploy the model in a private VCN subnet. Deploying the model in a private VCN subnet ensures that the model endpoint is not exposed to the internet, which is a fundamental security requirement for compliance with strict security policies. By placing the model in a private subnet, all traffic must traverse through a bastion host, VPN, or FastConnect, providing network isolation and reducing the attack surface. This aligns with OCI's shared responsibility model where the customer controls network security.

Answer

Expose the model endpoint to the internet for ease of access

Answer

Disable audit logging to minimize storage costs

Question 19

A data scientist wants to quickly test a prompt with different parameters like temperature and max tokens without writing code. Which OCI GenAI feature should they use?

Accepted Answer

OCI Generative AI Playground.. The OCI Generative AI Playground is a web-based, no-code interface that allows data scientists to interactively test prompts and adjust parameters like temperature and max tokens without writing any code. This directly matches the user's requirement for quick, code-free experimentation.

Answer

OCI CLI.

Answer

OCI SDK.

Answer

OCI Data Science Notebooks.

Question 20

During fine-tuning of a large language model on OCI, you notice that the model's performance on the validation set is not improving after several epochs, but the training loss continues to decrease. What is the most likely cause?

Accepted Answer

The model is overfitting to the training data.. When training loss decreases but validation performance stagnates or worsens, the model is overfitting to the training data. It memorizes the training examples but fails to generalize. A high learning rate might cause divergence, not this pattern. Too small training data can contribute to overfitting but is not the direct symptom. An unrepresentative validation set could cause mismatch, but the described pattern is classic overfitting.

Answer

The learning rate is too high.

Answer

The validation set is not representative.

Answer

The training data is too small.