CCNA Deploying and Managing Generative AI on OCI Questions

47 of 122 questions · Page 2/2 · Deploying and Managing Generative AI on OCI · Answers revealed

76
MCQmedium

Refer to the exhibit. An administrator receives the error shown when attempting to deploy a custom model. What is the most likely cause?

A.The user or service does not have permission to read the model artifact from Object Storage
B.The compartment ID is incorrect
C.The model artifact file is corrupted
D.The dedicated AI cluster ID is invalid
AnswerA

The 403 error indicates lack of IAM permissions to access the bucket.

Why this answer

The error indicates that the deployment process cannot access the model artifact stored in Object Storage. In OCI Generative AI, the service must have read permission on the bucket and object to download the artifact. If the user or service principal lacks the necessary IAM policy (e.g., `allow service generative-ai to read objects in compartment X where target.bucket.name='Y'`), the deployment fails with this access-denied error.

Exam trap

Oracle often tests the distinction between 'permission denied' and 'resource not found' errors; the trap here is that candidates may confuse a missing IAM policy with an incorrect compartment ID or a corrupted artifact, but the error message's reference to 'access' or 'permission' points directly to Object Storage read rights.

How to eliminate wrong answers

Option B is wrong because an incorrect compartment ID would produce a different error (e.g., 'compartment not found' or 'not authorized for compartment'), not a permission error on the artifact. Option C is wrong because a corrupted artifact would cause a validation or extraction failure during model loading, not an access-denied error at the storage retrieval stage. Option D is wrong because an invalid dedicated AI cluster ID would result in a cluster-not-found or capacity error, not a permission error on Object Storage.

77
MCQeasy

An administrator needs to ensure that only specific users in the finance department can invoke a generative AI model deployed on OCI. Which IAM policy should be used?

A.allow group admins to use generative-ai-model in compartment finance
B.allow group finance_group to manage generative-ai-model in compartment finance
C.allow group finance_group to use generative-ai-model in compartment finance
D.allow any-user to use generative-ai-model in compartment finance
AnswerC

This correctly restricts to the finance group.

Why this answer

Option C is correct because the 'use' verb in an OCI IAM policy grants the minimum required permissions to invoke a generative AI model without allowing management actions like creating or deleting models. The policy scopes access to the 'finance_group' group and the 'finance' compartment, ensuring only specific users in the finance department can invoke the model.

Exam trap

Oracle often tests the distinction between 'use' and 'manage' verbs, where candidates mistakenly choose 'manage' thinking it includes 'use', but 'manage' grants excessive permissions that violate least privilege requirements.

How to eliminate wrong answers

Option A is wrong because it grants access to the 'admins' group instead of the finance department group, and 'use' on the resource type 'generative-ai-model' is correct but the group is wrong. Option B is wrong because 'manage' provides excessive permissions (e.g., create, update, delete models) beyond the required invoke action, violating the principle of least privilege. Option D is wrong because 'any-user' allows all authenticated users in the tenancy to invoke the model, which does not restrict access to only finance department users.

78
MCQhard

A financial institution needs to deploy a fine-tuned model on OCI with strict data residency requirements. They must ensure that data used for inference never leaves a specific OCI region. The model is stored in Object Storage in the same region. What additional configuration is needed?

A.Configure the dedicated AI cluster to use a private endpoint and restrict access to the region
B.Use OCI Data Transfer service to move data
C.Set up a VPN connection to on-premises
D.Enable cross-region replication on the bucket
AnswerA

Private endpoints keep all traffic within the OCI network and the same region.

Why this answer

Option A is correct because configuring the dedicated AI cluster to use a private endpoint ensures that inference traffic stays within the OCI region and never traverses the public internet. This satisfies the strict data residency requirement by keeping all data and model inference within the designated region, while the model stored in Object Storage in the same region is accessed via the private endpoint without leaving the region.

Exam trap

The trap here is that candidates confuse data residency with data security, incorrectly assuming that a VPN (Option C) or cross-region replication (Option D) can enforce regional confinement, when in fact they either route data outside the region or actively replicate it across regions.

How to eliminate wrong answers

Option B is wrong because OCI Data Transfer Service is designed for offline bulk data migration (e.g., shipping physical drives) and does not address real-time inference data residency or network-level regional confinement. Option C is wrong because setting up a VPN connection to on-premises would route inference traffic outside the OCI region to an on-premises network, violating the data residency requirement that data never leave the specific region. Option D is wrong because enabling cross-region replication on the bucket would actively copy data to another region, directly contradicting the requirement that data never leave the original region.

79
MCQmedium

A company has deployed a model on a Dedicated AI Cluster and needs to monitor inference performance metrics such as request latency, throughput, and error rates. Which OCI service provides built-in monitoring dashboards for these metrics?

A.OCI Logging
B.OCI Notifications
C.OCI Monitoring
D.OCI Events
AnswerC

Monitoring provides dashboards for metrics like latency and throughput.

Why this answer

OCI Monitoring is the correct service because it provides built-in dashboards and metrics for inference performance, including request latency, throughput, and error rates, specifically for Dedicated AI Cluster deployments. These metrics are automatically collected and visualized in the OCI Monitoring console, allowing real-time tracking of model inference health without additional configuration.

Exam trap

Oracle often tests the distinction between monitoring (real-time metrics and dashboards) and logging (text-based event records), leading candidates to mistakenly choose OCI Logging for performance metrics when it is actually designed for troubleshooting and compliance, not live dashboarding.

How to eliminate wrong answers

Option A is wrong because OCI Logging is designed for collecting and storing log data (e.g., audit logs, custom logs) and does not offer built-in dashboards for real-time inference performance metrics like latency or throughput. Option B is wrong because OCI Notifications is a pub/sub messaging service for alerting and event distribution, not a monitoring dashboard for metrics. Option D is wrong because OCI Events triggers automated actions based on changes in OCI resources (e.g., state changes) but does not provide dashboards for continuous performance metrics.

80
MCQeasy

A developer wants to invoke an OCI Generative AI model from an application running on a compute instance in OCI. The instance is in a private subnet. What is the most secure method to access the model endpoint?

A.Use a Service Gateway to access the endpoint privately.
B.Use an Internet Gateway and public endpoint.
C.Use a VPN Connect to connect to the model's public IP.
D.Use a NAT Gateway to access the endpoint.
AnswerA

A Service Gateway enables private access to OCI services without traversing the internet.

Why this answer

A Service Gateway allows resources in a private subnet to access OCI services, including the Generative AI model endpoint, over the OCI private network without traversing the internet. This is the most secure method because traffic stays within the OCI backbone, avoiding exposure to public IPs and reducing the attack surface.

Exam trap

The trap here is that candidates may confuse a NAT Gateway with a Service Gateway, assuming that any gateway providing outbound access is sufficient, but only a Service Gateway offers private, secure access to OCI services without internet exposure.

How to eliminate wrong answers

Option B is wrong because using an Internet Gateway and public endpoint exposes the model endpoint to the public internet, increasing security risks and violating the requirement for a private subnet. Option C is wrong because VPN Connect is used to extend an on-premises network to OCI, not to access OCI service endpoints from within OCI; it would add unnecessary complexity and does not provide private access to the model endpoint. Option D is wrong because a NAT Gateway enables outbound internet access from a private subnet but does not provide private connectivity to OCI services; traffic would still leave the OCI network and return, which is less secure and not the intended use for accessing OCI service endpoints.

81
MCQmedium

A company is deploying a chatbot powered by OCI Generative AI. They want to inject the conversation history into the model prompt to maintain context. However, they notice that after a long conversation, the model starts to ignore earlier messages. What is the most likely cause?

A.The model's max_tokens limit is too low, truncating the prompt.
B.The model has a limited context window size.
C.The top_p parameter is set to 1, causing deterministic output.
D.The temperature setting is too high, causing randomness.
AnswerB

The context window determines how many input tokens the model can consider; exceeding it causes truncation.

Why this answer

The model's context window size limits the total number of tokens (input + output) it can process at once. When the conversation history grows beyond this limit, older messages are truncated or dropped, causing the model to lose context from earlier parts of the conversation. This is a fundamental constraint of transformer-based models like those used in OCI Generative AI.

Exam trap

Oracle often tests the distinction between input-side limits (context window) and output-side limits (max_tokens), so candidates mistakenly attribute context loss to max_tokens when the real issue is the fixed context window size.

How to eliminate wrong answers

Option A is wrong because max_tokens controls the maximum number of tokens in the generated response, not the input prompt; truncation of the prompt is caused by the context window limit, not max_tokens. Option C is wrong because top_p=1 means nucleus sampling considers all tokens with cumulative probability up to 1, which is the default and does not cause deterministic output; it does not affect context retention. Option D is wrong because temperature controls randomness in token selection, not the ability to retain conversation history; a high temperature increases diversity but does not cause earlier messages to be ignored.

82
MCQhard

Given the CLI output from `oci generative-ai model list`, what can be determined about the model 'my-fine-tuned-model'?

A.It was created by fine-tuning an existing base model
B.It is a pre-built model provided by OCI
C.It has been deployed to an endpoint
D.It is currently being trained
AnswerA

The base-model-id indicates it was fine-tuned from another model.

Why this answer

The CLI output from `oci generative-ai model list` includes a model named 'my-fine-tuned-model'. In OCI Generative AI, models listed with custom names that are not part of the base model catalog (e.g., cohere.command, meta.llama) indicate they were created by fine-tuning a base model using your own dataset. The presence of a custom name without a base model prefix confirms it is a fine-tuned model, not a pre-built one.

Exam trap

Oracle often tests the distinction between listing models and checking their lifecycle or deployment state, so candidates mistakenly assume a listed model is either deployed or still training, when in fact the `model list` command only confirms the model exists and is registered.

How to eliminate wrong answers

Option B is wrong because pre-built models in OCI Generative AI have names like 'cohere.command' or 'meta.llama-2-70b-chat', not custom names like 'my-fine-tuned-model'. Option C is wrong because the `model list` command only shows model metadata; deployment status requires a separate `oci generative-ai model get` or `oci generative-ai deployment list` command. Option D is wrong because the model list output does not indicate training status; training status is shown via `oci generative-ai model get` with a 'lifecycle-state' field (e.g., 'ACTIVE', 'CREATING'), and a listed model is typically already in an active state.

83
MCQhard

A company has multiple teams sharing an OCI Generative AI Dedicated AI Cluster. They need to ensure that each team can only access their own fine-tuned models and cannot see or invoke models from other teams. What is the best approach?

A.Use OCI compartments and IAM policies with resource-level permissions for models
B.Train separate models for each team
C.Encrypt model artifacts with different keys for each team
D.Use network security lists to isolate traffic
AnswerA

Compartments and IAM policies can restrict access to specific models.

Why this answer

OCI compartments and IAM policies with resource-level permissions allow you to grant granular access to specific models within a Dedicated AI Cluster. By placing each team's fine-tuned models in separate compartments and writing policies that restrict access to those compartments, you ensure teams can only see and invoke their own models. This approach leverages OCI's native identity and access management without requiring separate clusters or network-level isolation.

Exam trap

The trap here is that candidates often assume network-level isolation (security lists) or encryption keys are sufficient for multi-tenant model access control, but OCI requires IAM resource-level policies to enforce which principals can invoke specific models.

How to eliminate wrong answers

Option B is wrong because training separate models for each team does not address access control; it only creates more models without any mechanism to prevent cross-team visibility or invocation. Option C is wrong because encrypting model artifacts with different keys protects data at rest but does not control access at the API or invocation layer; teams could still see and invoke models if IAM permissions allow it. Option D is wrong because network security lists operate at the network layer and cannot distinguish between different models within the same Dedicated AI Cluster; they are designed for traffic filtering between subnets, not for model-level authorization.

84
MCQeasy

A company has deployed a fine-tuned GPT model on OCI Generative AI using a dedicated AI cluster with 2 nodes. The endpoint is used by an internal application that generates product descriptions. Recently, the application started receiving timeouts and slow responses. The monitoring dashboard shows that the cluster's CPU utilization is consistently above 90%, and the request queue is growing. The team has verified that the model and code have not changed. The application traffic has increased by 20% over the past month. What should the team do to resolve the issue?

A.Switch to a serverless endpoint to handle variable traffic.
B.Reduce the batch size in the inference requests to lower CPU usage.
C.Implement a caching layer for frequently requested descriptions.
D.Increase the number of nodes in the dedicated AI cluster from 2 to 4.
AnswerD

This directly adds compute capacity to handle the increased traffic.

Why this answer

Option D is correct because the dedicated AI cluster with 2 nodes is experiencing sustained CPU utilization above 90% and a growing request queue due to a 20% increase in traffic. Scaling out the cluster by adding more nodes (from 2 to 4) increases the available compute capacity, allowing the cluster to handle the higher inference load without timeouts. This directly addresses the resource bottleneck without requiring code or model changes.

Exam trap

The trap here is that candidates may confuse reducing batch size (which actually increases CPU overhead per request) with reducing load, or assume caching is a universal performance fix, when the real solution is to scale the dedicated cluster horizontally to match increased traffic.

How to eliminate wrong answers

Option A is wrong because switching to a serverless endpoint would not resolve the issue; serverless endpoints on OCI Generative AI still rely on underlying compute resources and may introduce cold-start latency, and the problem is a sustained increase in traffic that requires dedicated capacity, not variable traffic handling. Option B is wrong because reducing the batch size in inference requests would decrease throughput per request and increase the number of requests, potentially worsening CPU utilization and queue growth, not lowering it. Option C is wrong because implementing a caching layer for frequently requested descriptions would only help if identical requests are repeated, but the problem is a general increase in traffic volume and CPU saturation, not redundant requests; caching does not reduce the compute load for unique or varied product descriptions.

85
MCQmedium

A large enterprise is deploying a generative AI model for internal document summarization. The model is deployed on OCI Data Science using a custom container. The inference endpoint is behind a public load balancer. The security team requires that all traffic between the client and the endpoint be encrypted in transit and that the endpoint not be accessible from the public internet. The current setup uses a public load balancer with an SSL certificate. The VCN has a public subnet for the load balancer and a private subnet for the model deployment. The security team is concerned that the load balancer is publicly accessible. The enterprise wants to maintain high availability and low latency. What should the architect do to meet the security requirements?

A.Use a site-to-site VPN to connect clients to the VCN and access the endpoint via private IP.
B.Remove the load balancer and use a service gateway to access the model deployment directly from the VCN.
C.Keep the public load balancer but add a Web Application Firewall (WAF) to block unauthorized IPs.
D.Replace the public load balancer with a private load balancer in a private subnet, and attach an SSL certificate for encryption.
AnswerD

A private load balancer is not internet-facing, ensures encryption via SSL, and provides high availability.

Why this answer

Option D is correct because replacing the public load balancer with a private load balancer in a private subnet ensures the endpoint is not accessible from the public internet, while attaching an SSL certificate maintains encryption in transit. This satisfies both security requirements without sacrificing high availability or low latency, as the private load balancer still provides load balancing and TLS termination within the VCN.

Exam trap

The trap here is that candidates may think a WAF or VPN alone can satisfy both encryption and private access, but they overlook that the public load balancer itself remains a publicly routable endpoint, which directly violates the 'not accessible from the public internet' requirement.

How to eliminate wrong answers

Option A is wrong because a site-to-site VPN only encrypts traffic between the client site and the VCN, but the public load balancer remains publicly accessible, violating the requirement that the endpoint not be accessible from the public internet. Option B is wrong because removing the load balancer and using a service gateway would bypass load balancing, breaking high availability and low latency, and service gateways are used for outbound traffic to OCI services, not for inbound client access. Option C is wrong because keeping the public load balancer with a WAF does not remove public internet accessibility; WAF only filters traffic but does not make the endpoint private, so the security team's concern remains unaddressed.

86
Multi-Selecthard

Which THREE components are essential for a production-grade generative AI deployment on OCI? (Select THREE)

Select 3 answers
A.OCI Logging for audit
B.OCI Vault for secrets
C.OCI Data Flow for data processing
D.Dedicated AI cluster
E.OCI IAM policies for access control
AnswersA, D, E

Logging is critical for monitoring and compliance.

Why this answer

A is correct because OCI Logging provides centralized audit logging for all API calls and resource changes in the generative AI deployment. This is essential for compliance, security monitoring, and troubleshooting in a production environment, as it captures detailed logs of model invocations, data access, and configuration changes.

Exam trap

Oracle often tests the distinction between 'essential' components for deployment versus 'useful but optional' services, leading candidates to select OCI Vault or OCI Data Flow because they are commonly used in AI pipelines, but they are not mandatory for a production-grade deployment.

87
Multi-Selecthard

An OCI administrator is configuring access control for OCI Generative AI. Which three IAM components are required to allow a group of data scientists to call the GenerateText API? (Choose three.)

Select 3 answers
A.An IAM group for the data scientists
B.A local peering gateway
C.A policy granting ai-services-generative-ai-family in a compartment
D.A dynamic group
E.A compartment for the AI resources
AnswersA, C, E

The group is the subject of the policy.

Why this answer

An IAM group is required to organize the data scientists into a logical set of principals. IAM policies are then attached to this group to grant permissions, ensuring only members of the group can call the GenerateText API. Without a group, you cannot apply a policy to a collection of users.

Exam trap

The trap here is that candidates confuse dynamic groups (for resources) with IAM groups (for users), or mistakenly think a networking component like a local peering gateway is required for API access control.

88
MCQmedium

An organization wants to use OCI Generative AI to build a summarization tool but must ensure that all inference requests are logged for audit purposes. Which approach should they take?

A.Implement a custom proxy with logging
B.Enable OCI Audit service
C.Enable OCI Logging on the generative AI endpoint
D.Use OCI Vault to store logs
AnswerC

OCI Logging can capture detailed request and response data for audit.

Why this answer

Option C is correct because OCI Logging can be enabled directly on the Generative AI endpoint to capture all inference requests and responses as logs, which can then be used for audit purposes. This is the native, recommended approach for logging API calls without introducing additional infrastructure or complexity.

Exam trap

Oracle often tests the distinction between management-plane logging (OCI Audit) and data-plane logging (OCI Logging on the service endpoint), leading candidates to mistakenly choose OCI Audit for inference request auditing.

How to eliminate wrong answers

Option A is wrong because implementing a custom proxy with logging introduces unnecessary complexity, latency, and potential security gaps, and is not a native OCI solution for logging inference requests. Option B is wrong because the OCI Audit service captures only management-plane events (e.g., create, update, delete operations on resources), not data-plane events like individual inference API calls. Option D is wrong because OCI Vault is designed for storing secrets (e.g., API keys, passwords), not for storing logs; logs should be stored in OCI Logging or Object Storage.

89
MCQhard

A data scientist is fine-tuning a generative AI model on OCI Data Science using a custom container with GPU resources. The training job fails with an out-of-memory error despite the GPU instance having sufficient memory. The job works fine on a smaller dataset. What is the most likely cause?

A.The training script has a memory leak
B.The GPU instance is not supported by OCI Data Science
C.The model is not compatible with the PyTorch version
D.The batch size is too large for the GPU memory
AnswerD

Large batch size can cause OOM errors; reducing batch size resolves it.

Why this answer

The most likely cause is that the batch size is too large for the GPU memory. Even though the GPU instance has sufficient total memory, a batch size that exceeds the available GPU memory (after accounting for model parameters, gradients, and optimizer states) will trigger an out-of-memory (OOM) error. Reducing the batch size allows the model to fit within the GPU's memory limits, which explains why the job works on a smaller dataset but fails on a larger one.

Exam trap

Oracle often tests the misconception that 'sufficient instance memory' guarantees no OOM errors, ignoring that GPU memory is a separate, finite resource that must accommodate both the model and the batch data simultaneously.

How to eliminate wrong answers

Option A is wrong because a memory leak would cause gradual memory consumption over time, not a consistent OOM error that correlates with dataset size; the error occurs immediately with a larger dataset, not after prolonged execution. Option B is wrong because OCI Data Science supports a wide range of GPU instances (e.g., VM.GPU.A10.1, VM.GPU.A100.1), and if the instance were unsupported, the job would fail with a different error (e.g., 'unsupported instance shape') rather than an OOM error. Option C is wrong because model compatibility with PyTorch version would typically cause import or runtime errors (e.g., 'module not found' or 'operator not implemented'), not an OOM error; PyTorch version mismatches do not directly affect memory allocation.

90
MCQhard

During deployment of a generative AI model, the inference endpoint returns high latency and timeouts. The model is deployed on a dedicated AI cluster with multiple nodes. What is the most likely cause?

A.The inference request batch size is too small
B.The model is too large for the cluster memory
C.The cluster nodes are configured with insufficient parallelism or the model is not properly parallelized across nodes
D.The client-side network is slow
AnswerC

Correct: Without proper model parallelism, nodes may be underutilized leading to high per-request latency.

Why this answer

High latency and timeouts in a distributed AI inference deployment typically indicate that the model workload is not efficiently distributed across the cluster nodes. Option C is correct because insufficient parallelism—either due to misconfigured node resources (e.g., insufficient vCPUs, GPU cores, or memory bandwidth) or improper model sharding/parallelization—causes some nodes to become bottlenecks while others remain underutilized, leading to queuing delays and eventual timeouts.

Exam trap

Oracle often tests the misconception that high latency is always due to insufficient resources (e.g., memory or batch size), but the real trap here is that candidates overlook the critical role of parallelization configuration in distributed inference—assuming that simply adding more nodes automatically distributes the workload.

How to eliminate wrong answers

Option A is wrong because a batch size that is too small would actually reduce latency per request (though it might lower throughput), not cause high latency or timeouts; the issue here is overload, not underutilization. Option B is wrong because if the model were too large for the cluster memory, the deployment would fail to load or would crash immediately, not return high latency and timeouts during inference. Option D is wrong because client-side network slowness would manifest as high network round-trip time or packet loss, not as server-side timeouts from the inference endpoint; the problem is explicitly on the deployment side.

91
MCQeasy

Refer to the exhibit. A user receives this error when using the OCI CLI to chat with a model. What is the most likely cause?

A.The model is not deployed.
B.The model ID is incorrect.
C.The OCI CLI is not configured with the correct region.
D.The user does not have the required IAM policy to invoke the model.
AnswerD

Correct: The 'AuthorizationFailure' error indicates insufficient permissions.

Why this answer

The error occurs because the user lacks the necessary IAM policy to invoke the model. In OCI, even if the model is deployed and the CLI is correctly configured, the IAM policy must grant the user or group the 'inference' permission on the specific model or model family. Without this policy, the OCI CLI returns an authorization error when attempting to chat with the model.

Exam trap

The trap here is that candidates often assume the error is due to a misconfiguration (region or model ID) rather than a missing IAM policy, because the CLI error message may not explicitly say 'authorization' and instead show a generic 'service error'.

How to eliminate wrong answers

Option A is wrong because if the model were not deployed, the error would typically indicate that the model endpoint is unavailable or not found, not an authorization failure. Option B is wrong because an incorrect model ID would result in a 'model not found' or 'invalid parameter' error, not an authorization error. Option C is wrong because an incorrect region configuration would cause connectivity or endpoint resolution errors, such as 'region not found' or 'endpoint unreachable', not an IAM permission error.

92
MCQhard

Your company uses OCI Data Science for model development and deployment. You have a generative AI model that requires dynamic batching for efficient inference. You deployed the model using the OCI Model Deployment service with a custom inference script in a Docker container. However, you notice that the batch size is fixed at 1, leading to low throughput. The model can process multiple requests together efficiently. You want to implement dynamic batching to increase throughput without significantly increasing latency for individual requests. What is the best approach?

A.Modify the model deployment to use a larger GPU shape to handle larger batches
B.Enable the model deployment's built-in request batching feature
C.Use OCI Streaming service to buffer requests and then invoke the model in batches from a consumer
D.Implement a queuing mechanism in the inference script that collects incoming requests and processes them in batches
AnswerD

This is a common pattern for dynamic batching and can be done within the custom container.

Why this answer

Option D is correct because dynamic batching must be implemented at the application level within the custom inference script when using OCI Model Deployment. The service does not provide built-in request batching; instead, you need to collect incoming requests in a queue and process them together in a single forward pass, which maximizes GPU utilization while controlling latency via a timeout or max batch size.

Exam trap

The trap here is that candidates assume OCI Model Deployment has a built-in batching feature similar to some cloud ML services, but OCI requires you to implement batching logic yourself in the custom inference script.

How to eliminate wrong answers

Option A is wrong because simply using a larger GPU shape does not change the fact that the inference script processes one request at a time; throughput gains require batching logic, not just more compute. Option B is wrong because OCI Model Deployment does not have a built-in request batching feature; this is a common misconception—the service routes each request individually to the container. Option C is wrong because OCI Streaming is designed for asynchronous, durable message buffering and would introduce significant latency and complexity; it is not suitable for real-time inference where low latency is critical.

93
MCQmedium

A company has deployed a generative AI model endpoint on OCI. They want to monitor token usage and latency for cost optimization. Which OCI service should they use to collect these metrics?

A.OCI Monitoring
B.OCI Events
C.OCI Notifications
D.OCI Logging
AnswerA

OCI Monitoring collects and visualizes metrics such as token count and latency.

Why this answer

A is correct because OCI Monitoring is the native telemetry service that collects and stores metrics such as token usage (e.g., input/output token counts) and latency (e.g., model inference latency) from OCI Generative AI endpoints. These metrics are automatically emitted by the OCI Generative AI service and can be queried via the Monitoring API or visualized in the Console, enabling cost optimization by tracking consumption patterns.

Exam trap

The trap here is that candidates confuse OCI Logging (which collects unstructured logs) with OCI Monitoring (which collects structured metrics), leading them to select Logging for numeric performance data like token counts and latency.

How to eliminate wrong answers

Option B (OCI Events) is wrong because OCI Events is a notification service that triggers actions based on changes in OCI resources (e.g., state transitions), not a service for collecting time-series metrics like token usage or latency. Option C (OCI Notifications) is wrong because OCI Notifications is a pub/sub messaging service for distributing alerts and messages, not a metric collection or storage service. Option D (OCI Logging) is wrong because OCI Logging captures log data (e.g., text-based audit logs, error logs) from resources, not structured numeric metrics; metrics require OCI Monitoring's custom or predefined metric streams.

94
MCQhard

A multinational corporation uses OCI Generative AI to power a customer support chatbot. The chatbot uses a fine-tuned model deployed on a dedicated AI cluster in the us-ashburn-1 region. The application is used globally, and users in Europe are experiencing high latency (over 2 seconds) compared to users in North America (under 500 ms). The company has a requirement to keep all data within the US due to compliance, so they cannot deploy in Europe. The latency is not due to network bandwidth but due to the inference time. The monitoring shows that the cluster is at 80% utilization during peak hours. The team wants to reduce the latency for European users without violating data residency. What is the best course of action?

A.Optimize the model using techniques like quantization or pruning to reduce inference time.
B.Implement an edge caching layer in Europe to serve common queries.
C.Increase the number of nodes in the cluster to distribute the load.
D.Deploy an additional endpoint in a European region and use a global load balancer.
AnswerA

Model optimization directly reduces per-request latency without moving data.

Why this answer

Option A is correct because the latency issue is explicitly due to inference time, not network bandwidth or cluster utilization. Model optimization techniques like quantization (reducing precision of weights from FP32 to INT8) and pruning (removing redundant neurons) directly reduce the computational cost per inference, thereby lowering the response time without moving data or changing the deployment region. This approach satisfies the data residency constraint while addressing the root cause of high latency for European users.

Exam trap

The trap here is that candidates may confuse latency caused by inference time with latency caused by network distance or cluster load, leading them to choose scaling or caching solutions that do not address the fundamental computational bottleneck.

How to eliminate wrong answers

Option B is wrong because an edge caching layer in Europe would only serve cached responses for common queries; it does not reduce inference time for unique or dynamic queries, and caching introduces stale data risks for a customer support chatbot that may require real-time accuracy. Option C is wrong because increasing the number of nodes in the cluster addresses throughput (handling more concurrent requests) but does not reduce the per-request inference time; with 80% utilization, the cluster is not saturated, so adding nodes would not lower latency for individual inference calls. Option D is wrong because deploying an additional endpoint in a European region would violate the compliance requirement to keep all data within the US; even with a global load balancer, inference would still require data processing in Europe, which is not permitted.

95
MCQhard

A financial company deploys a generative AI model for document analysis. They need to ensure that the model does not expose sensitive information in its responses. Which OCI service should they use to implement content filtering?

A.OCI Data Safe
B.OCI Vault
C.OCI WAF
D.OCI AI Content Moderation
AnswerD

This service can filter sensitive content in model inputs and outputs.

Why this answer

OCI AI Content Moderation is the correct service because it provides pre-trained models and APIs specifically designed to detect and filter sensitive content such as personally identifiable information (PII), profanity, and other unsafe text in generative AI outputs. This allows the financial company to enforce content safety policies on document analysis responses, preventing exposure of sensitive information.

Exam trap

The trap here is that candidates often confuse security services like Data Safe or Vault with content moderation, assuming any 'security' service can filter AI outputs, but OCI AI Content Moderation is the only service purpose-built for analyzing and filtering the semantic content of text generated by AI models.

How to eliminate wrong answers

Option A is wrong because OCI Data Safe is a database security service focused on data masking, auditing, and user risk assessment for Oracle databases, not for filtering content generated by AI models. Option B is wrong because OCI Vault is a key management service for storing and managing encryption keys and secrets, not for content moderation or filtering of AI responses. Option C is wrong because OCI WAF (Web Application Firewall) protects web applications from common attacks like SQL injection and cross-site scripting at the HTTP/HTTPS layer, but it does not inspect or filter the semantic content of generative AI outputs.

96
MCQhard

A company is using OCI Generative AI service to power a customer support chatbot. They observe that the chatbot sometimes provides outdated information because the model was trained on data up to 2022. They want to incorporate real-time knowledge without retraining the model. Which approach should they use?

A.Increase the max-tokens parameter to allow longer responses.
B.Use prompt engineering to instruct the model to ignore old information.
C.Implement a Retrieval-Augmented Generation (RAG) pattern using OCI OpenSearch.
D.Fine-tune the model with recent data from 2023 onwards.
AnswerC

RAG retrieves relevant up-to-date documents and feeds them to the model, enabling current responses without retraining.

Why this answer

Option C is correct because Retrieval-Augmented Generation (RAG) allows the model to access real-time information from an external knowledge base, such as OCI OpenSearch, without retraining. This pattern retrieves relevant documents or data at inference time and injects them into the prompt, enabling the model to answer with up-to-date context. It directly addresses the need for real-time knowledge while keeping the base model static.

Exam trap

The trap here is that candidates often confuse prompt engineering (Option B) as a way to 'override' training data, but in reality, prompt instructions cannot erase the model's learned parameters, making RAG the only viable solution for real-time knowledge without retraining.

How to eliminate wrong answers

Option A is wrong because increasing max-tokens only extends the length of the response, not the recency or accuracy of the information; it does not provide any mechanism to incorporate new data. Option B is wrong because prompt engineering cannot force the model to 'ignore' outdated training data; the model's parametric knowledge is fixed and cannot be selectively suppressed by instructions alone, leading to hallucinations or contradictions. Option D is wrong because fine-tuning requires retraining the model on new data, which contradicts the requirement to avoid retraining and is also resource-intensive and time-consuming.

97
MCQeasy

An organization wants to fine-tune a large language model on OCI using their proprietary data. They are concerned about data privacy and want to ensure that fine-tuning data does not leave the OCI region. Which OCI service should they use to securely store and manage their training data?

A.OCI Block Volume
B.OCI File Storage
C.OCI Object Storage
D.Oracle Autonomous Database
AnswerC

Object Storage provides secure, regional storage ideal for large datasets.

Why this answer

C is correct because OCI Object Storage is a regional service that stores data within a specific OCI region, ensuring that fine-tuning data does not leave that region. It provides secure, durable, and scalable storage for large datasets, such as training data for LLMs, with encryption at rest and in transit, and supports direct integration with OCI Data Science and Generative AI services for fine-tuning workflows.

Exam trap

Oracle often tests the misconception that any storage service can be used for data residency, but the trap here is that Block Volume and File Storage are compute-attached services that do not inherently enforce regional data boundaries for data at rest across multiple services, while Object Storage is the only regional service designed for secure, scalable, and region-bound storage of unstructured data like LLM training datasets.

How to eliminate wrong answers

Option A is wrong because OCI Block Volume is a block-level storage service attached to compute instances, designed for low-latency, persistent storage for databases or applications, but it is not a regional service for storing and managing large training datasets; it is tied to a specific compute instance and does not inherently enforce regional data residency for data at rest across multiple services. Option B is wrong because OCI File Storage is a network file system (NFS) service for shared file access across compute instances, but it is not optimized for large-scale object storage of training data and does not provide the same regional data residency guarantees as Object Storage; it is typically used for shared file systems, not as a primary store for fine-tuning datasets. Option D is wrong because Oracle Autonomous Database is a managed database service for transactional and analytical workloads, not designed for storing large unstructured datasets like LLM training data; it is optimized for structured data and SQL queries, and using it for fine-tuning data would be inefficient and misaligned with the data storage requirements for generative AI training.

98
MCQhard

An enterprise with strict data residency requirements wants to use OCI Generative AI. They must ensure that no training data or inference data leaves a specific OCI region. Which configuration option should they choose?

A.Use a dedicated AI cluster in the desired region and disable cross-region access.
B.Configure a service gateway with a private endpoint.
C.Implement a policy restricting data transfer via OCI Identity and Access Management.
D.Use OCI Data Transfer Service to keep data within the region.
AnswerA

Dedicated clusters are region-specific and can be restricted to prevent cross-region data flow.

Why this answer

A dedicated AI cluster in the desired region, with cross-region access disabled, ensures that all compute, training data, and inference data remain physically within that OCI region. This satisfies strict data residency requirements because the cluster is isolated from other regions at the network and infrastructure level, preventing any data egress.

Exam trap

The trap here is that candidates confuse network-level controls (like service gateways or private endpoints) with data residency enforcement, but only a dedicated, region-locked compute cluster guarantees that no data leaves the specified region.

How to eliminate wrong answers

Option B is wrong because a service gateway with a private endpoint only provides private connectivity within a VCN and does not prevent data from being processed or stored in other regions; it does not enforce regional data residency. Option C is wrong because OCI IAM policies control user permissions and resource access, not the physical location or movement of data between regions. Option D is wrong because OCI Data Transfer Service is designed for offline bulk data migration and does not provide ongoing control over where inference or training data resides during active AI workloads.

99
Multi-Selectmedium

A DevOps engineer is setting up monitoring and logging for a generative AI inference endpoint. Which three resources should they enable? (Select THREE.)

Select 3 answers
A.OCI VCN flow logs for network traffic
B.OCI Logging for inference requests and responses
C.OCI Monitoring metrics for endpoint latency and error rates
D.OCI Application Performance Monitoring (APM) for tracing inference requests
E.OCI Audit logs for all API calls
AnswersB, C, D

Correct: Logging allows auditing and debugging of inference calls.

Why this answer

Option B is correct because OCI Logging captures detailed logs of inference requests and responses, which is essential for auditing, debugging, and analyzing the behavior of a generative AI endpoint. This service provides a centralized repository for log data, enabling DevOps engineers to track input prompts and model outputs for compliance and troubleshooting purposes.

Exam trap

The trap here is that candidates may confuse OCI Audit logs (which track administrative API calls) with OCI Logging (which captures data-plane request/response details), leading them to select Audit logs instead of Logging for monitoring inference payloads.

100
MCQeasy

Your organization uses OCI Data Science to train a generative AI model for code generation. After training, you want to deploy it as a REST API. You create a model deployment using the OCI console, but after 30 minutes the deployment status is still 'Creating'. You check the logs and see the message: 'Insufficient capacity for shape VM.GPU.A10.1 in availability domain AD-1'. The deployment is configured with a single replica. You have verified your tenancy has sufficient service limits for GPU instances. What should you do to resolve this issue quickly?

A.Change the deployment to use a different GPU shape, such as VM.GPU.A10.2
B.Delete the deployment and create it in a different region with more GPU capacity
C.Request a service limit increase for GPU shapes
D.Wait for 1 hour and check again; capacity may become available
AnswerA

A different GPU shape may have available capacity in the same availability domain.

Why this answer

Option A is correct because the error indicates that the specific GPU shape VM.GPU.A10.1 lacks capacity in the current availability domain. Switching to a different GPU shape, such as VM.GPU.A10.2, which uses a different instance configuration, can bypass the capacity constraint without requiring a region change or service limit increase. This is the fastest resolution because it directly addresses the availability domain capacity issue while keeping the deployment in the same region and AD.

Exam trap

The trap here is that candidates confuse service limits with capacity availability, assuming a limit increase will fix the issue, when in fact the error explicitly states 'Insufficient capacity' for the shape, not a limit breach.

How to eliminate wrong answers

Option B is wrong because deleting and recreating in a different region is an overreaction; the capacity issue is specific to the shape and AD, not the region, and moving regions introduces latency and complexity. Option C is wrong because the error is about capacity, not service limits; the user already verified sufficient service limits, so a limit increase would not resolve the immediate capacity shortage. Option D is wrong because waiting does not guarantee capacity will become available; the error indicates a persistent lack of capacity for that specific shape in that AD, and waiting could waste time without resolution.

101
MCQmedium

An administrator notices that a dedicated AI cluster is not scaling down after a period of low traffic. What could be the cause?

A.The cluster has a minimum size set to the current number of nodes
B.There are pending inference requests
C.The cluster is in a compartment without permissions
D.The autoscaling policy uses a cooldown period that is too short
AnswerA

A minimum size setting prevents scaling down below that threshold.

Why this answer

A dedicated AI cluster in OCI has a minimum size configuration that prevents the autoscaler from reducing the node count below that threshold. If the current number of nodes equals the configured minimum, the cluster will not scale down even during low traffic, as the autoscaler respects this lower bound. This ensures baseline capacity is always available for inference workloads.

Exam trap

Oracle often tests the misconception that autoscaling always scales down when traffic is low, without considering the minimum size constraint that overrides scaling policies.

How to eliminate wrong answers

Option B is wrong because pending inference requests would actually prevent scaling down, but the question states the cluster is not scaling down after a period of low traffic, implying no pending requests are present. Option C is wrong because compartment permissions affect resource access and management operations, not the autoscaling behavior of a cluster. Option D is wrong because a cooldown period that is too short would cause the cluster to scale down too aggressively or oscillate, not prevent scaling down entirely.

102
MCQeasy

A data scientist needs to generate vector embeddings for a large corpus of text documents to use in a semantic search application. Which OCI service is best suited for this task?

A.OCI Vision
B.OCI Speech
C.OCI Generative AI
D.OCI Language
AnswerC

OCI Generative AI offers embedding models (e.g., Cohere embed) specifically for text.

Why this answer

OCI Generative AI is the correct choice because it provides a managed service for generating vector embeddings from text using large language models (LLMs) like Cohere. This service is specifically designed for tasks such as semantic search, where embeddings capture the meaning of text to enable similarity comparisons. OCI Vision, Speech, and Language focus on other modalities (images, audio, and NLP tasks like sentiment analysis) and do not offer embedding generation for semantic search.

Exam trap

Oracle often tests the misconception that OCI Language can generate embeddings because it handles text, but OCI Language lacks an embedding API, while OCI Generative AI is the only service that provides this capability for semantic search.

How to eliminate wrong answers

Option A is wrong because OCI Vision is designed for image and video analysis (e.g., object detection, OCR), not for generating text embeddings. Option B is wrong because OCI Speech handles audio-to-text transcription and speaker diarization, not text embedding generation. Option D is wrong because OCI Language provides NLP features like sentiment analysis, entity extraction, and text classification, but it does not offer a dedicated embedding API for semantic search; that capability is exclusive to OCI Generative AI.

103
MCQeasy

A user wants to invoke an OCI Generative AI endpoint from a cloud function. What is the required authentication method?

A.API signing key
B.User name and password
C.Session token
D.OCI certificate
AnswerA

API signing key is required for OCI API authentication.

Why this answer

OCI Generative AI endpoints require API signing keys for authentication because they are REST APIs that use the Signature Version 1 algorithm (based on HMAC-SHA256) to sign requests. Cloud Functions must include a signed HTTP header using a user's or service principal's OCI API signing key pair (private key for signing, public key uploaded to OCI) to prove identity and authorization. This is the standard method for programmatic access to OCI services, including Generative AI, and is enforced by the OCI Identity and Access Management (IAM) policy layer.

Exam trap

Oracle often tests the misconception that OCI always uses session tokens or OAuth2 for service-to-service calls, but for Generative AI and most OCI REST APIs, the required method is API signing key authentication, not token-based or certificate-based methods.

How to eliminate wrong answers

Option B is wrong because username and password are used for interactive console login (OCI IAM user password authentication) and are not supported for programmatic API calls from cloud functions; they would expose credentials in code and violate OCI security best practices. Option C is wrong because a session token is a temporary credential obtained via federation or token exchange (e.g., from an identity provider) and is typically used for CLI or SDK sessions, not for direct REST API signing from a cloud function without a token exchange flow. Option D is wrong because OCI certificate authentication (mTLS) is used for specific services like API Gateway or load balancer mutual TLS, not for standard OCI REST API endpoints like Generative AI, which rely on API signing keys.

104
MCQmedium

An administrator runs the above CLI command to check the status of a dedicated AI cluster. The cluster is ACTIVE with capacity 10. However, a user reports that inference requests to this cluster are failing with a '429 Too Many Requests' error. What is the most likely cause?

A.The cluster is hitting the maximum inference requests per minute limit
B.The cluster does not have enough nodes to handle the load
C.The user is not in the same compartment as the cluster
D.The cluster is not in ACTIVE state
AnswerA

429 indicates rate limit; the cluster has a requests-per-minute limit separate from node count.

Why this answer

The '429 Too Many Requests' error is an HTTP status code indicating rate limiting has been exceeded. In OCI Generative AI, dedicated AI clusters have a configurable 'maximum inference requests per minute' limit. Even if the cluster is ACTIVE and has capacity (e.g., 10 nodes), hitting this per-minute request cap will cause the API gateway to reject further requests with a 429 error.

The administrator must increase the rate limit or implement client-side throttling to resolve this.

Exam trap

The trap here is that candidates confuse capacity (number of nodes) with rate limits, assuming a cluster with available compute resources cannot produce a 429 error, when in fact the 429 is tied to a separate API-level throttling mechanism.

How to eliminate wrong answers

Option B is wrong because a cluster with insufficient nodes would typically result in higher latency, timeouts, or '503 Service Unavailable' errors, not a '429 Too Many Requests' which is specifically a rate-limiting response. Option C is wrong because compartment mismatches cause '404 Not Found' or '403 Forbidden' errors, not a 429 status code. Option D is wrong because the cluster is explicitly stated as ACTIVE; an inactive cluster would return a '503 Service Unavailable' or '400 Bad Request' error, not a 429.

105
MCQhard

A company has deployed a generative AI model on OCI to generate product descriptions. After a recent update, the model started producing outputs with repetitive phrases and poor coherence. The inference endpoint is configured with default parameters. Which single parameter adjustment is most likely to improve output quality?

A.Increase the max-tokens parameter to 512
B.Increase the frequency penalty parameter to 0.5
C.Increase the temperature parameter to 1.5
D.Decrease the top-p parameter to 0.8
AnswerB

Frequency penalty reduces repeated tokens, directly improving repetitive output.

Why this answer

The correct answer is B because increasing the frequency penalty reduces the likelihood of the model repeating the same phrases, directly addressing the repetitive outputs. The frequency penalty subtracts a proportional penalty from tokens that have already appeared, discouraging repetition and improving coherence. Default parameters often have no frequency penalty (0.0), so a small positive value like 0.5 can significantly enhance output diversity.

Exam trap

The trap here is that candidates often confuse frequency penalty with temperature or top-p, assuming that increasing randomness (temperature) or narrowing token selection (top-p) will fix repetition, when in fact those parameters address different aspects of output diversity and coherence.

How to eliminate wrong answers

Option A is wrong because increasing max-tokens only extends the maximum length of the output, not the quality or repetition; it could even worsen the problem by allowing more repetitive text. Option C is wrong because increasing temperature to 1.5 makes the model more random and less focused, which typically reduces coherence and can increase nonsensical outputs. Option D is wrong because decreasing top-p to 0.8 narrows the sampling pool to the top 80% of probability mass, which may reduce diversity and potentially increase repetition rather than fix it.

106
MCQmedium

A company notices that some inference requests to their deployed model on OCI Generative AI take longer than acceptable. They want to reduce per-request latency. What should they do?

A.Reduce the maximum number of tokens generated
B.Enable request batching
C.Use a larger model to improve accuracy
D.Increase the number of replicas in the deployment
AnswerA

Lowering max tokens reduces the amount of computation per request, directly decreasing latency.

Why this answer

Reducing the maximum number of tokens generated directly decreases the amount of computation required per inference request because the model stops generating output earlier. Since latency is proportional to the number of output tokens produced, this is the most effective single change to reduce per-request response time in OCI Generative AI deployments.

Exam trap

Oracle often tests the distinction between latency (per-request speed) and throughput (requests per second), causing candidates to confuse batching or scaling replicas (which improve throughput) with reducing individual request latency.

How to eliminate wrong answers

Option B is wrong because request batching aggregates multiple inference requests into a single batch, which improves throughput (requests per second) but does not reduce the latency of any individual request; in fact, it can increase per-request latency due to queuing and waiting for batch completion. Option C is wrong because using a larger model increases the number of parameters and computational steps per token, which typically increases latency, not reduces it. Option D is wrong because increasing the number of replicas improves scalability and concurrency (handling more requests in parallel) but does not reduce the latency of a single inference request; each request still processes through the same model with the same token generation steps.

107
MCQmedium

A company is deploying a generative AI service on OCI using the OCI Data Science service with a large language model (LLM) in a VCN. The model inference endpoint must be accessible only from a private subnet within the same VCN. Which networking component should be configured to enable this?

A.NAT Gateway
B.Dynamic Routing Gateway (DRG)
C.Internet Gateway
D.Service Gateway
AnswerD

Service gateway enables private subnet access to OCI services like Data Science.

Why this answer

A Service Gateway enables private subnet resources to access OCI services (including the OCI Data Science model deployment endpoint) without traversing the internet. Since the inference endpoint must be accessible only from a private subnet within the same VCN, the Service Gateway provides the necessary private connectivity by routing traffic over the OCI network fabric, not through a NAT or internet gateway.

Exam trap

The trap here is that candidates often confuse a Service Gateway with a NAT Gateway, assuming both provide outbound-only access, but the Service Gateway is specifically designed for private access to OCI services, not general internet egress.

How to eliminate wrong answers

Option A is wrong because a NAT Gateway allows outbound internet access from a private subnet but does not provide private connectivity to OCI services; it would expose traffic to the internet. Option B is wrong because a Dynamic Routing Gateway (DRG) is used for connecting a VCN to on-premises networks or other VCNs via VPN or FastConnect, not for accessing OCI services privately within the same VCN. Option C is wrong because an Internet Gateway provides bidirectional internet access, which would make the endpoint publicly accessible, violating the requirement of private subnet-only access.

108
MCQmedium

An organization needs to ensure that all inference requests to OCI Generative AI are logged for compliance. Which OCI feature should be enabled?

A.OCI Cloud Guard
B.OCI Logging for the AI service
C.OCI Vault
D.OCI Audit logs
AnswerB

OCI Logging enables detailed logging of inference requests and responses for compliance.

Why this answer

Option B is correct because OCI Logging for the AI service captures detailed request and response data for inference calls to OCI Generative AI, including payloads, timestamps, and user identities. This feature must be explicitly enabled per service endpoint to meet compliance requirements for logging all inference requests. Unlike Audit logs, which record control-plane operations, OCI Logging provides data-plane logging for the AI service itself.

Exam trap

Oracle often tests the distinction between control-plane logging (Audit logs) and data-plane logging (service-specific Logging), leading candidates to mistakenly choose Audit logs for operational request tracking.

How to eliminate wrong answers

Option A is wrong because OCI Cloud Guard is a security posture management service that detects misconfigurations and threats, but it does not log individual inference requests to Generative AI. Option C is wrong because OCI Vault manages encryption keys and secrets, not request logging for AI services. Option D is wrong because OCI Audit logs capture only control-plane API calls (e.g., creating or deleting resources), not data-plane inference requests to the Generative AI service.

109
MCQhard

A team uses OCI Generative AI’s fine-tuning capability to adapt a base model. After fine-tuning, they evaluate the model but see degraded performance on certain edge cases. What is the most likely cause?

A.Overfitting on the training data
B.Validation data leakage
C.Learning rate too high
D.Insufficient training epochs
AnswerA

Overfitting leads to poor generalization, especially on edge cases not seen during training.

Why this answer

Fine-tuning adapts a base model to a specific dataset, but if the training data is too narrow or the model is trained for too many epochs, it can memorize the training examples rather than learning generalizable patterns. This overfitting causes the model to perform well on training-like inputs but poorly on edge cases that deviate from the training distribution. In OCI Generative AI, overfitting is a common pitfall when fine-tuning hyperparameters like the number of epochs or learning rate are not properly validated.

Exam trap

Oracle often tests the distinction between overfitting and underfitting by presenting a scenario where performance is good on training data but poor on unseen data, leading candidates to incorrectly blame a high learning rate or insufficient epochs.

How to eliminate wrong answers

Option B is wrong because validation data leakage would cause artificially high performance on validation metrics, not degraded performance on edge cases; leakage means the model has seen the test data during training, which would inflate scores rather than cause failures. Option C is wrong because a learning rate that is too high typically causes training instability, divergence, or failure to converge, not selective degradation on edge cases after successful fine-tuning. Option D is wrong because insufficient training epochs would result in underfitting, where the model fails to learn even the main training patterns, leading to poor performance across all cases, not just edge cases.

110
Multi-Selectmedium

Which TWO factors should be considered when selecting a base model for fine-tuning on OCI Generative AI service?

Select 2 answers
A.The model's training dataset size
B.The model's size and number of parameters
C.The model's license and terms of use
D.The model's training framework (PyTorch vs TensorFlow)
E.The model's built-in features like content filtering
AnswersB, C

Larger models consume more resources and cost more to serve.

Why this answer

When selecting a base model for fine-tuning on OCI Generative AI service, the model's size and number of parameters (B) directly impact computational cost, training time, and the model's capacity to learn from your dataset. The model's license and terms of use (C) are critical because commercial use, redistribution, and fine-tuning rights vary per model (e.g., Llama 2 vs. GPT-based models), and violating these can lead to legal or compliance issues.

Exam trap

Oracle often tests the misconception that technical details like training framework or dataset size are relevant, when in fact the exam focuses on operational and legal factors (size/license) that directly affect deployment and compliance in OCI's managed service.

111
MCQmedium

Refer to the exhibit. A team created this dedicated AI cluster. However, when they try to create a model deployment, the deployment fails with an error indicating insufficient public IPs. What change to the cluster configuration should they make?

A.Change assignPublicIp to true.
B.Increase the nodeCount to 8.
C.Attach a different subnet that has more available public IPs.
D.Change the AI cluster shape to VM.GPU.A10.2.
AnswerA

Correct: Enabling public IPs allows nodes to have public endpoints.

Why this answer

The error indicates insufficient public IPs because the cluster's subnet does not have enough available public IP addresses. Setting `assignPublicIp` to `true` in the cluster configuration allows the cluster to automatically allocate public IPs from the subnet's pool, resolving the shortage. This is required for model deployments that need public endpoints.

Exam trap

The trap here is that candidates might think the issue is a subnet IP shortage (Option C) or a scaling problem (Option B), when the real cause is a misconfigured public IP assignment flag that prevents the cluster from using available IPs.

How to eliminate wrong answers

Option B is wrong because increasing the nodeCount to 8 would require even more public IPs, exacerbating the shortage rather than fixing it. Option C is wrong because attaching a different subnet with more public IPs is a workaround, but the root cause is that the cluster is not configured to assign public IPs; changing the subnet does not enable the assignment. Option D is wrong because changing the AI cluster shape to VM.GPU.A10.2 does not affect public IP allocation; it only changes the GPU type and compute capacity.

112
MCQhard

An organization is fine-tuning a large language model on OCI Data Science. They must ensure that the training data remains within a specific geographic region and is encrypted at rest. Which combination of resources should they use?

A.OCI Object Storage bucket with a bucket policy and default encryption, created in the required region.
B.OCI Database with Transparent Data Encryption, storing the training data in tables.
C.OCI File Storage with export options and encryption, mounted to the Data Science session.
D.OCI Block Volume with encryption, attached to the Data Science notebook session.
AnswerA

Bucket policy controls access, encryption secures data at rest, and region selection ensures data residency.

Why this answer

Option A is correct because OCI Object Storage with default encryption ensures data is encrypted at rest using AES-256, and a bucket policy can enforce that data remains within a specific geographic region by restricting cross-region replication or access. This combination directly meets the requirements of regional data residency and encryption at rest for training data used in OCI Data Science.

Exam trap

The trap here is that candidates may confuse encryption at rest with data residency enforcement, assuming any encrypted storage (like Block Volume or File Storage) automatically guarantees geographic containment, but only Object Storage provides bucket-level policies to explicitly restrict data movement across regions.

How to eliminate wrong answers

Option B is wrong because OCI Database with Transparent Data Encryption is designed for transactional workloads, not for storing large-scale training data for LLM fine-tuning, and it does not inherently enforce geographic region constraints on the data. Option C is wrong because OCI File Storage with export options and encryption can be mounted to a Data Science session, but it does not provide native mechanisms to enforce regional data residency; the data could be replicated or accessed across regions. Option D is wrong because OCI Block Volume with encryption attached to a notebook session encrypts data at rest, but it does not offer policy controls to ensure the data remains within a specific geographic region, as block volumes are tied to the compute instance's availability domain, not the broader region.

113
MCQeasy

An administrator needs to grant a data science team access to create and manage generative AI model endpoints in a specific compartment. Which policy should they create?

A.Allow group DataScientists to manage all-resources in compartment Production
B.Allow group DataScientists to use generative-ai-model-family in compartment Production
C.Allow group DataScientists to read generative-ai-model-family in compartment Production
D.Allow group DataScientists to manage generative-ai-model-family in compartment Production
AnswerD

This policy grants the required permissions.

Why this answer

Option D is correct because the verb 'manage' grants full CRUD (Create, Read, Update, Delete) permissions on the 'generative-ai-model-family' resource type, which is the specific resource family for generative AI model endpoints in OCI. This allows the DataScientists group to create and manage endpoints within the specified compartment without granting broader access to all resources.

Exam trap

Oracle often tests the distinction between 'use' and 'manage' verbs, where candidates mistakenly choose 'use' thinking it covers creation, but 'use' only allows invocation and access, not resource lifecycle management.

How to eliminate wrong answers

Option A is wrong because 'manage all-resources' grants excessive permissions beyond what is needed, including access to unrelated services like compute or storage, violating the principle of least privilege. Option B is wrong because 'use' only allows actions like invoking or accessing the resource, but does not permit creating, updating, or deleting model endpoints. Option C is wrong because 'read' only allows viewing or listing resources, with no ability to create or manage endpoints.

114
Multi-Selecthard

Which THREE of the following are best practices when deploying a generative AI model on OCI?

Select 3 answers
A.Store API keys in the model endpoint configuration.
B.Set up autoscaling for the endpoint.
C.Disable logging to save costs.
D.Use a dedicated AI cluster for production endpoints.
E.Enable content filtering on the endpoint.
AnswersB, D, E

Autoscaling handles variable load efficiently.

Why this answer

Option B is correct because autoscaling ensures that the generative AI endpoint can dynamically adjust compute resources based on real-time inference traffic, maintaining low latency and high availability while optimizing cost. On OCI, autoscaling policies can be configured for dedicated AI clusters to scale the number of model serving replicas in response to metrics like CPU utilization or request queue depth.

Exam trap

Oracle often tests the misconception that disabling logging is a valid cost-saving measure, but in reality, logging is essential for operational visibility and compliance, and costs can be managed through sampling or retention policies rather than outright disabling.

115
MCQeasy

A data scientist fine-tunes a model using OCI Data Science and wants to deploy it as a managed endpoint in OCI Generative AI. What must they do first?

A.Upload model artifacts to Object Storage and register in Model Catalog
B.Write a custom container
C.Create a dedicated AI cluster
D.Use OCI CLI to create an endpoint
AnswerA

This is the required first step to deploy a custom model.

Why this answer

To deploy a fine-tuned model as a managed endpoint in OCI Generative AI, the model artifacts must first be uploaded to Object Storage and registered in the Model Catalog. This is a prerequisite because OCI Generative AI endpoints pull model artifacts from the Model Catalog, which references the storage location. Without registration, the service cannot locate or serve the model.

Exam trap

The trap here is that candidates assume they can directly create an endpoint using CLI or SDK without first registering the model in the Model Catalog, overlooking the mandatory registration step that links the artifacts to the serving infrastructure.

How to eliminate wrong answers

Option B is wrong because custom containers are not required for managed endpoints in OCI Generative AI; the service provides built-in serving infrastructure for supported model formats. Option C is wrong because a dedicated AI cluster is used for training or batch inference, not for deploying a managed endpoint, which uses OCI's shared serving infrastructure. Option D is wrong because using OCI CLI to create an endpoint is a valid method, but it cannot succeed until the model is registered in the Model Catalog; the CLI command requires a model OCID from the catalog.

116
Multi-Selectmedium

Which TWO actions are recommended best practices for managing costs when using OCI Generative AI dedicated AI clusters?

Select 2 answers
A.Provision a fixed number of nodes to handle peak load
B.Use preemptible instances for non-critical inference workloads
C.Use autoscaling to adjust nodes based on demand
D.Stop the dedicated AI cluster when not in use
E.Use pay-as-you-go billing instead of preemptible instances
AnswersB, C

Preemptible instances are cheaper and suitable for fault-tolerant tasks.

Why this answer

Option B is correct because preemptible instances in OCI are significantly cheaper than standard instances and are ideal for non-critical inference workloads that can tolerate interruptions. This aligns with cost optimization best practices by allowing you to use spare compute capacity at a reduced rate for tasks that do not require continuous availability.

Exam trap

The trap here is that candidates may think stopping a dedicated AI cluster is a valid cost-saving action, but OCI dedicated AI clusters do not support a 'stop' state—you must terminate the cluster, which loses all configuration and data, making it impractical for intermittent use.

117
MCQmedium

A data science team at a healthcare company has fine-tuned a Llama 2 model using OCI Data Science and registered it in the Model Catalog. They want to deploy it as a managed endpoint using OCI Generative AI. The model requires 64 GB of GPU memory. The team has created a dedicated AI cluster with a single node shape that has 48 GB GPU memory. When they attempt to deploy the model, the deployment fails with an error indicating insufficient resources. The team has verified that the model artifact is correct and that the compartment policies allow deployment. What should the team do to successfully deploy the model?

A.Increase the number of nodes in the cluster to 2.
B.Enable model parallelism to split the model across nodes.
C.Select a node shape with higher GPU memory, such as 80 GB.
D.Reduce the model's precision from FP16 to INT8 to lower memory usage.
AnswerC

Using a node shape with sufficient memory allows the model to be loaded.

Why this answer

Option C is correct because the model requires 64 GB of GPU memory, but the dedicated AI cluster uses a node shape with only 48 GB. The only way to satisfy the memory requirement is to select a node shape with higher GPU memory, such as 80 GB, as OCI Generative AI managed endpoints require a single node to host the entire model. Increasing nodes or enabling model parallelism does not help because OCI Generative AI does not support distributed inference across nodes for managed endpoints, and reducing precision may not guarantee the model fits or may degrade accuracy.

Exam trap

The trap here is that candidates may think adding more nodes or enabling model parallelism can aggregate GPU memory, but OCI Generative AI managed endpoints do not support distributed inference across nodes, so the only valid solution is to use a node shape with sufficient single-GPU memory.

How to eliminate wrong answers

Option A is wrong because increasing the number of nodes to 2 does not solve the memory issue; OCI Generative AI managed endpoints deploy the model on a single node, and additional nodes are not used to aggregate GPU memory for inference. Option B is wrong because model parallelism is not supported for managed endpoints in OCI Generative AI; the service expects the entire model to fit on one node's GPU memory. Option D is wrong because reducing precision from FP16 to INT8 may lower memory usage, but it is not a guaranteed fix and could introduce accuracy loss; moreover, the question states the model requires 64 GB of GPU memory, and the team should first ensure the hardware meets the requirement rather than altering the model.

118
MCQmedium

You manage a generative AI model deployed on OCI Model Deployment that serves a chatbot application. The model is a 13B parameter LLM on a VM.GPU.A100.1 shape. Recently, you rolled out a new version of the model that is supposed to improve response quality. However, after the update, the application starts returning HTTP 500 errors and memory usage spikes. You need to update to the new version without causing downtime. The current deployment has 2 replicas with autoscaling enabled. Which strategy should you use to safely deploy the new model version?

A.Directly update the existing model deployment with the new model artifact
B.Create a second deployment with the new model, test it, then shift traffic using a load balancer
C.Stop the existing deployment, update the model artifact, then start the deployment
D.Increase the number of replicas to 4, then update the model
AnswerB

Blue-green deployment ensures no downtime and safe rollout.

Why this answer

Option B is correct because it implements a blue/green deployment strategy: you create a second deployment with the new model, test it in isolation, and then shift traffic using a load balancer. This avoids downtime and allows you to validate the new model before exposing it to production traffic, which is critical given the observed HTTP 500 errors and memory spikes.

Exam trap

The trap here is that candidates may assume increasing replicas provides safety through redundancy, but it does not prevent the new model from causing errors on all replicas; the key is isolation via a separate deployment and traffic shifting.

How to eliminate wrong answers

Option A is wrong because directly updating the existing model deployment with the new artifact would cause in-place changes, potentially triggering the memory spike and HTTP 500 errors on the live replicas, leading to downtime. Option C is wrong because stopping the existing deployment before updating causes complete downtime, violating the requirement to update without downtime. Option D is wrong because increasing replicas to 4 and then updating still performs an in-place update on all replicas, which does not isolate the faulty model and can still cause errors and memory spikes across the entire fleet.

119
MCQhard

A data scientist in group DataScientists uses the OCI Generative AI SDK to start a fine-tuning job in compartment AIResources. They receive the error shown. What is the most likely cause?

A.The compartment AIResources does not exist.
B.The fine-tuning API is not yet available in that region.
C.The fine-tuning job requires additional IAM policies for accessing the training data in Object Storage.
D.The data scientist is not in the DataScientists group.
AnswerC

The policy must also grant permissions on Object Storage buckets containing the training data.

Why this answer

Option C is correct because the error message indicates a permissions issue related to accessing training data in Object Storage. When using the OCI Generative AI SDK to start a fine-tuning job, the data scientist's IAM policies must explicitly grant read access to the bucket and objects containing the training data. Without these policies, the API call fails even if the user is in the correct group and the compartment exists.

Exam trap

The trap here is that candidates assume the error is about group membership or compartment existence, when in fact the fine-tuning job's dependency on Object Storage permissions is a classic oversight in OCI IAM policy configuration.

How to eliminate wrong answers

Option A is wrong because if the compartment AIResources did not exist, the error would be a '404 Not Found' or 'CompartmentNotFound' error, not a permissions-related error. Option B is wrong because the fine-tuning API is available in all OCI regions where Generative AI is supported; region unavailability would produce a 'ServiceNotSupported' or 'RegionNotSupported' error. Option D is wrong because the user is explicitly stated to be in the DataScientists group, and group membership alone does not grant access to Object Storage; IAM policies must be attached to the group or compartment to allow read access to training data.

120
MCQmedium

An administrator created the above IAM policies. A member of the GenerativeAIAdmins group reports they cannot invoke the model endpoint. Which permission is missing?

A.Permission to access the compartment
B.Permission to manage generative-ai-model
C.Permission to use or manage generative-ai-endpoint
D.Permission to read the model's training data
AnswerC

Only inspect is granted; need use or manage to invoke.

Why this answer

The error occurs because the IAM policy grants permissions for 'generative-ai-model' but not for 'generative-ai-endpoint'. Invoking a model endpoint requires the 'use' or 'manage' permission on the 'generative-ai-endpoint' resource type, as the endpoint is the runtime interface that handles inference requests. Without this permission, the API call to the endpoint is denied, even if the user has access to the underlying model.

Exam trap

The trap here is that candidates confuse the 'generative-ai-model' resource type (used for model lifecycle management) with the 'generative-ai-endpoint' resource type (required for runtime inference), leading them to select Option B instead of C.

How to eliminate wrong answers

Option A is wrong because compartment access is typically granted via a separate policy statement (e.g., 'Allow group to read compartments') and is not the specific missing permission for invoking an endpoint; the error is about resource-type permissions, not compartment-level access. Option B is wrong because 'manage generative-ai-model' allows management of the model resource (e.g., creating, updating, deleting models) but does not grant the runtime permission needed to invoke the endpoint for inference. Option D is wrong because reading the model's training data is a data-plane permission unrelated to endpoint invocation; model training data access is governed by object storage or data catalog policies, not by generative-ai-endpoint permissions.

121
MCQhard

An AI team is fine-tuning a large language model using OCI Data Science and plans to deploy the fine-tuned model using the Generative AI service's custom model deployment. What is the required format for the model artifacts?

A.A Git repository URL
B.A single .pth file
C.A Docker image with the model and inference code
D.A .zip archive containing model weights and configuration files
AnswerD

The custom model deployment requires a zip archive with all necessary files.

Why this answer

The OCI Generative AI service requires custom model artifacts to be packaged as a .zip archive containing the model weights, configuration files (e.g., config.json, tokenizer files), and any necessary inference code. This format ensures the service can extract and load the model correctly into its managed inference infrastructure, aligning with the standard Hugging Face model repository structure.

Exam trap

The trap here is that candidates may confuse OCI Generative AI's custom model deployment with OCI Data Science model deployment, which does support Docker images, leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because a Git repository URL is not a supported artifact format for OCI Generative AI custom model deployment; the service expects a static artifact file, not a live repository reference. Option B is wrong because a single .pth file contains only PyTorch model weights without the required configuration files (e.g., config.json, tokenizer.json) and inference code, making it incomplete for deployment. Option C is wrong because OCI Generative AI custom model deployment does not accept Docker images; it uses a serverless, managed inference environment that expects a .zip archive of model artifacts, not a containerized application.

122
MCQmedium

A team has deployed a generative AI model and needs to monitor inference performance and set up alerts for increased error rates. Which OCI service should they integrate with?

A.OCI Monitoring
B.OCI Cloud Guard
C.OCI Events
D.OCI Logging
AnswerA

Correct: Monitoring provides metrics and alerting for inference endpoints.

Why this answer

OCI Monitoring is the correct service because it provides metrics and alarms for tracking inference performance (e.g., latency, throughput) and error rates from deployed generative AI models. It allows you to set up threshold-based alerts on custom or predefined metrics, enabling proactive incident response. This directly addresses the requirement to monitor inference performance and alert on increased error rates.

Exam trap

Oracle often tests the distinction between monitoring (metrics/alarms) and logging (raw events) — candidates mistakenly choose OCI Logging because they think 'error rates' require log analysis, but OCI Monitoring is designed for metric-based alerting with thresholds.

How to eliminate wrong answers

Option B is wrong because OCI Cloud Guard is a security posture management service that detects misconfigurations and security threats, not a real-time performance monitoring or alerting tool for inference metrics. Option C is wrong because OCI Events is a notification service that reacts to state changes in OCI resources (e.g., object creation, instance termination) but does not natively track or alert on time-series performance metrics like error rates. Option D is wrong because OCI Logging collects and stores log data for audit and troubleshooting, but it lacks built-in metric-based alerting capabilities for monitoring inference performance trends or setting threshold alarms.

← PreviousPage 2 of 2 · 122 questions total

Ready to test yourself?

Try a timed practice session using only Deploying and Managing Generative AI on OCI questions.