Which two metrics would you monitor to ensure a generative AI deployment on OCI is operating efficiently? (Choose two.)
Throughput indicates how many requests the system can handle.
Why this answer
Request throughput (requests per second) is a critical metric for monitoring the operational efficiency of a generative AI deployment on OCI because it directly measures the system's capacity to handle incoming inference requests. If throughput drops below expected levels, it indicates a bottleneck in the compute resources (e.g., GPU utilization) or the serving infrastructure, which can lead to degraded user experience and potential timeouts.
Exam trap
Oracle often tests the distinction between model quality metrics (like accuracy) and operational efficiency metrics (like latency and throughput), so the trap here is that candidates mistakenly select 'Model accuracy on validation set' because they confuse model performance with deployment performance.