CCNA Rag Vector Search Questions — Page 2 of 2

Multi-Selecthard

A developer is troubleshooting low recall in a vector search. Which THREE factors should be checked? (Choose three.)

Select 3 answers

A.Embedding model quality and relevance to domain

B.Chunk size and overlap strategy

C.Quality of the query embedding generation

D.The number of results returned (k) in the search

E.The LLM's temperature setting

AnswersA, B, C

A model not trained on similar data may produce poor embeddings.

Why this answer

Embedding model quality (A) directly affects relevance; chunk size and overlap (B) impact granularity; and query embedding quality (D) ensures the search input is properly represented.

Practice this question →

MCQhard

A company is using OCI Generative AI for a RAG-based code assistant. They index source code repositories into a vector store. Developers report that the assistant often suggests deprecated APIs or outdated code snippets, even though the latest code is in the repository. The index was built a week ago and has not been updated. They plan to set up incremental updates. However, they notice that even after re-indexing the latest commits, the issue persists. What is the most likely oversight?

A.The vector store is not configured to overwrite existing vectors for updated documents.

B.The retrieval top-k is set too low, missing some relevant snippets.

C.The chunking strategy splits code at function boundaries, losing import statements.

D.The embedding model is not fine-tuned on code; it was trained on natural language.

AnswerA

Without overwrite, old vectors persist even after re-indexing, causing retrieval of outdated code.

Why this answer

Option A is correct because if the vector store does not overwrite or update vectors for changed documents, old vectors remain, causing retrieval of outdated code. Option B (chunking at function boundaries) may cause missing imports but not specifically deprecation. Option C (embedding model not fine-tuned on code) might affect quality but not freshness.

Option D (low top-k) would affect recall, not freshness.

Practice this question →

MCQeasy

What is a recommended practice to prevent the LLM from generating information not present in the retrieved context when building a RAG application?

A.Setting the temperature to 0.

B.Using a system message that says 'Use only the provided context to answer.'

C.Including few-shot examples in the prompt.

D.Increasing the topK value.

AnswerB

This instruction directly constrains the model to the context.

Why this answer

Including a system message that explicitly instructs the model to rely only on the provided context reduces the likelihood of hallucination.

Practice this question →

MCQeasy

When building a RAG application for document retrieval, which chunking strategy is recommended to maximize retrieval accuracy?

A.Use fixed-size token chunks with no overlap

B.Use overlapping chunks with a sliding window

C.Use random splitting points

D.Use entire documents as single chunks

AnswerB

Overlap ensures contextual continuity between chunks.

Why this answer

Overlapping chunks with a sliding window preserve context at chunk boundaries, improving the chance that relevant text is captured in at least one chunk.

Practice this question →

MCQhard

A team is optimizing a RAG pipeline for OCI Generative AI. They observe that the model's responses are verbose and often include irrelevant details from the retrieved chunks, reducing user satisfaction. They have already tuned the prompt template. What is the most effective next step?

A.Apply instruction tuning on the generation model.

B.Implement a re-ranking step using a cross-encoder model.

C.Reduce the number of retrieved chunks from 5 to 3.

D.Increase the similarity threshold for retrieval from 0.7 to 0.85.

AnswerB

Re-ranking scores each chunk for relevance to the query, filtering out noise.

Why this answer

Option C is correct because re-ranking with a cross-encoder can filter out irrelevant chunks before generation, improving response quality. Option A reduces quantity but does not guarantee relevance. Option B may miss relevant chunks.

Option D is costly and time-consuming, and may not address the issue of irrelevant details.

Practice this question →

Multi-Selecteasy

Which TWO of the following are valid similarity metrics used in vector search?

Select 2 answers

A.Levenshtein distance

B.Cosine similarity

C.Euclidean distance

D.Hamming distance

E.Jaccard index

AnswersB, C

Commonly used for normalized vectors.

Why this answer

Cosine similarity and Euclidean distance are standard metrics for vector search. Jaccard is for sets, Levenshtein for strings, Hamming for bit vectors.

Practice this question →

MCQhard

A healthcare company is building a RAG-based chatbot to answer patient queries using medical documents stored in OCI Object Storage. They use OCI Generative AI service with Cohere Command R+ model and OCI OpenSearch as the vector database. The chatbot is deployed on OCI Compute with a Flask application. After deployment, the latency for each query is 15-20 seconds, which is unacceptable. Logs show that the embedding generation step (using OCI Generative AI embedding API) takes 8-10 seconds, and the vector search in OpenSearch takes 5-7 seconds. The team has already enabled connection pooling and increased the compute instance shape to the maximum allowed. Which action would MOST effectively reduce the overall latency?

A.Pre-generate embeddings for all documents during ingestion and store them in the vector database, so at query time only the query embedding is generated and compared.

B.Implement a caching layer with Redis to store previous query results and serve cached responses for identical queries.

C.Reindex the OpenSearch vector index with optimal settings (e.g., HNSW algorithm, ef_search param) to speed up vector search.

D.Switch to a faster embedding model like Cohere Embed v3 (English) which has lower latency.

AnswerA

This eliminates the need to generate embeddings for each document during the query path, drastically reducing latency.

Why this answer

The primary bottleneck is the embedding generation step (8-10 seconds). By pre-generating embeddings for all documents during ingestion and storing them in the vector database, the query-time embedding generation is eliminated, reducing the per-query latency to only the time needed to generate the query embedding and perform the vector search. This directly addresses the largest contributor to the 15-20 second latency.

Exam trap

The trap here is that candidates may focus on optimizing the vector search or caching responses, but the real bottleneck is the embedding generation step, which must be eliminated at query time through pre-generation during ingestion.

How to eliminate wrong answers

Option B is wrong because caching previous query results only helps for repeated identical queries, not for the vast majority of unique patient queries, and does not address the embedding generation bottleneck. Option C is wrong because while tuning HNSW parameters (like ef_search) can improve vector search speed, it only targets the 5-7 second search step, not the 8-10 second embedding generation step, so the overall latency reduction would be insufficient. Option D is wrong because switching to a faster embedding model may reduce embedding latency slightly, but the core issue is that embedding generation is still performed at query time for every query; pre-generation is a more fundamental optimization that eliminates the per-query embedding cost entirely.

Practice this question →

MCQeasy

A company is building a RAG application on OCI and needs a managed vector database with native support for AI Vector Search, which offers high performance and integration with OCI GenAI. Which OCI service should they use?

A.Autonomous Database with AI Vector Search

B.OCI OpenSearch

C.MySQL HeatWave

D.OCI Object Storage

AnswerA

Autonomous Database offers AI Vector Search, a key capability for RAG.

Why this answer

Oracle Autonomous Database provides AI Vector Search, enabling efficient similarity search on vector embeddings. OCI OpenSearch can also serve as a vector store but lacks the same level of AI Vector Search integration. MySQL HeatWave and Object Storage are not optimized for vector search.

Practice this question →

MCQeasy

A developer needs to generate embeddings for text data using the OCI Generative AI service. Which API should they call to get vector representations of text?

A.cohere.generate

B.cohere.embed

C.cohere.summarize

D.cohere.classify

AnswerB

Correct endpoint for generating embeddings.

Why this answer

The 'cohere.embed' endpoint is used for embedding generation in OCI GenAI. The 'cohere.generate' and 'cohere.summarize' are for text generation and summarization, respectively. 'cohere.classify' is for classification tasks.

Practice this question →

MCQeasy

A company uses OCI Generative AI to create embeddings for a vector search. They notice high latency in search queries. What is one possible optimization?

A.Decrease batch size for embedding creation

B.Use approximate nearest neighbor (ANN) search

C.Use exact search for better accuracy

D.Increase the embedding dimension

AnswerC

Exact search is slower; the optimization would be to use ANN.

Why this answer

Approximate nearest neighbor (ANN) search trades a slight reduction in accuracy for a significant speedup, addressing high latency.

Practice this question →

MCQeasy

A developer is building a RAG application using Oracle Cloud Infrastructure (OCI) Document Understanding and OCI Generative AI. After chunking documents and generating embeddings, the developer observes that the retrieval step often returns chunks that are semantically unrelated to the query. Which action is MOST likely to improve retrieval relevance?

A.Switch from a dense embedding model to a sparse embedding model.

B.Adjust the chunk size and chunk overlap to better capture coherent passages.

C.Increase the chunk size to capture more context.

D.Reduce the number of retrieved chunks (k) in the vector search.

AnswerB

Proper chunking helps preserve meaning and improves retrieval accuracy.

Why this answer

Option C is correct because adjusting the chunk size and overlap can significantly impact the quality of retrieved passages. Option A is wrong because increasing the chunk size may introduce more noise. Option B is wrong because reducing the number of retrieved chunks could miss relevant information.

Option D is wrong because the embedding model is already chosen; changing it may not fix the chunking issue.

Practice this question →

MCQmedium

An organization stores its knowledge base in Oracle Autonomous Database and wants to build a RAG chatbot using OCI Generative AI. The chatbot must retrieve the most relevant documents based on user queries. Which indexing approach is BEST suited for efficient similarity search on text embeddings?

A.Create an ANN index on the embedding vector column.

B.Create a bitmap index on the embedding vector column.

C.Create an inverted index on the document text column.

D.Create a B-tree index on the document text column.

AnswerA

ANN indexes enable fast approximate nearest neighbor search in vector databases.

Why this answer

Option A is correct because Approximate Nearest Neighbor (ANN) indexes are specifically designed for high-dimensional vector spaces, enabling efficient similarity search on embedding vectors. In Oracle Autonomous Database, ANN indexes (e.g., using IVF or HNSW algorithms) drastically reduce search latency compared to brute-force scans, which is critical for real-time RAG chatbot responses.

Exam trap

Oracle often tests the misconception that any index type can be applied to vector columns, but the trap here is that candidates confuse traditional database indexes (B-tree, bitmap, inverted) with specialized vector indexes, failing to recognize that only ANN indexes support distance-based similarity search on embeddings.

How to eliminate wrong answers

Option B is wrong because bitmap indexes are optimized for low-cardinality columns (e.g., gender or status flags), not for high-dimensional floating-point vectors, and they cannot perform similarity comparisons like cosine or Euclidean distance. Option C is wrong because inverted indexes are designed for full-text search on tokenized text, not for vector embeddings, and they cannot compute distances between vectors. Option D is wrong because B-tree indexes are for exact match or range queries on scalar data (e.g., numbers or short strings), and they do not support the distance-based ordering required for vector similarity search.

Practice this question →

MCQmedium

An enterprise RAG system must ensure that retrieved data comes only from authorized sources. Which OCI feature should be used to enforce this?

A.Data encryption at rest

B.OCI IAM policies for the vector database

C.Network security groups

D.Resource quotas

AnswerB

IAM policies control who can access the vector database and its data.

Why this answer

OCI IAM policies allow fine-grained access control to resources, ensuring only authorized users or services can retrieve data from the vector database.

Practice this question →

MCQeasy

A developer sends the above request to the OCI Generative AI API. The response returns an error: 'InvalidParameter: The parameter 'topP' is not supported for this model.' What is the most likely reason?

A.The 'cohere.command-r-plus-v1:0' model does not accept the topP parameter.

B.The model ID is deprecated.

C.The topP parameter value is out of range.

D.The JSON request format is incorrect.

AnswerA

Cohere's command-r-plus model only supports temperature for randomness control, not topP.

Why this answer

Option B is correct because the command-r-plus model does not support the topP parameter; it only supports temperature. Option A is wrong because topP is a valid parameter in some models. Option C is wrong because JSON is valid.

Option D is wrong because modelId is correct.

Practice this question →

MCQmedium

A document processing pipeline uses OCI Document Understanding to extract text from PDFs, then creates embeddings with OCI Generative AI. Some documents exceed the embedding model's token limit. What is the best approach?

A.Truncate the document to the token limit

B.Use a different embedding model with a higher token limit

C.Skip documents that exceed the limit

D.Split the document into chunks that fit the limit and embed each chunk separately

AnswerD

Chunking preserves full content and allows granular retrieval.

Why this answer

Splitting the document into chunks that fit the token limit and embedding each chunk separately is standard practice, ensuring all content is represented in the vector store.

Practice this question →

Multi-Selecthard

Which TWO are common causes of poor answer quality in a RAG system built on OCI Generative AI? (Choose two.)

Select 2 answers

A.Mismatch between the embedding model's training data and the domain of the documents.

B.Using a generation model that is too large for the task.

C.Setting the temperature parameter too low, causing overly deterministic outputs.

D.Insufficient number of relevant chunks in the document corpus for the given query.

E.Using only vector search without keyword-based fallback.

AnswersA, D

Domain mismatch leads to poor semantic alignment and irrelevant retrieval.

Why this answer

Option A is correct because the embedding model's training data determines the semantic space in which documents and queries are represented. If the model was trained on general text (e.g., Wikipedia) but the documents are from a specialized domain (e.g., medical or legal), the embeddings will fail to capture domain-specific nuances, leading to poor retrieval relevance and thus poor answer quality in the RAG system.

Exam trap

Oracle often tests the distinction between retrieval-side failures (like embedding mismatch or insufficient chunks) and generation-side parameters (like temperature or model size), so candidates mistakenly attribute poor answer quality to generation settings rather than the retrieval pipeline.

Practice this question →

MCQmedium

A legal firm needs an AI assistant that can answer questions based on a large corpus of internal regulations that change quarterly. The firm also requires high accuracy and the ability to cite sources. Which approach should the firm choose?

A.Build a RAG application with vector search and citation generation

B.Use a pre-trained model without customization

C.Implement a rule-based search engine

D.Fine-tune a pre-trained model on the current regulations

AnswerA

RAG retrieves relevant documents and can cite sources, and updating the knowledge base is straightforward.

Why this answer

A RAG application with vector search retrieves relevant regulations and allows citation generation, keeping answers up-to-date without retraining.

Practice this question →

Multi-Selectmedium

A data scientist is designing a RAG system using OCI Data Science and OCI Generative AI. Which two considerations are critical for optimal retrieval quality? (Choose 2.)

Select 2 answers

A.Use the same embedding model for both indexing and querying.

B.Increase the chunk size to the maximum allowed to capture more context.

C.Fine-tune the generation model on domain-specific data.

D.Apply metadata filtering to restrict search domain before vector search.

E.Use a hierarchical index structure for faster search.

AnswersA, D

Mismatched embeddings reduce similarity accuracy.

Why this answer

Options A and C are correct. Using the same embedding model for indexing and querying ensures consistent vector representation, and metadata filtering helps narrow down the search domain, improving relevance. Option B is wrong because too-large chunks can dilute relevance.

Option D is about generation, not retrieval. Option E is about performance, not quality.

Practice this question →

MCQeasy

A developer is building a RAG application using OCI Generative AI. They notice that the generated responses often contain outdated information even though the knowledge base is updated daily. What is the most likely cause?

A.The embedding model is not fine-tuned on the latest data.

B.The vector database index is not rebuilt after data updates.

C.The retrieval top-k is set too high.

D.The chunk size is too small, causing loss of context.

AnswerB

If the index is not refreshed, new data is not searchable, leading to outdated results.

Why this answer

Option C is correct because if the vector index is not rebuilt after data updates, retrieval will still return old chunks. Option A is wrong because fine-tuning the embedding model is not required for updating knowledge. Option B is wrong because chunk size affects context but not freshness.

Option D is wrong because a high top-k would include more results, but still old if not updated.

Practice this question →

MCQmedium

A developer wants to deploy a RAG application using OCI Generative AI for both embedding and text generation while minimizing costs. Which strategy is most effective?

A.Use a larger generation model

B.Cache frequent queries and their embeddings

C.Reduce chunk size to decrease embedding calls

D.Use a larger embedding model for better accuracy

AnswerB

Caching reduces redundant embedding API calls, lowering costs.

Why this answer

Caching embeddings for frequent queries eliminates repeated embedding API calls, directly reducing cost.

Practice this question →

MCQhard

A large organization is deploying a multi-tenant RAG application on OCI, where each tenant has its own set of documents. They use a shared OCI OpenSearch cluster with tenant_id metadata to filter documents. They observe that occasionally, queries from one tenant return results from another tenant's documents. The security team requires strict isolation. They have verified that the metadata filter is correctly applied in the search request. What is the most likely root cause?

A.The OpenSearch index has not been refreshed after ingestion of new documents.

B.The tenant_id field is not indexed as a keyword, causing incorrect filtering.

C.The embedding model has been trained on data from multiple tenants, causing cross-tenant leakage.

D.The metadata filter is being applied after the vector search instead of before.

AnswerB

If the field is not indexed properly, the filter may not match correctly, returning results from other tenants.

Why this answer

Option D is correct because if the tenant_id field is not indexed as a keyword, the filter may not be applied correctly, leading to cross-tenant results. Option A (index not refreshed) affects availability, not isolation. Option B (order of filter) does not matter.

Option C (embedding model training) does not cause retrieval leakage.

Practice this question →

MCQhard

A company is using Oracle Database 23ai AI Vector Search for their RAG pipeline. They notice that similarity search often returns chunks that are semantically unrelated but syntactically similar due to token overlap. Which vector index type should they consider to improve semantic relevance?

A.IVF_SQ8 index

B.IVF_FLAT index

C.HNSW index

D.Use the default index type, which is IVF_FLAT

AnswerC

HNSW builds a hierarchical graph that captures semantic neighborhood better, reducing token overlap effects.

Why this answer

Option C is correct because the Hierarchical Navigable Small World (HNSW) index is more effective for semantic search than IVF indices because it preserves global graph structure. Option A is wrong because IVF_FLAT uses inverted files and may suffer from token overlap bias. Option B is wrong because IVF_SQ8 is a quantized version of IVF, not better for semantics.

Option D is wrong because the default index is often IVF_FLAT.

Practice this question →

Multi-Selectmedium

Which THREE are valid considerations when designing a RAG pipeline that uses OCI Generative AI and OCI OpenSearch? (Choose three.)

Select 3 answers

A.OCI OpenSearch only supports Euclidean distance for vector similarity.

B.Each document must be converted to a single vector for efficient retrieval.

C.The quality of the text extraction from OCI Document Understanding directly impacts retrieval accuracy.

D.The generation model's context window size limits the number of chunks that can be included in the prompt.

E.The chunk size and overlap must be tuned based on the document type and query patterns.

AnswersC, D, E

Poor extraction leads to noisy embeddings and irrelevant results.

Why this answer

Option C is correct because OCI Document Understanding performs text extraction from documents (e.g., PDFs, images). If the extraction is poor (e.g., missing text, OCR errors), the resulting chunks will be inaccurate, directly degrading the quality of vector embeddings and thus retrieval accuracy in the RAG pipeline.

Exam trap

Oracle often tests the misconception that vector databases only support one similarity metric (like Euclidean) or that documents must be stored as single vectors, when in practice they support multiple metrics and chunking is essential for effective retrieval.

Practice this question →

MCQhard

A RAG system returns irrelevant chunks even though the embedding model and vector index are correctly configured. After reviewing, the chunks are too large and contain extraneous information. Which combination of adjustments should be made to improve relevance?

A.Increase chunk overlap only.

B.Decrease chunk size and increase chunk overlap.

C.Use semantic chunking and adjust topK.

D.Reduce chunk size, increase overlap, and adjust topK.

AnswerD

All three adjustments can help refine the retrieved context.

Why this answer

Adjusting chunk size, chunk overlap, and topK all influence retrieval quality. A holistic tuning is often needed to address irrelevant chunks.

Practice this question →

100

Multi-Selecteasy

Which TWO of the following are best practices for building a RAG pipeline in OCI?

Select 2 answers

A.Use overlapping chunks

B.Always use exact vector search for accuracy

C.Use a pre-trained embedding model from OCI Generative AI

D.Avoid storing metadata alongside vectors

E.Use a single large chunk for each document

AnswersA, C

Overlapping chunks preserve context across boundaries, improving retrieval.

Why this answer

Overlapping chunks improve context preservation, and using a pre-trained embedding model provides a strong baseline for retrieval.

Practice this question →

101

MCQeasy

A company is building a RAG application for customer support. The knowledge base includes documents in English, Spanish, and French. Which embedding model should they use from OCI Generative AI to ensure accurate retrieval across all languages?

A.cohere.embed-multilingual-light-v3.0

B.cohere.generate-english-v2:0

C.cohere.embed-english-light-v3.0

D.cohere.command-r-plus-v1:0

AnswerA

This multilingual embedding model is designed to handle multiple languages, providing accurate retrieval for the company's needs.

Why this answer

Option B is correct because the cohere.embed-multilingual-light-v3.0 model supports multiple languages, making it suitable for multilingual retrieval. Option A is wrong because it is English-only. Option C is wrong because it is Cohere's command model, not an embedding model.

Option D is wrong because it is a text generation model, not an embedding model.

Practice this question →

102

MCQhard

A research institution uses OCI Data Flow to process large-scale document corpora for a RAG system. They want to minimize latency for end-user queries. Which architecture decision would most effectively reduce query latency?

A.Embed documents on-the-fly during query time to ensure freshness.

B.Use a larger, more accurate embedding model.

C.Increase the number of Spark workers for parallel processing of queries.

D.Precompute embeddings offline using OCI Data Flow and store them in an OCI OpenSearch index.

AnswerD

Precomputation removes runtime embedding cost.

Why this answer

Precomputing embeddings offline with OCI Data Flow and storing them in an OCI OpenSearch index eliminates the need to generate embeddings at query time, which is the primary source of latency. This approach shifts the computationally expensive embedding generation to a batch process, allowing queries to perform only a fast vector similarity search against the precomputed index, drastically reducing end-user response time.

Exam trap

The trap here is that candidates often confuse batch processing with real-time processing, assuming that more parallelism (Option C) or a better model (Option B) can solve latency issues, when in fact the fundamental latency reduction comes from moving the expensive embedding computation out of the query path entirely.

How to eliminate wrong answers

Option A is wrong because embedding documents on-the-fly during query time introduces significant latency, as the embedding model must process each document in real time, which is impractical for large-scale corpora and defeats the purpose of minimizing query latency. Option B is wrong because using a larger, more accurate embedding model increases the computational cost and time for each embedding generation, which would actually increase latency rather than reduce it, especially if done at query time. Option C is wrong because increasing the number of Spark workers for parallel processing of queries does not address the bottleneck of embedding generation; Spark workers are used for batch processing in OCI Data Flow, not for real-time query serving, and adding more workers would not reduce the latency of the embedding step itself.

Practice this question →

103

MCQhard

A RAG application is hallucinating because the LLM receives irrelevant context from the retrieval step, even when topK is set to 3. Which strategy would best reduce hallucination by improving the relevance of retrieved documents?

A.Reduce the chunk size to one sentence per chunk

B.Add a reranking step after retrieval to select the most relevant chunks

C.Implement a query rewriting mechanism

D.Increase topK to 10 to provide more context

AnswerB

Reranking improves the relevance of the final context set.

Why this answer

Adding a reranking step after initial retrieval can filter out irrelevant documents, improving the quality of context fed to the LLM. Increasing topK would add more noise. Using a smaller chunk size might help but not as targeted.

Changing the query rewriting may not address the core issue of ranking.

Practice this question →

104

MCQeasy

An organization needs to extract text from PDF documents and convert them into embeddings for a RAG pipeline using OCI. Which OCI service is best suited for extracting text from PDFs?

A.OCI Language

B.OCI Speech

C.OCI Vision

D.OCI Document Understanding

AnswerD

This service provides OCR and text extraction from documents.

Why this answer

OCI Document Understanding is specifically designed to extract text and structured data from documents like PDFs, making it the ideal choice for preprocessing.

Practice this question →

105

MCQhard

A company is deploying a RAG pipeline using OCI Data Science and OCI Generative AI. The pipeline uses a Cohere command model for generation and a Cohere embed model for retrieval. The team notices that the model occasionally produces hallucinated answers that are not supported by the retrieved context. Which strategy is MOST effective at reducing hallucinations?

A.Implement a faithfulness verification step that re-ranks retrieved passages based on alignment with the generated answer.

B.Increase the temperature parameter of the generation model.

C.Increase the number of retrieved chunks (k) to provide more context.

D.Use a larger generative model with more parameters.

AnswerA

A verification step can detect and mitigate unsupported claims.

Why this answer

Option D is correct because incorporating a faithfulness check that re-ranks retrieval results can directly filter out unsupported claims. Option A is wrong because increasing temperature may increase randomness and hallucinations. Option B is wrong because more retrieved chunks can introduce conflicting information.

Option C is wrong because a larger model does not guarantee faithfulness and increases cost.

Practice this question →

106

MCQmedium

An organization wants to combine keyword search and vector search to improve retrieval accuracy in their RAG pipeline. Which OCI service provides built-in hybrid search capabilities?

A.OCI Search with AI

B.OCI OpenSearch

C.Autonomous Database with AI Vector Search

D.OCI Logging

AnswerB

OpenSearch integrates BM25 and vector search.

Why this answer

OCI OpenSearch supports both BM25 keyword search and k-NN vector search in a single query, enabling hybrid search. Autonomous Database with AI Vector Search focuses on vector search but lacks native keyword search. OCI Search is a different service.

OCI Logging is for logs.

Practice this question →

107

MCQhard

In OCI OpenSearch, a k-NN search query returns results with low precision. The index uses HNSW algorithm. The search parameters are: `k=10`, `ef_search=100`. To improve recall without significantly increasing latency, which parameter should be adjusted?

A.Increase `ef_search`

B.Decrease `ef_search`

C.Decrease `k`

D.Increase `k`

AnswerA

Larger ef_search explores more candidates, increasing recall at a small latency cost.

Why this answer

Increasing `ef_search` expands the search dynamic list, improving recall but also increasing latency. The question asks for improving recall without significantly increasing latency, but among options, increasing `ef_search` is the only direct control for recall. Note: the correct answer is the one that improves recall; latency increase is expected but minimal if adjusted moderately.

Practice this question →

108

MCQeasy

Refer to the exhibit. A RAG application logs this error when trying to search. What is the most likely cause?

A.The embedding model is incompatible

B.The OpenSearch cluster is not accessible

C.The index name is misspelled in the application configuration

D.The query syntax is incorrect

AnswerC

A mismatch between the configured index name and the actual index causes this exception.

Why this answer

The error clearly states that the index 'rag-index' does not exist. This typically occurs when the index name in the application configuration is misspelled or doesn't match the actual index.

Practice this question →

109

MCQmedium

A developer calls the OCI GenAI embedding API as shown in the exhibit. What is the most likely cause of the error?

A.The API key does not have permission to call the endpoint

B.The endpoint ID is incorrect

C.The input string is too long for the embedding model

D.The model ID is not supported for embedding

AnswerC

The error confirms the input exceeds the token limit.

Why this answer

The error message explicitly states that the input text length exceeds the maximum allowed length of 8192 tokens. The model ID and endpoint are correctly specified. The API key issue would show a different error.

The endpoint name is valid.

Practice this question →

110

MCQeasy

A developer wants to build a RAG application that processes highly sensitive medical records. The documents are already stored in OCI Object Storage. Which vector storage strategy best balances security and performance?

A.Store vectors in-memory within the application server

B.Use OCI OpenSearch with a public endpoint for low latency

C.Use OCI OpenSearch with a private subnet and VCN security lists

D.Use a third-party vector database outside OCI

AnswerC

Private subnet ensures network isolation, and security lists control access.

Why this answer

Using OCI OpenSearch with a private subnet and VCN security lists keeps data within a secure network while providing scalable search performance.

Practice this question →

111

Multi-Selectmedium

Which TWO actions can improve the retrieval accuracy of a RAG system? (Select two.)

Select 2 answers

A.Use a smaller chunk size for all documents

B.Remove stop words from documents before embedding

C.Increase the topK parameter significantly

D.Use a more accurate embedding model

E.Enrich chunk metadata and apply strict filters during retrieval

AnswersD, E

Better embeddings improve similarity search.

Why this answer

Using a more accurate embedding model (A) improves semantic matching. Enriching chunk metadata and applying filters (D) helps narrow down relevant documents. Increasing topK (B) may add noise.

Removing stop words (C) is standard but minor. Using a smaller chunk size (E) can help but may also lose context; not as direct as A and D.

Practice this question →

112

MCQhard

An AI engineer observes that the RAG application fails to retrieve relevant documents for certain user queries, despite having a comprehensive knowledge base. The issue appears to be a semantic gap between query phrasing and document content. Which technique should the engineer implement first to address this?

A.Switch from dense to sparse vector embeddings

B.Apply query expansion techniques before embedding the user query

C.Implement a re-ranking model to reorder retrieved results

D.Increase the chunk overlap to ensure more context

AnswerB

Query expansion broadens the query to capture more relevant documents.

Why this answer

Query expansion generates multiple paraphrases or related terms for the original query, increasing the chance of matching relevant documents. Re-ranking helps after retrieval, not for missing documents. Changing chunk size may help but is less targeted.

Using a different vector store doesn't directly address semantic mismatch.

Practice this question →

113

Multi-Selecthard

Which THREE factors should be considered when designing a vector search index for a RAG application that supports multiple languages?

Select 3 answers

A.Implement language identification as a preprocessing step.

B.Create separate vector indexes for each language.

C.Use a multilingual embedding model that supports all required languages.

D.Configure language-specific text analyzers for preprocessing documents.

E.Use larger chunk sizes for languages with complex morphology.

AnswersA, C, D

Allows proper analyzer selection.

Why this answer

Option A is correct because language identification as a preprocessing step ensures that documents are correctly tagged before indexing, which allows the system to apply appropriate language-specific tokenization, stop-word removal, and stemming. This prevents cross-language contamination in the vector index and improves retrieval accuracy for a multilingual RAG application.

Exam trap

Oracle often tests the misconception that separate indexes per language are required for multilingual support, but the correct approach is to use a single index with a multilingual embedding model and language-specific preprocessing.

Practice this question →

114

Multi-Selecteasy

Which THREE components are essential in a typical RAG architecture built on OCI? (Select three.)

Select 3 answers

A.Vector database (e.g., OCI OpenSearch, Autonomous Database)

B.Data ingestion pipeline with Apache Spark

C.Embedding model (e.g., Cohere Embed)

D.Large language model (e.g., Cohere Command)

E.Prompt template for system instructions

AnswersA, C, D

Required for storing and retrieving embeddings.

Why this answer

A vector store (A) for similarity search, an LLM (B) for generating answers, and a prompt template (E) to combine context and query. Embedding model (D) is also essential (but not listed as a separate option? Actually D is embedding model, so that is also essential. But we need exactly three.

The correct ones are A, B, D. A vector store, an LLM, and an embedding model are core. Prompt template is also core, but we have to select three.

Let's adjust: Options: A: vector store, B: LLM, C: data pipeline, D: embedding model, E: prompt template. Essential: A, B, D. Prompt template is important but not strictly essential if prompt is hardcoded.

Data pipeline is important but not part of runtime RAG. So A, B, D.

Practice this question →

115

MCQmedium

A data scientist is building a RAG application that processes PDF invoices. The extraction step uses OCI Document Understanding to convert PDFs to text. The scientist then splits the text into chunks and generates embeddings using OCI Generative AI. However, the retrieval often misses critical fields like invoice numbers and dates. Which preprocessing step would MOST likely improve retrieval of these specific fields?

A.Increase the chunk size to include entire invoices.

B.Apply stemming and lemmatization to the text before chunking.

C.Tag each chunk with metadata such as invoice number, date, and vendor, and use metadata filtering during retrieval.

D.Switch from dense embeddings to sparse embeddings for better exact match.

AnswerC

Metadata filtering enables precise retrieval based on structured fields.

Why this answer

Option C is correct because metadata tagging and filtering directly address the retrieval of specific fields like invoice numbers and dates. By attaching metadata (e.g., invoice number, date, vendor) to each chunk and filtering on these metadata fields during retrieval, the RAG system can precisely locate the relevant chunks without relying solely on semantic similarity. This approach leverages OCI Document Understanding's ability to extract structured data and OCI Generative AI's vector search capabilities to combine dense embeddings with exact metadata matching.

Exam trap

Oracle often tests the misconception that increasing chunk size or changing embedding type alone can solve retrieval failures for structured fields, when in reality metadata filtering is the correct technique for precise field-level retrieval in RAG applications.

How to eliminate wrong answers

Option A is wrong because increasing chunk size to include entire invoices reduces granularity, making it harder to retrieve specific fields like invoice numbers and dates, and may exceed the context window of the embedding model, degrading retrieval quality. Option B is wrong because stemming and lemmatization reduce words to root forms, which can obscure exact matches for critical fields like invoice numbers (e.g., 'INV-12345' becomes 'inv-12345') and dates (e.g., '2023-01-15' might be altered), harming retrieval precision. Option D is wrong because sparse embeddings (e.g., TF-IDF) improve exact keyword matching but still rely on the text content of chunks; without metadata tagging, the system cannot filter chunks by field type, so critical fields may still be missed if they appear in chunks with low keyword overlap.

Practice this question →

116

MCQmedium

A company has deployed a RAG application using OCI Generative AI service with a vector store in OCI OpenSearch. Users report that answers are often incomplete or irrelevant. The application uses a single prompt template with a fixed chunk size of 1000 tokens. Which action is most likely to improve answer quality?

A.Disable vector search and rely solely on the LLM's pre-trained knowledge

B.Use a smaller embedding model to reduce noise

C.Implement a re-ranking step after vector search

D.Increase the chunk size to 2000 tokens

AnswerC

Re-ranking improves precision by ordering chunks based on relevance to the query.

Why this answer

Implementing a re-ranking step after vector search helps filter and prioritize the most relevant chunks, improving answer quality. Larger chunks may dilute context, smaller models reduce accuracy, and disabling vector search defeats RAG purpose.

Practice this question →

117

MCQeasy

A developer is building a RAG pipeline using OCI Data Science and wants to store vector embeddings. Which OCI service is optimized for vector search and can be used as a vector store?

A.OCI Autonomous Database

B.OCI OpenSearch

C.OCI Object Storage

D.OCI Streaming

AnswerB

OCI OpenSearch includes a vector database plugin for k-NN similarity search, making it a suitable vector store.

Why this answer

Option B is correct because OCI OpenSearch provides a vector engine that supports k-NN search. Option A is wrong because OCI Object Storage is not a searchable vector store. Option C is wrong because OCI Autonomous Database has vector capabilities but is not primarily optimized for vector search at scale.

Option D is wrong because OCI Streams is for streaming data.

Practice this question →

118

Multi-Selecthard

Which THREE techniques effectively reduce query latency in a RAG system?

Select 3 answers

A.Pre-compute embeddings for all documents

B.Use approximate nearest neighbor search

C.Use a larger generation model

D.Increase the number of shards

E.Use a smaller embedding model

AnswersA, B, E

Pre-computed embeddings avoid real-time embedding calls during query.

Why this answer

Pre-computing embeddings for all documents eliminates the need to generate embeddings at query time, which is a computationally expensive step. By storing pre-computed vector representations, the system can directly perform similarity searches against the index, significantly reducing latency.

Exam trap

Oracle often tests the misconception that increasing model size or shard count always improves performance, but in RAG systems, these changes can introduce latency penalties due to higher computational overhead or distributed coordination costs.

Practice this question →

119

MCQmedium

A financial services company is deploying a RAG system for regulatory compliance queries. The system uses OCI Data Science to run a custom embedding model fine-tuned on regulatory documents. The index in OpenSearch uses cosine similarity and HNSW algorithm. Users report that queries containing synonyms to regulatory terms (e.g., "AML" vs "Anti-Money Laundering") often fail to retrieve relevant documents. Which combination of improvements would be MOST effective? (Assume budget and latency constraints)

A.Increase the `m` parameter in HNSW to improve recall

B.Fine-tune the embedding model further on a dataset of synonyms

C.Implement a hybrid search combining keyword and vector search

D.Use query expansion with a thesaurus before embedding

AnswerC

Hybrid search (BM25 + vector) directly captures exact term matches, bridging the synonym gap effectively.

Why this answer

Hybrid search (combining keyword (BM25) and vector search) catches exact synonym matches from text. Query expansion helps but may not be as reliable. Fine-tuning on synonyms is possible but time-consuming.

Increasing HNSW m slightly improves recall but does not address synonym gap.

Practice this question →

120

MCQmedium

A developer notices that the RAG application returns irrelevant chunks for user queries. The embedding model used is `cohere.embed-english-light-v3.0`. Which action is MOST likely to improve relevance?

A.Reduce the number of retrieved chunks (k)

B.Increase the chunk size

C.Switch to a larger embedding model (e.g., cohere.embed-english-v3.0)

D.Use a different similarity metric (e.g., Euclidean instead of cosine)

AnswerC

Larger models produce higher-quality embeddings, improving retrieval relevance.

Why this answer

Switching to the larger `cohere.embed-english-v3.0` model provides more powerful embeddings, capturing more semantic information. Increasing chunk size may include irrelevant content; changing similarity metric has marginal effect; reducing retrieved chunks may miss relevant ones.

Practice this question →

121

MCQeasy

A developer is testing a RAG application using OCI Generative AI. They receive an error: 'The model cohere.command-r-plus-v1:0 is not supported in this region.' What is the most likely cause?

A.The endpoint URL is incorrectly formatted.

B.The model is not available in the selected OCI region.

C.The tenancy is in a different availability domain.

D.The model name has a typo.

AnswerB

Cohere models are deployed in specific regions; the developer may be in a region where the model isn't provisioned.

Why this answer

The error message explicitly states that the model 'cohere.command-r-plus-v1:0' is not supported in the region. OCI Generative AI models are region-specific; each model is deployed only in certain OCI regions (e.g., us-ashburn-1, eu-frankfurt-1). If the selected region does not host that model, the API returns this error regardless of endpoint formatting, tenancy configuration, or model name spelling.

Exam trap

Oracle often tests the misconception that model availability is global across all OCI regions, leading candidates to overlook region-specific model deployment restrictions.

How to eliminate wrong answers

Option A is wrong because an incorrectly formatted endpoint URL would typically produce a 404 Not Found or a connection error, not a model-not-supported error. Option C is wrong because availability domains are a concept for compute instances, not for Generative AI model availability; the error is about regional model support, not AD-level placement. Option D is wrong because a typo in the model name would result in a 'model not found' error (e.g., 400 Bad Request), not a region-specific unsupported error.

Practice this question →

122

MCQhard

An engineer configured the above index mapping for vector search. When performing a k-NN search, the results are unexpected. What is the most likely issue?

A.The space type 'cosinesimil' is not supported; it should be 'cosine'.

B.The dimension 768 does not match the embedding model's output dimension.

C.The mapping uses 'knn_vector' type with 'faiss' engine, which is incompatible.

D.The space type at the index level and mapping level are mismatched.

AnswerD

Mismatch causes incorrect distance calculations.

Why this answer

Option D is correct because OpenSearch requires the space type to be consistently defined at both the index-level settings (method.parameters.space_type) and the field-level mapping (space_type). A mismatch between these two causes the k-NN search to behave unexpectedly, as the engine uses the index-level setting for distance computation while the mapping-level setting may be used for validation or other purposes.

Exam trap

Oracle often tests the nuance that OpenSearch requires consistency between index-level and mapping-level space_type settings, a detail that candidates overlook because they assume only the mapping-level setting matters.

How to eliminate wrong answers

Option A is wrong because 'cosinesimil' is a valid space type in OpenSearch (an abbreviation for cosine similarity), not an unsupported value. Option B is wrong because while a dimension mismatch can cause issues, the question states the mapping is configured for vector search and the results are unexpected; the dimension 768 is a common embedding size and is not inherently incorrect without evidence of mismatch. Option C is wrong because 'knn_vector' type with 'faiss' engine is fully compatible and supported in OpenSearch for vector search workloads.

Practice this question →

123

MCQeasy

What is the primary purpose of an embedding model in a RAG pipeline?

A.To convert text into numerical vectors.

B.To generate human-like responses.

C.To rank search results.

D.To summarize long documents.

AnswerA

Embedding models encode text semantically into vectors.

Why this answer

Embedding models convert text into dense vector representations that can be used for similarity search in a vector database.

Practice this question →

124

MCQhard

A developer implements a RAG chatbot using OCI Generative AI with streaming enabled. The chatbot fails to remember earlier conversation turns during a session. What is the most likely cause?

A.The max_tokens parameter is set too low.

B.The streaming endpoint does not support conversation history.

C.The application does not include previous messages in the request.

D.The temperature parameter is too high.

AnswerC

Session memory requires the client to send the conversation history in the messages list.

Why this answer

To maintain conversation history, the application must explicitly pass previous messages in each request. Without it, the model treats each query independently.

Practice this question →

125

MCQeasy

A retail company uses OCI Generative AI Service to build a RAG chatbot for product recommendations. The chatbot should consider both the user's query and the retrieved product descriptions. Which component of the RAG pipeline is responsible for combining these inputs before sending to the LLM?

A.Reranker

B.Document retriever

C.Embedding model

D.Prompt template

AnswerD

Merges user query and context into a single prompt.

Why this answer

The prompt template is the component in a RAG pipeline that structures the final input to the LLM by combining the user's query with the retrieved product descriptions. It defines the format and instructions (e.g., 'Based on these product descriptions, recommend...') that the LLM uses to generate a coherent response. Without a prompt template, the raw query and documents would be sent without context, leading to poor or irrelevant outputs.

Exam trap

Oracle often tests the misconception that the embedding model or retriever handles input combination, when in fact those components only deal with vector representation and retrieval, not prompt assembly.

How to eliminate wrong answers

Option A is wrong because a reranker reorders retrieved documents based on relevance scores after initial retrieval, but it does not combine inputs with the user query for the LLM. Option B is wrong because the document retriever fetches relevant documents from the vector store using similarity search, but it does not merge them with the query into a single prompt. Option C is wrong because the embedding model converts text into vector representations for search, but it plays no role in assembling the final input to the LLM.

Practice this question →