CCNA Rag Vector Search Questions — Page 1 of 2

Multi-Selectmedium

Which TWO are best practices for chunking documents in a RAG pipeline? (Choose two.)

Select 2 answers

A.Use fixed-size chunks regardless of content boundaries.

B.Always use small chunks (e.g., 100 characters).

C.Chunk entire documents as a single chunk.

D.Use semantic chunking to preserve meaning.

E.Overlap chunks to avoid missing context at boundaries.

AnswersD, E

Semantic chunking maintains context within chunks.

Why this answer

Semantic chunking ensures each chunk contains a coherent idea, and chunk overlap prevents loss of context at boundaries, improving retrieval quality.

Practice this question →

MCQeasy

A developer is using OCI Generative AI to build a question-answering system over a large corpus of technical manuals. The developer uses the Cohere Embed model to generate embeddings and stores them in an OCI OpenSearch cluster. Queries are slow and the team needs to reduce latency. Which approach is BEST for improving search speed while maintaining acceptable accuracy?

A.Increase the embedding dimension for better representation.

B.Reduce the k value in the nearest neighbor search.

C.Use exact nearest neighbor search instead of approximate.

D.Increase the index refresh interval to reduce write overhead.

AnswerB

Fewer neighbors means less distance computation and faster retrieval.

Why this answer

Reducing the k value in the nearest neighbor search directly decreases the number of vectors that must be compared during query time, which lowers latency. In approximate nearest neighbor (ANN) search, a smaller k means fewer candidates are evaluated, speeding up retrieval while still maintaining acceptable accuracy if the original k was unnecessarily high. This is the most effective tuning knob for latency in vector search systems like OCI OpenSearch with Cohere embeddings.

Exam trap

The trap here is that candidates often confuse reducing k with reducing accuracy, but in practice, many RAG systems use a k value larger than necessary, and reducing it to a reasonable minimum (e.g., from 20 to 5) can dramatically improve speed without noticeable quality loss.

How to eliminate wrong answers

Option A is wrong because increasing the embedding dimension increases the computational cost of distance calculations and memory usage, which would worsen latency, not improve it. Option B is wrong because exact nearest neighbor search (k-NN) requires scanning all vectors, which is O(n) and significantly slower than approximate methods, especially on large corpora. Option D is wrong because increasing the index refresh interval reduces write overhead but does not affect query latency; it only delays the visibility of new documents.

Practice this question →

MCQmedium

A team uses OCI OpenSearch as a vector database for RAG. Some queries return no results despite relevant documents being indexed. What is a likely cause?

A.The vector index is not refreshed after adding documents

B.The number of candidates (k) is set too low

C.The query text is too long for the embedding model

D.The embedding model is incompatible with the document language

AnswerB

A very low k value may result in no matches being returned for queries with distant nearest neighbors.

Why this answer

If k (the number of nearest neighbors to return) is set too low, the search may not find any documents within the top k, returning no results.

Practice this question →

MCQmedium

Which OCI service provides a managed vector database capability that can be used as a knowledge base in a RAG architecture?

A.OCI MySQL HeatWave

B.OCI Database (Autonomous Database)

C.OCI Search with OpenSearch

D.OCI Object Storage

AnswerC

OpenSearch includes the k-NN plugin for vector search, managed by OCI.

Why this answer

OCI Search with OpenSearch supports vector indexing and search natively, making it a suitable managed solution for RAG vector storage.

Practice this question →

MCQmedium

A developer notices that the RAG system returns irrelevant chunks when the user query contains typos or abbreviations. Which technique would BEST improve retrieval robustness for such queries?

A.Decrease the chunk size to focus on smaller units.

B.Increase the number of retrieved chunks to cover more variations.

C.Use a spell-checker on the retrieved chunks.

D.Implement query rewriting or expansion using a language model before embedding.

AnswerD

Rewriting corrects typos and expands abbreviations, improving embedding quality.

Why this answer

Option D is correct because query rewriting or expansion using a language model (LLM) directly addresses typos and abbreviations by generating a corrected or enriched query before embedding. This improves the semantic alignment between the user's intent and the vector search, ensuring that even noisy input retrieves relevant chunks. Techniques like spelling correction or synonym expansion at query time are far more effective than post-retrieval fixes or parameter tuning.

Exam trap

Oracle often tests the misconception that retrieval robustness can be improved by tuning chunk size or retrieval count, when the real bottleneck is the quality of the query embedding itself.

How to eliminate wrong answers

Option A is wrong because decreasing chunk size does not fix typos or abbreviations; it only changes the granularity of retrieval units, potentially missing context or increasing noise. Option B is wrong because increasing the number of retrieved chunks may include more irrelevant results without correcting the query's semantic mismatch caused by typos or abbreviations. Option C is wrong because applying a spell-checker on retrieved chunks is a post-retrieval fix that cannot recover relevance lost during embedding of a malformed query; the damage is already done at the retrieval stage.

Practice this question →

Multi-Selecthard

A team is designing a RAG system for a multilingual knowledge base. Which TWO strategies are appropriate? (Choose two.)

Select 2 answers

A.Store separate vector indices per language

B.Disable vector search for non-English queries

C.Translate all documents to English before indexing

D.Use a different embedding model per language

E.Use a single embedding model trained for multilingual text

AnswersA, E

Separate indices allow language-specific preprocessing and retrieval optimizations.

Why this answer

Using a multilingual embedding model (A) handles multiple languages in a single pipeline, and storing separate vector indices per language (C) allows optimized retrieval for each language.

Practice this question →

MCQeasy

A startup is building a customer support chatbot using RAG with OCI Generative AI. They have a large corpus of FAQ documents stored as PDFs in OCI Object Storage. The developer uses OCI Language to embed the text and stores vectors in OCI OpenSearch. During testing, the chatbot often fails to answer questions because relevant FAQ entries are not retrieved. The team suspects the chunking size is too large, causing loss of specific details. After reducing chunk size, retrieval improves slightly but still misses many answers. What should the team do NEXT?

A.Use a sliding window chunking strategy with overlap

B.Increase the number of retrieved chunks (k)

C.Switch to a different embedding model

D.Manually rephrase the queries

AnswerA

Overlap preserves context across chunk boundaries, improving recall.

Why this answer

The problem is likely chunk boundaries cutting off context. Using a sliding window with overlap ensures continuity, so relevant information is not lost at chunk edges. Increasing k adds noise; switching model is expensive; manual rephrasing is not scalable.

Practice this question →

Multi-Selecthard

Which THREE factors should be considered when choosing a vector store for a RAG application in OCI?

Select 3 answers

A.The maximum vector dimension supported.

B.Capability of hybrid search (vector + keyword).

C.The CPU architecture of the vector store nodes.

D.Support for multi-tenancy and isolation.

E.Indexing latency for adding new vectors.

AnswersB, D, E

Hybrid search improves retrieval by combining semantic and exact matches, especially for rare terms.

Why this answer

Options B, D, and E are correct. Indexing latency (B) affects how fast new documents can be ingested. Multi-tenancy (D) is important for enterprise use.

Hybrid search support (E) combines vector and keyword search for better recall. Option A is wrong because CPU architecture is irrelevant. Option C is wrong because vector dimension is determined by the embedding model, not the store.

Practice this question →

MCQhard

A healthcare company is deploying a RAG application using OCI Generative AI and wants to ensure patient data privacy. They cannot send sensitive data to a public embedding endpoint. Which approach should they take to embed documents while maintaining data residency and security?

A.Use the standard OCI Generative AI public endpoint with data encryption in transit.

B.Use an external embedding service that complies with HIPAA in a different cloud region.

C.Hash the document text before sending to the public embedding endpoint.

D.Provision a dedicated AI cluster in their OCI tenancy to host the embedding model.

AnswerD

A dedicated cluster keeps all data within the customer's tenancy, meeting data privacy and residency requirements.

Why this answer

Option C is correct because OCI allows deploying Cohere models as dedicated AI clusters, ensuring data does not leave the customer's tenancy. Option A is wrong because the public endpoint processes data in Oracle's shared infrastructure. Option B is wrong because using a third-party API violates data residency.

Option D is wrong because hashing before embedding destroys semantic meaning.

Practice this question →

MCQmedium

A data scientist is using OCI Data Science to build a RAG system for medical literature. They have a large corpus of PDFs. They used the default OCI Generative AI embedding model and chunked each PDF into 512-character segments with 10% overlap. However, queries about specific drug doses often return incorrect information, even though the correct dose is present in the corpus. Upon inspection, they find that the retrieved chunks often contain partial dose information or miss the context units (e.g., mg vs. mcg). What improvement should they prioritize?

A.Implement a secondary verification step using a rule-based pattern matcher.

B.Use a semantic chunking strategy that respects document structure (e.g., paragraphs, sections).

C.Increase the chunk overlap to 50% to ensure more context.

D.Fine-tune the embedding model on medical text.

AnswerB

Preserving natural boundaries ensures that related information stays in one chunk.

Why this answer

Option A is correct because semantic chunking that respects document structure (e.g., paragraphs, sections) will keep dose information together. Option B (more overlap) may help but does not address structural breaks. Option C (fine-tuning) is heavy and may not fix chunk boundaries.

Option D is a workaround, not a core fix.

Practice this question →

MCQeasy

A company uses a RAG pipeline with OCI Data Science and Cohere embeddings. They notice that retrieval recall is low for domain-specific acronyms. What is the best practice to improve this?

A.Reduce the cosine similarity threshold in the vector search.

B.Expand acronyms to their full forms during document preprocessing and indexing.

C.Fine-tune the embedding model with domain-specific acronyms.

D.Increase the chunk size to include more context around acronyms.

AnswerB

Full forms improve semantic matching.

Why this answer

Expanding acronyms to their full forms during document preprocessing and indexing ensures that the embedding model can map the acronym to its semantic meaning, improving retrieval recall for domain-specific terms. Cohere embeddings are trained on general text, so without expansion, acronyms like 'NLP' may not match queries for 'Natural Language Processing' in vector space. This preprocessing step directly addresses the root cause of low recall for acronyms.

Exam trap

Oracle often tests the misconception that fine-tuning the embedding model is the default fix for retrieval issues, when in practice simpler preprocessing techniques like acronym expansion are more efficient and recommended for domain-specific vocabulary gaps.

How to eliminate wrong answers

Option A is wrong because reducing the cosine similarity threshold would increase the number of retrieved chunks but also introduce more irrelevant results, degrading precision without fixing the underlying embedding mismatch for acronyms. Option C is wrong because fine-tuning the embedding model is resource-intensive and typically unnecessary for this issue; preprocessing acronyms is a simpler, more effective solution that avoids retraining. Option D is wrong because increasing chunk size may add more context but does not resolve the core problem that the acronym itself is not semantically represented in the embedding space, so the retrieval still fails to match the intended concept.

Practice this question →

MCQmedium

A development team notices that their RAG application returns responses slowly when processing large PDF documents (100+ pages). They need to improve response time without significantly reducing retrieval quality. Which action is most effective?

A.Add a reranking step after initial retrieval

B.Use a smaller chunk size during document ingestion

C.Increase the topK parameter to retrieve more context

D.Switch to a larger embedding model for better accuracy

AnswerB

Smaller chunks mean faster embedding and retrieval.

Why this answer

Using smaller chunk sizes reduces the amount of text per embedding, speeding up retrieval and subsequent processing. Increasing topK would retrieve more contexts and slow down response. Switching to a more expensive model or adding a reranker would increase latency.

Practice this question →

MCQhard

An enterprise RAG application experiences high latency during peak hours. The architecture uses OCI OpenSearch with a single node cluster storing 5 million vectors (768 dimensions). The search uses exact k-NN (EF_SEARCH=500). The average query takes 1.5 seconds, but the SLA requires <500ms. The team considers several options: A) Switch to ANN with lower recall (HNSW with ef_search=50), B) Scale OpenSearch cluster to 3 nodes, C) Reduce embedding dimension to 256 using PCA, D) Increase the number of shards from 1 to 10. Which option provides the best balance of latency reduction and minimal impact on retrieval quality? (Assume all options are feasible)

A.Scale OpenSearch cluster to 3 nodes

B.Increase the number of shards from 1 to 10

C.Switch to ANN (HNSW with ef_search=50)

D.Reduce embedding dimension to 256 using PCA

AnswerB

More shards divide the vector set, allowing parallel exact searches on smaller partitions, reducing latency without quality loss.

Why this answer

Increasing shards on the same node partitions the index, so each shard contains fewer vectors, making exact search faster. This reduces latency without sacrificing accuracy. ANN reduces recall, scaling adds cost and complexity, and dimension reduction can degrade embedding quality.

Practice this question →

MCQhard

A team has set up a RAG pipeline using OCI Data Science with OCI OpenSearch as the vector store. The embedding model is from the OCI Generative AI service. Users note that the vector search returns irrelevant documents for many queries. Which of the following is the most likely cause?

A.The chunk size is too large, causing overlapping context

B.The OpenSearch cluster is too small to handle the load

C.The query is not being converted to an embedding before search

D.The embedding model dimension does not match OpenSearch index dimension

AnswerC

Without embedding the query, the vector store cannot perform semantic similarity search.

Why this answer

If the query is not converted to an embedding before search, the vector store may interpret the raw text as an invalid query or fall back to lexical search, yielding irrelevant results.

Practice this question →

MCQeasy

When invoking the OCI Generative AI service from a RAG application, the developer receives a 401 Unauthorized error. The application uses resource principal authentication from an OCI Data Science notebook session. What is the most likely fix?

A.Add the Generative AI service to the subnet's security list

B.Use an API key instead of resource principal

C.Ensure the dynamic group includes the data science notebook session and has the correct policy

D.Restart the notebook session

AnswerC

The dynamic group must match the session, and the policy must grant access to the Generative AI service.

Why this answer

Resource principal requires that the dynamic group containing the session has a policy granting the 'use generative-ai-embeddings' permission. If the session is not in the dynamic group, the policy does not apply.

Practice this question →

MCQhard

An enterprise is using OCI Generative AI with a RAG architecture. They observe that the LLM sometimes produces hallucinated answers that are not supported by the retrieved documents. Which strategy is most effective in reducing these hallucinations?

A.Increase the temperature parameter to make outputs more focused.

B.Provide clear instructions in the system prompt to answer only based on the provided context.

C.Use a smaller LLM to reduce model capacity.

D.Retrieve more chunks (increase top-k) to provide more context.

AnswerB

Explicit grounding instructions guide the model to stick to retrieved documents, reducing unsupported claims.

Why this answer

Option D is correct because instructing the LLM to only answer based on context reduces hallucinations. Option A is wrong because increasing temperature increases randomness, worsening hallucinations. Option B is wrong because adding more retrieved chunks may introduce conflicting information.

Option C is wrong because using a smaller model may increase hallucination.

Practice this question →

MCQhard

An application using OCI Generative AI returns a 403 Forbidden error when attempting to invoke a model. The user's API key is valid and the endpoint is correct. What is the most likely cause?

A.The model is out of capacity.

B.The region is not supported for the selected model.

C.The API request body is malformed.

D.The IAM policy does not grant the necessary permission to use the model.

AnswerD

403 errors indicate permission denial; the policy likely lacks an 'allow' statement.

Why this answer

OCI requires specific IAM policies to allow users to use Generative AI services. A missing or incorrect policy is the typical cause of 403 errors.

Practice this question →

MCQhard

A DBA has created the above vector index. After running queries, they observe that recall is lower than expected for approximate searches. Which change would most likely improve recall while maintaining query performance?

A.Change the index type from IVF to HNSW.

B.Increase the TARGET ACCURACY value to 99.

C.Increase the number of neighbor partitions (NEIGHBOR PARTITIONS) to 8.

D.Reduce the number of neighbor partitions to 2.

AnswerB

A higher TARGET ACCURACY forces the approximate search to consider more vectors, increasing recall at the cost of some latency.

Why this answer

Option D is correct because increasing the TARGET ACCURACY parameter forces the index to consider more candidates, improving recall. Option A is wrong because increasing neighbor partitions may improve performance but not necessarily recall. Option B is wrong because changing to HNSW would alter the index type but may require more rebuild.

Option C is wrong because reducing neighbor partitions reduces recall.

Practice this question →

MCQmedium

A company's RAG application ingests news articles that are updated frequently. The vector store in OCI OpenSearch contains embeddings of the articles. The team notices that outdated information is still retrieved even after updating the source documents. What is the most effective way to ensure the vector store reflects the latest content?

A.Increase the TTL for vector indices

B.Rely on the LLM to ignore outdated information

C.Re-index the entire vector store daily

D.Use the OCI OpenSearch document update API to replace embeddings for changed documents

AnswerD

Targeted updates minimize cost and ensure real-time accuracy.

Why this answer

Using the OCI OpenSearch document update API to replace embeddings for changed documents is efficient and targeted, ensuring immediate consistency.

Practice this question →

MCQhard

A company wants to build a multi-modal RAG system that can retrieve both text and images based on a user query. Which approach is most aligned with OCI GenAI capabilities?

A.Use OCI Document Understanding to convert images to text, then index text

B.Use separate vector stores for text and image embeddings

C.Use image captioning to generate text descriptions and index those

D.Utilize a multi-modal embedding model from OCI GenAI to embed both text and images into a common vector space

AnswerD

Multi-modal models enable direct retrieval of both types.

Why this answer

OCI GenAI supports multi-modal models like Cohere's multimodal embedding model, which can embed text and images into a shared vector space, enabling retrieval across modalities. Separate text and image models would not align the vectors. OCR-based text-only approach loses image semantics.

Using multiple vector stores complicates retrieval.

Practice this question →

MCQmedium

A developer receives the above error when querying a RAG application. What is the most likely cause and recommended action?

A.The API rate limit has been exceeded; wait for the retry period and implement exponential backoff.

B.The model is deprecated; update to the latest model.

C.The endpoint URL is incorrect; verify the OCI region endpoint.

D.The request payload is malformed; check the input format.

AnswerA

429 means rate limit.

Why this answer

The 429 (Too Many Requests) error indicates the API rate limit has been exceeded. In OCI Generative AI services, rate limits are enforced per tenancy and per region; the recommended action is to wait for the retry-after period and implement exponential backoff to avoid overwhelming the service.

Exam trap

Oracle often tests the distinction between HTTP status codes (429 vs 400 vs 404 vs 410) to see if candidates can map the exact error to the correct cause rather than guessing based on general troubleshooting.

How to eliminate wrong answers

Option B is wrong because a model deprecation would return a 404 or 410 error, not a 429. Option C is wrong because an incorrect endpoint URL would result in a 404 or connection timeout, not a rate-limit error. Option D is wrong because a malformed payload would produce a 400 Bad Request error, not a 429.

Practice this question →

MCQmedium

A company is building a RAG application using OCI Generative AI and OCI Search with OpenSearch. Users report that the responses from the LLM are not relevant to the queries, even though the document chunks seem appropriate. What is the most likely cause?

A.The embedding model is not suited for the domain.

B.Reranking is not enabled in the OpenSearch query.

C.The top K value is set too high.

D.The chunk size is too small, causing loss of context.

AnswerB

Reranking reorders search results for better relevance, significantly impacting quality.

Why this answer

Enabling reranking improves the relevance of retrieved documents by reordering them based on semantic match with the query. Without reranking, the initial vector search results may not be optimally ordered.

Practice this question →

Multi-Selecteasy

Which TWO are best practices for building a RAG application on OCI? (Choose two.)

Select 2 answers

A.Use a vector database such as OCI OpenSearch with ANN indexes for storing embeddings.

B.Generate embeddings for documents at query time to ensure freshness.

C.Pre-index the documents and update the index periodically to reflect new content.

D.Store the source documents only in OCI Object Storage and retrieve them at query time using full-text search.

E.Use a different embedding model for documents and queries to capture distinct semantics.

AnswersA, C

ANN indexes enable fast similarity search.

Why this answer

Option A is correct because OCI OpenSearch with Approximate Nearest Neighbor (ANN) indexes is a best practice for vector storage and retrieval in RAG applications. ANN indexes enable efficient similarity search over high-dimensional embeddings, which is essential for retrieving relevant context from large document collections at low latency.

Exam trap

Oracle often tests the misconception that real-time embedding generation or full-text search can substitute for precomputed vector indexes in RAG, when in practice latency and semantic alignment requirements make pre-indexing and ANN search mandatory.

Practice this question →

MCQeasy

When chunking a large Python code repository for a RAG application, which chunking strategy is best suited to preserve code semantics and functionality?

A.Semantic chunking based on function and class definitions

B.Fixed-size chunking with 512 tokens

C.Sentence-level chunking

D.Character-level chunking

AnswerA

Keeps logical code blocks intact.

Why this answer

Semantic chunking (e.g., splitting by function definitions, classes) keeps related code together, preserving context. Fixed-size or sentence splitting can break syntactical units. Character splitting destroys meaning.

Practice this question →

Multi-Selectmedium

Which THREE factors should be considered when designing a chunking strategy for a RAG application?

Select 3 answers

A.Desired granularity of retrieval

B.Number of GPUs available

C.Database indexing method

D.Document structure

E.Embedding model's maximum input tokens

AnswersA, D, E

Smaller chunks allow more precise retrieval; larger chunks provide more context.

Why this answer

Document structure (e.g., paragraphs), embedding model token limit, and desired retrieval granularity are key. GPU availability is unrelated; indexing method is post-chunking.

Practice this question →

Multi-Selecteasy

Which TWO of the following are valid approaches to serve a RAG application in OCI with low latency?

Select 2 answers

A.Pre-compute embeddings and answers for all possible questions.

B.Deploy the vector store on multiple regions to reduce network latency.

C.Increase the chunk size to reduce the number of retrievals.

D.Implement a caching layer for frequently asked questions.

E.Use an LLM that supports streaming response for faster user feedback.

AnswersD, E

Caching avoids redundant retrieval and generation, reducing latency for common queries.

Why this answer

Options C and D are correct. Using an LLM with streaming response (C) reduces perceived latency. Caching common queries (D) avoids repeated retrieval and generation.

Option A is wrong because pre-computing all possible answers is impractical. Option B is wrong because increasing chunk size can increase latency due to more tokens to process.

Practice this question →

MCQeasy

A company is building a RAG application using OCI Generative AI and wants to store embeddings for document retrieval. Which OCI service is most appropriate for storing and querying vector embeddings?

A.OCI MySQL Database

B.OCI Data Flow

C.OCI Search with OpenSearch

D.OCI Object Storage

AnswerC

OpenSearch supports k-nearest neighbor (k-NN) search and is the recommended vector store in OCI.

Why this answer

OCI Search with OpenSearch provides native vector search capabilities (k-NN) suitable for storing and querying embeddings. Object Storage is for blob data, MySQL is relational, and Data Flow is for big data processing.

Practice this question →

MCQmedium

A manufacturing company uses OCI OpenSearch to build a RAG application that retrieves procedural documents. After deployment, queries often return outdated procedures even though the vector index was refreshed. What is the most likely cause?

A.The embedding model was fine-tuned on outdated data.

B.The full-text search index is not synchronized with the vector index after updates.

C.The BM25 scoring algorithm prioritizes older documents due to term frequency.

D.The chunk overlap percentage is too high, causing duplicate context.

AnswerB

Outdated procedures remain in the text index if not reindexed.

Why this answer

Option B is correct because in a RAG application using OCI OpenSearch, the vector index and full-text search index are separate. When procedural documents are updated, the full-text search index may reflect changes immediately, but the vector index requires explicit re-indexing or synchronization to update embeddings. If the vector index is not refreshed after updates, queries can still retrieve outdated vector representations, leading to outdated results despite the index being refreshed.

Exam trap

The trap here is that candidates may assume 'refreshing the vector index' automatically synchronizes it with document updates, but in practice, vector indexes require explicit re-embedding and re-indexing, which is often overlooked in RAG architectures.

How to eliminate wrong answers

Option A is wrong because fine-tuning the embedding model on outdated data would affect all embeddings, not just those from refreshed documents, and the scenario specifies the vector index was refreshed, implying embeddings were regenerated. Option C is wrong because BM25 scoring is used for full-text search, not vector search; it prioritizes documents based on term frequency and inverse document frequency, not age, and would not cause outdated procedures to be returned if the vector index is correctly synchronized. Option D is wrong because chunk overlap percentage affects context continuity and duplication, not the freshness of retrieved data; high overlap might cause duplicate chunks but not outdated procedures.

Practice this question →

Multi-Selecthard

A company is deploying a RAG application for legal document analysis using OCI. Which three best practices should be followed to mitigate hallucinations? (Choose 3.)

Select 3 answers

A.Implement a fallback to abstain from answering if confidence is low.

B.Use a low temperature setting for the generation model.

C.Provide the source document citations in the prompt.

D.Include a verification step via a secondary model.

E.Increase the number of retrieved chunks to 10.

AnswersA, B, C

Avoids generating incorrect answers when retrieval is uncertain.

Why this answer

Options A, B, and E are correct. Providing source citations keeps the model grounded, low temperature reduces randomness, and a fallback to abstain avoids false answers. Option C is possible but adds complexity and latency.

Option D increases noise and may increase hallucinations.

Practice this question →

MCQeasy

A small business is building an internal Q&A bot using OCI Generative AI with RAG. They have indexed their product manuals into OCI OpenSearch using a precomputed embedding model. When they test queries, the bot often returns answers that are only partially relevant, and sometimes it cannot find answers for questions that are clearly present in the manuals. The developers suspect the chunking strategy is suboptimal. Currently, they use a fixed chunk size of 512 tokens with no overlap. What should they do to improve retrieval relevance?

A.Increase the chunk size to 1024 tokens to include more context.

B.Add a 20% token overlap between consecutive chunks.

C.Reduce the chunk size to 256 tokens to increase precision.

D.Switch to a sentence-based chunking strategy with no overlap.

AnswerB

Overlap ensures that context spanning chunk boundaries is preserved.

Why this answer

Option A is correct because adding overlap helps capture context across chunk boundaries, which is a common cause for missing information. Option B may reduce precision and cause noise. Option C without overlap still misses context.

Option D may increase missed context due to smaller chunks.

Practice this question →

MCQmedium

A financial firm deploys a RAG application using OCI OpenSearch. They observe that the LLM sometimes generates incorrect answers that are not supported by the retrieved documents. Which technique directly addresses this issue?

A.Use a more detailed system prompt instructing the model to not make up information.

B.Increase the temperature parameter of the LLM to reduce creativity.

C.Implement a post-generation verification step that checks if the answer is grounded in the retrieved chunks.

D.Increase the number of retrieved documents to provide more context.

AnswerC

Directly verifies faithfulness.

Why this answer

Option C is correct because it directly addresses the problem of hallucination by verifying that the LLM's output is factually supported by the retrieved documents. In a RAG pipeline, the LLM may still generate unsupported content even with good retrieval; a post-generation grounding check explicitly validates each claim against the source chunks, ensuring answer fidelity.

Exam trap

Oracle often tests the misconception that prompt engineering or parameter tuning alone can solve hallucination in RAG, when in fact a dedicated verification step is required to enforce factual grounding.

How to eliminate wrong answers

Option A is wrong because a more detailed system prompt instructing the model not to make up information is a soft constraint that LLMs can easily ignore, especially when the model is confident in its fabricated answer; it does not provide a deterministic mechanism to prevent hallucination. Option B is wrong because increasing the temperature parameter actually increases randomness and creativity, making hallucinations more likely; reducing temperature (closer to 0) would make outputs more deterministic and less creative, but it still does not guarantee grounding in retrieved documents. Option D is wrong because increasing the number of retrieved documents can introduce irrelevant or conflicting context, potentially confusing the LLM and increasing the chance of unsupported answers; it does not enforce that the final answer is actually supported by any specific chunk.

Practice this question →

MCQhard

A real-time customer support chatbot uses RAG with OCI Generative AI. The average response time is 5 seconds, which is too slow. The team identifies the vector search as the bottleneck. Which optimization would most reduce latency?

A.Use approximate nearest neighbor (ANN) search with a lower recall setting

B.Increase the number of retrieved documents to 10

C.Move the vector store to a different region

D.Switch to a larger embedding model for better accuracy

AnswerA

ANN speeds up search by sacrificing some recall, which can be mitigated by re-ranking.

Why this answer

Using approximate nearest neighbor (ANN) search with a lower recall setting reduces search time by trading off some accuracy for speed, which is often acceptable in real-time applications.

Practice this question →

MCQmedium

Refer to the exhibit. What is a potential issue with this OCI OpenSearch index template configuration?

A.The ef_construction parameter is set too low

B.The space_type in settings (l2) differs from the method's space_type (cosinesimil)

C.Number of replicas is 0, which provides no redundancy

D.The dimension 1024 is too large for the knn_vector type

AnswerB

This mismatch can cause inconsistency in how distances are computed during indexing and search.

Why this answer

The space_type defined at the index settings (l2) conflicts with the space_type defined in the method mapping (cosinesimil). This mismatch can lead to incorrect distance calculations and poor retrieval results.

Practice this question →

Multi-Selecthard

Which TWO best practices should be followed when designing a RAG application using OCI GenAI? (Select two.)

Select 2 answers

A.Batch all user queries to minimize costs

B.Store raw documents in the vector database for easy updates

C.Use OCI Dedicated AI Cluster for inference to ensure data privacy

D.Store embedding API keys in OCI Vault and rotate frequently

E.Use dedicated AI endpoints for sensitive workloads

AnswersC, E

Keeps data within tenancy.

Why this answer

Using dedicated AI endpoints (A) ensures isolation and performance. Monitoring with Vault (B) is good for secrets, not logs. Using inference on dedicated AI clusters (D) is a best practice.

Batching queries (C) is fine but not top. Storing raw documents (E) is unnecessary.

Practice this question →

MCQhard

A team is deploying a RAG system that uses OCI Generative AI to answer questions about internal HR policies. The system must comply with data residency requirements: all data processing must stay within a specific OCI region. The team uses OCI Data Science for orchestration. Which architecture BEST meets the data residency requirement?

A.Deploy the generative AI model endpoints within the same OCI region as the data and compute.

B.Use OCI Generative AI endpoints in a different region but store data in the required region.

C.Use an external third-party LLM endpoint that guarantees data residency.

D.Store embeddings in a different region but run inference in the required region.

AnswerA

All components remain in the specified region, ensuring compliance.

Why this answer

Option A is correct because deploying the generative AI model endpoints within the same OCI region as the data and compute ensures that all data processing—including inference, embedding generation, and vector search—occurs entirely within the required region, satisfying data residency requirements. OCI Generative AI endpoints are region-specific and do not automatically route requests to other regions, so co-locating all components avoids any cross-region data transfer.

Exam trap

Oracle often tests the misconception that data residency only applies to storage, not to processing—candidates may think storing data in the required region is sufficient, but the trap is that inference and embedding generation also count as data processing and must occur in the same region.

How to eliminate wrong answers

Option B is wrong because using OCI Generative AI endpoints in a different region while storing data in the required region would cause inference requests and model processing to occur outside the required region, violating data residency requirements. Option C is wrong because an external third-party LLM endpoint that guarantees data residency still requires data to leave the OCI region to reach the external service, which breaks the requirement that all data processing must stay within a specific OCI region. Option D is wrong because storing embeddings in a different region while running inference in the required region means the embedding data (derived from HR policies) resides outside the required region, failing the data residency constraint.

Practice this question →

MCQmedium

During load testing, the RAG application's response time increases significantly. The vector search is performed on millions of vectors. Which optimization would MOST reduce latency?

A.Increase the number of replicas in OpenSearch

B.Shard the index by document type

C.Use approximate nearest neighbor (ANN) search instead of exact

D.Use a smaller embedding model

AnswerC

ANN search is orders of magnitude faster than exact search for large datasets.

Why this answer

Approximate Nearest Neighbor (ANN) search uses indexes like HNSW to trade a small amount of accuracy for large speed gains, drastically reducing query time. Increasing replicas helps throughput but not per-query latency. Sharding organizes data but does not inherently reduce latency.

A smaller model may reduce computation but also harms quality.

Practice this question →

MCQeasy

An application uses RAG to answer customer queries, but answers are often incomplete because the retrieved chunks do not contain full context. Which adjustment should the developer make?

A.Use a different embedding model

B.Increase chunk overlap

C.Increase the number of retrieved chunks

D.Decrease chunk size

AnswerC

Retrieving more chunks provides more context to the generation model.

Why this answer

Increasing the number of retrieved chunks gives the model more contextual information, leading to more complete answers.

Practice this question →

MCQhard

Refer to the exhibit. A developer creates an index mapping for a vector search application. When performing a k-NN search query, the query fails with a parsing error. What is the most likely cause?

A.The query is missing the 'knn' query clause.

B.The dimension value 768 does not match the embedding model's output dimension.

C.The knn setting is not enabled at the index level.

D.The space_type should be 'l2' instead of 'cosinesimil'.

E.The engine should be 'nmslib' instead of 'faiss'.

AnswerB

A mismatch between mapping dimension and model dimension causes a parsing error during search.

Why this answer

The dimension in the mapping (768) must exactly match the output dimension of the embedding model used to generate the vectors. A mismatch causes parsing errors. The knn setting is correctly enabled (A is wrong). cosinesimil is a valid space_type (C is wrong). faiss is a valid engine (D is wrong).

Missing query clause would not cause a parsing error on the index (E is wrong).

Practice this question →

Multi-Selecteasy

A company wants to ensure their RAG application complies with data residency requirements. Data must not leave a specific OCI region. Which TWO actions are necessary? (Choose two.)

Select 2 answers

A.Enable cross-region access for the vector store

B.Use a global search interface

C.Deploy the OCI Generative AI service endpoint in the required region

D.Configure the vector store to replicate across regions for high availability

E.Use an embedding model hosted in the required region

AnswersC, E

Keeps LLM inference and any data sent to the endpoint within the region.

Why this answer

Using an embedding model hosted in the required region (A) and deploying the OCI Generative AI service endpoint in that region (C) ensure that all data processing stays within the region.

Practice this question →

MCQmedium

A healthcare organization plans to deploy a RAG application on OCI that handles sensitive patient data. They require that all LLM inference and embedding processing happen within a controlled environment to avoid data leakage to public endpoints. Which OCI feature should they use?

A.OCI Data Labeling

B.OCI Vault

C.OCI Data Masking

D.OCI Dedicated AI Cluster

AnswerD

Dedicated AI Cluster provides isolated compute for AI workloads.

Why this answer

OCI Dedicated AI Cluster provides a private, isolated environment for AI workloads, ensuring data stays within the customer's tenancy. OCI Data Labeling is for labeling. OCI Data Masking is for masking but not for inference isolation.

OCI Vault manages keys, but doesn't isolate inference.

Practice this question →

MCQhard

A security audit reveals that the RAG application exposes internal documents through the chatbot. The vector search index contains sensitive data. Which action should be taken FIRST to mitigate?

A.Reduce the number of retrieved chunks

B.Implement access control at the OpenSearch index level

C.Redact sensitive terms from documents before embedding

D.Use a different embedding model

AnswerB

Index-level security restricts which documents can be searched by user roles.

Why this answer

Implementing access control at the OpenSearch index level prevents unauthorized users from retrieving sensitive documents. Redaction reduces risk but is less comprehensive. Changing model or reducing chunks does not address the exposure.

Practice this question →

Multi-Selectmedium

Which TWO actions are best practices when deploying a RAG application using OCI OpenSearch and OCI Generative AI?

Select 2 answers

A.Embed every document chunk in real-time during query processing.

B.Implement a reranker to improve the relevance of retrieved documents.

C.Use very small chunk sizes (e.g., 50 tokens) to maximize granularity.

D.Monitor query latency and adjust the number of retrieved documents accordingly.

E.Set the LLM temperature to 1.5 to encourage diverse outputs.

AnswersB, D

Improves precision.

Why this answer

Option B is correct because implementing a reranker improves retrieval precision by re-scoring the top-k documents from the initial vector search using a cross-encoder model, which captures deeper semantic relevance than cosine similarity alone. In OCI OpenSearch, this is typically done via a post-processing step with OCI Generative AI or a dedicated reranking model, ensuring only the most contextually relevant chunks are passed to the LLM for generation.

Exam trap

Oracle often tests the misconception that real-time embedding (Option A) is efficient for RAG, when in fact pre-computed embeddings are standard, and that very small chunks (Option C) improve granularity, whereas they actually harm context coherence and retrieval quality.

Practice this question →

MCQeasy

A company uses OCI Generative AI's chat endpoint with RAG for customer support. They have observed that the model sometimes generates answers that contradict the retrieved context. The retrieved chunks are correct and relevant, but the model ignores them. What configuration change should they implement first?

A.Fine-tune the generation model on customer support dialogues.

B.Increase the number of retrieved chunks from 3 to 5.

C.Add explicit instructions in the system prompt to base answers solely on provided context.

D.Use a different embedding model for retrieval.

AnswerC

A strong prompt can enforce grounding in the retrieved chunks.

Why this answer

Option D is correct because strengthening the system prompt to enforce grounding on the provided context is the most immediate fix. Option A may not help if the model ignores context. Option B is about fine-tuning, which is resource-intensive.

Option C addresses retrieval, not generation behavior.

Practice this question →

MCQmedium

A data scientist is designing a RAG system with a large vector database (hundreds of millions of documents) and requires high recall accuracy. Which vector search index type should be used in OCI Search with OpenSearch?

A.LSH (Locality Sensitive Hashing)

B.Flat (brute-force)

C.HNSW (Hierarchical Navigable Small World)

D.IVF (Inverted File Index)

AnswerC

HNSW offers a good balance of high recall and reasonable latency, suitable for large-scale vector search.

Why this answer

HNSW (Hierarchical Navigable Small World) provides excellent recall and speed for large datasets, making it ideal for high-accuracy requirements.

Practice this question →

MCQhard

A healthcare startup is building a chatbot that retrieves patient treatment guidelines using OCI Generative AI Service and OCI OpenSearch. They require that all retrieved documents are from approved sources only and that the system can explain which source was used for each response. Which combination of features should they implement?

A.Add a metadata filter for source_type='approved' in the retrieval step and include document IDs in the context for the model.

B.Rely on the vector search's cosine similarity to rank approved sources higher.

C.Use prompt engineering to ask the model to ignore non-approved sources.

D.Reduce the top-K value to limit the number of retrieved documents.

AnswerA

Metadata filtering enforces source restriction; document IDs provide provenance.

Why this answer

Option A is correct because it directly addresses both requirements: a metadata filter on `source_type='approved'` ensures only approved documents are retrieved from OpenSearch, and including document IDs in the context allows the model to cite the specific source for each response. This approach enforces access control at the retrieval layer while providing traceability, which is essential for compliance in healthcare applications.

Exam trap

The trap here is that candidates may assume semantic similarity or prompt engineering alone can enforce access control, but in RAG systems, retrieval-layer filtering is the only reliable way to restrict document access before the model sees the content.

How to eliminate wrong answers

Option B is wrong because cosine similarity measures semantic relevance, not source approval status; approved and non-approved documents can be equally similar to a query, so ranking by similarity alone cannot guarantee that only approved sources are used. Option C is wrong because prompt engineering cannot reliably filter out non-approved sources; the model may still see and inadvertently use non-approved content in its context, and it has no inherent mechanism to verify source approval. Option D is wrong because reducing the top-K value limits the number of retrieved documents but does not enforce any approval criterion; non-approved documents can still appear in the top-K results if they are semantically similar.

Practice this question →

MCQmedium

An organization is experiencing low recall in their RAG system. They are using OCI OpenSearch as the vector store with cosine similarity. After reviewing the retrieved chunks, they notice that relevant documents are not being returned. Which configuration change is most likely to improve recall?

A.Use a deterministic ID generator for consistent chunk IDs.

B.Increase the chunk size to provide more context per chunk.

C.Reduce the chunk size to capture more granular information.

D.Switch similarity metric from cosine to Euclidean distance.

AnswerC

Smaller chunks increase the number of vectors and can help retrieve relevant passages that might be buried in larger chunks.

Why this answer

Option A is correct because reducing the chunk size increases the number of chunks and can capture more fine-grained information, improving recall at the cost of precision. Option B is wrong because increasing chunk size may reduce recall by missing details. Option C is wrong because switching to Euclidean distance does not inherently improve recall.

Option D is wrong because using a deterministic ID generator does not affect retrieval quality.

Practice this question →

MCQhard

Refer to the exhibit. What is the best action to resolve this error?

A.Decrease the temperature of the generation model

B.Increase the max_tokens parameter for generation

C.Reduce the number of retrieved documents

D.Use a smaller chunk size for documents

AnswerC

Reducing retrieved documents directly decreases the token count from that segment, bringing total under the limit.

Why this answer

The input exceeds the model's context length due to a high number of retrieved document tokens. Reducing the number of documents retrieved (or their size) is the most direct fix.

Practice this question →

MCQhard

You are a cloud architect at a global e-commerce company. The company is building a RAG-based product support chatbot using OCI Generative AI Service and OCI OpenSearch. The chatbot must answer customer questions in real-time by retrieving from a product knowledge base containing over 10 million documents. The current architecture uses a single vector index with all documents, and the LLM (Cohere Command R+) returns answers in English only. The team observes that queries from non-English customers often return irrelevant results, and the chatbot sometimes fails to generate answers within the 5-second SLA. The leadership wants to support 10 languages and reduce the average response time to under 3 seconds. You need to propose a solution that improves both relevance and latency. Which course of action should you take?

A.Increase the number of OCI OpenSearch nodes and upgrade the LLM to a faster variant.

B.Replace the embedding model with a multilingual model and partition the vector index by language to reduce search space.

C.Translate all non-English queries to English before retrieval and use an English-only embedding model.

D.Implement a caching layer for frequent queries and use a larger LLM for better accuracy.

AnswerB

Multilingual model improves relevance; partitioning improves latency.

Why this answer

Option B is correct because partitioning the vector index by language reduces the search space for each query, directly improving retrieval latency, while using a multilingual embedding model ensures that non-English queries are semantically matched to documents in their original language, improving relevance. This combination addresses both the 3-second SLA and the 10-language requirement without relying on translation, which introduces latency and potential loss of meaning.

Exam trap

The trap here is that candidates often assume translation is the simplest path to multilingual support, overlooking the latency and semantic drift it introduces, and fail to recognize that partitioning the index is a standard optimization for both relevance and speed in large-scale RAG systems.

How to eliminate wrong answers

Option A is wrong because simply scaling nodes and upgrading the LLM does not fix the root cause of irrelevant results for non-English queries—the embedding model remains English-only, so multilingual queries will still map poorly in vector space. Option C is wrong because translating all non-English queries to English before retrieval adds significant latency (often 200-500ms per translation) and can lose cultural or contextual nuances, making it unsuitable for a 3-second SLA and 10-language support. Option D is wrong because a caching layer only helps with repeated queries, not novel ones, and using a larger LLM increases inference latency, making it harder to meet the 3-second target; it also does not address the embedding mismatch for non-English content.

Practice this question →

MCQhard

A team fine-tunes an embedding model for a legal document RAG system but observes low retrieval recall. Which technique is most likely to improve recall?

A.Use a smaller batch size

B.Use hard negative mining during training

C.Reduce the learning rate

D.Increase the number of fine-tuning epochs

AnswerB

Hard negatives force the model to differentiate between similar but irrelevant documents, improving retrieval discrimination.

Why this answer

Hard negative mining exposes the model to challenging negatives during training, which sharpens the embedding space and improves recall.

Practice this question →

MCQmedium

Refer to the exhibit. A developer runs the command and immediately tries to use the endpoint. The application fails with an error indicating the endpoint is not active. What is the most likely reason?

A.The model ID is not available in us-ashburn-1

B.The purpose parameter is misspelled

C.The endpoint is in provisioning state and not yet ready

D.The compartment ID is incorrect

AnswerC

Endpoints take time to provision; using them immediately fails.

Why this answer

The service endpoint creation is asynchronous; the endpoint is initially in a 'provisioning' state and will become active after a few minutes.

Practice this question →

MCQmedium

A team uses Cohere's `rerank` endpoint after initial retrieval to improve result quality. What is the main benefit of reranking?

A.It generates new embeddings for chunks

B.It combines multiple queries

C.It reorders chunks by relevance to the query

D.It reduces the number of retrieved chunks

AnswerC

Reranking improves the ordering so the most relevant appear first.

Why this answer

Reranking reorders the initially retrieved chunks by more accurately assessing relevance to the query, improving the quality of the top-k results presented to the LLM. It does not reduce the number of chunks, generate new embeddings, or combine queries.

Practice this question →

Multi-Selectmedium

Which THREE of the following are likely causes if retrieval returns no results despite documents being indexed in an OCI OpenSearch vector store?

Select 3 answers

A.The embedding model dimension mismatch

B.The k-NN algorithm is misconfigured (e.g., k=0)

C.The query embedding is out of distribution

D.The database connection string is incorrect

E.The index is not fully built or refreshed

AnswersB, D, E

Misconfiguration like k=0 causes no candidates to be returned.

Why this answer

An unbuilt/refreshed index, a misconfigured k-NN algorithm, and an incorrect connection string are common causes of empty retrieval results.

Practice this question →

MCQmedium

A team is designing a RAG system for legal document review. They want to ensure that the retrieved chunks are contextually coherent and not truncated mid-sentence. Which chunking strategy should they use?

A.Recursive chunking based on sentence boundaries.

B.Token-level chunking.

C.Semantic chunking using document section headers.

D.Fixed-size character chunking with overlap.

AnswerA

Sentence boundary chunking ensures each chunk contains complete sentences, improving coherence.

Why this answer

Option B is correct because sentence-based chunking preserves semantic boundaries, avoiding mid-sentence truncation. Option A is wrong because fixed-size chunks often cut sentences. Option C is wrong because paragraph-level may be too large.

Option D is wrong because token-level is too fine-grained and loses context.

Practice this question →

MCQhard

Refer to the exhibit. A developer has set this policy to allow an OCI Data Science session to generate embeddings. However, the API call returns a 403 Forbidden. Which of the following is likely missing?

A.The policy needs a 'where request.region != ...' condition

B.The policy should include 'in tenancy' instead of compartment

C.The service requires 'manage' permission instead of 'use'

D.The dynamic group does not include the Data Science session

AnswerD

The session must be matched by a rule in the dynamic group for the policy to apply.

Why this answer

The policy is correctly written but it applies to the dynamic group 'RAGGroup'. If the Data Science session is not a member of that dynamic group, the policy has no effect.

Practice this question →

MCQeasy

A company has implemented a RAG-based chatbot using OCI Generative AI and OCI OpenSearch as the vector store. The chatbot answers questions about internal policies. The team uses a dense vector embedding model with 768 dimensions and the HNSW algorithm. The corpus contains 5 million documents. Users report that the chatbot takes 8-12 seconds to respond, and the answers are often not relevant, missing key policy details. Upon investigation, the team finds that the k-NN search returns results based solely on vector similarity, ignoring exact keyword matches that are critical for policy documents. Which course of action will most effectively improve both response time and relevance?

A.Implement hybrid search using a combination of match (keyword) and k-NN (vector) queries with boosting.

B.Increase the number of OpenSearch data nodes to 5 and use higher-memory instances.

C.Reduce the ef_search parameter to 100 and retrain the embedding model on domain-specific data.

D.Switch to OCI Generative AI's built-in vector store instead of OpenSearch.

AnswerA

Hybrid search enhances relevance by integrating keyword and semantic matching, and pre-filtering can reduce latency.

Why this answer

Hybrid search combines keyword and vector queries, improving relevance by including exact matches. It can also reduce the search space by filtering on keywords, thereby reducing latency. Increasing nodes (A) only addresses speed.

Reducing ef_search (C) may speed up but can reduce recall and does not fix relevance. Using OCI GenAI's built-in vector store (D) is not guaranteed to improve either.

Practice this question →

MCQmedium

A company is deploying a RAG system for internal document search using OCI OpenSearch as the vector store. Users report that queries about recent policy changes return no results, even though the new policies were ingested. Which configuration is most likely missing?

A.The query should use a hybrid search combining keyword and vector.

B.The embeddings must be normalized before indexing.

C.The vector search index must have a refresh interval set to immediate.

D.The ingestion pipeline should use a text-splitting chunker.

AnswerC

Without immediate refresh, new documents may not be visible in search results.

Why this answer

Option A is correct because if the vector search index's refresh interval is not set to immediate, new documents may not be immediately searchable. Option B is wrong because chunking is for ingestion, not search availability. Option C is about hybrid search, which improves relevance but not availability.

Option D is not required for basic search functionality.

Practice this question →

Multi-Selecthard

A troubleshooting scenario: A RAG system returns no results for certain queries. The index exists and has documents. Which TWO are likely causes?

Select 2 answers

A.The search algorithm is set to exact (brute-force) and index is small

B.The query embedding dimension does not match the index dimension

C.The query is too long for the embedding model

D.The index has not been refreshed after adding new documents

E.The embedding model used for indexing is different from the one used for query

AnswersB, E

Dimension mismatch causes search errors or zero results.

Why this answer

Mismatched embedding models or dimension differences prevent correct searches. Query length is not an issue (model truncates). Exact search still returns results.

Index refresh may be delayed but unlikely for no results.

Practice this question →

MCQmedium

During a RAG implementation, the response quality degrades because the LLM receives too many irrelevant document chunks. Which technique can best filter out irrelevant chunks before sending them to the LLM?

A.Use a larger LLM for generation, hoping it ignores irrelevant chunks.

B.Reduce the top-k retrieval count.

C.Implement a reranking step using a cross-encoder model.

D.Increase the chunk size to provide more context.

AnswerC

Reranking with a cross-encoder (e.g., Cohere rerank) reorders chunks by relevance to the query, filtering out irrelevant ones.

Why this answer

Option D is correct because reranking with a cross-encoder is a common post-retrieval step to improve relevance. Option A is wrong because increasing chunk size may include more noise. Option B is wrong because using a larger LLM does not filter irrelevant chunks.

Option C is wrong because reducing top-k lowers chance of including relevant ones too.

Practice this question →

MCQmedium

You are a data scientist at a legal firm. The firm uses OCR to digitize court documents and then indexes them in OCI OpenSearch for a RAG application. The application uses OCI Generative AI Service (Cohere Command) to answer questions about case law. Recently, the team noticed that the answers are often factually incorrect or include information not present in the retrieved documents. After reviewing the pipeline, you find that the chunking strategy splits documents into 512-token chunks with 128-token overlap. The embedding model is Cohere Embed v3 (English), and the retrieval returns the top 5 chunks. The LLM has a context window of 4096 tokens. The team suspects that the chunking strategy is causing loss of context. What is the best course of action to improve answer accuracy?

A.Increase the chunk size to 1024 tokens and overlap to 256 tokens.

B.Reduce the chunk overlap to 64 tokens to avoid redundancy.

C.Switch to a smaller LLM with a larger context window.

D.Increase the number of retrieved chunks from 5 to 10.

AnswerA

Larger chunks with more overlap preserve context better.

Why this answer

Increasing the chunk size to 1024 tokens and overlap to 256 tokens directly addresses the loss of context by ensuring each chunk contains more complete semantic units (e.g., entire paragraphs or legal arguments) while the larger overlap preserves continuity across chunk boundaries. This improves the quality of the embeddings and the relevance of retrieved chunks, leading to more factually accurate answers from the LLM.

Exam trap

The trap here is that candidates may assume increasing retrieval count (Option D) always improves accuracy, but in RAG systems, more chunks often introduce noise and dilute relevant context, whereas fixing the chunking strategy directly addresses the root cause of context loss.

How to eliminate wrong answers

Option B is wrong because reducing the overlap to 64 tokens would further fragment context, increasing the risk of missing critical information at chunk boundaries and worsening the factual inaccuracies. Option C is wrong because switching to a smaller LLM with a larger context window does not fix the root cause—poor chunking—and a smaller model may have lower reasoning capability, potentially degrading answer quality. Option D is wrong because increasing the number of retrieved chunks from 5 to 10 would introduce more noise and irrelevant content into the LLM's context, likely amplifying hallucinations rather than improving accuracy.

Practice this question →

MCQmedium

An OCI CLI command above returns embeddings for the phrase 'Hello world'. The developer notices that the embedding vector length is 384 dimensions. However, they expected 768 dimensions. What is the most likely cause?

A.The input text 'Hello world' is too short, causing dimension reduction.

B.The CLI result is truncated in the display.

C.The model 'cohere.embed-multilingual-light-v3.0' outputs 384-dimensional vectors.

D.The --truncate END flag reduces the dimension.

AnswerC

This specific model produces 384 dimensions; the 'light' version is smaller.

Why this answer

Option B is correct because cohere.embed-multilingual-light-v3.0 outputs 384-dimensional embeddings by default, while the 'v3' version outputs 1024. Option A is wrong because the flag does not affect dimension. Option C is wrong because truncate mode does not change dimension.

Option D is wrong because input length is irrelevant.

Practice this question →

Multi-Selecthard

Which THREE factors directly influence the quality of responses in a RAG system? (Choose three.)

Select 3 answers

A.The prompt template used to ask the LLM

B.The chunk size used during document processing

C.The temperature parameter of the LLM

D.The number of GPUs allocated to the LLM

E.The choice of embedding model

AnswersA, B, E

A well-structured prompt helps the LLM use the context properly.

Why this answer

The choice of embedding model affects how well semantics are captured, chunk size determines granularity of retrieval, and prompt engineering guides the LLM to use context effectively.

Practice this question →

MCQeasy

A developer is using OCI Data Science to create a RAG pipeline. They have ingested documents into a vector store using OCI Generative AI's text-embedding model. During testing, they notice that queries return very few results (often 0 or 1) even when the knowledge base contains relevant documents. They have set the top-k parameter to 10. What is the most likely cause?

A.The similarity threshold is set too high, filtering out most results.

B.The documents were chunked with too small a chunk size, losing key information.

C.The embedding model's dimensionality is too low to capture semantic differences.

D.The vector search index is not configured with the correct distance metric.

AnswerA

A threshold that is too strict reduces the number of retrieved chunks.

Why this answer

Option B is correct because a high similarity threshold (e.g., >0.9) can exclude many relevant results. Option A: dimensionality is fixed by the model. Option C: distance metric affects ranking but not count.

Option D: chunk size may affect quality but not count.

Practice this question →

MCQhard

An application mixes RAG with other data sources. The vector search returns too many irrelevant chunks. What is the best approach to filter them?

A.Use a reranker model

B.Use exact search instead of ANN

C.Reduce the number of retrieved chunks

D.Increase chunk size

AnswerA

A reranker scores retrieved chunks by relevance, filtering out irrelevant ones.

Why this answer

A reranker model (Option A) is the best approach because it takes the initial set of retrieved chunks and re-orders them based on semantic relevance to the query, effectively filtering out irrelevant chunks. Unlike simple vector similarity, a reranker uses cross-encoding to evaluate the query-chunk pair as a whole, which significantly improves precision when mixing RAG with other data sources.

Exam trap

Oracle often tests the misconception that reducing the number of retrieved chunks (Option C) is a valid filter, but the trap is that this only limits output size without improving relevance—reranking is the correct technique to reorder and discard irrelevant results.

How to eliminate wrong answers

Option B is wrong because exact search (e.g., brute-force k-NN) retrieves the same chunks as ANN but without approximation; it does not filter irrelevant chunks—it only guarantees the true nearest neighbors, which may still be irrelevant if the vector representation is poor. Option C is wrong because reducing the number of retrieved chunks (e.g., lowering top_k) risks missing relevant chunks and does not address the core problem of irrelevant chunks being ranked too high. Option D is wrong because increasing chunk size makes each chunk more likely to contain irrelevant content, potentially worsening the problem by diluting relevant information with noise.

Practice this question →

MCQeasy

A developer wants to implement a simple RAG pipeline using OCI Language's text generation and embedding models. Which OCI SDK method is used to generate embeddings for a text chunk?

A.embed_text

B.generate_embeddings

C.encode_text

D.create_embedding

AnswerA

`embed_text` is the correct method to call for generating embeddings from text.

Why this answer

The OCI Python SDK for AI Language provides the `embed_text` method to generate embeddings. Other names like `create_embedding` or `generate_embeddings` are not standard in OCI SDK.

Practice this question →

Multi-Selectmedium

Which TWO of the following are best practices when indexing documents for a RAG application using OCI OpenSearch?

Select 2 answers

A.Use the same chunk size for all documents regardless of content type.

B.Apply stemming to reduce vocabulary size.

C.Enable chunk overlap to avoid splitting important information across chunks.

D.Remove all stop words from the text before embedding.

E.Store metadata (e.g., source URL, page number) alongside the vector.

AnswersC, E

Overlap ensures that boundaries don't cut off context, improving retrieval.

Why this answer

Options A and D are correct. Using overlapping chunks prevents information loss at boundaries (A). Storing metadata like source document helps with traceability (D).

Option B is wrong because stop words removal may harm retrieval for queries containing common words. Option C is wrong because stemming can lose precision in semantic search when using embeddings.

Practice this question →

MCQhard

An IAM policy is shown in the exhibit. A user reports that they cannot call the OCI GenAI embedding API, but they can use OCI AI Language. Which policy statement is missing to allow embedding API access?

A.The policy needs 'use' action on 'oci-generative-ai-family'

B.The policy needs 'inspect' on 'oci-generative-ai-endpoint'

C.The policy needs 'manage' action on 'oci-generative-ai-family'

D.The policy needs 'inspect' on 'oci-ai-language-family'

AnswerA

Missing 'use' permission for GenAI.

Why this answer

The embedding API requires the 'use' permission on the 'oci-generative-ai-family'. The first statement only grants 'inspect', which allows listing but not using. The second grants 'use' on AI Language, not GenAI.

Adding 'use' action to the first statement would fix the issue.

Practice this question →

Multi-Selectmedium

Which TWO of the following are best practices when implementing a RAG application using OCI OpenSearch as a vector store?

Select 2 answers

A.Use a large embedding dimension (e.g., 1536) to improve accuracy.

B.Set index.number_of_replicas to 0 to speed up indexing.

C.Enable approximate nearest neighbor (ANN) search for large datasets.

D.Store the embedding vectors in the _source field to simplify retrieval.

E.Use cosine similarity as the distance metric for vector comparison.

AnswersC, E

ANN search significantly reduces query latency for large vector collections.

Why this answer

Cosine similarity (A) is the recommended distance metric for text embeddings. ANN search (E) is essential for scaling to large datasets. Storing embeddings in _source (B) is unnecessary and increases index size.

Larger dimensions (C) can degrade performance without guaranteed accuracy improvement. Setting replicas to 0 (D) risks data loss and is not production-ready.

Practice this question →

MCQeasy

A developer is building a RAG chatbot for an internal knowledge base. To ensure the system retrieves the most relevant chunks, what is the best practice for chunking?

A.Use very small chunks

B.Use semantic chunking with overlap

C.Use fixed-size chunks without overlap

D.Use random-sized chunks

AnswerB

Semantic chunking preserves natural boundaries, and overlap provides context continuity.

Why this answer

Semantic chunking with overlap ensures that context is preserved and retrieval is more accurate by avoiding splits in meaningful content.

Practice this question →

MCQeasy

The architecture shown in the exhibit is missing a critical component for a RAG pipeline. What step is missing between receiving the user query and searching the vector store?

A.A document chunking step

B.A query embedding step using an embedding model

C.A data masking step for privacy

D.A reranker step after retrieval

AnswerB

The query must be embedded for vector search.

Why this answer

In a RAG pipeline, the user query must be converted into a vector embedding before searching the vector store. The architecture directly passes the query to OpenSearch without embedding. OCI Functions likely performs orchestration but does not automatically embed the query.

Adding a call to an embedding model (e.g., Cohere Embed) is necessary.

Practice this question →

MCQhard

A developer uses OCI Generative AI with a custom OCI OpenSearch vector store. The text generation model sometimes hallucinates facts not in the retrieved documents. What is the most effective mitigation?

A.Use a larger retrieval chunk size

B.Increase the temperature

C.Use prompt engineering to instruct the model to stick to the provided context

D.Decrease the maximum token length

AnswerC

Explicitly instructing the model to base answers only on the given context reduces hallucination.

Why this answer

Prompt engineering with clear instructions to use only the provided context is a direct and effective way to reduce hallucination.

Practice this question →

MCQhard

A company uses OCI Data Science to fine-tune an embedding model for a specialized domain. After fine-tuning, the model produces embeddings that are not aligned with the vector index used in OCI OpenSearch. What is the most likely cause?

A.The fine-tuning process modified the model architecture

B.The embedding dimension changed after fine-tuning

C.The fine-tuning dataset was too small

D.The vector index was built using a different distance metric than used during fine-tuning

AnswerB

If fine-tuning added/removed layers or changed the output size, the embedding dimension differs, causing index incompatibility.

Why this answer

Fine-tuning may change the output dimension (e.g., if the model head is modified), causing dimension mismatch which makes existing vector indices unusable. Distance metric mismatch is less common as it's usually fixed during indexing. Architecture changes are unlikely for minor fine-tuning.

Small dataset affects quality, not dimension.

Practice this question →

MCQmedium

An enterprise is deploying a RAG application for compliance document analysis using OCI. They use OCI OpenSearch as the vector store and have millions of documents. Retrieval latency is critical. Currently, a single query takes over 2 seconds. The index uses a flat (brute-force) distance computation. They have considered using approximate nearest neighbor (ANN) algorithms but are unsure about the impact on recall. They need to reduce latency to under 500ms while maintaining high recall. What should they do?

A.Use a smaller embedding dimension by truncating the existing embeddings.

B.Reduce the number of shards in the OpenSearch index to improve parallelism.

C.Switch to an HNSW algorithm with an appropriate M and ef_search parameters.

D.Increase the top-k parameter to retrieve more candidates then filter.

AnswerC

HNSW provides sub-linear search time with good recall.

Why this answer

Option C is correct because switching to HNSW with appropriate parameters provides fast approximate search with configurable recall. Option A (reducing shards) may not achieve the required latency reduction. Option B (reducing dimensions) can degrade embedding quality.

Option D (increasing top-k) would increase latency.

Practice this question →

MCQeasy

What is the primary purpose of chunking documents in a RAG pipeline?

A.To improve embedding quality

B.To speed up training

C.To reduce storage costs

D.To ensure each chunk fits within the model's context window

AnswerD

Models have token limits; chunking prevents truncation.

Why this answer

Chunking ensures that each text segment fits within the input token limit of the embedding model and the LLM context window. While it may also help retrieval granularity, the primary reason is to meet model constraints.

Practice this question →

Multi-Selectmedium

Which TWO are required components to implement a basic RAG system using OCI services? (Choose two.)

Select 2 answers

A.OCI Object Storage

B.OCI Functions

C.OCI Data Flow

D.OCI Search with OpenSearch

E.OCI Document Understanding

AnswersD, E

Required as the vector database for similarity search.

Why this answer

A RAG system needs a way to parse documents into chunks (OCI Document Understanding) and a vector store to index and search embeddings (OCI Search with OpenSearch).

Practice this question →

MCQeasy

Refer to the exhibit. Why did the embedding creation fail?

A.The input text is too short

B.The API call was not properly authenticated

C.The model ID is not available in the us-ashburn-1 region

D.The region is not enabled for the Generative AI service

AnswerB

The MissingAuthenticationError indicates no credentials were provided.

Why this answer

The error 'MissingAuthenticationError' clearly indicates that the API call lacks authentication credentials, which is required for OCI API calls.

Practice this question →