An application uses ConversationalRetrievalChain with a vector store retriever. Users report that the chatbot sometimes provides answers that are not grounded in the retrieved documents. Which step in the RAG pipeline is most likely the cause?
The prompt should explicitly constrain the LLM to answer only from the retrieved documents; otherwise, the LLM may use its internal knowledge, leading to ungrounded answers.
Why this answer
Option C is correct because the ConversationalRetrievalChain in LangChain relies on the LLM prompt to instruct the model to base its answer solely on the provided context. If the prompt does not include such an instruction, the LLM may generate answers using its pre-trained knowledge rather than the retrieved documents, leading to ungrounded responses. This is a common oversight in RAG pipeline design where the prompt template fails to enforce context-only generation.
Exam trap
Cisco often tests the misconception that retrieval quality (chunk size, embeddings, or document relevance) is the primary cause of ungrounded answers, when in fact the prompt instruction to the LLM is the critical control point in the RAG pipeline.
How to eliminate wrong answers
Option A is wrong because chunk_size affects the granularity of document splitting and retrieval relevance, but it does not directly cause the LLM to ignore retrieved context; a too-large chunk may reduce precision but still provides context. Option B is wrong because embedding model compatibility with the retriever affects retrieval quality, not the LLM's adherence to provided context; incompatible embeddings would cause poor retrieval, not ungrounded answers from the LLM. Option D is wrong because irrelevant documents from the retriever would lead to answers based on wrong context, but the core issue of the LLM not grounding its answer in the provided context is a prompt-level failure, not a retrieval failure.