A company is deploying a large language model for a customer service chatbot. The model needs to understand industry-specific jargon and maintain low latency. Which approach best balances these requirements?
Trap 1: Employ retrieval-augmented generation (RAG) with a general model
RAG helps with facts but does not deeply embed jargon into model behavior.
Trap 2: Rely solely on prompt engineering with a general model
Prompt engineering may not suffice for consistent understanding of specialized terms.
Trap 3: Use a large general-purpose LLM with zero-shot prompting
Large models have higher latency and may still miss niche jargon.
- A
Employ retrieval-augmented generation (RAG) with a general model
Why wrong: RAG helps with facts but does not deeply embed jargon into model behavior.
- B
Rely solely on prompt engineering with a general model
Why wrong: Prompt engineering may not suffice for consistent understanding of specialized terms.
- C
Use a large general-purpose LLM with zero-shot prompting
Why wrong: Large models have higher latency and may still miss niche jargon.
- D
Fine-tune a small open-source LLM on domain-specific data
Fine-tuning adapts the model to jargon and a smaller model keeps latency low.