A team is building a generative AI model for customer support. They notice the model often produces overly polite but unhelpful responses. Which technique would best improve response quality without sacrificing helpfulness?
Trap 1: Increase the amount of training data
More data may not address the specific politeness issue.
Trap 2: Lower the top_k sampling value
Top_k controls diversity, not politeness.
Trap 3: Increase the temperature parameter
Higher temperature increases randomness, not helpfulness.
- A
Apply reinforcement learning from human feedback (RLHF)
RLHF tunes the model to align with desired response characteristics.
- B
Increase the amount of training data
Why wrong: More data may not address the specific politeness issue.
- C
Lower the top_k sampling value
Why wrong: Top_k controls diversity, not politeness.
- D
Increase the temperature parameter
Why wrong: Higher temperature increases randomness, not helpfulness.