A chatbot developer uses a transformer-based model for customer service. Users complain that the chatbot sometimes gives offensive responses. Which technique should be applied first to mitigate this issue?
Cleaning training data addresses the root cause.
Why this answer
Option D is correct because the root cause of offensive responses in transformer-based models is typically biased or toxic language present in the training data. Reviewing and filtering the dataset to remove such content, followed by fine-tuning the model, directly addresses the source of the problem. This approach aligns with the principle of data-centric AI, where improving data quality is the first step before modifying model architecture or inference parameters.
Exam trap
CompTIA often tests the misconception that modifying inference parameters (like temperature) or adding post-processing classifiers can fix fundamental data quality issues, when in fact the first and most effective mitigation is to address the training data itself.
How to eliminate wrong answers
Option A is wrong because increasing model size does not inherently fix biased or offensive outputs; larger models can actually amplify existing biases in the training data due to increased capacity to memorize patterns. Option B is wrong because decreasing the temperature parameter makes outputs more deterministic (lower randomness) but does not prevent the model from generating offensive content that it has learned from the data; it only reduces creative variation, not toxicity. Option C is wrong because training a separate classifier to detect offensive outputs in real time is a reactive measure that adds latency and complexity, whereas the proactive first step should be to clean the training data; a classifier also cannot prevent the model from generating offensive content in the first place.