arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

In this episode, we discuss Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? by Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig. The paper explores the effects of integrating new factual information into large language models (LLMs) during the fine-tuning phase, particularly focusing on how this affects their ability to retain and utilize pre-existing knowledge. It was found that LLMs struggle to learn new facts during fine-tuning, indicating a slower learning curve for new information compared to familiar content from their training data. Additionally, the study reveals that as LLMs incorporate new facts, they are more prone to generating factually incorrect or “hallucinated” responses, suggesting a trade-off between knowledge integration and accuracy.

arxiv preprint – Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?