arxiv Preprint - Enable Language Models to Implicitly Learn Self-Improvement From Data

In this episode we discuss Enable Language Models to Implicitly Learn Self-Improvement From Data
by Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji. The paper introduces a framework called ImPlicit Self-ImprovemenT (PIT) that allows large language models (LLMs) to learn self-improvement from data. PIT learns the improvement goal from human preference data without requiring explicit rubrics, making it more efficient and effective compared to previous approaches that rely on explicit inputs. Experimental results show that PIT outperforms prompting-based methods in enhancing LLM performance.

arxiv Preprint – Enable Language Models to Implicitly Learn Self-Improvement From Data