ArXiv Preprint – Zephyr: Direct Distillation of LM Alignment


In this episode we discuss Zephyr: Direct Distillation of LM Alignment
by Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf. The paper introduces ZEPHYR, a language model that focuses on aligning with user intent to improve task accuracy. The authors employ distilled supervised fine-tuning (dSFT) on larger models but note the lack of alignment with natural prompts. To address this, the authors experiment with preference data from AI Feedback (AIF) and use distilled direct preference optimization (dDPO) to enhance intent alignment. Their approach, requiring only a few hours of training, achieves state-of-the-art performance on chat benchmarks without human annotation, surpassing the performance of the best RLHF-based model LLAMA2-CHAT-70B on MT-Bench.


Posted

in

by

Tags: