arxiv Preprint - PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

In this episode we discuss PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
by Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li. The paper presents a training method called PoSE for adapting large language models to longer context windows. It addresses the challenge of extending the context window of pre-trained models without disrupting performance. The method simulates long inputs using a fixed context window with manipulated position indices, reducing memory and time overhead while maintaining performance.

arxiv Preprint – PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training