arxiv preprint – LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

In this episode, we discuss LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu. The paper presents SelfExtend, a novel method for extending the context window of Large Language Models (LLMs) to better handle long input sequences without the need for fine-tuning. SelfExtend incorporates bi-level attention mechanisms to manage dependencies between both distant and adjacent tokens, allowing LLMs to operate beyond their original training constraints. The method has been tested comprehensively, showing its effectiveness, and the code is shared for public use, addressing the key challenge of LLMs’ fixed sequence length limitations during inference.


Posted

in

by

Tags: