arxiv Preprint – LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models


In this episode we discuss LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
by Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. The paper introduces LM-Infinite as a solution to the challenge of length generalization in Large Language Models (LLMs). Existing methods for handling longer sequences are resource-intensive and time-consuming, prompting the need for a simpler approach. LM-Infinite employs a Λ-shaped attention mask and a distance limit, requiring no parameter updates or learning. The technique consistently generates fluent texts and performs well on downstream tasks even with input sequences exceeding training lengths.


Posted

in

by

Tags: