arxiv Preprint - LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

In this episode we discuss LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
by Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. The paper introduces LM-Infinite as a solution to the challenge of length generalization in Large Language Models (LLMs). Existing methods for handling longer sequences are resource-intensive and time-consuming, prompting the need for a simpler approach. LM-Infinite employs a Λ-shaped attention mask and a distance limit, requiring no parameter updates or learning. The technique consistently generates fluent texts and performs well on downstream tasks even with input sequences exceeding training lengths.

arxiv Preprint – LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models