arxiv preprint - Retentive Network: A Successor to Transformer for Large Language Models

In this episode we discuss Retentive Network: A Successor to Transformer for Large Language Models
by Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei. The paper introduces RETNET as a successor to the Transformer architecture for language models. RETNET utilizes a retention mechanism that supports parallel, recurrent, and chunkwise recurrent computation paradigms for efficient training and inference. Experimental results show that RETNET achieves favorable scaling, parallel training, low-cost deployment, and efficient inference, making it a promising candidate for large language models.

arxiv preprint – Retentive Network: A Successor to Transformer for Large Language Models