In this episode, we discuss Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA by Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Tal Schuster. The paper presents methods to transform large language models into smaller, efficient “Recursive Transformers” by using parameter sharing through revisiting “layer tying”, which reduces model size and cost with minimal performance loss. By initializing these Recursive Transformers from standard pre-trained models and incorporating “Relaxed Recursive Transformers” with LoRA modules for flexibility, the models can recover most of the original performance while remaining compact. Additionally, a new inference paradigm called Continuous Depth-wise Batching with early exiting is introduced, aiming to enhance inference throughput significantly.
Arxiv Paper – Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
by
Tags:
Leave a Reply