arxiv Preprint - DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

In this episode we discuss DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
by Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Leon Song, Samyam Rajbhandari, Yuxiong He. DeepSpeed-Ulysses is a methodology for efficient and scalable training of large language models with long sequence lengths. It addresses the limitations of existing sequence parallelism approaches by partitioning input data and using efficient all-to-all collective communication for attention computation. Experimental evaluations show that DeepSpeed-Ulysses trains 2.5 times faster with sequence lengths four times longer than existing methods, highlighting its importance for generative AI and AI for science.

arxiv Preprint – DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models