arxiv preprint – Phantom of Latent for Large Language and Vision Models

In this episode, we discuss Phantom of Latent for Large Language and Vision Models by Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro. The paper introduces Phantom, an efficient LLVM family designed to perform comparably to larger models but with significantly smaller sizes, ranging from 0.5B to 7B parameters. By temporarily increasing the latent hidden dimension during multi-head self-attention, Phantom enhances learning capabilities without a substantial increase in model size. Phantom Optimization (PO) combines autoregressive supervised fine-tuning and a direct preference optimization-like concept, resulting in state-of-the-art performance against larger LLVMs.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *