In this episode, we discuss JPEG-LM: LLMs as Image Generators with Canonical Codec Representations by Xiaochuang Han, Marjan Ghazvininejad, Pang Wei Koh, Yulia Tsvetkov. The paper introduces a novel approach for image and video generation by modeling them as compressed files using standard codecs like JPEG and AVC/H.264. Instead of pixel-based or vector quantization methods, the authors employ the Llama architecture to directly output the compressed bytes, showing improved performance and simplicity. This method achieves a significant reduction in FID and excels in generating long-tail visual elements, highlighting its potential for seamless integration into multimodal systems.
arxiv preprint – JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
by
Tags: