Arxiv paper - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

In this episode, we discuss VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models by Hila Chefer, Uriel Singer, Amit Zohar, Yuval Kirstain, Adam Polyak, Yaniv Taigman, Lior Wolf, Shelly Sheynin. Generative video models typically prioritize appearance accuracy over motion coherence, limiting their ability to capture realistic dynamics. The paper presents VideoJAM, a framework that integrates a joint appearance-motion representation and uses an Inner-Guidance mechanism to enhance motion consistency during generation. VideoJAM achieves state-of-the-art motion realism and visual quality while being easily adaptable to existing video models without major changes.

Arxiv paper – VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models