arxiv preprint - Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

In this episode, we discuss Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think by Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe. The study identifies and corrects a flaw in the inference pipeline of large diffusion models used for monocular depth estimation, achieving over 200× speed improvement without compromising accuracy. By end-to-end fine-tuning with task-specific losses, the researchers attain a deterministic model that surpasses all other diffusion-based depth and normal estimation models on zero-shot benchmarks. Moreover, applying this fine-tuning protocol to Stable Diffusion models yields performance comparable to state-of-the-art, challenging prior conclusions in the field.

arxiv preprint – Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think