arxiv preprint - Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

In this episode we discuss Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models
by Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su. This paper presents a method for generating customized images based on user specifications. The approach uses an encoder to capture high-level semantics of objects, enabling faster image generation. The acquired object embedding is then used in a text-to-image synthesis model, and different network designs and training strategies are explored to blend the object-aware embedding space with the text-to-image model. The paper demonstrates compelling output quality and appearance diversity, with the ability to produce diverse content and styles conditioned on texts and objects without the need for test-time optimization.

arxiv preprint – Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models