ICCV 2023 - Adding Conditional Control to Text-to-Image Diffusion Models

In this episode we discuss Adding Conditional Control to Text-to-Image Diffusion Models
by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. The paper introduces ControlNet, a neural network architecture that enhances spatial control in text-to-image diffusion models. It incorporates additional images, such as edge maps and human pose skeletons, as conditioning factors to specify desired image composition. ControlNet utilizes pretrained encoding layers and gradually adjusts parameters for improved spatial control. Experimental results with different conditioning controls and datasets demonstrate the effectiveness of ControlNet, suggesting its potential to broaden the applications of image diffusion models.

ICCV 2023 – Adding Conditional Control to Text-to-Image Diffusion Models