ArXiv Preprint – De-Diffusion Makes Text a Strong Cross-Modal Interface


In this episode we discuss De-Diffusion Makes Text a Strong Cross-Modal Interface
by Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu. The paper introduces De-Diffusion, a new approach that uses text to represent images. An autoencoder is used to transform an image into text, which can be reconstructed back into the original image using a pre-trained text-to-image diffusion model. The De-Diffusion text representation of images is shown to be accurate and comprehensive, making it compatible with various multi-modal tasks and achieving state-of-the-art performance on vision-language tasks.


Posted

in

by

Tags: