In this episode we discuss InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
by Yulu Gan, Sungwoo Park, Alexander Schubert, Anthony Philippakis, Ahmed M. Alaa. The paper proposes a unified language interface for computer vision tasks that allows for task execution through natural language instructions. The approach involves training a text-to-image diffusion model using a multi-modal and multi-task training dataset created through paraphrasing prompt templates. Experimental results show that the model, called InstructCV, performs competitively compared to other vision models and exhibits strong generalization capabilities.
arxiv Preprint – InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
by
Tags: