arxiv – Visual In-Context Prompting

In this episode, we discuss Visual In-Context Prompting by Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao. This paper introduces a new framework for improving zero-shot learning capabilities in vision tasks called universal visual in-context prompting, which works by allowing an encoding-decoding architecture to utilize various types of prompts like strokes, boxes, and points, as well as reference image segments as context. Unlike existing methods, which are limited to referring segmentation, the framework extends to a broader range of tasks including open-set segmentation and detection. The authors demonstrate notable performance enhancements, with the proposed method achieving competitive results on close-set in-domain datasets like COCO and promising outcomes on open-set datasets such as ADE20K, with planned code release on GitHub.


Posted

in

by

Tags: