ICLR 2023 - Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

In this episode we discuss Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
by Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong. The paper proposes Visual Token Matching (VTM), a few-shot learning solution for arbitrary dense prediction tasks in computer vision. VTM uses non-parametric matching on patch-level embedded tokens of images and labels to handle different tasks. It incorporates a hierarchical encoder-decoder architecture with ViT backbones and performs token matching at multiple feature hierarchies. Experimental results demonstrate that VTM successfully learns various unseen dense prediction tasks, surpassing fully supervised baselines with just 10 labeled examples.

ICLR 2023 – Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching