CVPR 2023 – Improving Image Recognition by Retrieving from Web-Scale Image-Text Data


In this episode we discuss Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
by Ahmet Iscen
Alireza Fathi
Cordelia Schmid. The paper proposes a new attention-based memory module for retrieval augmented models that enhances recognition capabilities by retrieving similar examples for visual input from an external memory set. The method removes irrelevant retrieved examples and retains useful ones. The study demonstrates the benefits of using a massive-scale memory dataset of 1B image-text pairs and achieves state-of-the-art accuracies in three classification tasks. The paper also discusses challenges associated with scaling large transformer models and suggests using world knowledge to create a massive-scale index/memory for use with a small model for the given inference task.


Posted

in

by

Tags: