arxiv - AVIS: Autonomous Visual Information Seeking

In this episode we discuss AVIS: Autonomous Visual Information Seeking
by The author’s name cannot be determined from the snippet provided as it only includes the title of the paper.. The paper introduces AVIS, an autonomous visual question-answering framework that utilizes a Large Language Model to strategically utilize external tools and provide answers to visual questions that require external knowledge. The framework includes a planner, reasoner, and working memory component that work together to analyze and extract key information from external tools. The collected user behavior serves as a guide for the system to enhance its decision-making capacity. AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks.

arxiv – AVIS: Autonomous Visual Information Seeking