In this episode, we discuss Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions by Yu-Guan Hsieh, Cheng-Yu Hsieh, Shih-Ying Yeh, Louis Béthune, Hadi Pour Ansari, Pavan Kumar Anasosalu Vasu, Chun-Liang Li, Ranjay Krishna, Oncel Tuzel, Marco Cuturi. The paper introduces a new annotation strategy termed graph-based captioning (GBC) that uses labelled graph structures to describe images more richly than plain text. GBC combines object detection and dense captioning to create a hierarchical graph of nodes and edges detailing entities and their relationships. The authors demonstrate the effectiveness of GBC by creating a large dataset, GBC10M, which significantly improves performance in vision-language models and propose a novel attention mechanism to utilize the graph’s structure for further benefits.
arxiv preprint – Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by
Tags: