arxiv Preprint - LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

In this episode we discuss LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
by Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai. The paper introduces a new approach called LLM-Grounder for grounding 3D visual scenes using natural language queries. It utilizes a Large Language Model (LLM) to break down complex queries and a visual grounding tool to identify objects in the scene. The method does not require labeled training data and achieves state-of-the-art accuracy on the ScanRefer benchmark.

arxiv Preprint – LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent