Arxiv paper – HD-EPIC: A Highly-Detailed Egocentric Video Dataset


In this episode, we discuss HD-EPIC: A Highly-Detailed Egocentric Video Dataset by Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, Dima Damen. The paper introduces HD-EPIC, a 41-hour dataset of egocentric kitchen videos collected from diverse home environments and meticulously annotated with detailed 3D-grounded labels, including recipe steps, actions, ingredients, and audio events. It features a challenging visual question answering benchmark with 26,000 questions, where current models like Gemini Pro achieve only 38.5% accuracy, underscoring the dataset’s complexity and the limitations of existing vision-language models. Additionally, HD-EPIC supports various tasks such as action recognition and video-object segmentation, providing a valuable resource for enhancing real-world kitchen scenario understanding.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *