In this episode, we discuss Long Story Short: a Summarize-then-Search Method for Long Video Question Answering by Jiwan Chung, Youngjae Yu. The paper presents “Long Story Short,” a new framework for video question-answering (QA) tasks that involves summarizing long multimodal narratives (like movies or dramas) into brief plots. This summary is then used to find video segments pertinent to specific questions. The paper also introduces an enhancement called CLIPCheck for improved visual matching, and their model significantly surpasses existing supervised models in performance, demonstrating the effectiveness of zero-shot QA for lengthy video content.
arxiv preprint – Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
by
Tags: