CVPR 2023 – Detecting and Grounding Multi-Modal Media Manipulation


In this episode we discuss Detecting and Grounding Multi-Modal Media Manipulation
by Rui Shao, Tianxing Wu, Ziwei Liu. This paper discusses a new research problem for detecting and grounding multi-modal media manipulation, which requires deeper reasoning across different modalities. The authors propose a new dataset and a novel model called HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities. Dedicated manipulation detection and grounding heads are integrated from shallow to deep levels based on the interacted multi-modal information. The authors conduct comprehensive experiments and set up rigorous evaluation metrics, demonstrating the superiority of their model and revealing valuable observations to facilitate future research in multi-modal media manipulation.


Posted

in

by

Tags: