CVPR 2023 - PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

In this episode, we discuss “PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization” by Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen from Microsoft and University of Central Florida. It introduces a novel approach to address the problem of localizing actions in untrimmed videos with only video-level supervision. Existing methods rely on classifying individual frames and post-processing to aggregate predictions, but this often leads to incomplete localization. PivoTAL takes a different approach by directly learning to localize action snippets, leveraging spatio-temporal regularities in videos through action-specific scene prior, action snippet generation prior, and a learnable Gaussian prior. The proposed method, evaluated on benchmark datasets, demonstrates a significant improvement (at least 3% avg mAP) compared to existing methods. The results highlight the effectiveness of the prior-driven supervision approach in weakly-supervised temporal action localization.

CVPR 2023 – PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization