Category: Uncategorized

  • CVPR 2023 – Improving GAN Training via Feature Space Shrinkage

    In this episode we discuss Improving GAN Training via Feature Space Shrinkage by Haozhe Liu, Wentian Zhang, Bing Li, Haoqian Wu, Nanjun He, Yawen Huang, Yuexiang Li, Bernard Ghanem, Yefeng Zheng. The paper proposes a new method, called AdaptiveMix, for training Generative Adversarial Networks (GANs) from a robust image classification perspective. The proposed method shrinks…

  • CVPR 2023 – CRAFT: Concept Recursive Activation FacTorization for Explainability

    In this episode we discuss CRAFT: Concept Recursive Activation FacTorization for Explainability by Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, Thomas Serre. The paper introduces a new approach called CRAFT to identify “what” and “where” a model looks at in an image. The approach generates concept-based explanations and…

  • CVPR 2023 – gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction

    In this episode we discuss gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction by Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev. The paper presents a method for reconstructing 3D shapes of hands and manipulated objects from monocular RGB images using signed distance functions (SDFs) as a framework. The authors exploit the hand structure…

  • CVPR 2023 – The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning

    In this episode we discuss The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning by Joshua C. Zhao, Ahmed Roushdy Elkordy, Atul Sharma, Yahya H. Ezzeldin, Salman Avestimehr, Saurabh Bagchi. The paper discusses the usage of secure aggregation in federated learning, which promises to maintain privacy by only allowing the server access…

  • CVPR 2023 – Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

    In this episode we discuss Spatiotemporal Self-supervised Learning for Point Clouds in the Wild by Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann. The paper discusses a new self-supervised learning strategy for semantic segmentation of point clouds that leverages positive pairs in both the spatial and temporal domain. The authors designed a point-to-cluster…

  • CVPR 2023 – Masked Image Modeling with Local Multi-Scale Reconstruction

    In this episode we discuss Masked Image Modeling with Local Multi-Scale Reconstruction by Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han. The paper proposes a new self-supervised representation learning approach called Masked Image Modeling (MIM) that achieves outstanding success, but with a huge computational burden and slow learning process. To address…

  • CVPR 2023 – CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes

    In this episode we discuss CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes by Harshil Bhatia, Edith Tretschk, Zorah Lähner, Marcel Seelbach Benkner, Michael Moeller, Christian Theobalt, Vladislav Golyanik. This paper proposes a quantum-hybrid approach for the challenging problem of jointly matching multiple, non-rigidly deformed 3D shapes, which is NP-hard. The approach is cycle-consistent and iterative,…

  • CVPR 2023 – Contrastive Mean Teacher for Domain Adaptive Object Detectors

    In this episode we discuss Contrastive Mean Teacher for Domain Adaptive Object Detectors by Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang. The paper proposes a unified framework called Contrastive Mean Teacher (CMT) that integrates mean-teacher self-training and contrastive learning to overcome the domain gap in object detection. CMT extracts object-level features using low-quality pseudo-labels…

  • ICLR 2023 – F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

    In this episode we discuss F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models by Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova. I’m sorry, there is no abstract provided for me to summarize. Could you please provide the abstract or more information about the paper you would like me to summarize?

  • CVPR 2023 – ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

    In this episode we discuss ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding by Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese. The paper introduces ULIP, a framework that learns a unified representation of images, texts, and 3D…

  • CVPR 2023 – Masked Autoencoding Does Not Help Natural Language Supervision at Scale

    In this episode we discuss Masked Autoencoding Does Not Help Natural Language Supervision at Scale by Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter. The paper explores the effectiveness of combining self-supervision and natural language supervision for training general purpose image encoders. While recent works have shown promising results with small pre-training datasets,…

  • CVPR 2023 – Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

    In this episode we discuss Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos by Sixun Dong, Huazhang Hu, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao. The paper proposes a weakly supervised approach for sequential video understanding, where time-stamp level text-video alignment is not provided. The proposed method uses a transformer to…

  • CVPR 2023 – Attribute-preserving Face Dataset Anonymization via Latent Code Optimization

    In this episode we discuss Attribute-preserving Face Dataset Anonymization via Latent Code Optimization by Simone Barattin, Christos Tzelepis, Ioannis Patras, Nicu Sebe. The paper presents a task-agnostic approach for anonymizing the identities of faces in a dataset of images while retaining the facial attributes necessary for downstream tasks. The proposed method optimizes the latent representation…

  • CVPR 2023 – ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos

    In this episode we discuss ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos by Zhou Yu, Lixiang Zheng, Zhou Zhao, Fei Wu, Jianping Fan, Kui Ren, Jun Yu. The paper discusses the challenge of building benchmarks for video question answering (VideoQA) models that can systematically analyze their capabilities. Existing benchmarks have limitations…

  • CVPR 2023 – Neuralizer: General Neuroimage Analysis without Re-Training

    In this episode we discuss Neuralizer: General Neuroimage Analysis without Re-Training by Steffen Czolbe, Adrian V. Dalca. The paper discusses the challenges in using deep learning for neuroimage processing tasks such as segmentation and registration. The authors introduce a new model called Neuralizer that can generalize to previously unseen tasks and modalities without the need…

  • CVPR 2023 – Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification

    In this episode we discuss Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification by Jiawei Feng, Ancong Wu, Wei-Shi Zheng. The paper proposes a new approach to address the challenging problem of visible-infrared person re-identification (VI-ReID) by learning diverse modality-shared semantic concepts. The proposed method aims to force the ReID model to extract more and different…

  • CVPR 2023 – STMixer: A One-Stage Sparse Action Detector

    In this episode we discuss STMixer: A One-Stage Sparse Action Detector by Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang. The paper proposes a new one-stage sparse action detector called STMixer which is based on two core designs. The first design is a query-based adaptive feature sampling module that allows STMixer to mine…

  • CVPR 2023 – Balanced Spherical Grid for Egocentric View Synthesis

    In this episode we discuss Balanced Spherical Grid for Egocentric View Synthesis by Changwoon Choi, Sang Min Kim, Young Min Kim. The paper presents EgoNeRF, an efficient solution for reconstructing large-scale environments from a few seconds of 360 videos for virtual reality (VR) assets. The authors adopted a spherical coordinate parameterization instead of Cartesian coordinate…

  • CVPR 2023 – Train-Once-for-All Personalization

    In this episode we discuss Train-Once-for-All Personalization by Authors: – Hong-You Chen – Yandong Li – Yin Cui – Mingda Zhang – Wei-Lun Chao – Li Zhang Affiliations: – Hong-You Chen and Wei-Lun Chao are affiliated with The Ohio State University. – Yandong Li, Yin Cui, Mingda Zhang, and Li Zhang are affiliated with Google…

  • CVPR 2023 – MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

    In this episode we discuss MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation by Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo. The paper proposes a joint audio-video generation framework called Multi-Modal Diffusion (MM-Diffusion) that generates high-quality realistic videos with aligned…

  • CVPR 2023 – Robust Test-Time Adaptation in Dynamic Scenarios

    In this episode we discuss Robust Test-Time Adaptation in Dynamic Scenarios by Longhui Yuan, Binhui Xie, Shuang Li. The paper discusses the limitations of test-time adaptation (TTA) methods in dynamic scenarios where the test data is sampled gradually over time, and proposes a new method called Robust Test-Time Adaptation (RoTTA) to address these limitations. RoTTA…

  • CVPR 2023 – Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

    In this episode we discuss Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes by Zhen Li, Lingli Wang, Mofang Cheng, Cihui Pan, Jiaqi Yang. The paper proposes an efficient method for inverse rendering of large-scale real-world indoor scenes, which reconstructs global illumination and physically-plausible SVBRDFs. They introduce a new compact representation called Texture-based Lighting (TBL),…

  • CVPR 2023 – Jedi: Entropy-based Localization and Removal of Adversarial Patches

    In this episode we discuss Jedi: Entropy-based Localization and Removal of Adversarial Patches by Bilel Tarchoun, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Nael Abu-Ghazaleh, Ihsen Alouani. The paper proposes a new defense against adversarial patches that is resilient to realistic patch attacks, called Jedi. Jedi tackles the patch localization problem from an information theory perspective…

  • CVPR 2023 – Improving Generalization with Domain Convex Game

    In this episode we discuss Improving Generalization with Domain Convex Game by Fangrui Lv, Jian Liang, Shuang Li, Jinming Zhang, Di Liu. The paper explores the effectiveness of domain augmentation in domain generalization. The authors propose a new perspective on DG as a convex game between domains and design a regularization term based on supermodularity…

  • CVPR 2023 – Masked Motion Encoding for Self-Supervised Video Representation Learning

    In this episode we discuss Masked Motion Encoding for Self-Supervised Video Representation Learning by Xinyu Sun, Peihao Chen, Liangwei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan. The paper proposes a new pre-training paradigm called Masked Motion Encoding (MME) for learning discriminative video representation from unlabeled videos. The authors address the limitations of…