Category: Uncategorized
-
CVPR 2023 – FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
In this episode we discuss FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs by Luke Rowe, Martin Ethier, Eli-Henry Dykhne, Krzysztof Czarnecki. The paper proposes a framework called FJMP for generating a set of joint future trajectory predictions in multi-agent driving scenarios. FJMP models the future scene interaction dynamics using a…
-
CVPR 2023 – Unsupervised Continual Semantic Adaptation through Neural Rendering
In this episode we discuss Unsupervised Continual Semantic Adaptation through Neural Rendering by Zhizheng Liu, Francesco Milano, Jonas Frey, Roland Siegwart, Hermann Blum, Cesar Cadena. The paper proposes a method for continual multi-scene adaptation for semantic segmentation tasks, in which no ground-truth labels are available during deployment and performance on previous scenes must be maintained.…
-
CVPR 2023 – Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
In this episode we discuss Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars by Jingxiang Sun, Xuan Wang, Lizhen Wang, Xiaoyu Li, Yong Zhang, Hongwen Zhang, Yebin Liu. The paper proposes a novel 3D GAN framework for unsupervised learning of generative, high-quality, and 3D-consistent facial avatars from unstructured 2D images. The proposed framework combines…
-
CVPR 2023 – Catch Missing Details: Image Reconstruction with Frequency Augmented
In this episode we discuss Catch Missing Details: Image Reconstruction with Frequency Augmented by Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong. The paper proposes a new architecture called Frequency Augmented VAE (FA-VAE) to address the issue of rapid quality degradation in image reconstruction with popular VQ-VAE models as the compression rate increases.…
-
CVPR 2023 – Better “CMOS” Produces Clearer Images:
In this episode we discuss Better “CMOS” Produces Clearer Images: by Xuhai Chen, Jiangning Zhang, Chao Xu, Yabiao Wang, Chengjie Wang, Yong Liu. The paper discusses the problem of space-variant blur in blind image super-resolution methods, which severely affects their performance. To tackle this issue, the authors introduce two new datasets and design a Cross-MOdal…
-
CVPR 2023 – Detecting and Grounding Multi-Modal Media Manipulation
In this episode we discuss Detecting and Grounding Multi-Modal Media Manipulation by Rui Shao, Tianxing Wu, Ziwei Liu. This paper discusses a new research problem for detecting and grounding multi-modal media manipulation, which requires deeper reasoning across different modalities. The authors propose a new dataset and a novel model called HierArchical Multi-modal Manipulation rEasoning tRansformer…
-
CVPR 2023 – Improving GAN Training via Feature Space Shrinkage
In this episode we discuss Improving GAN Training via Feature Space Shrinkage by Haozhe Liu, Wentian Zhang, Bing Li, Haoqian Wu, Nanjun He, Yawen Huang, Yuexiang Li, Bernard Ghanem, Yefeng Zheng. The paper proposes a new method, called AdaptiveMix, for training Generative Adversarial Networks (GANs) from a robust image classification perspective. The proposed method shrinks…
-
CVPR 2023 – CRAFT: Concept Recursive Activation FacTorization for Explainability
In this episode we discuss CRAFT: Concept Recursive Activation FacTorization for Explainability by Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, Thomas Serre. The paper introduces a new approach called CRAFT to identify “what” and “where” a model looks at in an image. The approach generates concept-based explanations and…
-
CVPR 2023 – gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
In this episode we discuss gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction by Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev. The paper presents a method for reconstructing 3D shapes of hands and manipulated objects from monocular RGB images using signed distance functions (SDFs) as a framework. The authors exploit the hand structure…
-
CVPR 2023 – The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
In this episode we discuss The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning by Joshua C. Zhao, Ahmed Roushdy Elkordy, Atul Sharma, Yahya H. Ezzeldin, Salman Avestimehr, Saurabh Bagchi. The paper discusses the usage of secure aggregation in federated learning, which promises to maintain privacy by only allowing the server access…
-
CVPR 2023 – Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
In this episode we discuss Spatiotemporal Self-supervised Learning for Point Clouds in the Wild by Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann. The paper discusses a new self-supervised learning strategy for semantic segmentation of point clouds that leverages positive pairs in both the spatial and temporal domain. The authors designed a point-to-cluster…
-
CVPR 2023 – Masked Image Modeling with Local Multi-Scale Reconstruction
In this episode we discuss Masked Image Modeling with Local Multi-Scale Reconstruction by Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han. The paper proposes a new self-supervised representation learning approach called Masked Image Modeling (MIM) that achieves outstanding success, but with a huge computational burden and slow learning process. To address…
-
CVPR 2023 – CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
In this episode we discuss CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes by Harshil Bhatia, Edith Tretschk, Zorah Lähner, Marcel Seelbach Benkner, Michael Moeller, Christian Theobalt, Vladislav Golyanik. This paper proposes a quantum-hybrid approach for the challenging problem of jointly matching multiple, non-rigidly deformed 3D shapes, which is NP-hard. The approach is cycle-consistent and iterative,…
-
CVPR 2023 – Contrastive Mean Teacher for Domain Adaptive Object Detectors
In this episode we discuss Contrastive Mean Teacher for Domain Adaptive Object Detectors by Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang. The paper proposes a unified framework called Contrastive Mean Teacher (CMT) that integrates mean-teacher self-training and contrastive learning to overcome the domain gap in object detection. CMT extracts object-level features using low-quality pseudo-labels…
-
ICLR 2023 – F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
In this episode we discuss F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models by Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova. I’m sorry, there is no abstract provided for me to summarize. Could you please provide the abstract or more information about the paper you would like me to summarize?
-
CVPR 2023 – ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
In this episode we discuss ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding by Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese. The paper introduces ULIP, a framework that learns a unified representation of images, texts, and 3D…
-
CVPR 2023 – Masked Autoencoding Does Not Help Natural Language Supervision at Scale
In this episode we discuss Masked Autoencoding Does Not Help Natural Language Supervision at Scale by Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter. The paper explores the effectiveness of combining self-supervision and natural language supervision for training general purpose image encoders. While recent works have shown promising results with small pre-training datasets,…
-
CVPR 2023 – Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
In this episode we discuss Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos by Sixun Dong, Huazhang Hu, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao. The paper proposes a weakly supervised approach for sequential video understanding, where time-stamp level text-video alignment is not provided. The proposed method uses a transformer to…
-
CVPR 2023 – Attribute-preserving Face Dataset Anonymization via Latent Code Optimization
In this episode we discuss Attribute-preserving Face Dataset Anonymization via Latent Code Optimization by Simone Barattin, Christos Tzelepis, Ioannis Patras, Nicu Sebe. The paper presents a task-agnostic approach for anonymizing the identities of faces in a dataset of images while retaining the facial attributes necessary for downstream tasks. The proposed method optimizes the latent representation…
-
CVPR 2023 – ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
In this episode we discuss ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos by Zhou Yu, Lixiang Zheng, Zhou Zhao, Fei Wu, Jianping Fan, Kui Ren, Jun Yu. The paper discusses the challenge of building benchmarks for video question answering (VideoQA) models that can systematically analyze their capabilities. Existing benchmarks have limitations…
-
CVPR 2023 – Neuralizer: General Neuroimage Analysis without Re-Training
In this episode we discuss Neuralizer: General Neuroimage Analysis without Re-Training by Steffen Czolbe, Adrian V. Dalca. The paper discusses the challenges in using deep learning for neuroimage processing tasks such as segmentation and registration. The authors introduce a new model called Neuralizer that can generalize to previously unseen tasks and modalities without the need…
-
CVPR 2023 – Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
In this episode we discuss Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification by Jiawei Feng, Ancong Wu, Wei-Shi Zheng. The paper proposes a new approach to address the challenging problem of visible-infrared person re-identification (VI-ReID) by learning diverse modality-shared semantic concepts. The proposed method aims to force the ReID model to extract more and different…
-
CVPR 2023 – STMixer: A One-Stage Sparse Action Detector
In this episode we discuss STMixer: A One-Stage Sparse Action Detector by Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang. The paper proposes a new one-stage sparse action detector called STMixer which is based on two core designs. The first design is a query-based adaptive feature sampling module that allows STMixer to mine…
-
CVPR 2023 – Balanced Spherical Grid for Egocentric View Synthesis
In this episode we discuss Balanced Spherical Grid for Egocentric View Synthesis by Changwoon Choi, Sang Min Kim, Young Min Kim. The paper presents EgoNeRF, an efficient solution for reconstructing large-scale environments from a few seconds of 360 videos for virtual reality (VR) assets. The authors adopted a spherical coordinate parameterization instead of Cartesian coordinate…
-
CVPR 2023 – Train-Once-for-All Personalization
In this episode we discuss Train-Once-for-All Personalization by Authors: – Hong-You Chen – Yandong Li – Yin Cui – Mingda Zhang – Wei-Lun Chao – Li Zhang Affiliations: – Hong-You Chen and Wei-Lun Chao are affiliated with The Ohio State University. – Yandong Li, Yin Cui, Mingda Zhang, and Li Zhang are affiliated with Google…