Category: Uncategorized
-
CVPR 2023 – MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
In this episode we discuss MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation by Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo. The paper proposes a joint audio-video generation framework called Multi-Modal Diffusion (MM-Diffusion) that generates high-quality realistic videos with aligned…
-
CVPR 2023 – Robust Test-Time Adaptation in Dynamic Scenarios
In this episode we discuss Robust Test-Time Adaptation in Dynamic Scenarios by Longhui Yuan, Binhui Xie, Shuang Li. The paper discusses the limitations of test-time adaptation (TTA) methods in dynamic scenarios where the test data is sampled gradually over time, and proposes a new method called Robust Test-Time Adaptation (RoTTA) to address these limitations. RoTTA…
-
CVPR 2023 – Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
In this episode we discuss Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes by Zhen Li, Lingli Wang, Mofang Cheng, Cihui Pan, Jiaqi Yang. The paper proposes an efficient method for inverse rendering of large-scale real-world indoor scenes, which reconstructs global illumination and physically-plausible SVBRDFs. They introduce a new compact representation called Texture-based Lighting (TBL),…
-
CVPR 2023 – Jedi: Entropy-based Localization and Removal of Adversarial Patches
In this episode we discuss Jedi: Entropy-based Localization and Removal of Adversarial Patches by Bilel Tarchoun, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Nael Abu-Ghazaleh, Ihsen Alouani. The paper proposes a new defense against adversarial patches that is resilient to realistic patch attacks, called Jedi. Jedi tackles the patch localization problem from an information theory perspective…
-
CVPR 2023 – Improving Generalization with Domain Convex Game
In this episode we discuss Improving Generalization with Domain Convex Game by Fangrui Lv, Jian Liang, Shuang Li, Jinming Zhang, Di Liu. The paper explores the effectiveness of domain augmentation in domain generalization. The authors propose a new perspective on DG as a convex game between domains and design a regularization term based on supermodularity…
-
CVPR 2023 – Masked Motion Encoding for Self-Supervised Video Representation Learning
In this episode we discuss Masked Motion Encoding for Self-Supervised Video Representation Learning by Xinyu Sun, Peihao Chen, Liangwei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan. The paper proposes a new pre-training paradigm called Masked Motion Encoding (MME) for learning discriminative video representation from unlabeled videos. The authors address the limitations of…
-
CVPR 2023 – Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
In this episode we discuss Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis by Thuan Hoang Nguyen, Thanh Van Le, Anh Tran. The paper proposes a new generative model called Column-Row Entangled Pixel Synthesis (CREPS) that can efficiently and scalably synthesize photo-realistic images of any arbitrary resolution. Existing GAN-based solutions suffer from inconsistency and texture…
-
CVPR 2023 – IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
In this episode we discuss IMP: Iterative Matching and Pose Estimation with Adaptive Pooling by Fei Xue, Ignas Budvytis, Roberto Cipolla. The paper proposes an iterative matching and pose estimation framework (IMP) that leverages the geometric connections between the two tasks. They introduce a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera…
-
CVPR 2023 – Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
In this episode we discuss Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection by Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan. The paper proposes a Discriminative co-saliency and background Mining Transformer (DMT) framework for co-salient object detection. The framework includes several economical…
-
CVPR 2023 – ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
In this episode we discuss ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field by Zhe Jun Tang, Tat-Jen Cham, Haiyu Zhao. The paper presents an alternative approach to the Neural Radiance Field (NeRF) method for representing 3D scenes that addresses view-dependent effects such as murky glossy and translucent surfaces. The proposed method, called…
-
CVPR 2023 – A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
In this episode we discuss A Dynamic Multi-Scale Voxel Flow Network for Video Prediction by Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou. The paper proposes a Dynamic Multi-scale Voxel Flow Network (DMVFN) for video prediction using only RGB images. The proposed network is efficient and achieves better performance than previous methods that…
-
CVPR 2023 – Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel
In this episode we discuss Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel by Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang. The paper discusses the challenge of multi-channel video-language retrieval, which requires models to understand information from different sources such as video and text. The authors…
-
CVPR 2023 – Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
In this episode we discuss Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images by Tiancheng Lin, Zhimiao Yu, Hongyu Hu, Yi Xu, Chang Wen Chen. The paper proposes a new scheme called Interventional Bag Multi-Instance Learning (IBMIL) to improve the classification of whole slide pathological images. Existing methods focus on improving feature extraction and aggregation…
-
CVPR 2023 – Devil is in the Queries: Advancing Mask Transformers for Real-world Medical
In this episode we discuss Devil is in the Queries: Advancing Mask Transformers for Real-world Medical by Mingze Yuan, Yingda Xia, Hexin Dong, Zifan Chen, Jiawen Yao, Mingyan Qiu, Ke Yan, Xiaoli Yin, Yu Shi, Xin Chen, Zaiyi Liu, Bin Dong, Jingren Zhou, Le Lu, Ling Zhang, Li Zhang. The paper proposes a method for…
-
CVPR 2023 – Inverting the Imaging Process by Learning an Implicit Camera Model
In this episode we discuss Inverting the Imaging Process by Learning an Implicit Camera Model by Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Qing Wang. The paper introduces a new approach for modeling the physical imaging process of a camera as an implicit neural network, which is able to learn and control camera parameters.…
-
CVPR 2023 – Label-Free Liver Tumor Segmentation
In this episode we discuss Label-Free Liver Tumor Segmentation by Qixin Hu, Yixiong Chen, Junfei Xiao, Shuwen Sun, Jieneng Chen, Alan Yuille, Zongwei Zhou. The paper discusses the use of synthetic tumors in CT scans to train AI models to accurately segment liver tumors without the need for manual annotation. These synthetic tumors are realistic…
-
CVPR 2023 – Regularized Vector Quantization for Tokenized Image Synthesis
In this episode we discuss Regularized Vector Quantization for Tokenized Image Synthesis by Jiahui Zhang, Fangneng Zhan, Christian Theobalt, Shijian Lu. The paper proposes a regularized vector quantization framework for quantizing images into discrete representations, which has been a fundamental problem in generative modeling. Existing approaches either learn the discrete representation deterministically or stochastically, but…
-
CVPR 2023 – Reliability in Semantic Segmentation: Are We on the Right Track?
In this episode we discuss Reliability in Semantic Segmentation: Are We on the Right Track? by Pau de Jorge, Riccardo Volpi, Philip Torr, Gregory Rogez. The paper discusses a study on the reliability of modern semantic segmentation models in terms of robustness and uncertainty estimation. The authors analyze a variety of models and compare their…
-
CVPR 2023 – ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
In this episode we discuss ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection by Jeeseung Park, Jin-Woo Park, Jong-Seok Lee. The paper proposes a new method for improving the performance of human-object interaction (HOI) detectors, which are used in scene understanding. The proposed method, called Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO),…
-
CVPR 2023 – FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
In this episode we discuss FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding by Thanh-Dat Truong, Ngan Le, Bhiksha Raj, Jackson Cothren, Khoa Luu. The paper proposes a new approach called Fairness Domain Adaptation (FREDOM) for semantic scene segmentation that addresses fairness concerns in domain adaptation. The proposed adaptation framework is based on the…
-
CVPR 2023 – Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
In this episode we discuss Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching by Dongliang Cao, Florian Bernard. The paper proposes a self-supervised multimodal learning strategy to bridge the gap between mesh-based and point cloud-based shape matching methods. Meshes provide rich topological information but require curation, while point clouds are commonly used for real-world data…
-
CVPR 2023 – Post-Processing Temporal Action Detection
In this episode we discuss Post-Processing Temporal Action Detection by Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang. The paper proposes a novel model-agnostic post-processing method, called Gaussian Approximated Post-processing (GAP), to improve the performance of Temporal Action Detection (TAD) methods without requiring model redesign and retraining. The existing TAD methods usually have a pre-processing…
-
CVPR 2023 – Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields
In this episode we discuss Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields by Sungheon Park, Minjung Son, Seokhwan Jang, Young Chun Ahn, Ji-Yeon Kim, Nahyup Kang. The paper presents a novel technique for training spatiotemporal neural radiance fields for dynamic scenes based on temporal interpolation of feature vectors. The proposed method…
-
CVPR 2023 – DATE: Domain Adaptive Product Seeker for E-commerce
In this episode we discuss DATE: Domain Adaptive Product Seeker for E-commerce by Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao. The paper presents a framework for Product Retrieval (PR) and Grounding (PG) that can seek image and object-level products respectively according to a textual query to…
-
CVPR 2023 – Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
In this episode we discuss Visibility Aware Human-Object Interaction Tracking from Single RGB Camera by Xianghui Xie, Bharat Lal Bhatnagar, Gerard Pons-Moll. The paper proposes a method to track the 3D human and object, their contacts, and their relative translation across frames from a single RGB camera while being robust to heavy occlusions. The authors…