Skip to content

About
AI Breakdown

Category: Uncategorized

CVPR 2023 – MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

In this episode we discuss MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation by Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo. The paper proposes a joint audio-video generation framework called Multi-Modal Diffusion (MM-Diffusion) that generates high-quality realistic videos with aligned…

May 18, 2023
CVPR 2023 – Robust Test-Time Adaptation in Dynamic Scenarios

In this episode we discuss Robust Test-Time Adaptation in Dynamic Scenarios by Longhui Yuan, Binhui Xie, Shuang Li. The paper discusses the limitations of test-time adaptation (TTA) methods in dynamic scenarios where the test data is sampled gradually over time, and proposes a new method called Robust Test-Time Adaptation (RoTTA) to address these limitations. RoTTA…

May 18, 2023
CVPR 2023 – Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

In this episode we discuss Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes by Zhen Li, Lingli Wang, Mofang Cheng, Cihui Pan, Jiaqi Yang. The paper proposes an efficient method for inverse rendering of large-scale real-world indoor scenes, which reconstructs global illumination and physically-plausible SVBRDFs. They introduce a new compact representation called Texture-based Lighting (TBL),…

May 18, 2023
CVPR 2023 – Jedi: Entropy-based Localization and Removal of Adversarial Patches

In this episode we discuss Jedi: Entropy-based Localization and Removal of Adversarial Patches by Bilel Tarchoun, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Nael Abu-Ghazaleh, Ihsen Alouani. The paper proposes a new defense against adversarial patches that is resilient to realistic patch attacks, called Jedi. Jedi tackles the patch localization problem from an information theory perspective…

May 18, 2023
CVPR 2023 – Improving Generalization with Domain Convex Game

In this episode we discuss Improving Generalization with Domain Convex Game by Fangrui Lv, Jian Liang, Shuang Li, Jinming Zhang, Di Liu. The paper explores the effectiveness of domain augmentation in domain generalization. The authors propose a new perspective on DG as a convex game between domains and design a regularization term based on supermodularity…

May 18, 2023
CVPR 2023 – Masked Motion Encoding for Self-Supervised Video Representation Learning

In this episode we discuss Masked Motion Encoding for Self-Supervised Video Representation Learning by Xinyu Sun, Peihao Chen, Liangwei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan. The paper proposes a new pre-training paradigm called Masked Motion Encoding (MME) for learning discriminative video representation from unlabeled videos. The authors address the limitations of…

May 18, 2023
CVPR 2023 – Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis

In this episode we discuss Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis by Thuan Hoang Nguyen, Thanh Van Le, Anh Tran. The paper proposes a new generative model called Column-Row Entangled Pixel Synthesis (CREPS) that can efficiently and scalably synthesize photo-realistic images of any arbitrary resolution. Existing GAN-based solutions suffer from inconsistency and texture…

May 17, 2023
CVPR 2023 – IMP: Iterative Matching and Pose Estimation with Adaptive Pooling

In this episode we discuss IMP: Iterative Matching and Pose Estimation with Adaptive Pooling by Fei Xue, Ignas Budvytis, Roberto Cipolla. The paper proposes an iterative matching and pose estimation framework (IMP) that leverages the geometric connections between the two tasks. They introduce a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera…

May 17, 2023
CVPR 2023 – Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

In this episode we discuss Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection by Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan. The paper proposes a Discriminative co-saliency and background Mining Transformer (DMT) framework for co-salient object detection. The framework includes several economical…

May 17, 2023
CVPR 2023 – ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field

In this episode we discuss ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field by Zhe Jun Tang, Tat-Jen Cham, Haiyu Zhao. The paper presents an alternative approach to the Neural Radiance Field (NeRF) method for representing 3D scenes that addresses view-dependent effects such as murky glossy and translucent surfaces. The proposed method, called…

May 17, 2023
CVPR 2023 – A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

In this episode we discuss A Dynamic Multi-Scale Voxel Flow Network for Video Prediction by Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou. The paper proposes a Dynamic Multi-scale Voxel Flow Network (DMVFN) for video prediction using only RGB images. The proposed network is efficient and achieves better performance than previous methods that…

May 17, 2023
CVPR 2023 – Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel

In this episode we discuss Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel by Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang. The paper discusses the challenge of multi-channel video-language retrieval, which requires models to understand information from different sources such as video and text. The authors…

May 17, 2023
CVPR 2023 – Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images

In this episode we discuss Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images by Tiancheng Lin, Zhimiao Yu, Hongyu Hu, Yi Xu, Chang Wen Chen. The paper proposes a new scheme called Interventional Bag Multi-Instance Learning (IBMIL) to improve the classification of whole slide pathological images. Existing methods focus on improving feature extraction and aggregation…

May 17, 2023
CVPR 2023 – Devil is in the Queries: Advancing Mask Transformers for Real-world Medical

In this episode we discuss Devil is in the Queries: Advancing Mask Transformers for Real-world Medical by Mingze Yuan, Yingda Xia, Hexin Dong, Zifan Chen, Jiawen Yao, Mingyan Qiu, Ke Yan, Xiaoli Yin, Yu Shi, Xin Chen, Zaiyi Liu, Bin Dong, Jingren Zhou, Le Lu, Ling Zhang, Li Zhang. The paper proposes a method for…

May 17, 2023
CVPR 2023 – Inverting the Imaging Process by Learning an Implicit Camera Model

In this episode we discuss Inverting the Imaging Process by Learning an Implicit Camera Model by Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Qing Wang. The paper introduces a new approach for modeling the physical imaging process of a camera as an implicit neural network, which is able to learn and control camera parameters.…

May 16, 2023
CVPR 2023 – Label-Free Liver Tumor Segmentation

In this episode we discuss Label-Free Liver Tumor Segmentation by Qixin Hu, Yixiong Chen, Junfei Xiao, Shuwen Sun, Jieneng Chen, Alan Yuille, Zongwei Zhou. The paper discusses the use of synthetic tumors in CT scans to train AI models to accurately segment liver tumors without the need for manual annotation. These synthetic tumors are realistic…

May 16, 2023
CVPR 2023 – Regularized Vector Quantization for Tokenized Image Synthesis

In this episode we discuss Regularized Vector Quantization for Tokenized Image Synthesis by Jiahui Zhang, Fangneng Zhan, Christian Theobalt, Shijian Lu. The paper proposes a regularized vector quantization framework for quantizing images into discrete representations, which has been a fundamental problem in generative modeling. Existing approaches either learn the discrete representation deterministically or stochastically, but…

May 16, 2023
CVPR 2023 – Reliability in Semantic Segmentation: Are We on the Right Track?

In this episode we discuss Reliability in Semantic Segmentation: Are We on the Right Track? by Pau de Jorge, Riccardo Volpi, Philip Torr, Gregory Rogez. The paper discusses a study on the reliability of modern semantic segmentation models in terms of robustness and uncertainty estimation. The authors analyze a variety of models and compare their…

May 16, 2023
CVPR 2023 – ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection

In this episode we discuss ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection by Jeeseung Park, Jin-Woo Park, Jong-Seok Lee. The paper proposes a new method for improving the performance of human-object interaction (HOI) detectors, which are used in scene understanding. The proposed method, called Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO),…

May 16, 2023
CVPR 2023 – FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding

In this episode we discuss FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding by Thanh-Dat Truong, Ngan Le, Bhiksha Raj, Jackson Cothren, Khoa Luu. The paper proposes a new approach called Fairness Domain Adaptation (FREDOM) for semantic scene segmentation that addresses fairness concerns in domain adaptation. The proposed adaptation framework is based on the…

May 16, 2023
CVPR 2023 – Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching

In this episode we discuss Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching by Dongliang Cao, Florian Bernard. The paper proposes a self-supervised multimodal learning strategy to bridge the gap between mesh-based and point cloud-based shape matching methods. Meshes provide rich topological information but require curation, while point clouds are commonly used for real-world data…

May 16, 2023
CVPR 2023 – Post-Processing Temporal Action Detection

In this episode we discuss Post-Processing Temporal Action Detection by Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang. The paper proposes a novel model-agnostic post-processing method, called Gaussian Approximated Post-processing (GAP), to improve the performance of Temporal Action Detection (TAD) methods without requiring model redesign and retraining. The existing TAD methods usually have a pre-processing…

May 16, 2023
CVPR 2023 – Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields

In this episode we discuss Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields by Sungheon Park, Minjung Son, Seokhwan Jang, Young Chun Ahn, Ji-Yeon Kim, Nahyup Kang. The paper presents a novel technique for training spatiotemporal neural radiance fields for dynamic scenes based on temporal interpolation of feature vectors. The proposed method…

May 16, 2023
CVPR 2023 – DATE: Domain Adaptive Product Seeker for E-commerce

In this episode we discuss DATE: Domain Adaptive Product Seeker for E-commerce by Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao. The paper presents a framework for Product Retrieval (PR) and Grounding (PG) that can seek image and object-level products respectively according to a textual query to…

May 16, 2023
CVPR 2023 – Visibility Aware Human-Object Interaction Tracking from Single RGB Camera

In this episode we discuss Visibility Aware Human-Object Interaction Tracking from Single RGB Camera by Xianghui Xie, Bharat Lal Bhatnagar, Gerard Pons-Moll. The paper proposes a method to track the 3D human and object, their contacts, and their relative translation across frames from a single RGB camera while being robust to heavy occlusions. The authors…

May 16, 2023

←Previous Page Next Page→

Proudly powered by WordPress