Category: Uncategorized

  • CVPR 2023 – SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text

    In this episode we discuss SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text by Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Yi-Zhe Song. The paper discusses an extension of scene understanding that includes human sketch as a modality, resulting in a complete trilogy of scene representation from…

  • CVPR 2023 – Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos

    In this episode we discuss Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos by Liao Wang, Qiang Hu, Qihan He, Ziyu Wang, Jingyi Yu, Tinne Tuytelaars, Lan Xu, Minye Wu. The paper introduces a new technique called Residual Radiance Field (ReRF), a compact neural representation for achieving real-time free-view rendering on long-duration dynamic scenes. ReRF…

  • CVPR 2023 – Planning-oriented Autonomous Driving

    In this episode we discuss Planning-oriented Autonomous Driving by Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li. The paper proposes a new framework for autonomous driving systems called Unified Autonomous Driving…

  • CVPR 2023 – DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering

    In this episode we discuss DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering by Zongrui Li, Qian Zheng, Boxin Shi, Gang Pan, Xudong Jiang. The paper proposes a deep learning approach, called DANI-Net, to solve the challenging problem of uncalibrated photometric stereo (UPS) which is complicated by unknown…

  • CVPR 2023 – Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings

    In this episode we discuss Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings by Daniel J. Trosten, Rwiddhi Chakraborty, Sigurd Løkse, Kristoffer Knutsen Wickstrøm, Robert Jenssen, Michael C. Kampffmeyer. This paper proposes two approaches to address the hubness problem in distance-based classification in transductive few-shot learning. The authors prove that…

  • CVPR 2023 – TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation

    In this episode we discuss TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation by Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-Jin Yoon. In this paper, the authors propose a method called “Test-Time Adaptation for Category-level Object Pose Estimation” or TTA-COPE, for addressing source-to-target domain…

  • CVPR 2023 – Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures

    In this episode we discuss Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures by Eugenia Iofinova, Alexandra Peste, Dan Alistarh. The paper investigates the relationship between neural network pruning and induced bias in Convolutional Neural Networks (CNNs) for computer vision. The authors show that highly-sparse models (with less than 10% remaining weights) can maintain…

  • CVPR 2023 – Practical Network Acceleration with Tiny Sets

    In this episode we discuss Practical Network Acceleration with Tiny Sets by Guo-Hua Wang, Jianxin Wu. The paper proposes a new method called PRACTISE for accelerating networks using only small training sets. It suggests dropping blocks as a better approach than filter-level pruning for achieving higher acceleration ratio and improved latency-accuracy performance under few-shot settings.…

  • CVPR 2023 – CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP

    In this episode we discuss CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP by Runnan Chen, Youquan Liu, Lingdong Kong, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, Wenping Wang. The paper explores how Contrastive Language-Image Pre-training (CLIP) knowledge can benefit 3D scene understanding, which has yet to be explored. The authors propose…

  • CVPR 2023 – Zero-Shot Noise2Noise: Efficient Image Denoising without any Data

    In this episode we discuss Zero-Shot Noise2Noise: Efficient Image Denoising without any Data by Youssef Mansour, Reinhard Heckel. The paper proposes a new method for image denoising that does not rely on any training data or knowledge of the noise distribution and is computationally efficient. The proposed method utilizes a simple 2-layer network that can…

  • CVPR 2023 – Single Image Backdoor Inversion via Robust Smoothed Classifiers

    In this episode we discuss Single Image Backdoor Inversion via Robust Smoothed Classifiers by Mingjie Sun, Zico Kolter. The paper proposes a new method called SmoothInv for identifying backdoor triggers in machine learning models. Previous methods used an optimization process to flip a support set of clean images into the target class. However, the paper…

  • CVPR 2023 – Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

    In this episode we discuss Fake it till you make it: Learning transferable representations from synthetic ImageNet clones by Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis. The paper investigates the ability of synthetic images, generated using Stable Diffusion, to replace real images for training models for ImageNet classification. Using only class names to…

  • CVPR 2023 – Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

    In this episode we discuss Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations by Yiwu Zhong, Licheng Yu, Yang Bai, Shangwen Li, Xueting Yan, Yin Li. The paper proposes a method to learn a video representation that encodes both action steps and their temporal ordering from a large-scale dataset of web instructional videos…

  • CVPR 2023 – Focused and Collaborative Feedback Integration

    In this episode we discuss Focused and Collaborative Feedback Integration by Qiaoqiao Wei, Hui Zhang, Jun-Hai Yong. The paper proposes Focused and Collaborative Feedback Integration (FCFI), an approach for click-based interactive image segmentation. FCFI fully exploits feedback by focusing on a local area around the new click and correcting the feedback based on high-level feature…

  • CVPR 2023 – Seeing What You Said: Talking Face Generation Guided

    In this episode we discuss Seeing What You Said: Talking Face Generation Guided by Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li. The paper discusses the generation of talking faces, also known as speech-to-lip generation, which reconstructs facial motions concerning lips based on speech input. The authors propose using a lip-reading expert…

  • CVPR 2023 – Instance-Aware Domain Generalization for Face Anti-Spoofing

    In this episode we discuss Instance-Aware Domain Generalization for Face Anti-Spoofing by Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma. The paper discusses the development of a Face Anti-Spoofing (FAS) system based on Domain Generalization (DG) which aligns features on the instance level without relying on domain labels. This…

  • CVPR 2023 – SpaText: Spatio-Textual Representation for Controllable Image Generation

    In this episode we discuss SpaText: Spatio-Textual Representation for Controllable Image Generation by Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin. The paper presents SpaText, a new method for text-to-image generation that allows for open-vocabulary scene control. By providing a global text prompt and annotated…

  • CVPR 2023 – Neural Part Priors: Learning to Optimize Part-Based Object Completion in

    In this episode we discuss Neural Part Priors: Learning to Optimize Part-Based Object Completion in by Alexey Bokhovkin, Angela Dai. The paper proposes learning Neural Part Priors (NPPs) to improve 3D scene understanding. NPPs are parametric spaces of objects and their parts that allow for optimization to fit new input 3D scans while maintaining global…

  • Introducing AI Breakdown

    Welcome to the AI Breakdown podcast, where we leverage the power of artificial intelligence to break down recent AI papers and provide simplified explanations of intricate AI topics for educational purposes. We’re delighted to have you join us on this exciting journey into the world of artificial intelligence. Our goal is to make complex AI…