Category: Uncategorized

  • CVPR 2023 – Consistent View Synthesis with Pose-Guided Diffusion Models

    In this episode we discuss Consistent View Synthesis with Pose-Guided Diffusion Models by Hung-Yu Tseng, Qinbo Li, Changil Kim, Suhib Alsisan, Jia-Bin Huang, Johannes Kopf. The paper proposes a new technique for synthesizing novel views from a single image for virtual reality applications. The proposed method, called pose-guided diffusion, generates consistent and high-quality views from…

  • arxiv preprint – BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion

    In this episode we discuss BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion by Michael J. Black, Priyanka Patel, Joachim Tesch, Jinlong Yang. This paper presents BEDLAM, a large-scale synthetic dataset for 3D human pose and shape estimation. Unlike previous datasets, BEDLAM is realistic and diverse, featuring monocular RGB videos with ground-truth…

  • ICASSP 2023 – Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement

    In this episode we discuss Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement by Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan Reddy, Jayant Gupchup, Ross Cutler. The paper presents Aura, a privacy-preserving method to enhance test set diversity in speech enhancement models. Usually, these models are trained on public data, which leads…

  • arxiv preprint – Segment Anything Meets Point Tracking

    In this episode we discuss Segment Anything Meets Point Tracking by Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu. The paper introduces a method called SAM-PT for tracking and segmenting objects in dynamic videos. SAM-PT uses point selection and propagation techniques to create masks for video object segmentation. The method demonstrates…

  • ICASSP 2023 – A Speech Representation Anonymization Framework via Selective Noise Perturbation

    In this episode we discuss A Speech Representation Anonymization Framework via Selective Noise Perturbation by Minh Tran, Mohammad Soleymani. The paper presents a framework for anonymizing speech data by adding selective noise perturbation to speech representations. It includes a Privacy-risk Saliency Estimator (PSE) that predicts the importance of different representation positions. The approach achieves competitive…

  • arxiv preprint – One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

    In this episode we discuss One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning by Arnav Chavan, Zhuang Liu, Deepak Gupta, Eric Xing, Zhiqiang Shen. The paper introduces GLoRA, an approach for fine-tuning tasks that is parameter-efficient. GLoRA utilizes a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, resulting in improved flexibility and capability…

  • arxiv preprint – Tracking Everything Everywhere All at Once

    In this episode we discuss Tracking Everything Everywhere All at Once by Qianqian Wang, Yen-Yu Chang, Ruojin Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, Noah Snavely. The paper introduces a new method called OmniMotion for estimating dense and long-range motion from a video sequence. Traditional approaches like sparse feature tracking and dense optical flow are…

  • arxiv preprint – ViNT: A Foundation Model for Visual Navigation

    In this episode we discuss ViNT: A Foundation Model for Visual Navigation by Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Stachowicz, Kevin Black, Noriaki Hirose, Sergey Levine. The paper presents ViNT, a pre-trained foundation model for visual navigation in robotics. It utilizes a Transformer-based architecture and is trained with a goal-reaching objective. ViNT demonstrates positive…

  • arxiv preprint – Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

    In this episode we discuss Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation by Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy. The paper introduces a novel framework for adapting image models to videos through zero-shot text-guided video-to-video translation. The framework consists of two parts: key frame translation and full video translation. The key frame…

  • arxiv preprint – RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models

    In this episode we discuss RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models by Xingchen Zhou, Ying He, F. Richard Yu, Jianqiang Li, You Li. The paper proposes a framework called RePaint-NeRF for editing the content in Neural Radiance Fields (NeRF). Traditional NeRF methods struggle with content editing, so the framework leverages diffusion models…

  • arxiv preprint – ZipIt! Merging Models from Different Tasks without Training

    In this episode we discuss ZipIt! Merging Models from Different Tasks without Training by George Stoica, Daniel Bolya, Jakob Bjorner, Taylor Hearn, Judy Hoffman. The paper introduces a method called “ZipIt!” that can merge two deep visual recognition models trained on separate tasks without additional training. The method incorporates a “zip” operation to handle non-shared…

  • CVPR 2023 – Integral Neural Networks

    In this episode, we discuss, CVPR 2023 award candidate, Integral Neural Networks by Kirill Solodskikh, Azim Kurbanov, Ruslan Aydarkhanov, Irina Zhelavskaya, Yury Parfenov, Dehua Song, and Stamatios Lefkimmiatis. The paper introduces a novel type of deep neural networks called Integral Neural Networks (INNs), which deviate from the traditional representation of network layers as N-dimensional weight…

  • arxiv preprint – Faith and Fate: Limits of Transformers on Compositionality

    In this episode we discuss Faith and Fate: Limits of Transformers on Compositionality by Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi. The paper examines the limitations…

  • arxiv preprint – LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

    In this episode we discuss LayoutGPT: Compositional Visual Planning and Generation with Large Language Models by Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang. The paper introduces LayoutGPT, a method that uses Large Language Models (LLMs) to generate layouts from text instructions. LayoutGPT…

  • CVPR2023 – 3D Human Pose Estimation via Intuitive Physics

    In this episode we discuss 3D Human Pose Estimation via Intuitive Physics by Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael J. Black, Dimitrios Tzionas. This paper introduces a method called IPMAN (Intuitive Physics-based Human Pose Estimation) that aims to estimate 3D human pose from images while producing physically plausible body configurations. The…

  • arxiv preprint – Textbooks Are All You Need

    In this episode, we discuss Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi…

  • CVPR 2023, Honorable mention award winner – DynIBaR: Neural Dynamic Image-Based Rendering

    In this episode we discuss DynIBaR: Neural Dynamic Image-Based Rendering by Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely. The paper presents a new approach called “DynIBaR” that can generate novel views from a monocular video of a dynamic scene. Existing methods struggle with complex object motions and uncontrolled camera paths, resulting in…

  • Preprint: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

    In this episode, we discuss Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale by Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz,Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu from Meta AI. The paper “Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale” presents a breakthrough in generative modeling for…

  • arxiv preprint – Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

    In this episode we discuss Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks by Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West. The paper explores how large language models (LLMs) affect the reliability of human-generated data collected through crowdsourcing. The authors conducted a case study on Amazon Mechanical Turk…

  • CVPR 2023 – TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

    In this episode we discuss TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition by Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah. The paper proposes a semi-supervised learning framework for action recognition using self-supervised video representations, called TimeBalance. They suggest using temporally-invariant and temporally-distinctive representations that complement each other for different…

  • arxiv – AVIS: Autonomous Visual Information Seeking

    In this episode we discuss AVIS: Autonomous Visual Information Seeking by The author’s name cannot be determined from the snippet provided as it only includes the title of the paper.. The paper introduces AVIS, an autonomous visual question-answering framework that utilizes a Large Language Model to strategically utilize external tools and provide answers to visual…

  • CVPR 2023, award candidate – Data-driven Feature Tracking for Event Cameras

    In this episode we discuss Data-driven Feature Tracking for Event Cameras by Nico Messikommer, Carter Fang, Mathias Gehrig, Davide Scaramuzza. The paper details a data-driven feature tracking method for event cameras that improves upon existing techniques that require parameter tuning and struggle with noise and generalization. The proposed method utilizes a frame attention module to…

  • arxiv – AVIS: Autonomous Visual Information Seeking

    In this episode we discuss AVIS: Autonomous Visual Information Seeking by The author’s name cannot be determined from the snippet provided as it only includes the title of the paper.. The paper introduces AVIS, an autonomous visual question-answering framework that utilizes a Large Language Model to strategically utilize external tools and provide answers to visual…

  • CVPR 2023 – SIEDOB: Semantic Image Editing by Disentangling Object and Background

    In this episode we discuss SIEDOB: Semantic Image Editing by Disentangling Object and Background by Wuyang Luo, Su Yang, Xinjian Zhang, Weishan Zhang. The paper presents a new method for semantic image editing called Semantic Image Editing by Disentangling Object and Background (SIEDOB). This method separates objects and backgrounds into separate subnetworks for more efficient…

  • CVPR 2023 – GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

    In this episode we discuss GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts by Haoran Geng, Helin Xu, Chengyang Zhao, Chao Xu, Li Yi, Siyuan Huang, He Wang. The paper proposes a method called Generalizable and Actionable Parts (GAParts) for learning cross-category domain-generalizable object perception and manipulation. This involves defining 9…