Category: Uncategorized
-
ArXiv Preprint – Birth of a Transformer: A Memory Viewpoint
In this episode we discuss Birth of a Transformer: A Memory Viewpoint by The authors of the paper are Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Hervé Jegou and Léon Bottou.. The paper titled “Birth of a Transformer: A Memory Viewpoint” delves into the internal workings of large language models based on transformers. The authors introduce…
-
CVPR 2023 – PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
In this episode, we discuss “PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization” by Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen from Microsoft and University of Central Florida. It introduces a novel approach to address the problem of localizing actions in untrimmed videos with only video-level supervision.…
-
CVPR 2023 – Polynomial Implicit Neural Representations For Large Diverse Datasets
In this episode we discuss Polynomial Implicit Neural Representations For Large Diverse Datasets by Rajhans Singh, Ankita Shukla, Pavan Turaga. The paper proposes a new approach to implicit neural representations (INR) which are popularly used for signal and image representation in various tasks. The current INR architectures rely on sinusoidal positional encoding, limiting their representational…
-
CVPR 2023 – Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections
In this episode we discuss Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections by Alexander Gillert, Giulia Resente, Alba Anadon-Rosell, Martin Wilmking, Uwe Freiherr von Lukas. The paper proposes a new iterative method called Iterative Next Boundary Detection (INBD) for detecting tree rings in microscopy images…
-
CVPR 2023 – OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields
In this episode we discuss OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields by Haim Sawdayee, Amir Vaxman, Amit H. Bermano. The paper presents OReX, a method for reconstructing 3D shapes from planar cross-sections using a Neural Field as the interpolation prior. The trained neural network estimates the inside/outside function of a given 3D…
-
CVPR 2023 – Towards Unified Scene Text Spotting based on Sequence Generation
In this episode we discuss Towards Unified Scene Text Spotting based on Sequence Generation by Taeho Kil, Seonghyeon Kim, Sukmin Seo, Yoonsik Kim, Daehee Kim. The proposed paper presents a UNIfied scene Text Spotter, called UNITS, to overcome the limitations of auto-regressive models used for end-to-end text spotting. UNITS unifies various detection formats, allowing it…
-
CVPR 2023 – Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
In this episode we discuss Learning Neural Duplex Radiance Fields for Real-Time View Synthesis by Ziyu Wan, Christian Richardt, Aljaž Božič, Chao Li, Vijay Rengarajan, Seonghyeon Nam, Xiaoyu Xiang, Tuotuo Li, Bo Zhu, Rakesh Ranjan, Jing Liao. The paper proposes a novel approach to rendering photorealistic images using Neural Radiance Fields (NeRFs) in a more…
-
CVPR 2023 – Context-Based Trit-Plane Coding for Progressive Image Compression
In this episode we discuss Context-Based Trit-Plane Coding for Progressive Image Compression by Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim. The paper proposes the context-based trit-plane coding (CTC) algorithm for progressive image compression. CTC enables compact encoding of trit-planes by developing a context-based rate reduction module to estimate trit probabilities accurately. The context-based…
-
CVPR 2023 – Interactive Cartoonization with Controllable Perceptual Factors
In this episode we discuss Interactive Cartoonization with Controllable Perceptual Factors by Namhyuk Ahn, Patrick Kwon, Jihye Back, Kibeom Hong, Seungkwon Kim. The paper proposes a new method for cartoonization, which involves rendering natural photos into cartoon styles with editing features of texture and color. The proposed method uses a model architecture with separate decoders…
-
CVPR 2023 – Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
In this episode we discuss Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning by Qian Jiang, Changyou Chen, Han Zhao, Liqun Chen, Qing Ping, Son Dinh Tran, Yi Xu, Belinda Zeng, Trishul Chilimbi. The paper discusses the use of contrastive loss in learning representations from multiple modalities. It argues that perfect modality alignment…
-
CVPR 2023 – MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
In this episode we discuss MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving by Jiale Li, Hang Dai, Hao Han, Yong Ding. This paper proposes a multi-modal 3D semantic segmentation model (MSeg3D) for autonomous driving, combining LiDAR and camera data. The authors address several challenges with multi-modal solutions, including modality heterogeneity, limited sensor field of…
-
arxiv preprint – PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$
In this episode we discuss PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$ by Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Ogras, Linjie Luo. The paper introduces PanoHead, a 3D-aware generative model that can synthesize high-quality, view-consistent images of full heads in 360 degrees. Existing 3D generative adversarial networks (GANs) struggle to preserve 3D…
-
CVPR 2023 – OmniMAE: Single Model Masked Pretraining on Images and Videos
In this episode we discuss OmniMAE: Single Model Masked Pretraining on Images and Videos by Authors: – Rohit Girdhar – Alaaeldin El-Nouby – Mannat Singh – Kalyan Vasudev Alwala – Armand Joulin – Ishan Misra Affiliation: – FAIR, Meta AI. The paper discusses how a common architecture can be used to train a single unified…
-
CVPR 2023 – NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
In this episode we discuss NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination by Haoqian Wu, Zhipeng Hu, Lincheng Li, Yongqiang Zhang, Changjie Fan, Xin Yu. The paper proposes an end-to-end inverse rendering pipeline that decomposes materials and illumination from multi-view images, while considering near-field indirect illumination. They introduce Monte Carlo sampling based…
-
CVPR 2023 – PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
In this episode we discuss PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos by Zhiqiang Shen, Xiaoxiao Sheng, Longguang Wang, Yulan Guo, Qiong Liu, Xi Zhou. The paper proposed a self-supervised learning framework, called PointCMP, for point cloud videos, in which high labeling costs make unsupervised methods appealing. PointCMP uses a two-branch…
-
CVPR 2023 – A Strong Baseline for Generalized Few-Shot Semantic Segmentation
In this episode we discuss A Strong Baseline for Generalized Few-Shot Semantic Segmentation by Sina Hajimiri, Malik Boudiaf, Ismail Ben Ayed, Jose Dolz. The paper focuses on introducing a generalized few-shot segmentation framework with a simple and easy-to-optimize inference phase and training process. They propose a model based on the InfoMax principle, where the Mutual…
-
CVPR 2023 – MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
In this episode we discuss MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision by Antoine Guédon, Tom Monnier, Pascal Monasse, Vincent Lepetit. The paper introduces a method that can learn to explore and reconstruct large environments in 3D from color images only, without relying on depth sensors or 3D supervision. The method learns to…
-
CVPR 2023 – Stare at What You See: Masked Image Modeling without Reconstruction
In this episode we discuss Stare at What You See: Masked Image Modeling without Reconstruction by Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo. The paper proposes a new approach to Masked Image Modeling (MIM) called MaskAlign. The authors argue that the features extracted by powerful teacher models already…
-
CVPR 2023 – SimpleNet: A Simple Network for Image Anomaly Detection and Localization
In this episode we discuss SimpleNet: A Simple Network for Image Anomaly Detection and Localization by Zhikang Liu, Yiming Zhou, Yuansheng Xu, Zilei Wang. The paper introduces a new deep learning network called SimpleNet for detecting and localizing anomalies. SimpleNet has four main components that include a pre-trained Feature Extractor, a shallow Feature Adapter, a…
-
CVPR 2023 – Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
In this episode we discuss Self-Supervised Video Forensics by Audio-Visual Anomaly Detection by Chao Feng, Ziyang Chen, Andrew Owens. The paper proposes a method for detecting inconsistencies between the visual and audio signals in manipulated videos using anomaly detection. The method trains an autoregressive model on real, unlabeled data to generate audio-visual feature sequences capturing…
-
CVPR 2023 – Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
In this episode we discuss Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation by Chaohui Yu, Qiang Zhou, Jingliang Li, Jianlong Yuan, Zhibin Wang, Fan Wang. The paper proposes a novel and data-efficient framework for weakly incremental learning for semantic segmentation (WILSS) called FMWISS. WILSS aims to learn to segment new classes from cheap…
-
CVPR 2023 – Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
In this episode we discuss Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling by Yulin Liu, Haoran Liu, Yingda Yin, Yang Wang, Baoquan Chen, He Wang. The paper proposes a new normalizing flow method for the SO(3) manifold, which is an important quantity in computer vision, graphics, and robotics but has…
-
CVPR 2023 – StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
In this episode we discuss StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos by Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson. The paper introduces StepFormer, a self-supervised model that locates key-steps in instructional videos with no human supervision. Traditional methods require video-level human annotations,…
-
CVPR 2023 – SketchXAI: A First Look at Explainability for Human Sketches
In this episode we discuss SketchXAI: A First Look at Explainability for Human Sketches by Zhiyu Qu, Yulia Gryaditskaya, Ke Li, Kaiyue Pang, Tao Xiang, Yi-Zhe Song. The paper introduces human sketches to the landscape of Explainable Artificial Intelligence (XAI). Sketch is argued to be a “human-centered” data form that represents a natural interface to…
-
CVPR 2023 – Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
In this episode we discuss Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training by Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan. The paper discusses improvements to the contrastive pre-training pipeline for vision-language models used in zero-shot recognition problems. The authors propose a filtering strategy…