Skip to content

About
AI Breakdown

Category: Uncategorized

CVPR 2023 – DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

In this episode we discuss DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation by Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman. The paper discusses a new approach to personalize text-to-image diffusion models by fine-tuning the pre-trained model with a few images of a particular subject, allowing the model to…

May 15, 2023
CVPR 2023 – PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

In this episode we discuss PolyFormer: Referring Image Segmentation as Sequential Polygon Generation by Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha. The paper presents a new approach to referring image segmentation that uses sequential polygon generation instead of directly predicting pixel-level masks. The method, called Polygon Transformer…

May 15, 2023
CVPR 2023 – Noisy Correspondence Learning with Meta Similarity Correction

In this episode we discuss Noisy Correspondence Learning with Meta Similarity Correction by Haochen Han, Kaiyao Miao, Qinghua Zheng, Minnan Luo. The paper proposes a Meta Similarity Correction Network (MSCN) to address the problem of noisy correspondence datasets, which causes performance degradation in cross-modal retrieval methods. MSCN provides reliable similarity scores by viewing a binary…

May 15, 2023
CVPR 2023 – Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

In this episode we discuss Query-Dependent Video Representation for Moment Retrieval and Highlight Detection by WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, Jae-Pil Heo. The paper introduces Query-Dependent DETR (QD-DETR), a detection transformer model designed for video moment retrieval and highlight detection (MR/HD). The previous transformer-based models did not exploit the information of a…

May 15, 2023
CVPR 2023 – Exploring Data Geometry for Continual Learning

In this episode we discuss Exploring Data Geometry for Continual Learning by Zhi Gao, Chen Xu, Feng Li, Yunde Jia, Mehrtash Harandi, Yuwei Wu. The paper explores the concept of continual learning, which involves effectively learning from a constantly changing stream of data without forgetting the knowledge gained from the old data. The study analyzes…

May 15, 2023
CVPR 2023 – The Differentiable Lens: Compound Lens Search over Glass Surfaces and Materials for Object Detection

In this episode we discuss The Differentiable Lens: Compound Lens Search over Glass Surfaces and Materials for Object Detection by Geoffroi Côté, Fahim Mannan, Simon Thibault, Jean-François Lalonde, Felix Heide. This paper proposes a novel approach to joint optimization of camera lens systems with other image processing components, particularly neural networks. The authors introduce a…

May 15, 2023
CVPR 2023 – CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

In this episode we discuss CrOC: Cross-View Online Clustering for Dense Visual Representation Learning by Thomas Stegmüller, Tim Lebailly, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran. The paper proposes a method called Cross-view consistency objective with an Online Clustering (CrOC) to learn dense visual representations without labels in scene-centric data. The method uses an online clustering…

May 14, 2023
CVPR 2023 – Efficient Map Sparsification Based on 2D and 3D Discretized Grids

In this episode we discuss Efficient Map Sparsification Based on 2D and 3D Discretized Grids by Xiaoyu Zhang, Yun-Hui Liu. The paper proposes an efficient linear approach for map sparsification, which involves selecting a subset of landmarks from a larger map for robot navigation. Existing methods require heavy computation and memory capacity, especially for large-scale…

May 14, 2023
CVPR 2023 – Learning Generative Structure Prior for Blind Text Image Super-resolution

In this episode we discuss Learning Generative Structure Prior for Blind Text Image Super-resolution by Xiaoming Li, Wangmeng Zuo, Chen Change Loy. This paper proposes a novel prior for blind text image super-resolution (SR), focusing on character structure, which can deal with diverse font styles and unknown degradation. The authors store discrete features for each…

May 14, 2023
CVPR 2023 – Re-thinking Federated Active Learning based on Inter-class Diversity

In this episode we discuss Re-thinking Federated Active Learning based on Inter-class Diversity by SangMook Kim, Sangmin Bae, Hwanjun Song, Se-Young Yun. The paper discusses the use of federated active learning (FAL) frameworks in situations where a significant amount of unlabeled data is present. The authors demonstrate that the effectiveness of available query selector models…

May 14, 2023
CVPR 2023 – Super-Resolution Neural Operator

In this episode we discuss Super-Resolution Neural Operator by Min Wei, Xuesong Zhang. The paper proposes a deep learning framework called Super-resolution Neural Operator (SRNO) that can generate high-resolution images from their low-resolution counterparts. It works by learning the mapping between the function spaces of the LR and HR image pairs, embedding the LR input…

May 14, 2023
CVPR 2023 – Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses

In this episode we discuss Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses by Junbong Jang, Kwonmoo Lee, Tae-Kyun Kim. The paper proposes a deep learning-based method for tracking the dynamic changes of cellular morphology in live cell videos. The proposed method includes point correspondence and considering local shapes and textures…

May 14, 2023
CVPR 2023 – Probabilistic Prompt Learning for Dense Prediction

In this episode we discuss Probabilistic Prompt Learning for Dense Prediction by Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn. This paper proposes a new approach called “probabilistic prompt learning” to improve the performance of dense prediction tasks. The authors introduce learnable class-agnostic attribute prompts to describe universal attributes across object…

May 14, 2023
CVPR 2023 – SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage

In this episode we discuss SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage by Yifan Wang, Aleksander Holynski, Xiuming Zhang, Xuaner Zhang. The paper presents SunStage, a lightweight alternative to a light stage that captures facial appearance and relighting data using only a smartphone camera and the sun. The method requires…

May 14, 2023
CVPR 2023 – Feature Separation and Recalibration for Adversarial Robustness

In this episode we discuss Feature Separation and Recalibration for Adversarial Robustness by Woo Jae Kim, Yoonki Cho, Junsik Jung, Sung-Eui Yoon. The paper proposes a novel approach called Feature Separation and Recalibration (FSR) to improve the robustness of deep neural networks against adversarial attacks. The FSR method recalibrates the non-robust feature activations, which are…

May 13, 2023
CVPR 2023 – DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

In this episode we discuss DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection by Zongheng Tang, Yifan Sun, Si Liu, Yi Yang. The paper proposes a method for cross-domain weakly supervised object detection (CDWSOD) by adapting the detector from source to target domain through weak supervision using DETR (transformers-based object detection model).…

May 13, 2023
CVPR 2023 – Diffusion-SDF: Text-to-Shape via Voxelized Diffusion

In this episode we discuss Diffusion-SDF: Text-to-Shape via Voxelized Diffusion by Muheng Li, Yueqi Duan, Jie Zhou, Jiwen Lu. The paper presents a new generative 3D modeling framework called Diffusion-SDF for synthesizing 3D shapes from text. The proposed framework uses a SDF autoencoder and Voxelized Diffusion model to generate representations for voxelized signed distance fields…

May 13, 2023
CVPR 2023 – Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

In this episode we discuss Aligning Step-by-Step Instructional Diagrams to Video Demonstrations by Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould. The paper presents a novel approach to align instruction steps depicted as assembly diagrams with segments from in-the-wild videos that depict the actions. The authors propose a supervised contrastive learning…

May 13, 2023
CVPR 2023 – AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

In this episode we discuss AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR by Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid. The paper proposes a method called AVFormer for augmenting audio-only models with visual information for audiovisual automatic speech recognition (AV-ASR). The method involves injecting visual embeddings into a frozen ASR model using…

May 13, 2023
CVPR 2023 – Hard Patches Mining for Masked Image Modeling

In this episode we discuss Hard Patches Mining for Masked Image Modeling by Haochen Wang, Kaiyou Song, Junsong Fan, Yuxi Wang, Jin Xie, Zhaoxiang Zhang. The paper proposes a new framework called Hard Patches Mining (HPM) for pre-training in masked image modeling (MIM). The authors argue that MIM models should not only focus on predicting…

May 13, 2023
CVPR 2023 – Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution

In this episode we discuss Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution by Bangyan Liao, Delin Qu, Yifei Xue, Huiqing Zhang, Yizhen Lao. The paper proposes a solution for accurate and fast bundle adjustment (BA) to estimate the 6-DoF pose using a rolling shutter camera. The proposed method addresses the challenges in…

May 13, 2023
CVPR 2023, highlight paper – Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

In this episode we discuss Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning by Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille. The paper introduces a virtual benchmark called Super-CLEVR to isolate different factors of variation that affect the performance of Visual Question Answering (VQA)…

May 12, 2023
CVPR 2023, highlight paper – Quantum Multi-Model Fitting

In this episode we discuss Quantum Multi-Model Fitting by Matteo Farina, Luca Magri, Willi Menapace, Elisa Ricci, Vladislav Golyanik, Federica Arrigoni. This paper introduces the first quantum approach to multi-model fitting (MMF), a fundamental computer vision problem. The authors propose a formulation that can be efficiently sampled on modern adiabatic quantum computers, without the relaxation…

May 12, 2023
CVPR 2023, highlight paper – DiffRF: Rendering-Guided 3D Radiance Field Diffusion

In this episode we discuss DiffRF: Rendering-Guided 3D Radiance Field Diffusion by Norman Müller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Matthias Nießner. The paper introduces a novel approach for 3D radiance field synthesis called DiffRF, which is based on denoising diffusion probabilistic models. Unlike existing diffusion-based methods that operate on images, latent…

May 12, 2023
CVPR 2023, highlight paper – SPARF: Neural Radiance Fields from Sparse and Noisy Poses

In this episode we discuss SPARF: Neural Radiance Fields from Sparse and Noisy Poses by Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, Federico Tombari. This paper introduces Sparse Pose Adjusting Radiance Field (SPARF), a method for synthesizing photorealistic novel views with only a few input images and noisy camera poses. SPARF uses multi-view geometry constraints to…

May 12, 2023

←Previous Page Next Page→

Proudly powered by WordPress