Category: Uncategorized
-
ICCV 2023 – Adding Conditional Control to Text-to-Image Diffusion Models
In this episode we discuss Adding Conditional Control to Text-to-Image Diffusion Models by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. The paper introduces ControlNet, a neural network architecture that enhances spatial control in text-to-image diffusion models. It incorporates additional images, such as edge maps and human pose skeletons, as conditioning factors to specify desired image composition.…
-
arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional
In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in…
-
arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional
In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in…
-
arxiv Preprint – eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
In this episode we discuss eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models by Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal. The paper proposes eDKM, a memory-efficient implementation of weight clustering for large language models (LLMs). LLMs have high…
-
arxiv Preprint – Link-Context Learning for Multimodal LLMs
In this episode we discuss Link-Context Learning for Multimodal LLMs by Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu. The paper presents a method called link-context learning (LCL) that enhances the learning abilities of Multimodal Large Language Models (MLLMs). LCL aims to enable MLLMs to recognize new images and understand unfamiliar…
-
arxiv Preprint – ProPainter: Improving Propagation and Transformer for Video Inpainting
In this episode we discuss ProPainter: Improving Propagation and Transformer for Video Inpainting by Shangchen Zhou, Chongyi Li, Kelvin C. K. Chan, Chen Change Loy. The paper discusses the limitations of existing approaches to video inpainting, specifically flow-based propagation and spatiotemporal Transformer methods, due to spatial misalignment and limited temporal range. To address these challenges,…
-
arxiv Preprint – Large Language Models as Optimizers
In this episode we discuss Large Language Models as Optimizers by Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen. The paper introduces Optimization by PROmpting (OPRO), a method that uses large language models as optimizers in the absence of gradients. OPRO utilizes natural language descriptions of the optimization…
-
arxiv Preprint – Active Retrieval Augmented Generation
In this episode we discuss Active Retrieval Augmented Generation by Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig. The paper presents FLARE, a method that improves the performance of language models by incorporating retrieval of information from external knowledge resources. Unlike existing retrieval-augmented models,…
-
arxiv Preprint – Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
In this episode we discuss Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation by Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen. This paper presents Animate-A-Story, a framework for generating storytelling videos by customizing existing video clips. The framework includes two modules: Motion…
-
arxiv Preprint – FACET: Fairness in Computer Vision Evaluation Benchmark
In this episode we discuss FACET: Fairness in Computer Vision Evaluation Benchmark by Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, Candace Ross. The paper introduces a new benchmark called FACET, which measures performance disparities in computer vision models across different attributes such as gender and skin tone. It…
-
arxiv Preprint – Baseline Defenses for Adversarial Attacks Against Aligned Language Models
In this episode we discuss Baseline Defenses for Adversarial Attacks Against Aligned Language Models by Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, Tom Goldstein. The paper discusses the security vulnerabilities of Large Language Models (LLMs) and explores defense strategies against adversarial attacks. Three types…
-
ICCV 2023 – Verbs in Action: Improving verb understanding in video-language models
In this episode we discuss Verbs in Action: Improving verb understanding in video-language models by Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid. The paper proposes a Verb-Focused Contrastive (VFC) framework to address the limited understanding of verbs in video-language models. The framework utilizes pre-trained large language models (LLMs) to generate hard negative…
-
arxiv Preprint – RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
In this episode we discuss RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback by Harrison Lee, Samrat Phatale, Hassan Mansoor, Kellie Lu, Thomas Mesnard, Colton Bishop, Victor Carbune, Abhinav Rastogi. The paper introduces a new technique called RL from AI Feedback (RLAIF) as a solution to the scalability limitations of reinforcement learning from…
-
arxiv Preprint – LLM-Rec: Personalized Recommendation via Prompting Large Language Models
In this episode we discuss LLM-Rec: Personalized Recommendation via Prompting Large Language Models by Hanjia Lyu, Song Jiang, Hanqing Zeng, Qifan Wang, Si Zhang, Ren Chen, Chris Leung, Jiajie Tang, Yinglong Xia, Jiebo Luo. The paper examines different prompting strategies for improving personalized recommendation performance using large language models (LLMs) through input augmentation. The proposed…
-
ICCV 2023 – Robust Monocular Depth Estimation under Challenging Conditions
In this episode we discuss Robust Monocular Depth Estimation under Challenging Conditions by Stefano Gasperini, Nils Morbitzer, HyunJun Jung, Nassir Navab, Federico Tombari. The paper addresses the limitations of existing monocular depth estimation methods in challenging lighting and weather conditions. The authors propose md4all, a simple and reliable solution that can handle diverse conditions without…
-
arxiv Preprint – LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
In this episode we discuss LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models by Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. The paper introduces LM-Infinite as a solution to the challenge of length generalization in Large Language Models (LLMs). Existing methods for handling longer sequences are resource-intensive and time-consuming,…
-
arxiv Preprint – Llama 2: Open Foundation and Fine-Tuned Chat Models
In this episode we discuss Llama 2: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller,…
-
arxiv Preprint – Nougat: Neural Optical Understanding for Academic Documents
In this episode we discuss Nougat: Neural Optical Understanding for Academic Documents by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. The paper introduces Nougat, a neural optical understanding model for academic documents. Nougat utilizes a Visual Transformer model and Optical Character Recognition (OCR) to convert scientific documents into a markup language, bridging the gap…
-
arxiv Preprint – Graph of Thoughts: Solving Elaborate Problems with Large Language Models
In this episode we discuss Graph of Thoughts: Solving Elaborate Problems with Large Language Models by Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler. The paper introduces a framework called Graph of Thoughts (GoT) that enhances the prompting capabilities of large…
-
arxiv Preprint – Large Language Models as Zero-Shot Conversational Recommenders
In this episode we discuss Large Language Models as Zero-Shot Conversational Recommenders by Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, Julian McAuley. This paper presents empirical studies on conversational recommendation tasks using large language models (LLMs) in a zero-shot setting, without fine-tuning. The authors introduce…
-
arxiv Preprint – A Survey on Large Language Model based Autonomous Agents
In this episode we discuss A Survey on Large Language Model based Autonomous Agents by Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen. The authors of this paper conducted a comprehensive survey on the topic of…
-
arxiv Preprint – A Survey on Large Language Model based Autonomous Agents
In this episode we discuss A Survey on Large Language Model based Autonomous Agents by Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen. The authors of this paper conducted a comprehensive survey on the topic of…
-
arxiv Preprint – EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
In this episode we discuss EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding by Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik. The paper presents EgoSchema, a benchmark dataset and evaluation metric for assessing the long-form video language understanding capabilities of vision and language systems. The dataset consists of over 5000 multiple choice question-answer pairs…
-
ICCV 2023 – UnLoc: A Unified Framework for Video Localization Tasks
In this episode we discuss UnLoc: A Unified Framework for Video Localization Tasks by Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid. The paper introduces UnLoc, a unified framework for video localization using large-scale image-text pretrained models. UnLoc eliminates the need for action proposals, motion-based features, and…
-
ICCV 2023 – UnLoc: A Unified Framework for Video Localization Tasks
In this episode we discuss UnLoc: A Unified Framework for Video Localization Tasks by Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid. The paper introduces UnLoc, a unified framework for video localization using large-scale image-text pretrained models. UnLoc eliminates the need for action proposals, motion-based features, and…