Category: Uncategorized
-
arxiv Preprint – Active Retrieval Augmented Generation
In this episode we discuss Active Retrieval Augmented Generation by Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig. The paper presents FLARE, a method that improves the performance of language models by incorporating retrieval of information from external knowledge resources. Unlike existing retrieval-augmented models,…
-
arxiv Preprint – Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
In this episode we discuss Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation by Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen. This paper presents Animate-A-Story, a framework for generating storytelling videos by customizing existing video clips. The framework includes two modules: Motion…
-
arxiv Preprint – FACET: Fairness in Computer Vision Evaluation Benchmark
In this episode we discuss FACET: Fairness in Computer Vision Evaluation Benchmark by Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, Candace Ross. The paper introduces a new benchmark called FACET, which measures performance disparities in computer vision models across different attributes such as gender and skin tone. It…
-
arxiv Preprint – Baseline Defenses for Adversarial Attacks Against Aligned Language Models
In this episode we discuss Baseline Defenses for Adversarial Attacks Against Aligned Language Models by Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, Tom Goldstein. The paper discusses the security vulnerabilities of Large Language Models (LLMs) and explores defense strategies against adversarial attacks. Three types…
-
ICCV 2023 – Verbs in Action: Improving verb understanding in video-language models
In this episode we discuss Verbs in Action: Improving verb understanding in video-language models by Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid. The paper proposes a Verb-Focused Contrastive (VFC) framework to address the limited understanding of verbs in video-language models. The framework utilizes pre-trained large language models (LLMs) to generate hard negative…
-
arxiv Preprint – RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
In this episode we discuss RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback by Harrison Lee, Samrat Phatale, Hassan Mansoor, Kellie Lu, Thomas Mesnard, Colton Bishop, Victor Carbune, Abhinav Rastogi. The paper introduces a new technique called RL from AI Feedback (RLAIF) as a solution to the scalability limitations of reinforcement learning from…
-
arxiv Preprint – LLM-Rec: Personalized Recommendation via Prompting Large Language Models
In this episode we discuss LLM-Rec: Personalized Recommendation via Prompting Large Language Models by Hanjia Lyu, Song Jiang, Hanqing Zeng, Qifan Wang, Si Zhang, Ren Chen, Chris Leung, Jiajie Tang, Yinglong Xia, Jiebo Luo. The paper examines different prompting strategies for improving personalized recommendation performance using large language models (LLMs) through input augmentation. The proposed…
-
ICCV 2023 – Robust Monocular Depth Estimation under Challenging Conditions
In this episode we discuss Robust Monocular Depth Estimation under Challenging Conditions by Stefano Gasperini, Nils Morbitzer, HyunJun Jung, Nassir Navab, Federico Tombari. The paper addresses the limitations of existing monocular depth estimation methods in challenging lighting and weather conditions. The authors propose md4all, a simple and reliable solution that can handle diverse conditions without…
-
arxiv Preprint – LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
In this episode we discuss LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models by Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. The paper introduces LM-Infinite as a solution to the challenge of length generalization in Large Language Models (LLMs). Existing methods for handling longer sequences are resource-intensive and time-consuming,…
-
arxiv Preprint – Llama 2: Open Foundation and Fine-Tuned Chat Models
In this episode we discuss Llama 2: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller,…
-
arxiv Preprint – Nougat: Neural Optical Understanding for Academic Documents
In this episode we discuss Nougat: Neural Optical Understanding for Academic Documents by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. The paper introduces Nougat, a neural optical understanding model for academic documents. Nougat utilizes a Visual Transformer model and Optical Character Recognition (OCR) to convert scientific documents into a markup language, bridging the gap…
-
arxiv Preprint – Graph of Thoughts: Solving Elaborate Problems with Large Language Models
In this episode we discuss Graph of Thoughts: Solving Elaborate Problems with Large Language Models by Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler. The paper introduces a framework called Graph of Thoughts (GoT) that enhances the prompting capabilities of large…
-
arxiv Preprint – Large Language Models as Zero-Shot Conversational Recommenders
In this episode we discuss Large Language Models as Zero-Shot Conversational Recommenders by Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, Julian McAuley. This paper presents empirical studies on conversational recommendation tasks using large language models (LLMs) in a zero-shot setting, without fine-tuning. The authors introduce…
-
arxiv Preprint – A Survey on Large Language Model based Autonomous Agents
In this episode we discuss A Survey on Large Language Model based Autonomous Agents by Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen. The authors of this paper conducted a comprehensive survey on the topic of…
-
arxiv Preprint – A Survey on Large Language Model based Autonomous Agents
In this episode we discuss A Survey on Large Language Model based Autonomous Agents by Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen. The authors of this paper conducted a comprehensive survey on the topic of…
-
arxiv Preprint – EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
In this episode we discuss EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding by Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik. The paper presents EgoSchema, a benchmark dataset and evaluation metric for assessing the long-form video language understanding capabilities of vision and language systems. The dataset consists of over 5000 multiple choice question-answer pairs…
-
ICCV 2023 – UnLoc: A Unified Framework for Video Localization Tasks
In this episode we discuss UnLoc: A Unified Framework for Video Localization Tasks by Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid. The paper introduces UnLoc, a unified framework for video localization using large-scale image-text pretrained models. UnLoc eliminates the need for action proposals, motion-based features, and…
-
ICCV 2023 – UnLoc: A Unified Framework for Video Localization Tasks
In this episode we discuss UnLoc: A Unified Framework for Video Localization Tasks by Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid. The paper introduces UnLoc, a unified framework for video localization using large-scale image-text pretrained models. UnLoc eliminates the need for action proposals, motion-based features, and…
-
arxiv Preprint – A Survey on Large Language Model based Autonomous Agents
In this episode we discuss A Survey on Large Language Model based Autonomous Agents by Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen. The authors of this paper conducted a comprehensive survey on the topic of…
-
arxiv Preprint – Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
In this episode we discuss Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies by Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang. The paper provides a comprehensive review of self-correction strategies for large language models (LLMs). It examines recent work on self-correction techniques, categorizing them into…
-
ICLR 2023 – Rethinking the Expressive Power of GNNs via Graph Biconnectivity
In this episode we discuss Rethinking the Expressive Power of GNNs via Graph Biconnectivity by Bohang Zhang, Shengjie Luo, Liwei Wang, Di He. This paper introduces a new approach called Generalized Distance Weisfeiler-Lehman (GD-WL) to study the expressive power of Graph Neural Networks (GNNs). The authors show that most existing GNN architectures are not expressive…
-
ICLR 2023 – Conditional Antibody Design as 3D Equivariant Graph Translation
In this episode we discuss Conditional Antibody Design as 3D Equivariant Graph Translation by Xiangzhe Kong, Wenbing Huang, Yang Liu. The paper introduces a method called Multi-channel Equivariant Attention Network (MEAN) for antibody design. MEAN addresses challenges faced by existing deep-learning-based methods by formulating antibody design as a conditional graph translation problem and incorporating additional…
-
arxiv Preprint – ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
In this episode we discuss ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation by Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia. The paper discusses ReCLIP, a source-free domain adaptation method for large-scale pre-training vision-language models…
-
arxiv Preprint – LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
In this episode we discuss LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition by Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, Min Lin. The paper presents LoraHub, a framework for combining Low-rank adaptations (LoRA) to improve cross-task generalization in fine-tuning large language models (LLMs). LoraHub allows the assembly of LoRA modules…
-
ICLR 2023 – Emergence of Maps in the Memories of Blind Navigation Agents
In this episode we discuss Emergence of Maps in the Memories of Blind Navigation Agents by Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra. The paper explores whether blind artificial intelligence agents can develop implicit maps of their environment. The study involves training these agents in navigation tasks and finding…