Category: Uncategorized

arxiv preprint – Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

In this episode, we discuss Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention by Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal. The paper presents a novel method for enabling Transformer-based Large Language Models to process extremely long inputs while keeping memory and computational requirements fixed. The technique introduced, called Infini-attention, blends a new form…

April 12, 2024
arxiv preprint – Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

In this episode, we discuss Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs by Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan. The paper presents Ferret-UI, a new multimodal large language model tailored for interpreting and interacting with mobile user interface screens, which overcomes common challenges through…

April 11, 2024
arxiv preprint – Evaluating Text-to-Visual Generation with Image-to-Text Generation

In this episode, we discuss Evaluating Text-to-Visual Generation with Image-to-Text Generation by Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan. The paper introduces VQAScore, a novel metric for evaluating the alignment of generated images to text prompts, utilizing a visual-question-answering model to score the relevance of images…

April 10, 2024
arxiv preprint – Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

In this episode, we discuss Future Lens: Anticipating Subsequent Tokens from a Single Hidden State by Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau. The paper investigates if single hidden state vectors from an input token in a model like GPT-J-6B can predict multiple future tokens in a sequence. Using linear approximation…

April 9, 2024
arxiv preprint – Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

In this episode, we discuss Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity by Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park. The paper introduces an adaptive QA model that optimizes the balance between efficiency and accuracy by choosing the appropriate response strategy for questions of varying complexity.…

April 8, 2024
arxiv preprint – Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

In this episode, we discuss Mixture-of-Depths: Dynamically allocating compute in transformer-based language models by David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro. The study presents a method for transformers that allows for the dynamic allocation of computational resources within sequences by limiting the number of tokens processed at each layer…

April 5, 2024
arxiv preprint – WavLLM: Towards Robust and Adaptive Speech Large Language Model

In this episode, we discuss WavLLM: Towards Robust and Adaptive Speech Large Language Model by Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei. The paper introduces WavLLM, a robust speech large language model with a unique dual-encoder system—one for semantic content…

April 4, 2024
arxiv preprint – Gecko: Versatile Text Embeddings Distilled from Large Language Models

In this episode, we discuss Gecko: Versatile Text Embeddings Distilled from Large Language Models by Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy…

April 3, 2024
arxiv preprint – ReALM: Reference Resolution As Language Modeling

In this episode, we discuss ReALM: Reference Resolution As Language Modeling by Joel Ruben Antony Moniz, Soundarya Krishnan, Melis Ozyildirim, Prathamesh Saraf, Halim Cagri Ates, Yuan Zhang, Hong Yu, Nidhi Rajshree. This paper presents a method for using Large Language Models (LLMs) to resolve references, including complex ones such as entities on a user’s screen…

April 2, 2024
arxiv preprint – sDPO: Don’t Use Your Data All at Once

In this episode, we discuss sDPO: Don’t Use Your Data All at Once by Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park. The paper introduces stepwise DPO (sDPO), a novel technique for better aligning large language models (LLM) with human preferences by utilizing preference datasets in stages rather than…

April 1, 2024
arxiv preprint – LITA: Language Instructed Temporal-Localization Assistant

In this episode, we discuss LITA: Language Instructed Temporal-Localization Assistant by De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz. The paper introduces the Language Instructed Temporal-Localization Assistant (LITA), which tackles the issue of temporal localization in Large Language Models (LLMs) processing video content, where they struggle to identify “when”…

March 29, 2024
arxiv preprint – AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

In this episode, we discuss AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks by Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen. This paper presents AnyV2V, a framework that simplifies video-to-video editing by breaking it down into two main steps. It leverages existing image editing models to edit individual frames and then…

March 28, 2024
arxiv preprint – InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

In this episode, we discuss InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding by Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang. InternVideo2 is a cutting-edge…

March 27, 2024
arxiv preprint – Giraffe: Adventures in Expanding Context Lengths in LLMs

In this episode, we discuss Giraffe: Adventures in Expanding Context Lengths in LLMs by Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind Sundararajan, Siddartha Naidu. The paper reviews techniques for overcoming the fixed context length limitation in large language models like LLaMA or LLaMA 2 by modifying positional encodings and introduces a new truncation…

March 26, 2024
arxiv preprint – Explorative Inbetweening of Time and Space

In this episode, we discuss Explorative Inbetweening of Time and Space by Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang. The paper presents a method for generating video sequences from just a starting and ending frame, called bounded generation, by utilizing a new sampling strategy named Time Reversal…

March 25, 2024
arxiv preprint – Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

In this episode, we discuss Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking by Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman. The paper presents Quiet-STaR, an advancement of Self-Taught Reasoner (STaR), which teaches Language Models to generate internal rationales to enhance text predictions. By introducing a tokenwise…

March 22, 2024
arxiv preprint – Evaluating Large Language Models at Evaluating Instruction Following

In this episode, we discuss Evaluating Large Language Models at Evaluating Instruction Following by Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen. This paper examines the effectiveness of using large language models (LLMs) to evaluate the performance of other models in following instructions, and introduces a new meta-evaluation benchmark called LLM-BAR.…

March 21, 2024
arxiv preprint – LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

In this episode, we discuss LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu. The paper presents SelfExtend, a novel method for extending the context window of Large Language Models (LLMs) to better handle long input sequences without…

February 28, 2024
arxiv preprint – Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

In this episode, we discuss Branch-Solve-Merge Improves Large Language Model Evaluation and Generation by Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li. The paper introduces a program called BRANCH-SOLVE-MERGE (BSM) designed to enhance the performance of Large Language Models (LLMs) on complex natural language tasks. BSM uses a three-module approach that…

February 27, 2024
arxiv preprint – SciMON: Scientific Inspiration Machines Optimized for Novelty

In this episode, we discuss SciMON: Scientific Inspiration Machines Optimized for Novelty by Qingyun Wang, Doug Downey, Heng Ji, Tom Hope. The paper presents SCIMON, a new framework designed to push neural language models towards generating innovative scientific ideas that are informed by existing literature, going beyond simple binary link prediction. SCIMON generates natural language…

February 26, 2024
arxiv preprint – Speculative Streaming: Fast LLM Inference without Auxiliary Models

In this episode, we discuss Speculative Streaming: Fast LLM Inference without Auxiliary Models by Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi. The paper introduces Speculative Streaming, a method designed to quickly infer outputs from large language models without needing auxiliary models, unlike the current speculative decoding technique. This new approach…

February 23, 2024
arxiv preprint – LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

In this episode, we discuss LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models by Yanwei Li, Chengyao Wang, Jiaya Jia. The paper introduces a new approach named LLaMA-VID for improving the processing of lengthy videos in Vision Language Models (VLMs) by using a dual token system: a context token and a content…

February 22, 2024
arxiv preprint – UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities

In this episode, we discuss UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities by Hejia Geng, Boxun Xu, Peng Li. The paper introduces the UPAR framework for Large Language Models (LLMs) to enhance their inferential abilities by structuring their processes similar to human cognition. UPAR includes four stages: Understand, Plan, Act, and…

February 21, 2024
arxiv preprint – Guiding Instruction-based Image Editing via Multimodal Large Language Models

In this episode, we discuss Guiding Instruction-based Image Editing via Multimodal Large Language Models by Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan. The paper introduces MLLM-Guided Image Editing (MGIE), a system that uses multimodal large language models (MLLMs) to enhance the quality of instruction-based image editing. MGIE generates more…

February 20, 2024
arxiv preprint – Spectral State Space Models

In this episode, we discuss Spectral State Space Models by Naman Agarwal, Daniel Suo, Xinyi Chen, Elad Hazan. The paper introduces a new type of state space model (SSM) for sequence prediction that utilizes spectral filtering to handle long-range dependencies in data. These spectral state space models (SSMs) are shown to be robust, as their…

February 16, 2024