Category: Uncategorized
-
arxiv preprint – Gecko: Versatile Text Embeddings Distilled from Large Language Models
In this episode, we discuss Gecko: Versatile Text Embeddings Distilled from Large Language Models by Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy…
-
arxiv preprint – ReALM: Reference Resolution As Language Modeling
In this episode, we discuss ReALM: Reference Resolution As Language Modeling by Joel Ruben Antony Moniz, Soundarya Krishnan, Melis Ozyildirim, Prathamesh Saraf, Halim Cagri Ates, Yuan Zhang, Hong Yu, Nidhi Rajshree. This paper presents a method for using Large Language Models (LLMs) to resolve references, including complex ones such as entities on a user’s screen…
-
arxiv preprint – sDPO: Don’t Use Your Data All at Once
In this episode, we discuss sDPO: Don’t Use Your Data All at Once by Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park. The paper introduces stepwise DPO (sDPO), a novel technique for better aligning large language models (LLM) with human preferences by utilizing preference datasets in stages rather than…
-
arxiv preprint – LITA: Language Instructed Temporal-Localization Assistant
In this episode, we discuss LITA: Language Instructed Temporal-Localization Assistant by De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz. The paper introduces the Language Instructed Temporal-Localization Assistant (LITA), which tackles the issue of temporal localization in Large Language Models (LLMs) processing video content, where they struggle to identify “when”…
-
arxiv preprint – AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
In this episode, we discuss AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks by Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen. This paper presents AnyV2V, a framework that simplifies video-to-video editing by breaking it down into two main steps. It leverages existing image editing models to edit individual frames and then…
-
arxiv preprint – InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
In this episode, we discuss InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding by Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang. InternVideo2 is a cutting-edge…
-
arxiv preprint – Giraffe: Adventures in Expanding Context Lengths in LLMs
In this episode, we discuss Giraffe: Adventures in Expanding Context Lengths in LLMs by Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind Sundararajan, Siddartha Naidu. The paper reviews techniques for overcoming the fixed context length limitation in large language models like LLaMA or LLaMA 2 by modifying positional encodings and introduces a new truncation…
-
arxiv preprint – Explorative Inbetweening of Time and Space
In this episode, we discuss Explorative Inbetweening of Time and Space by Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang. The paper presents a method for generating video sequences from just a starting and ending frame, called bounded generation, by utilizing a new sampling strategy named Time Reversal…
-
arxiv preprint – Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
In this episode, we discuss Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking by Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman. The paper presents Quiet-STaR, an advancement of Self-Taught Reasoner (STaR), which teaches Language Models to generate internal rationales to enhance text predictions. By introducing a tokenwise…
-
arxiv preprint – Evaluating Large Language Models at Evaluating Instruction Following
In this episode, we discuss Evaluating Large Language Models at Evaluating Instruction Following by Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen. This paper examines the effectiveness of using large language models (LLMs) to evaluate the performance of other models in following instructions, and introduces a new meta-evaluation benchmark called LLM-BAR.…
-
arxiv preprint – LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
In this episode, we discuss LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu. The paper presents SelfExtend, a novel method for extending the context window of Large Language Models (LLMs) to better handle long input sequences without…
-
arxiv preprint – Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
In this episode, we discuss Branch-Solve-Merge Improves Large Language Model Evaluation and Generation by Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li. The paper introduces a program called BRANCH-SOLVE-MERGE (BSM) designed to enhance the performance of Large Language Models (LLMs) on complex natural language tasks. BSM uses a three-module approach that…
-
arxiv preprint – SciMON: Scientific Inspiration Machines Optimized for Novelty
In this episode, we discuss SciMON: Scientific Inspiration Machines Optimized for Novelty by Qingyun Wang, Doug Downey, Heng Ji, Tom Hope. The paper presents SCIMON, a new framework designed to push neural language models towards generating innovative scientific ideas that are informed by existing literature, going beyond simple binary link prediction. SCIMON generates natural language…
-
arxiv preprint – Speculative Streaming: Fast LLM Inference without Auxiliary Models
In this episode, we discuss Speculative Streaming: Fast LLM Inference without Auxiliary Models by Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi. The paper introduces Speculative Streaming, a method designed to quickly infer outputs from large language models without needing auxiliary models, unlike the current speculative decoding technique. This new approach…
-
arxiv preprint – LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
In this episode, we discuss LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models by Yanwei Li, Chengyao Wang, Jiaya Jia. The paper introduces a new approach named LLaMA-VID for improving the processing of lengthy videos in Vision Language Models (VLMs) by using a dual token system: a context token and a content…
-
arxiv preprint – UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities
In this episode, we discuss UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities by Hejia Geng, Boxun Xu, Peng Li. The paper introduces the UPAR framework for Large Language Models (LLMs) to enhance their inferential abilities by structuring their processes similar to human cognition. UPAR includes four stages: Understand, Plan, Act, and…
-
arxiv preprint – Guiding Instruction-based Image Editing via Multimodal Large Language Models
In this episode, we discuss Guiding Instruction-based Image Editing via Multimodal Large Language Models by Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan. The paper introduces MLLM-Guided Image Editing (MGIE), a system that uses multimodal large language models (MLLMs) to enhance the quality of instruction-based image editing. MGIE generates more…
-
arxiv preprint – Spectral State Space Models
In this episode, we discuss Spectral State Space Models by Naman Agarwal, Daniel Suo, Xinyi Chen, Elad Hazan. The paper introduces a new type of state space model (SSM) for sequence prediction that utilizes spectral filtering to handle long-range dependencies in data. These spectral state space models (SSMs) are shown to be robust, as their…
-
arxiv preprint – More Agents Is All You Need
In this episode, we discuss More Agents Is All You Need by Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye. The study demonstrates that the effectiveness of large language models (LLMs) improves when more instances of the model (agents) are used in a simple sampling-and-voting technique. This technique can be combined with other…
-
arxiv preprint – World Model on Million-Length Video And Language With RingAttention
In this episode, we discuss World Model on Million-Length Video And Language With RingAttention by Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel. The paper discusses the creation of large-scale transformers trained on extended video and language sequences, introducing methods such as RingAttention to manage the training of models with context sizes up to 1M…
-
arxiv preprint – Learning Video Representations from Large Language Models
In this episode, we discuss Learning Video Representations from Large Language Models by Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar. The LAVILA method introduces a novel technique to enhance video-language representations by utilizing pre-trained Large Language Models (LLMs) to generate automatic video narrations. By using these auto-generated narrations, LAVILA achieves more detailed coverage, better…
-
arxiv preprint – Can Large Language Models Understand Context?
In this episode, we discuss Can Large Language Models Understand Context? by Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng. The paper introduces a novel benchmark consisting of four tasks and nine datasets aimed at rigorously evaluating Large Language Models’ (LLMs) ability to…
-
arxiv preprint – Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
In this episode, we discuss Long Story Short: a Summarize-then-Search Method for Long Video Question Answering by Jiwan Chung, Youngjae Yu. The paper presents “Long Story Short,” a new framework for video question-answering (QA) tasks that involves summarizing long multimodal narratives (like movies or dramas) into brief plots. This summary is then used to find…
-
arxiv preprint – System 2 Attention (is something you might need too)
In this episode, we discuss System 2 Attention (is something you might need too) by Jason Weston, Sainbayar Sukhbaatar. The paper introduces System 2 Attention (S2A), an approach that improves Transformer-based Large Language Models by regenerating input contexts to focus on relevant information before processing, thereby enhancing the generation of the next token. S2A was…
-
arxiv preprint – DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
In this episode, we discuss DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models by Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo. The paper presents DeepSeekMath 7B, an advanced language model trained on 120 billion math-related tokens to improve mathematical reasoning.…