Category: Uncategorized

arxiv preprint – More Agents Is All You Need

In this episode, we discuss More Agents Is All You Need by Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye. The study demonstrates that the effectiveness of large language models (LLMs) improves when more instances of the model (agents) are used in a simple sampling-and-voting technique. This technique can be combined with other…

February 15, 2024
arxiv preprint – World Model on Million-Length Video And Language With RingAttention

In this episode, we discuss World Model on Million-Length Video And Language With RingAttention by Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel. The paper discusses the creation of large-scale transformers trained on extended video and language sequences, introducing methods such as RingAttention to manage the training of models with context sizes up to 1M…

February 14, 2024
arxiv preprint – Learning Video Representations from Large Language Models

In this episode, we discuss Learning Video Representations from Large Language Models by Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar. The LAVILA method introduces a novel technique to enhance video-language representations by utilizing pre-trained Large Language Models (LLMs) to generate automatic video narrations. By using these auto-generated narrations, LAVILA achieves more detailed coverage, better…

February 13, 2024
arxiv preprint – Can Large Language Models Understand Context?

In this episode, we discuss Can Large Language Models Understand Context? by Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng. The paper introduces a novel benchmark consisting of four tasks and nine datasets aimed at rigorously evaluating Large Language Models’ (LLMs) ability to…

February 12, 2024
arxiv preprint – Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

In this episode, we discuss Long Story Short: a Summarize-then-Search Method for Long Video Question Answering by Jiwan Chung, Youngjae Yu. The paper presents “Long Story Short,” a new framework for video question-answering (QA) tasks that involves summarizing long multimodal narratives (like movies or dramas) into brief plots. This summary is then used to find…

February 9, 2024
arxiv preprint – System 2 Attention (is something you might need too)

In this episode, we discuss System 2 Attention (is something you might need too) by Jason Weston, Sainbayar Sukhbaatar. The paper introduces System 2 Attention (S2A), an approach that improves Transformer-based Large Language Models by regenerating input contexts to focus on relevant information before processing, thereby enhancing the generation of the next token. S2A was…

February 8, 2024
arxiv preprint – DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

In this episode, we discuss DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models by Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo. The paper presents DeepSeekMath 7B, an advanced language model trained on 120 billion math-related tokens to improve mathematical reasoning.…

February 7, 2024
arxiv preprint – KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

In this episode, we discuss KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization by Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami. The paper introduces KVQuant, a novel method for reducing memory usage in Large Language Models (LLMs) by efficiently quantizing key-value (KV)…

February 6, 2024
arxiv preprint – Language Model Inversion

In this episode, we discuss Language Model Inversion by John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush. The paper explores language model inversion, revealing that the probabilities given by language models for the next token can reveal significant details about the preceding text. The authors introduce a technique to reconstruct…

February 5, 2024
arxiv preprint – Tree Prompting: Efficient Task Adaptation without Fine-Tuning

In this episode, we discuss Tree Prompting: Efficient Task Adaptation without Fine-Tuning by John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng. Tree Prompting is a novel method for interacting with smaller language models (LMs) that creates a decision tree of prompts to guide the model’s responses. This technique significantly enhances accuracy…

February 2, 2024
arxiv preprint – Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

In this episode, we discuss Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens by Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi. The paper introduces an improved n-gram language model named “Infini-gram,” which scales to 1.4 trillion tokens and has the capacity to use n-grams of arbitrary length. The authors develop…

February 1, 2024
arxiv preprint – Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

In this episode, we discuss Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning by Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang. This paper introduces LRV-Instruction, a diverse dataset designed for visual instruction tuning with a focus on mitigating hallucination in large multi-modal models (LMMs). The dataset contains 400k…

January 31, 2024
arxiv preprint – RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

In this episode, we discuss RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture by Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer…

January 30, 2024
arxiv preprint – SliceGPT: Compress Large Language Models by Deleting Rows and Columns

In this episode, we discuss SliceGPT: Compress Large Language Models by Deleting Rows and Columns by Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman. The paper introduces SliceGPT, a new method for post-training sparsification of large language models that reduces their size and computational requirements by replacing weight matrices with…

January 29, 2024
arxiv preprint – Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

In this episode, we discuss Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video by Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis. The paper presents two innovations in self-supervised learning: a new dataset called “Walking Tours,” which features high-resolution, long duration, first-person videos ideal for…

January 26, 2024
arxiv preprint – MambaByte: Token-free Selective State Space Model

In this episode, we discuss MambaByte: Token-free Selective State Space Model by Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush. “MambaByte, a token-free language model, removes the bias associated with subword tokenization by learning from raw bytes. It capitalizes on the Mamba state space model’s adaptability to byte sequences, offering computational efficiency and…

January 25, 2024
arxiv preprint – Lumiere: A Space-Time Diffusion Model for Video Generation

In this episode, we discuss Lumiere: A Space-Time Diffusion Model for Video Generation by Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri. The paper presents Lumiere, a novel text-to-video diffusion model capable of generating realistic…

January 24, 2024
arxiv preprint – Self-Rewarding Language Models

In this episode, we discuss Self-Rewarding Language Models by Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Sainbayar Sukhbaatar, Jing Xu, Jason Weston. The paper introduces self-rewarding language models (SR-LMs) which generate their own rewards for self-improvement beyond human performance levels. Using a method called Iterative Direct Preference Optimization, SR-LMs can enhance their ability to follow…

January 23, 2024
arxiv preprint – Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

In this episode, we discuss Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao. “Depth Anything” is an approach to improve monocular depth estimation by exploiting a massive collection of about 62 million unlabeled images, aiming to extend dataset size and lessen…

January 22, 2024
arxiv preprint – MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

In this episode, we discuss MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding by Hongjie Zhang, Yi Liu, Lu Dong, Yifei Huang, Zhen-Hua Ling, Yali Wang, Limin Wang, Yu Qiao. The newly introduced dataset MoVQA aims to enhance the evaluation of AI systems’ understanding of long-form video content, such as movies, addressing the…

January 19, 2024
arxiv preprint – Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

In this episode, we discuss Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model by Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang. The paper introduces a new vision backbone called Vim, which leverages bidirectional Mamba blocks for efficient and effective visual representation learning, sidestepping the need for self-attention…

January 18, 2024
arxiv preprint – Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

In this episode, we discuss Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models by Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva. The paper presents a novel framework named Patchscopes designed to improve understanding of the hidden representations in large language models (LLMs) by using the models themselves to articulate…

January 17, 2024
arxiv preprint – Time Travel in LLMs: Tracing Data Contamination in Large Language Models

In this episode, we discuss Time Travel in LLMs: Tracing Data Contamination in Large Language Models by Shahriar Golchin, Mihai Surdeanu. The paper presents a method to detect test data contamination in large language models by checking if the model’s output closely matches specific segments of reference data. This process involves guided instructions using dataset…

January 16, 2024
arxiv preprint – InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

In this episode, we discuss InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes by Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari. InseRF is a new approach for inserting generated objects into 3D scene reconstructions using NeRF, based on textual descriptions and 2D reference images. This method…

January 12, 2024
arxiv preprint – A Simple LLM Framework for Long-Range Video Question-Answering

In this episode, we discuss A Simple LLM Framework for Long-Range Video Question-Answering by Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius. The LLoVi framework innovates in long-range video question-answering (LVQA) by combining visual captioners with Large Language Models (LLMs) such as GPT-3.5 or GPT-4, foregoing complex long-range…

January 10, 2024