Category: Uncategorized
-
arxiv preprint – DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
In this episode, we discuss DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models by Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo. The paper presents DeepSeekMath 7B, an advanced language model trained on 120 billion math-related tokens to improve mathematical reasoning.…
-
arxiv preprint – KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
In this episode, we discuss KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization by Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami. The paper introduces KVQuant, a novel method for reducing memory usage in Large Language Models (LLMs) by efficiently quantizing key-value (KV)…
-
arxiv preprint – Language Model Inversion
In this episode, we discuss Language Model Inversion by John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush. The paper explores language model inversion, revealing that the probabilities given by language models for the next token can reveal significant details about the preceding text. The authors introduce a technique to reconstruct…
-
arxiv preprint – Tree Prompting: Efficient Task Adaptation without Fine-Tuning
In this episode, we discuss Tree Prompting: Efficient Task Adaptation without Fine-Tuning by John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng. Tree Prompting is a novel method for interacting with smaller language models (LMs) that creates a decision tree of prompts to guide the model’s responses. This technique significantly enhances accuracy…
-
arxiv preprint – Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
In this episode, we discuss Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens by Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi. The paper introduces an improved n-gram language model named “Infini-gram,” which scales to 1.4 trillion tokens and has the capacity to use n-grams of arbitrary length. The authors develop…
-
arxiv preprint – Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
In this episode, we discuss Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning by Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang. This paper introduces LRV-Instruction, a diverse dataset designed for visual instruction tuning with a focus on mitigating hallucination in large multi-modal models (LMMs). The dataset contains 400k…
-
arxiv preprint – RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
In this episode, we discuss RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture by Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer…
-
arxiv preprint – SliceGPT: Compress Large Language Models by Deleting Rows and Columns
In this episode, we discuss SliceGPT: Compress Large Language Models by Deleting Rows and Columns by Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman. The paper introduces SliceGPT, a new method for post-training sparsification of large language models that reduces their size and computational requirements by replacing weight matrices with…
-
arxiv preprint – Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
In this episode, we discuss Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video by Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis. The paper presents two innovations in self-supervised learning: a new dataset called “Walking Tours,” which features high-resolution, long duration, first-person videos ideal for…
-
arxiv preprint – MambaByte: Token-free Selective State Space Model
In this episode, we discuss MambaByte: Token-free Selective State Space Model by Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush. “MambaByte, a token-free language model, removes the bias associated with subword tokenization by learning from raw bytes. It capitalizes on the Mamba state space model’s adaptability to byte sequences, offering computational efficiency and…
-
arxiv preprint – Lumiere: A Space-Time Diffusion Model for Video Generation
In this episode, we discuss Lumiere: A Space-Time Diffusion Model for Video Generation by Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri. The paper presents Lumiere, a novel text-to-video diffusion model capable of generating realistic…
-
arxiv preprint – Self-Rewarding Language Models
In this episode, we discuss Self-Rewarding Language Models by Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Sainbayar Sukhbaatar, Jing Xu, Jason Weston. The paper introduces self-rewarding language models (SR-LMs) which generate their own rewards for self-improvement beyond human performance levels. Using a method called Iterative Direct Preference Optimization, SR-LMs can enhance their ability to follow…
-
arxiv preprint – Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
In this episode, we discuss Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao. “Depth Anything” is an approach to improve monocular depth estimation by exploiting a massive collection of about 62 million unlabeled images, aiming to extend dataset size and lessen…
-
arxiv preprint – MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding
In this episode, we discuss MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding by Hongjie Zhang, Yi Liu, Lu Dong, Yifei Huang, Zhen-Hua Ling, Yali Wang, Limin Wang, Yu Qiao. The newly introduced dataset MoVQA aims to enhance the evaluation of AI systems’ understanding of long-form video content, such as movies, addressing the…
-
arxiv preprint – Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
In this episode, we discuss Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model by Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang. The paper introduces a new vision backbone called Vim, which leverages bidirectional Mamba blocks for efficient and effective visual representation learning, sidestepping the need for self-attention…
-
arxiv preprint – Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
In this episode, we discuss Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models by Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva. The paper presents a novel framework named Patchscopes designed to improve understanding of the hidden representations in large language models (LLMs) by using the models themselves to articulate…
-
arxiv preprint – Time Travel in LLMs: Tracing Data Contamination in Large Language Models
In this episode, we discuss Time Travel in LLMs: Tracing Data Contamination in Large Language Models by Shahriar Golchin, Mihai Surdeanu. The paper presents a method to detect test data contamination in large language models by checking if the model’s output closely matches specific segments of reference data. This process involves guided instructions using dataset…
-
arxiv preprint – InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
In this episode, we discuss InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes by Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari. InseRF is a new approach for inserting generated objects into 3D scene reconstructions using NeRF, based on textual descriptions and 2D reference images. This method…
-
arxiv preprint – A Simple LLM Framework for Long-Range Video Question-Answering
In this episode, we discuss A Simple LLM Framework for Long-Range Video Question-Answering by Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius. The LLoVi framework innovates in long-range video question-answering (LVQA) by combining visual captioners with Large Language Models (LLMs) such as GPT-3.5 or GPT-4, foregoing complex long-range…
-
arxiv preprint – Mixtral of Experts
In this episode, we discuss Mixtral of Experts by Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak,…
-
arxiv preprint – Weight subcloning: direct initialization of transformers using larger pretrained ones
In this episode we discuss Weight subcloning: direct initialization of transformers using larger pretrained ones by Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari. The paper introduces a new method called weight subcloning to expedite the training of small transformer models by initializing them with weights from…
-
arxiv preprint – Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
In this episode we discuss Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task by Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka. The paper investigates how conditional diffusion models generalize compositionally by studying their ability to generate novel data combinations within a controlled synthetic environment. Key discoveries include that compositional…
-
arxiv preprint – LLM in a flash: Efficient Large Language Model Inference with Limited Memory
In this episode, we discuss LLM in a flash: Efficient Large Language Model Inference with Limited Memory by Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar. The paper introduces an approach to operate large language models (LLMs) efficiently on devices with limited DRAM by using…
-
arxiv preprint – The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
In this episode, we discuss The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction by Pratyusha Sharma, Jordan T. Ash, Dipendra Misra. The paper presents Layer-Selective Rank Reduction (LASER), an innovative method that enhances Transformer-based Large Language Models (LLMs) by reducing higher-order features in their weight matrices post-training, without adding…
-
arxiv preprint – DreaMoving: A Human Video Generation Framework based on Diffusion Models
In this episode we discuss DreaMoving: A Human Video Generation Framework based on Diffusion Models by Mengyang Feng, Jinlin Liu, Kai Yu, Yuan Yao, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie. DreaMoving is a framework that uses diffusion…