Category: Uncategorized
-
arxiv preprint – Detection and Measurement of Syntactic Templates in Generated Text
In this episode, we discuss Detection and Measurement of Syntactic Templates in Generated Text by Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace. The paper investigates syntactic features in text generated by large language models (LLMs), revealing higher rates of templated text in these models compared to human-generated text. It finds that a…
-
arxiv preprint – From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
In this episode, we discuss From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data by Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee, Dimitris Papailiopoulos. This paper addresses the challenge Large Language Models (LLMs) face with long-context information retrieval and reasoning. The authors propose finetuning LLMs using a synthetic dataset…
-
arxiv preprint – MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
In this episode, we discuss MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning by Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang. The study presents MG-LLaVA, a multi-modal large language model designed to process both low-resolution and high-resolution images along with object-centric features for improved perception tasks. It includes a high-resolution…
-
arxiv preprint – 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
In this episode, we discuss 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities by Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir. The paper presents a novel any-to-any model that significantly extends the capabilities of existing multimodal and multitask foundation models…
-
arxiv preprint – VideoLLM-online: Online Video Large Language Model for Streaming Video
In this episode, we discuss VideoLLM-online: Online Video Large Language Model for Streaming Video by Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou. The paper discusses the development of the Learning-In-Video-Stream (LIVE) framework, which improves large multimodal models’ ability to handle…
-
arxiv preprint – EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
In this episode, we discuss EvTexture: Event-driven Texture Enhancement for Video Super-Resolution by Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun. The paper introduces EvTexture, the first video super-resolution (VSR) method using event signals specifically for enhancing texture details. The proposed method employs a new texture enhancement branch and an iterative module to progressively refine…
-
arxiv preprint – MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
In this episode, we discuss MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model by Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng. MOFA-Video is a novel image animation technique that produces videos from a single image using various control signals like human landmarks, manual trajectories,…
-
arxiv preprint – An Image is Worth More Than 16×16 Patches: Exploring Transformers on Individual Pixels
In this episode, we discuss An Image is Worth More Than 16×16 Patches: Exploring Transformers on Individual Pixels by Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen. This paper questions the necessity of locality inductive bias in modern computer vision architectures by showing that vanilla Transformers can treat…
-
arxiv preprint – Graphic Design with Large Multimodal Model
In this episode, we discuss Graphic Design with Large Multimodal Model by Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao. The paper introduces Hierarchical Layout Generation (HLG) for graphic design, which creates compositions from unordered sets of design elements, addressing limitations of the existing Graphic Layout Generation (GLG). The…
-
arxiv preprint – LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
In this episode, we discuss LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning by Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quenum, Yutong Bai, Baifeng Shi, Trevor Darrell, Roei Herzig. The paper introduces LLARVA, a model improved with a novel instruction-tuning method to unify various robotic tasks using structured prompts. The model utilizes 2-D visual traces…
-
arxiv preprint – Transformers need glasses! Information over-squashing in language tasks
In this episode, we discuss Transformers need glasses! Information over-squashing in language tasks by Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković. The paper explores how information propagates in decoder-only Transformers, revealing a phenomenon where different input sequences can result in nearly identical final token…
-
arxiv preprint – Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback
In this episode, we discuss Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback by Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang. The paper introduces Demonstration ITerated Task Optimization (DITTO), a method for customizing language model outputs using fewer than ten demonstrations as feedback. DITTO, based on online imitation learning,…
-
arxiv preprint – TextGrad: Automatic ”Differentiation” via Text
In this episode, we discuss TextGrad: Automatic “Differentiation” via Text by Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, James Zou. The paper introduces TEXTGRAD, a novel framework that automates the optimization of compound AI systems by utilizing textual feedback from large language models (LLMs). TEXTGRAD treats text feedback as a…
-
arxiv preprint – SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
In this episode, we discuss SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales by Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao. The paper introduces SaySelf, a framework for training large language models (LLMs) to produce accurate, fine-grained confidence estimates and self-reflective rationales explaining their uncertainties. This is…
-
arxiv preprint – Open-Endedness is Essential for Artificial Superhuman Intelligence
In this episode, we discuss Open-Endedness is Essential for Artificial Superhuman Intelligence by Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel. The paper argues that the development of open-ended, self-improving AI systems is achievable using current foundation models trained on extensive internet data. It provides a formal…
-
arxiv preprint – To Believe or Not to Believe Your LLM
In this episode, we discuss To Believe or Not to Believe Your LLM by Yasin Abbasi Yadkori, Ilja Kuzborskij, András György, Csaba Szepesvári. The study investigates uncertainty quantification in large language models (LLMs), focusing on distinguishing large epistemic uncertainty to identify unreliable outputs and potential hallucinations. By employing an information-theoretic metric and a method of…
-
arxiv preprint – Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
In this episode, we discuss Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts by Chunjing Gan, Dan Yang, Binbin Hu, Hanxiao Zhang, Siyuan Li, Ziqi Liu, Yue Shen, Lin Ju, Zhiqiang Zhang, Jinjie Gu, Lei Liang, Jun Zhou. The paper introduces METRAG, a novel Multi-layered Thought enhanced Retrieval-Augmented Generation…
-
arxiv preprint – Contextual Position Encoding: Learning to Count What’s Important
In this episode, we discuss Contextual Position Encoding: Learning to Count What’s Important by Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar. The paper introduces Contextual Position Encoding (CoPE), a new position encoding method for Large Language Models (LLMs) that incrementally alters position based on context rather than just token count. This approach enables more…
-
arxiv preprint – Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
In this episode, we discuss Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis by Chaoyou Fu, Yuhan Dai, Yondong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong…
-
arxiv preprint – VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
In this episode, we discuss VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos by Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal. The paper introduces VideoTree, a novel framework that enhances the efficiency and accuracy of long-video question answering by selectively extracting and hierarchically organizing frames…
-
arxiv preprint – CinePile: A Long Video Question Answering Dataset and Benchmark
In this episode, we discuss CinePile: A Long Video Question Answering Dataset and Benchmark by Ruchit Rawal, Khalid Saifullah, Ronen Basri, David Jacobs, Gowthami Somepalli, Tom Goldstein. CinePile is a new dataset and benchmark designed for authentic long-form video understanding, addressing the limitations of current datasets. It comprises 305,000 multiple-choice questions (MCQs) spanning various visual…
-
arxiv preprint – Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
In this episode, we discuss Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum by Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel Tuzel. The paper introduces a novel variable sequence length training technique called dataset decomposition to address inefficiencies in training large language models (LLMs)…
-
arxiv preprint – SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
In this episode, we discuss SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering by John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press. The paper introduces SWE-agent, an autonomous system leveraging a language model to tackle software engineering tasks through a specialized agent-computer interface (ACI). SWE-agent significantly improves task completion…
-
arxiv preprint – Octo: An Open-Source Generalist Robot Policy
In this episode, we discuss Octo: An Open-Source Generalist Robot Policy by Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine. The paper introduces Octo, a…
-
arxiv preprint – Layer-Condensed KV Cache for Efficient Inference of Large Language Models
In this episode, we discuss Layer-Condensed KV Cache for Efficient Inference of Large Language Models by Haoyi Wu, Kewei Tu. The paper addresses the significant memory consumption issue in deploying large language models by proposing a novel method that computes and caches key-value pairs for only a small number of layers, thereby saving memory and…