Category: Uncategorized
-
arxiv Preprint – Enable Language Models to Implicitly Learn Self-Improvement From Data
In this episode we discuss Enable Language Models to Implicitly Learn Self-Improvement From Data by Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji. The paper introduces a framework called ImPlicit Self-ImprovemenT (PIT) that allows large language models (LLMs) to learn self-improvement from data. PIT learns the improvement goal from…
-
arxiv Preprint – Efficient Streaming Language Models with Attention Sinks
In this episode we discuss Efficient Streaming Language Models with Attention Sinks by Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis. The paper proposes StreamingLLM, a framework that allows Large Language Models (LLMs) to generalize to infinite sequence length without fine-tuning. By observing the phenomenon of attention sink, where initial tokens have a…
-
Neurips 2023 – PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving
In this episode we discuss PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving by Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, Yasutaka Furukawa. The paper introduces PuzzleFusion, a neural architecture based on Diffusion Models for spatial puzzle solving. It focuses on jigsaw puzzle solving and room arrangement tasks, using new datasets including…
-
arxiv Preprint – Vision Transformers Need Registers
In this episode we discuss Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski. The paper discusses a solution to artifacts found in the feature maps of Vision Transformers (ViT) in low-informative background areas of images. By adding additional tokens called “registers” to the input sequence, the feature maps and attention…
-
arxiv Preprint – VPA: Fully Test-Time Visual Prompt Adaptation
In this episode we discuss VPA: Fully Test-Time Visual Prompt Adaptation by Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton Ferrer, Caner Hazirbas. The paper presents Visual Prompt Adaptation (VPA), a framework that extends prompt tuning to visual recognition tasks. VPA allows for test-time adaptation without source-domain information and improves…
-
arxiv Preprint – Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
In this episode we discuss Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition by Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko.…
-
arxiv Preprint – DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
In this episode we discuss DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models by Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Leon Song, Samyam Rajbhandari, Yuxiong He. DeepSpeed-Ulysses is a methodology for efficient and scalable training of large language models with long sequence lengths. It addresses the limitations…
-
arxiv Preprint – VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
In this episode we discuss VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning by Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal. The paper presents VIDEODIRECTORGPT, a framework for generating multi-scene videos with consistency using large language models. It consists of a video planner LLM (GPT-4) that expands a text prompt into a “video plan”…
-
arxiv Preprint – PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
In this episode we discuss PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training by Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li. The paper presents a training method called PoSE for adapting large language models to longer context windows. It addresses the challenge of extending the…
-
arxiv Preprint – Summarization is (Almost) Dead
In this episode we discuss Summarization is (Almost) Dead by Xiao Pu, Mingqi Gao, Xiaojun Wan. The paper investigates the capabilities of large language models (LLMs) in summary generation. Through new datasets and human evaluation experiments, the authors find that LLM-generated summaries are preferred by evaluators compared to human-written summaries and fine-tuned model summaries. LLM-generated…
-
arxiv Preprint – LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
In this episode we discuss LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent by Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai. The paper introduces a new approach called LLM-Grounder for grounding 3D visual scenes using natural language queries. It utilizes a Large Language…
-
Neurips 2023 spotlight – Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
In this episode we discuss Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems by Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng. The paper introduces a framework called Feature Multiplexing, which allows for the use of a single representation space across multiple categorical features in web-scale…
-
Neurips 2023 spotlight – Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
In this episode we discuss Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems by Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng. The paper introduces a framework called Feature Multiplexing, which allows for the use of a single representation space across multiple categorical features in web-scale…
-
arxiv Preprint – Chain-of-Verification Reduces Hallucination in Large Language Models
In this episode we discuss Chain-of-Verification Reduces Hallucination in Large Language Models by Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston. The paper proposes the Chain-of-Verification (COVE) method to address the issue of factual hallucination in large language models. COVE involves generating an initial response, planning independent fact-checking questions,…
-
arxiv Preprint – Language Modeling Is Compression
In this episode we discuss Language Modeling Is Compression by Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness. The authors argue that large language models can be seen as powerful compressors due to their predictive capabilities. They demonstrate…
-
arxiv Preprint – From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
In this episode we discuss From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting by Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad. The paper introduces a method called “Chain of Density” (CoD) for generating summaries with varying levels of information density. Using GPT-4, the authors generate entity-sparse summaries and then…
-
ICCV 2023 – Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
In this episode we discuss Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing by Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara. This paper introduces a framework for generating fashion images using multimodal prompts such as text, body poses, and garment sketches. The proposed architecture utilizes latent diffusion…
-
arxiv Preprint – GPT Can Solve Mathematical Problems Without a Calculator
In this episode we discuss GPT Can Solve Mathematical Problems Without a Calculator by Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang. The paper challenges the belief that large language models cannot perform arithmetic operations accurately without calculator tools. The researchers present MathGLM, a 2 billion-parameter language…
-
ICCV 2023 – Adding Conditional Control to Text-to-Image Diffusion Models
In this episode we discuss Adding Conditional Control to Text-to-Image Diffusion Models by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. The paper introduces ControlNet, a neural network architecture that enhances spatial control in text-to-image diffusion models. It incorporates additional images, such as edge maps and human pose skeletons, as conditioning factors to specify desired image composition.…
-
arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional
In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in…
-
arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional
In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in…
-
arxiv Preprint – eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
In this episode we discuss eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models by Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal. The paper proposes eDKM, a memory-efficient implementation of weight clustering for large language models (LLMs). LLMs have high…
-
arxiv Preprint – Link-Context Learning for Multimodal LLMs
In this episode we discuss Link-Context Learning for Multimodal LLMs by Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu. The paper presents a method called link-context learning (LCL) that enhances the learning abilities of Multimodal Large Language Models (MLLMs). LCL aims to enable MLLMs to recognize new images and understand unfamiliar…
-
arxiv Preprint – ProPainter: Improving Propagation and Transformer for Video Inpainting
In this episode we discuss ProPainter: Improving Propagation and Transformer for Video Inpainting by Shangchen Zhou, Chongyi Li, Kelvin C. K. Chan, Chen Change Loy. The paper discusses the limitations of existing approaches to video inpainting, specifically flow-based propagation and spatiotemporal Transformer methods, due to spatial misalignment and limited temporal range. To address these challenges,…
-
arxiv Preprint – Large Language Models as Optimizers
In this episode we discuss Large Language Models as Optimizers by Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen. The paper introduces Optimization by PROmpting (OPRO), a method that uses large language models as optimizers in the absence of gradients. OPRO utilizes natural language descriptions of the optimization…