Category: Uncategorized

arxiv Preprint – Enable Language Models to Implicitly Learn Self-Improvement From Data

In this episode we discuss Enable Language Models to Implicitly Learn Self-Improvement From Data by Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji. The paper introduces a framework called ImPlicit Self-ImprovemenT (PIT) that allows large language models (LLMs) to learn self-improvement from data. PIT learns the improvement goal from…

October 4, 2023
arxiv Preprint – Efficient Streaming Language Models with Attention Sinks

In this episode we discuss Efficient Streaming Language Models with Attention Sinks by Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis. The paper proposes StreamingLLM, a framework that allows Large Language Models (LLMs) to generalize to infinite sequence length without fine-tuning. By observing the phenomenon of attention sink, where initial tokens have a…

October 3, 2023
Neurips 2023 – PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

In this episode we discuss PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving by Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, Yasutaka Furukawa. The paper introduces PuzzleFusion, a neural architecture based on Diffusion Models for spatial puzzle solving. It focuses on jigsaw puzzle solving and room arrangement tasks, using new datasets including…

October 2, 2023
arxiv Preprint – Vision Transformers Need Registers

In this episode we discuss Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski. The paper discusses a solution to artifacts found in the feature maps of Vision Transformers (ViT) in low-informative background areas of images. By adding additional tokens called “registers” to the input sequence, the feature maps and attention…

October 1, 2023
arxiv Preprint – VPA: Fully Test-Time Visual Prompt Adaptation

In this episode we discuss VPA: Fully Test-Time Visual Prompt Adaptation by Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton Ferrer, Caner Hazirbas. The paper presents Visual Prompt Adaptation (VPA), a framework that extends prompt tuning to visual recognition tasks. VPA allows for test-time adaptation without source-domain information and improves…

September 30, 2023
arxiv Preprint – Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

In this episode we discuss Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition by Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko.…

September 29, 2023
arxiv Preprint – DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

In this episode we discuss DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models by Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Leon Song, Samyam Rajbhandari, Yuxiong He. DeepSpeed-Ulysses is a methodology for efficient and scalable training of large language models with long sequence lengths. It addresses the limitations…

September 28, 2023
arxiv Preprint – VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

In this episode we discuss VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning by Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal. The paper presents VIDEODIRECTORGPT, a framework for generating multi-scene videos with consistency using large language models. It consists of a video planner LLM (GPT-4) that expands a text prompt into a “video plan”…

September 27, 2023
arxiv Preprint – PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

In this episode we discuss PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training by Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li. The paper presents a training method called PoSE for adapting large language models to longer context windows. It addresses the challenge of extending the…

September 26, 2023
arxiv Preprint – Summarization is (Almost) Dead

In this episode we discuss Summarization is (Almost) Dead by Xiao Pu, Mingqi Gao, Xiaojun Wan. The paper investigates the capabilities of large language models (LLMs) in summary generation. Through new datasets and human evaluation experiments, the authors find that LLM-generated summaries are preferred by evaluators compared to human-written summaries and fine-tuned model summaries. LLM-generated…

September 25, 2023
arxiv Preprint – LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

In this episode we discuss LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent by Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai. The paper introduces a new approach called LLM-Grounder for grounding 3D visual scenes using natural language queries. It utilizes a Large Language…

September 23, 2023
Neurips 2023 spotlight – Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

In this episode we discuss Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems by Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng. The paper introduces a framework called Feature Multiplexing, which allows for the use of a single representation space across multiple categorical features in web-scale…

September 22, 2023
Neurips 2023 spotlight – Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

In this episode we discuss Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems by Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng. The paper introduces a framework called Feature Multiplexing, which allows for the use of a single representation space across multiple categorical features in web-scale…

September 22, 2023
arxiv Preprint – Chain-of-Verification Reduces Hallucination in Large Language Models

In this episode we discuss Chain-of-Verification Reduces Hallucination in Large Language Models by Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston. The paper proposes the Chain-of-Verification (COVE) method to address the issue of factual hallucination in large language models. COVE involves generating an initial response, planning independent fact-checking questions,…

September 21, 2023
arxiv Preprint – Language Modeling Is Compression

In this episode we discuss Language Modeling Is Compression by Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness. The authors argue that large language models can be seen as powerful compressors due to their predictive capabilities. They demonstrate…

September 20, 2023
arxiv Preprint – From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

In this episode we discuss From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting by Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad. The paper introduces a method called “Chain of Density” (CoD) for generating summaries with varying levels of information density. Using GPT-4, the authors generate entity-sparse summaries and then…

September 19, 2023
ICCV 2023 – Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

In this episode we discuss Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing by Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara. This paper introduces a framework for generating fashion images using multimodal prompts such as text, body poses, and garment sketches. The proposed architecture utilizes latent diffusion…

September 18, 2023
arxiv Preprint – GPT Can Solve Mathematical Problems Without a Calculator

In this episode we discuss GPT Can Solve Mathematical Problems Without a Calculator by Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang. The paper challenges the belief that large language models cannot perform arithmetic operations accurately without calculator tools. The researchers present MathGLM, a 2 billion-parameter language…

September 17, 2023
ICCV 2023 – Adding Conditional Control to Text-to-Image Diffusion Models

In this episode we discuss Adding Conditional Control to Text-to-Image Diffusion Models by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. The paper introduces ControlNet, a neural network architecture that enhances spatial control in text-to-image diffusion models. It incorporates additional images, such as edge maps and human pose skeletons, as conditioning factors to specify desired image composition.…

September 16, 2023
arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional

In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in…

September 15, 2023
arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional

In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in…

September 15, 2023
arxiv Preprint – eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

In this episode we discuss eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models by Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal. The paper proposes eDKM, a memory-efficient implementation of weight clustering for large language models (LLMs). LLMs have high…

September 14, 2023
arxiv Preprint – Link-Context Learning for Multimodal LLMs

In this episode we discuss Link-Context Learning for Multimodal LLMs by Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu. The paper presents a method called link-context learning (LCL) that enhances the learning abilities of Multimodal Large Language Models (MLLMs). LCL aims to enable MLLMs to recognize new images and understand unfamiliar…

September 13, 2023
arxiv Preprint – ProPainter: Improving Propagation and Transformer for Video Inpainting

In this episode we discuss ProPainter: Improving Propagation and Transformer for Video Inpainting by Shangchen Zhou, Chongyi Li, Kelvin C. K. Chan, Chen Change Loy. The paper discusses the limitations of existing approaches to video inpainting, specifically flow-based propagation and spatiotemporal Transformer methods, due to spatial misalignment and limited temporal range. To address these challenges,…

September 12, 2023
arxiv Preprint – Large Language Models as Optimizers

In this episode we discuss Large Language Models as Optimizers by Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen. The paper introduces Optimization by PROmpting (OPRO), a method that uses large language models as optimizers in the absence of gradients. OPRO utilizes natural language descriptions of the optimization…

September 11, 2023