Category: Uncategorized
-
arxiv Preprint – Large Language Models Cannot Self-Correct Reasoning Yet
In this episode we discuss Large Language Models Cannot Self-Correct Reasoning Yet by Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou. The paper explores the effectiveness of self-correction in Large Language Models (LLMs) for improving the accuracy and appropriateness of generated content. It specifically focuses on the…
-
arxiv Preprint – Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
In this episode we discuss Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution by Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rocktäschel. The paper presents PROMPTBREEDER, a method for evolving and adapting prompts for Large Language Models (LLMs) in order to enhance their reasoning abilities. It uses an LLM to mutate a population of task-prompts…
-
arxiv Preprint – Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
In this episode we discuss Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation by Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai. The paper presents a method called Self-Taught Optimizer (STOP) that utilizes a language model to enhance a scaffolding program for solving optimization problems. The language model suggests self-improvement strategies like beam search, genetic…
-
arxiv Preprint – Tree of Thoughts: Deliberate Problem Solving with Large Language Models
In this episode we discuss Tree of Thoughts: Deliberate Problem Solving with Large Language Models by Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan. The authors of this paper introduce a framework called “Tree of Thoughts” (ToT) to enhance language model inference. The ToT framework allows language models…
-
Neurips 2023 – Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
In this episode we discuss Evaluating Cognitive Maps and Planning in Large Language Models with CogEval by Ida Momennejad, Hosein Hasanbeig, Felipe Vieira, Hiteshi Sharma, Robert Osazuwa Ness, Nebojsa Jojic, Hamid Palangi, Jonathan Larson. The paper presents CogEval, a protocol designed to evaluate the cognitive abilities of Large Language Models (LLMs). The authors note the…
-
ICCV 2023 – Diffusion Models as Masked Autoencoders
In this episode we discuss Diffusion Models as Masked Autoencoders by Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer. The authors present a method called Diffusion Models as Masked Autoencoders (DiffMAE) that combines generative pre-training with diffusion models for visual data. They show…
-
arxiv Preprint – Conditional Diffusion Distillation
In this episode we discuss Conditional Diffusion Distillation by Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar. The authors of this paper propose a new method called conditional distillation to speed up the sampling time of diffusion models in text-to-image generation. The method incorporates image conditions to enhance the diffusion…
-
arxiv Preprint – Enable Language Models to Implicitly Learn Self-Improvement From Data
In this episode we discuss Enable Language Models to Implicitly Learn Self-Improvement From Data by Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji. The paper introduces a framework called ImPlicit Self-ImprovemenT (PIT) that allows large language models (LLMs) to learn self-improvement from data. PIT learns the improvement goal from…
-
arxiv Preprint – Efficient Streaming Language Models with Attention Sinks
In this episode we discuss Efficient Streaming Language Models with Attention Sinks by Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis. The paper proposes StreamingLLM, a framework that allows Large Language Models (LLMs) to generalize to infinite sequence length without fine-tuning. By observing the phenomenon of attention sink, where initial tokens have a…
-
Neurips 2023 – PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving
In this episode we discuss PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving by Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, Yasutaka Furukawa. The paper introduces PuzzleFusion, a neural architecture based on Diffusion Models for spatial puzzle solving. It focuses on jigsaw puzzle solving and room arrangement tasks, using new datasets including…
-
arxiv Preprint – Vision Transformers Need Registers
In this episode we discuss Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski. The paper discusses a solution to artifacts found in the feature maps of Vision Transformers (ViT) in low-informative background areas of images. By adding additional tokens called “registers” to the input sequence, the feature maps and attention…
-
arxiv Preprint – VPA: Fully Test-Time Visual Prompt Adaptation
In this episode we discuss VPA: Fully Test-Time Visual Prompt Adaptation by Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton Ferrer, Caner Hazirbas. The paper presents Visual Prompt Adaptation (VPA), a framework that extends prompt tuning to visual recognition tasks. VPA allows for test-time adaptation without source-domain information and improves…
-
arxiv Preprint – Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
In this episode we discuss Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition by Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko.…
-
arxiv Preprint – DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
In this episode we discuss DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models by Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Leon Song, Samyam Rajbhandari, Yuxiong He. DeepSpeed-Ulysses is a methodology for efficient and scalable training of large language models with long sequence lengths. It addresses the limitations…
-
arxiv Preprint – VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
In this episode we discuss VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning by Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal. The paper presents VIDEODIRECTORGPT, a framework for generating multi-scene videos with consistency using large language models. It consists of a video planner LLM (GPT-4) that expands a text prompt into a “video plan”…
-
arxiv Preprint – PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
In this episode we discuss PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training by Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li. The paper presents a training method called PoSE for adapting large language models to longer context windows. It addresses the challenge of extending the…
-
arxiv Preprint – Summarization is (Almost) Dead
In this episode we discuss Summarization is (Almost) Dead by Xiao Pu, Mingqi Gao, Xiaojun Wan. The paper investigates the capabilities of large language models (LLMs) in summary generation. Through new datasets and human evaluation experiments, the authors find that LLM-generated summaries are preferred by evaluators compared to human-written summaries and fine-tuned model summaries. LLM-generated…
-
arxiv Preprint – LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
In this episode we discuss LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent by Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai. The paper introduces a new approach called LLM-Grounder for grounding 3D visual scenes using natural language queries. It utilizes a Large Language…
-
Neurips 2023 spotlight – Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
In this episode we discuss Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems by Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng. The paper introduces a framework called Feature Multiplexing, which allows for the use of a single representation space across multiple categorical features in web-scale…
-
Neurips 2023 spotlight – Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
In this episode we discuss Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems by Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng. The paper introduces a framework called Feature Multiplexing, which allows for the use of a single representation space across multiple categorical features in web-scale…
-
arxiv Preprint – Chain-of-Verification Reduces Hallucination in Large Language Models
In this episode we discuss Chain-of-Verification Reduces Hallucination in Large Language Models by Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston. The paper proposes the Chain-of-Verification (COVE) method to address the issue of factual hallucination in large language models. COVE involves generating an initial response, planning independent fact-checking questions,…
-
arxiv Preprint – Language Modeling Is Compression
In this episode we discuss Language Modeling Is Compression by Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness. The authors argue that large language models can be seen as powerful compressors due to their predictive capabilities. They demonstrate…
-
arxiv Preprint – From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
In this episode we discuss From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting by Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad. The paper introduces a method called “Chain of Density” (CoD) for generating summaries with varying levels of information density. Using GPT-4, the authors generate entity-sparse summaries and then…
-
ICCV 2023 – Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
In this episode we discuss Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing by Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara. This paper introduces a framework for generating fashion images using multimodal prompts such as text, body poses, and garment sketches. The proposed architecture utilizes latent diffusion…
-
arxiv Preprint – GPT Can Solve Mathematical Problems Without a Calculator
In this episode we discuss GPT Can Solve Mathematical Problems Without a Calculator by Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang. The paper challenges the belief that large language models cannot perform arithmetic operations accurately without calculator tools. The researchers present MathGLM, a 2 billion-parameter language…