Category: Uncategorized

arxiv preprint – Simplifying Transformer Blocks

In this episode we discuss Simplifying Transformer Blocks by Bobby He, Thomas Hofmann. The paper studies the possibility of simplifying standard transformer blocks without reducing training speed by experimenting with the removal of certain components such as skip connections and normalization layers. Using signal propagation theory along with empirical research, the authors justify modifications that…

December 1, 2023
arxiv – Visual In-Context Prompting

In this episode, we discuss Visual In-Context Prompting by Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao. This paper introduces a new framework for improving zero-shot learning capabilities in vision tasks called universal visual in-context prompting, which works by…

November 30, 2023
Arxiv Preprint – GAIA: a benchmark for General AI Assistants

In this episode we discuss GAIA: a benchmark for General AI Assistants by Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom. The paper introduces GAIA, a benchmark designed to assess the capabilities of General AI Assistants in performing tasks that are simple for humans yet difficult for AIs, such as reasoning,…

November 29, 2023
Arxiv Preprint – DisCo: Disentangled Control for Realistic Human Dance Generation

In this episode we discuss DisCo: Disentangled Control for Realistic Human Dance Generation by Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang. The paper discusses the challenges of generative AI in creating realistic human-centric dance content for social media, highlighting the need for models to…

November 28, 2023
Arxiv Preprint – Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

In this episode we discuss Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation by Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai. The paper discusses how a language-model-infused scaffolding program uses a seed “improver” program to iteratively improve itself by querying a language model multiple times and optimizing based on a utility function. The improved…

November 27, 2023
Arxiv Preprint – A General Theoretical Paradigm to Understand Learning from Human Preferences

In this episode we discuss A General Theoretical Paradigm to Understand Learning from Human Preferences by Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos. The paper explores reinforcement learning from human preferences (RLHF) and proposes ΨPO, a new theoretical framework that directly utilizes pairwise preferences without relying on…

November 25, 2023
Arxiv Preprint – ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

In this episode we discuss ShareGPT4V: Improving Large Multi-Modal Models with Better Captions by Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin. The ShareGPT4V dataset, with 1.2 million rich descriptive captions, has been created to enhance modality alignment in large multi-modal models (LMMs), offering greater diversity and…

November 22, 2023
ArXiv Preprint – S-LoRA: Serving Thousands of Concurrent LoRA Adapters

In this episode we discuss S-LoRA: Serving Thousands of Concurrent LoRA Adapters by Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica. The paper introduces S-LoRA, a system for efficiently serving a large number of Low-Rank Adaptation (LoRA) language…

November 21, 2023
ArXiv Preprint – Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

In this episode we discuss Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities by AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova. The paper presents Mirasol3B, a multimodal model that handles the disparate natures of video, audio, and text modalities through separate autoregressive components, dividing the process according…

November 20, 2023
Arxiv Preprint – LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

In this episode we discuss LCM-LoRA: A Universal Stable-Diffusion Acceleration Module by Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao (Project page). The paper discusses the advancements in Latent Consistency Models (LCMs), which have shown great efficiency in text-to-image generation by being distilled from…

November 17, 2023
ArXiv Preprint – Fine-tuning Language Models for Factuality

In this episode we discuss Fine-tuning Language Models for Factuality by Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelsea Finn. The paper presents a method to improve the factual accuracy of large pre-trained language models (LLMs) without human fact-checking. By utilizing recent advancements in natural language processing (NLP), such as judging the factuality…

November 16, 2023
arxiv preprint – Language Models can be Logical Solvers

In this episode we discuss Language Models can be Logical Solvers by Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen. The paper presents LOGIPT, a new language model designed to tackle complex logical reasoning by directly mimicking the reasoning process of logical solvers, which avoids errors caused by parsing…

November 15, 2023
ArXiv Preprint – Prompt Engineering a Prompt Engineer

In this episode we discuss Prompt Engineering a Prompt Engineer by Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani. The paper presents PE2, an advanced method for automatically engineering prompts for large language models (LLMs), enabling them to perform better at complex tasks. By incorporating elements like a step-by-step reasoning template and verbalized optimization concepts…

November 14, 2023
arxiv preprint – CogVLM: Visual Expert for Pretrained Language Models

In this episode we discuss CogVLM: Visual Expert for Pretrained Language Models by Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang. CogVLM is an open-source visual language foundation model that significantly…

November 13, 2023
ArXiv Preprint – De-Diffusion Makes Text a Strong Cross-Modal Interface

In this episode we discuss De-Diffusion Makes Text a Strong Cross-Modal Interface by Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu. The paper introduces De-Diffusion, a new approach that uses text to represent images. An autoencoder is used to transform an image into text, which can be reconstructed back into the…

November 10, 2023
ArXiv Preprint – E3 TTS: Easy End-to-End Diffusion-based Text to Speech

In this episode we discuss E3 TTS: Easy End-to-End Diffusion-based Text to Speech by Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen. The paper introduces Easy End-to-End Diffusion-based Text to Speech (E3 TTS), an innovative text-to-speech model that converts text to audio using a diffusion process without the need for intermediate representations or alignment information.…

November 9, 2023
ArXiv Preprint – Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

In this episode we discuss Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges by Chenhang Cui, Yiyang Zhou, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao. The study introduces the Bingo benchmark to analyze hallucination behavior in GPT-4V(ision), a model processing both visual and textual data. Hallucinations, categorized as either bias…

November 8, 2023
ArXiv Preprint – Learning From Mistakes Makes LLM Better Reasoner

In this episode we discuss Learning From Mistakes Makes LLM Better Reasoner by Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen. The paper introduces LEarning from MistAkes (LEMA), a method that improves large language models’ (LLMs) ability to solve math problems by fine-tuning them using GPT-4-generated mistake-correction data pairs. LEMA involves…

November 7, 2023
ArXiv Preprint – The Generative AI Paradox: ”What It Can Create, It May Not Understand”

In this episode we discuss The Generative AI Paradox: “What It Can Create, It May Not Understand” by Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi. The paper examines the paradox in generative…

November 6, 2023
ArXiv Preprint – TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

In this episode we discuss TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise by Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan. The paper introduces TeacherLM, a series of…

November 3, 2023
ArXiv Preprint – MM-VID: Advancing Video Understanding with GPT-4V(ision)

In this episode we discuss MM-VID: Advancing Video Understanding with GPT-4V(ision) by Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang. The paper introduces MM-VID, a system that incorporates GPT-4V with vision, audio, and speech experts to enhance video understanding.…

November 2, 2023
ArXiv Preprint – Zephyr: Direct Distillation of LM Alignment

In this episode we discuss Zephyr: Direct Distillation of LM Alignment by Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf. The paper introduces ZEPHYR, a language model that focuses on aligning with user…

November 1, 2023
ArXiv Preprint – ControlLLM: Augment Language Models with Tools by Searching on Graphs

In this episode we discuss ControlLLM: Augment Language Models with Tools by Searching on Graphs by Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang. The paper introduces a framework called ControlLLM that enhances large language models (LLMs) by allowing them to use multi-modal…

October 31, 2023
ArXiv Preprint – Talk like a Graph: Encoding Graphs for Large Language Models

In this episode we discuss Talk like a Graph: Encoding Graphs for Large Language Models by Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi. The paper discusses the encoding of graph-structured data for use in large language models (LLMs). It investigates different graph encoding methods, the nature of graph tasks, and the structure of the graph, and…

October 30, 2023
arxiv Preprint – AgentTuning: Enabling Generalized Agent Abilities for LLMs

In this episode we discuss AgentTuning: Enabling Generalized Agent Abilities for LLMs by Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang. AgentTuning is a method that enhances the agent abilities of large language models (LLMs) while maintaining their general capabilities. It introduces AgentInstruct, a lightweight instruction-tuning dataset, and combines…

October 29, 2023