Category: Uncategorized
-
arxiv preprint – Simplifying Transformer Blocks
In this episode we discuss Simplifying Transformer Blocks by Bobby He, Thomas Hofmann. The paper studies the possibility of simplifying standard transformer blocks without reducing training speed by experimenting with the removal of certain components such as skip connections and normalization layers. Using signal propagation theory along with empirical research, the authors justify modifications that…
-
arxiv – Visual In-Context Prompting
In this episode, we discuss Visual In-Context Prompting by Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao. This paper introduces a new framework for improving zero-shot learning capabilities in vision tasks called universal visual in-context prompting, which works by…
-
Arxiv Preprint – GAIA: a benchmark for General AI Assistants
In this episode we discuss GAIA: a benchmark for General AI Assistants by Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom. The paper introduces GAIA, a benchmark designed to assess the capabilities of General AI Assistants in performing tasks that are simple for humans yet difficult for AIs, such as reasoning,…
-
Arxiv Preprint – DisCo: Disentangled Control for Realistic Human Dance Generation
In this episode we discuss DisCo: Disentangled Control for Realistic Human Dance Generation by Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang. The paper discusses the challenges of generative AI in creating realistic human-centric dance content for social media, highlighting the need for models to…
-
Arxiv Preprint – Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
In this episode we discuss Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation by Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai. The paper discusses how a language-model-infused scaffolding program uses a seed “improver” program to iteratively improve itself by querying a language model multiple times and optimizing based on a utility function. The improved…
-
Arxiv Preprint – A General Theoretical Paradigm to Understand Learning from Human Preferences
In this episode we discuss A General Theoretical Paradigm to Understand Learning from Human Preferences by Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos. The paper explores reinforcement learning from human preferences (RLHF) and proposes ΨPO, a new theoretical framework that directly utilizes pairwise preferences without relying on…
-
Arxiv Preprint – ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
In this episode we discuss ShareGPT4V: Improving Large Multi-Modal Models with Better Captions by Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin. The ShareGPT4V dataset, with 1.2 million rich descriptive captions, has been created to enhance modality alignment in large multi-modal models (LMMs), offering greater diversity and…
-
ArXiv Preprint – S-LoRA: Serving Thousands of Concurrent LoRA Adapters
In this episode we discuss S-LoRA: Serving Thousands of Concurrent LoRA Adapters by Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica. The paper introduces S-LoRA, a system for efficiently serving a large number of Low-Rank Adaptation (LoRA) language…
-
ArXiv Preprint – Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
In this episode we discuss Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities by AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova. The paper presents Mirasol3B, a multimodal model that handles the disparate natures of video, audio, and text modalities through separate autoregressive components, dividing the process according…
-
Arxiv Preprint – LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
In this episode we discuss LCM-LoRA: A Universal Stable-Diffusion Acceleration Module by Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao (Project page). The paper discusses the advancements in Latent Consistency Models (LCMs), which have shown great efficiency in text-to-image generation by being distilled from…
-
ArXiv Preprint – Fine-tuning Language Models for Factuality
In this episode we discuss Fine-tuning Language Models for Factuality by Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelsea Finn. The paper presents a method to improve the factual accuracy of large pre-trained language models (LLMs) without human fact-checking. By utilizing recent advancements in natural language processing (NLP), such as judging the factuality…
-
arxiv preprint – Language Models can be Logical Solvers
In this episode we discuss Language Models can be Logical Solvers by Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen. The paper presents LOGIPT, a new language model designed to tackle complex logical reasoning by directly mimicking the reasoning process of logical solvers, which avoids errors caused by parsing…
-
ArXiv Preprint – Prompt Engineering a Prompt Engineer
In this episode we discuss Prompt Engineering a Prompt Engineer by Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani. The paper presents PE2, an advanced method for automatically engineering prompts for large language models (LLMs), enabling them to perform better at complex tasks. By incorporating elements like a step-by-step reasoning template and verbalized optimization concepts…
-
arxiv preprint – CogVLM: Visual Expert for Pretrained Language Models
In this episode we discuss CogVLM: Visual Expert for Pretrained Language Models by Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang. CogVLM is an open-source visual language foundation model that significantly…
-
ArXiv Preprint – De-Diffusion Makes Text a Strong Cross-Modal Interface
In this episode we discuss De-Diffusion Makes Text a Strong Cross-Modal Interface by Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu. The paper introduces De-Diffusion, a new approach that uses text to represent images. An autoencoder is used to transform an image into text, which can be reconstructed back into the…
-
ArXiv Preprint – E3 TTS: Easy End-to-End Diffusion-based Text to Speech
In this episode we discuss E3 TTS: Easy End-to-End Diffusion-based Text to Speech by Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen. The paper introduces Easy End-to-End Diffusion-based Text to Speech (E3 TTS), an innovative text-to-speech model that converts text to audio using a diffusion process without the need for intermediate representations or alignment information.…
-
ArXiv Preprint – Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges
In this episode we discuss Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges by Chenhang Cui, Yiyang Zhou, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao. The study introduces the Bingo benchmark to analyze hallucination behavior in GPT-4V(ision), a model processing both visual and textual data. Hallucinations, categorized as either bias…
-
ArXiv Preprint – Learning From Mistakes Makes LLM Better Reasoner
In this episode we discuss Learning From Mistakes Makes LLM Better Reasoner by Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen. The paper introduces LEarning from MistAkes (LEMA), a method that improves large language models’ (LLMs) ability to solve math problems by fine-tuning them using GPT-4-generated mistake-correction data pairs. LEMA involves…
-
ArXiv Preprint – The Generative AI Paradox: ”What It Can Create, It May Not Understand”
In this episode we discuss The Generative AI Paradox: “What It Can Create, It May Not Understand” by Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi. The paper examines the paradox in generative…
-
ArXiv Preprint – TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
In this episode we discuss TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise by Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan. The paper introduces TeacherLM, a series of…
-
ArXiv Preprint – MM-VID: Advancing Video Understanding with GPT-4V(ision)
In this episode we discuss MM-VID: Advancing Video Understanding with GPT-4V(ision) by Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang. The paper introduces MM-VID, a system that incorporates GPT-4V with vision, audio, and speech experts to enhance video understanding.…
-
ArXiv Preprint – Zephyr: Direct Distillation of LM Alignment
In this episode we discuss Zephyr: Direct Distillation of LM Alignment by Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf. The paper introduces ZEPHYR, a language model that focuses on aligning with user…
-
ArXiv Preprint – ControlLLM: Augment Language Models with Tools by Searching on Graphs
In this episode we discuss ControlLLM: Augment Language Models with Tools by Searching on Graphs by Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang. The paper introduces a framework called ControlLLM that enhances large language models (LLMs) by allowing them to use multi-modal…
-
ArXiv Preprint – Talk like a Graph: Encoding Graphs for Large Language Models
In this episode we discuss Talk like a Graph: Encoding Graphs for Large Language Models by Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi. The paper discusses the encoding of graph-structured data for use in large language models (LLMs). It investigates different graph encoding methods, the nature of graph tasks, and the structure of the graph, and…
-
arxiv Preprint – AgentTuning: Enabling Generalized Agent Abilities for LLMs
In this episode we discuss AgentTuning: Enabling Generalized Agent Abilities for LLMs by Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang. AgentTuning is a method that enhances the agent abilities of large language models (LLMs) while maintaining their general capabilities. It introduces AgentInstruct, a lightweight instruction-tuning dataset, and combines…