Category: Uncategorized
-
Arxiv Paper – Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
In this episode, we discuss Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation by Danny Halawi, Alexander Wei, Eric Wallace, Tony T. Wang, Nika Haghtalab, Jacob Steinhardt. The paper highlights security risks in black-box finetuning interfaces for large language models and introduces covert malicious finetuning, a method to compromise a model’s safety undetected. This involves…
-
Arxiv Paper – Video Instruction Tuning With Synthetic Data
In this episode, we discuss Video Instruction Tuning With Synthetic Data by Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li. The paper proposes a high-quality synthetic dataset, LLaVA-Video-178K, to address the challenge of developing large multimodal video models by improving video instruction-following tasks through detailed captioning and question-answering. Using…
-
Arxiv Paper – Generative Agent Simulations of 1,000 People
In this episode, we discuss Generative Agent Simulations of 1,000 People by Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, Michael S. Bernstein. The paper introduces a new agent architecture that simulates the behaviors and attitudes of over 1,000 individuals using large language…
-
NeurIPS 2024 – Moving Off-the-Grid: Scene-Grounded Video Representations
In this episode, we discuss Moving Off-the-Grid: Scene-Grounded Video Representations by Sjoerd van Steenkiste, Daniel Zoran, Yi Yang, Yulia Rubanova, Rishabh Kabra, Carl Doersch, Dilara Gokay, Joseph Heyward, Etienne Pot, Klaus Greff, Drew A. Hudson, Thomas Albert Keck, Joao Carreira, Alexey Dosovitskiy, Mehdi S. M. Sajjadi, Thomas Kipf. The paper introduces the Moving Off-the-Grid (MooG)…
-
Arxiv Paper – Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
In this episode, we discuss Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution by Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang Lin.…
-
Arxiv Paper – FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
In this episode, we discuss FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality by Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong. FasterCache is introduced as a training-free approach that accelerates inference in video diffusion models by reusing features more efficiently, maintaining high video quality. The strategy…
-
Arxiv Paper – Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
In this episode, we discuss Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA by Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Tal Schuster. The paper presents methods to transform large language models into smaller, efficient “Recursive Transformers” by using parameter sharing through revisiting “layer tying”, which reduces model size and cost…
-
Arxiv Paper – Long Context RAG Performance of Large Language Models
In this episode, we discuss Long Context RAG Performance of Large Language Models by Quinn Leng, Jacob Portes, Sam Havens, Matei Zaharia, Michael Carbin. The paper examines the effects of long context lengths on Retrieval Augmented Generation (RAG) in large language models, especially with models supporting contexts over 64k tokens like Anthropic Claude and GPT-4-turbo.…
-
Arxiv Paper – NVLM: Open Frontier-Class Multimodal LLMs
In this episode, we discuss NVLM: Open Frontier-Class Multimodal LLMs by Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuolin Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping. The paper introduces NVLM 1.0, a set of advanced multimodal large language models that achieve state-of-the-art performance on vision-language tasks and improve upon their…
-
Arxiv Paper – ColPali: Efficient Document Retrieval with Vision Language Models
In this episode, we discuss ColPali: Efficient Document Retrieval with Vision Language Models by Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo. The paper discusses the limitations of modern document retrieval systems in effectively utilizing visual elements, prompting the introduction of the Visual Document Retrieval Benchmark (ViDoRe) to evaluate…
-
Arxiv Paper – Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
In this episode, we discuss Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models by Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar,…
-
Arxiv Paper – Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
In this episode, we discuss Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization by Mohammad Samragh, Iman Mirzadeh, Keivan Alizadeh Vahid, Fartash Faghri, Minsik Cho, Moin Nabi, Devang Naik, Mehrdad Farajtabar. The paper presents HyperCloning, a technique for initializing large language models with smaller, pre-trained models to leverage their predictive power. This…
-
Arxiv Paper – Unbounded: A Generative Infinite Game of Character Life Simulation
In this episode, we discuss Unbounded: A Generative Infinite Game of Character Life Simulation by Jialu Li, Yuanzhen Li, Neal Wadhwa, Yael Pritch, David E. Jacobs, Michael Rubinstein, Mohit Bansal, Nataniel Ruiz. The paper introduces UNBOUNDED, a generative infinite game utilizing generative AI models to create an open-ended, character life simulation game inspired by sandbox…
-
Arxiv Paper – Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
In this episode, we discuss Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer? by Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Boyd-Graber, Rachel Rudinger. The paper investigates the reverse question answering (RQA) task where a question is generated based on a given answer and…
-
Arxiv Paper – LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
In this episode, we discuss LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding by Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny, Vikas Chandra. LongVU presents a spatiotemporal adaptive compression…
-
Arxiv Paper – When Does Perceptual Alignment Benefit Vision Representations?
In this episode, we discuss When Does Perceptual Alignment Benefit Vision Representations? by Shobhita Sundaram, Stephanie Fu, Lukas Muttenthaler, Netanel Y. Tamir, Lucy Chai, Simon Kornblith, Trevor Darrell, Phillip Isola. The paper examines how aligning vision model representations with human perception affects various computer vision tasks by finetuning models on human similarity judgments and testing…
-
Arxiv paper – SceneCraft: Layout-Guided 3D Scene Generation
In this episode, we discuss SceneCraft: Layout-Guided 3D Scene Generation by Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang. SceneCraft is a method for generating detailed indoor 3D scenes based on user-provided textual descriptions and spatial preferences, using a rendering-based technique and a semantic and depth-conditioned diffusion model to enhance scene representation. It extends beyond…
-
arxiv preprint – A Tale of Tails: Model Collapse as a Change of Scaling Laws
In this episode, we discuss A Tale of Tails: Model Collapse as a Change of Scaling Laws by Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe. The paper investigates the impact of incorporating synthetic data into training datasets on neural scaling laws and future model performance, questioning whether this integration will lead to…
-
arxiv preprint – Thinking LLMs: General Instruction Following with Thought Generation
In this episode, we discuss Thinking LLMs: General Instruction Following with Thought Generation by Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar. The paper introduces a novel approach to enhance Large Language Models by incorporating an iterative thought process before response generation, which helps in overcoming limitations of current models that…
-
arxiv preprint – Thinking LLMs: General Instruction Following with Thought Generation
In this episode, we discuss Thinking LLMs: General Instruction Following with Thought Generation by Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar. The paper introduces a novel approach to enhance Large Language Models by incorporating an iterative thought process before response generation, which helps in overcoming limitations of current models that…
-
arxiv preprint – Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
In this episode, we discuss Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think by Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, Saining Xie. The paper presents a novel approach called REPresentation Alignment (REPA) to enhance the training efficiency and quality of generative diffusion models by integrating…
-
arxiv preprint – F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
In this episode, we discuss F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching by Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen. F5-TTS is a fully non-autoregressive text-to-speech system that utilizes flow matching with Diffusion Transformer (DiT) and addresses limitations of previous systems…
-
arxiv preprint – One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
In this episode, we discuss One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation by Fabian Paischer, Lukas Hauzenberger, Thomas Schmied, Benedikt Alkin, Marc Peter Deisenroth, Sepp Hochreiter. The paper introduces Explained Variance Adaptation (EVA), a method that enhances the fine-tuning of foundation models by using singular value decomposition for a more effective…
-
arxiv preprint – Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
In this episode, we discuss Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models by Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber. The paper addresses issues with high guidance scales in classifier-free guidance (CFG) for diffusion models, which can cause oversaturation and artifacts. The authors propose a modified update rule by reducing the…
-
arxiv preprint – NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING
In this episode, we discuss NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING by The authors of the paper “NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING” are: – Arsha Nagrani – Mingda Zhang – Ramin Mehran – Rachel Hornung – Nitesh Bharadwaj Gundavarapu – Nilpa Jha – Austin Myers – Xingyi Zhou…