Category: Uncategorized
-
arxiv preprint – Model-tuning Via Prompts Makes NLP Models Adversarially Robust
In this episode we discuss Model-tuning Via Prompts Makes NLP Models Adversarially Robust by Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachary C. Lipton, Danish Pruthi. The discussed paper presents a new method called Model-tuning Via Prompts (MVP) that significantly improves the adversarial robustness of pretrained language models over the standard multilayer perceptron fine-tuning (MLP-FT)…
-
arxiv preprint – Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
In this episode we discuss Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation by Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler. The paper presents Marigold, a new method for monocular depth estimation that utilizes the learned priors from generative diffusion models, specifically derived from Stable Diffusion. Marigold is affine-invariant…
-
arxiv preprint – Instruction-tuning Aligns LLMs to the Human Brain
In this episode we discuss Instruction-tuning Aligns LLMs to the Human Brain by Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut. The paper examines whether instruction-tuning, a method for fine-tuning large language models (LLMs), makes their processing more human-like through two metrics: brain alignment and behavioral alignment. Results indicate instruction-tuning increases brain…
-
arxiv preprint – WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
In this episode we discuss WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia by Sina J. Semnani, Violet Z. Yao, Heidi C. Zhang, Monica S. Lam. The paper introduces WikiChat, a chatbot that uses a few-shot Large Language Model (LLM) grounded in Wikipedia to provide accurate, engaging responses with…
-
arxiv preprint – DemoFusion: Democratising High-Resolution Image Generation With No $$$
In this episode we discuss DemoFusion: Democratising High-Resolution Image Generation With No $$$ by Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma. The paper introduces DemoFusion, a framework designed to enhance open-source Latent Diffusion Models (LDMs) for higher-resolution image generation. It incorporates Progressive Upscaling, Skip Residual, and Dilated Sampling to improve image quality…
-
arxiv preprint – Recommender Systems with Generative Retrieval
In this episode we discuss Recommender Systems with Generative Retrieval by Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy. The paper presents a novel generative approach for large-scale retrieval in recommender systems, where a…
-
arxiv preprint – Mamba: Linear-Time Sequence Modeling with Selective State Spaces
In this episode we discuss Mamba: Linear-Time Sequence Modeling with Selective State Spaces by Albert Gu, Tri Dao. The paper presents Mamba, an innovative neural network architecture that outperforms traditional Transformer models, especially in handling very long sequences. Mamba’s design incorporates selective structured state space models (SSMs) whose parameters depend on input tokens, enabling content-based…
-
arxiv preprint – Block-State Transformers
In this episode we discuss Block-State Transformers by Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin. The paper introduces the Block-State Transformer (BST) architecture that merges state space models and block-wise attention to effectively capture long-range dependencies and improve performance on language modeling tasks. The BST incorporates an SSM sublayer for…
-
arxiv preprint – Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
In this episode we discuss Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns by Brian DuSell, David Chiang. The paper introduces stack attention, a novel attention mechanism that incorporates the concept of stacks to help recognize hierarchical and nested syntactic structures, which traditional scaled dot-product attention fails to handle effectively. Two versions…
-
arxiv preprint – LooseControl: Lifting ControlNet for Generalized Depth Conditioning
In this episode we discuss LooseControl: Lifting ControlNet for Generalized Depth Conditioning by Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka. LOOSECONTROL is introduced as a novel method for depth-conditioned image generation that is less reliant on detailed depth maps, unlike the state-of-the-art ControlNet. It allows for content creation by specifying scene boundaries or 3D…
-
Announcement: AI Breakdown Youtube Channel
Welcome back to AI Breakdown! In this special announcement, your hosts Megan and Ray share exciting news – we’re expanding to YouTube! This new platform will add a visual dimension to our discussions, bringing AI papers to life with figures, tables, and results. While the podcast will continue as usual, the YouTube channel will offer…
-
arxiv preprint – OneLLM: One Framework to Align All Modalities with Language
In this episode we discuss OneLLM: One Framework to Align All Modalities with Language by Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue. The paper introduces OneLLM, a multimodal large language model that unifies the encoding of eight different modalities to language via a single…
-
arxiv preprint – The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
In this episode we discuss The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning by Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi. The paper discusses the effectiveness of traditional alignment tuning methods for large language models (LLMs) and introduces a new, simple tuning-free…
-
arxiv – MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
In this episode, we discuss MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI by Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao…
-
arxiv preprint – MLP-Mixer: An all-MLP Architecture for Vision
In this episode we discuss MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy. The paper presents MLP-Mixer, an architecture that relies solely on multi-layer perceptrons (MLPs) for image classification tasks, demonstrating that…
-
arxiv preprint – Training Chain-of-Thought via Latent-Variable Inference
In this episode we discuss Training Chain-of-Thought via Latent-Variable Inference by Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous. The paper introduces a fine-tuning strategy for large language models that improves their problem-solving accuracy by focusing on maximizing the probability…
-
arxiv preprint – Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
In this episode we discuss Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine by Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric…
-
arxiv preprint – MLP-Mixer: An all-MLP Architecture for Vision
In this episode we discuss MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy. The paper presents MLP-Mixer, an architecture that relies solely on multi-layer perceptrons (MLPs) for image classification tasks, demonstrating that…
-
arxiv preprint – Nash Learning from Human Feedback
In this episode we discuss Nash Learning from Human Feedback by Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot from Google DeepMind. The paper introduces Nash Learning…
-
arxiv preprint – Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
In this episode we discuss Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation by Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, Liefeng Bo. The paper presents a novel framework designed for character animation that synthesizes consistent and controllable videos from still images using diffusion models. It introduces a ReferenceNet that…
-
arxiv preprint – Knowledge is a Region in Weight Space for Fine-tuned Language Models
In this episode we discuss Knowledge is a Region in Weight Space for Fine-tuned Language Models by Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen. The paper investigates the relationships between different neural network models when trained on diverse datasets, focusing on their weight space and loss landscape. The study reveals…
-
arxiv preprint – MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
In this episode we discuss MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel. The paper introduces MobileCLIP, a new efficient image-text model family optimized for mobile devices with a novel multi-modal reinforced training method that enhances accuracy without increasing on-device computational demands.…
-
arxiv preprint – Simplifying Transformer Blocks
In this episode we discuss Simplifying Transformer Blocks by Bobby He, Thomas Hofmann. The paper studies the possibility of simplifying standard transformer blocks without reducing training speed by experimenting with the removal of certain components such as skip connections and normalization layers. Using signal propagation theory along with empirical research, the authors justify modifications that…
-
arxiv – Visual In-Context Prompting
In this episode, we discuss Visual In-Context Prompting by Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao. This paper introduces a new framework for improving zero-shot learning capabilities in vision tasks called universal visual in-context prompting, which works by…
-
Arxiv Preprint – GAIA: a benchmark for General AI Assistants
In this episode we discuss GAIA: a benchmark for General AI Assistants by Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom. The paper introduces GAIA, a benchmark designed to assess the capabilities of General AI Assistants in performing tasks that are simple for humans yet difficult for AIs, such as reasoning,…