Category: Uncategorized

arxiv preprint – Mixtral of Experts

In this episode, we discuss Mixtral of Experts by Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak,…

January 9, 2024
arxiv preprint – Weight subcloning: direct initialization of transformers using larger pretrained ones

In this episode we discuss Weight subcloning: direct initialization of transformers using larger pretrained ones by Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari. The paper introduces a new method called weight subcloning to expedite the training of small transformer models by initializing them with weights from…

January 8, 2024
arxiv preprint – Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

In this episode we discuss Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task by Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka. The paper investigates how conditional diffusion models generalize compositionally by studying their ability to generate novel data combinations within a controlled synthetic environment. Key discoveries include that compositional…

January 5, 2024
arxiv preprint – LLM in a flash: Efficient Large Language Model Inference with Limited Memory

In this episode, we discuss LLM in a flash: Efficient Large Language Model Inference with Limited Memory by Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar. The paper introduces an approach to operate large language models (LLMs) efficiently on devices with limited DRAM by using…

January 4, 2024
arxiv preprint – The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

In this episode, we discuss The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction by Pratyusha Sharma, Jordan T. Ash, Dipendra Misra. The paper presents Layer-Selective Rank Reduction (LASER), an innovative method that enhances Transformer-based Large Language Models (LLMs) by reducing higher-order features in their weight matrices post-training, without adding…

January 2, 2024
arxiv preprint – DreaMoving: A Human Video Generation Framework based on Diffusion Models

In this episode we discuss DreaMoving: A Human Video Generation Framework based on Diffusion Models by Mengyang Feng, Jinlin Liu, Kai Yu, Yuan Yao, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie. DreaMoving is a framework that uses diffusion…

December 29, 2023
arxiv preprint – Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

In this episode we discuss Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution by Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby. The paper introduces NaViT (Native Resolution…

December 28, 2023
arxiv preprint – UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

In this episode, we discuss UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces by Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo. The paper introduces UniRef++, a unified architecture designed to address four reference-based object segmentation tasks: referring image segmentation (RIS), few-shot image segmentation (FSS), referring video object segmentation…

December 28, 2023
arxiv preprint – LongNet: Scaling Transformers to 1,000,000,000 Tokens

In this episode we discuss LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei. LONGNET is a new Transformer variant that allows for efficient processing of sequences over 1 billion tokens long using a novel dilated attention mechanism. This mechanism provides…

December 27, 2023
arxiv preprint – MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

In this episode, we discuss MotionCtrl: A Unified and Flexible Motion Controller for Video Generation by Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan. The study introduces MotionCtrl, a novel approach for video generation that can separately regulate camera and object motions, addressing limitations in previous methodologies that lacked…

December 27, 2023
arxiv preprint – Model-tuning Via Prompts Makes NLP Models Adversarially Robust

In this episode we discuss Model-tuning Via Prompts Makes NLP Models Adversarially Robust by Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachary C. Lipton, Danish Pruthi. The discussed paper presents a new method called Model-tuning Via Prompts (MVP) that significantly improves the adversarial robustness of pretrained language models over the standard multilayer perceptron fine-tuning (MLP-FT)…

December 26, 2023
arxiv preprint – Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

In this episode we discuss Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation by Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler. The paper presents Marigold, a new method for monocular depth estimation that utilizes the learned priors from generative diffusion models, specifically derived from Stable Diffusion. Marigold is affine-invariant…

December 21, 2023
arxiv preprint – Instruction-tuning Aligns LLMs to the Human Brain

In this episode we discuss Instruction-tuning Aligns LLMs to the Human Brain by Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut. The paper examines whether instruction-tuning, a method for fine-tuning large language models (LLMs), makes their processing more human-like through two metrics: brain alignment and behavioral alignment. Results indicate instruction-tuning increases brain…

December 20, 2023
arxiv preprint – WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

In this episode we discuss WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia by Sina J. Semnani, Violet Z. Yao, Heidi C. Zhang, Monica S. Lam. The paper introduces WikiChat, a chatbot that uses a few-shot Large Language Model (LLM) grounded in Wikipedia to provide accurate, engaging responses with…

December 19, 2023
arxiv preprint – DemoFusion: Democratising High-Resolution Image Generation With No $$$

In this episode we discuss DemoFusion: Democratising High-Resolution Image Generation With No $$$ by Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma. The paper introduces DemoFusion, a framework designed to enhance open-source Latent Diffusion Models (LDMs) for higher-resolution image generation. It incorporates Progressive Upscaling, Skip Residual, and Dilated Sampling to improve image quality…

December 18, 2023
arxiv preprint – Recommender Systems with Generative Retrieval

In this episode we discuss Recommender Systems with Generative Retrieval by Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy. The paper presents a novel generative approach for large-scale retrieval in recommender systems, where a…

December 15, 2023
arxiv preprint – Mamba: Linear-Time Sequence Modeling with Selective State Spaces

In this episode we discuss Mamba: Linear-Time Sequence Modeling with Selective State Spaces by Albert Gu, Tri Dao. The paper presents Mamba, an innovative neural network architecture that outperforms traditional Transformer models, especially in handling very long sequences. Mamba’s design incorporates selective structured state space models (SSMs) whose parameters depend on input tokens, enabling content-based…

December 14, 2023
arxiv preprint – Block-State Transformers

In this episode we discuss Block-State Transformers by Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin. The paper introduces the Block-State Transformer (BST) architecture that merges state space models and block-wise attention to effectively capture long-range dependencies and improve performance on language modeling tasks. The BST incorporates an SSM sublayer for…

December 13, 2023
arxiv preprint – Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

In this episode we discuss Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns by Brian DuSell, David Chiang. The paper introduces stack attention, a novel attention mechanism that incorporates the concept of stacks to help recognize hierarchical and nested syntactic structures, which traditional scaled dot-product attention fails to handle effectively. Two versions…

December 12, 2023
arxiv preprint – LooseControl: Lifting ControlNet for Generalized Depth Conditioning

In this episode we discuss LooseControl: Lifting ControlNet for Generalized Depth Conditioning by Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka. LOOSECONTROL is introduced as a novel method for depth-conditioned image generation that is less reliant on detailed depth maps, unlike the state-of-the-art ControlNet. It allows for content creation by specifying scene boundaries or 3D…

December 11, 2023
Announcement: AI Breakdown Youtube Channel

Welcome back to AI Breakdown! In this special announcement, your hosts Megan and Ray share exciting news – we’re expanding to YouTube! This new platform will add a visual dimension to our discussions, bringing AI papers to life with figures, tables, and results. While the podcast will continue as usual, the YouTube channel will offer…

December 8, 2023
arxiv preprint – OneLLM: One Framework to Align All Modalities with Language

In this episode we discuss OneLLM: One Framework to Align All Modalities with Language by Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue. The paper introduces OneLLM, a multimodal large language model that unifies the encoding of eight different modalities to language via a single…

December 8, 2023
arxiv preprint – The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

In this episode we discuss The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning by Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi. The paper discusses the effectiveness of traditional alignment tuning methods for large language models (LLMs) and introduces a new, simple tuning-free…

December 8, 2023
arxiv – MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

In this episode, we discuss MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI by Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao…

December 7, 2023
arxiv preprint – MLP-Mixer: An all-MLP Architecture for Vision

In this episode we discuss MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy. The paper presents MLP-Mixer, an architecture that relies solely on multi-layer perceptrons (MLPs) for image classification tasks, demonstrating that…

December 7, 2023