Category: Uncategorized

  • arxiv preprint – The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

    In this episode, we discuss The Chosen One: Consistent Characters in Text-to-Image Diffusion Models by Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski. The paper introduces a novel method for creating character images that remain consistent in various settings using text-to-image diffusion models. It details a technique…

  • arxiv preprint – Memory Mosaics

    In this episode, we discuss Memory Mosaics by Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen, Léon Bottou. Memory Mosaics are collective networks designed for prediction tasks, utilizing associative memories in a collaborative manner. These networks offer a simpler and more transparent alternative to transformers, maintaining comparable abilities in compositional learning and learning in context.…

  • arxiv preprint – Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

    In this episode, we discuss Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? by Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig. The paper explores the effects of integrating new factual information into large language models (LLMs) during the fine-tuning phase, particularly focusing on how this affects their ability…

  • arxiv preprint – LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

    In this episode, we discuss LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models by Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia. The abstract describes “LongLoRA,” a technique designed to efficiently expand the context size of large language models (LLMs) while maintaining computational feasibility. This methodology includes a novel…

  • arxiv preprint – WildChat: 1M ChatGPT Interaction Logs in the Wild

    In this episode, we discuss WildChat: 1M ChatGPT Interaction Logs in the Wild by Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng. WILDCHAT is a dataset featuring 1 million user-ChatGPT conversations with over 2.5 million interaction turns, created by collecting chat transcripts and request headers from users who consented to participate.…

  • arxiv preprint – NOLA: Compressing LoRA using Linear Combination of Random Basis

    In this episode, we discuss NOLA: Compressing LoRA using Linear Combination of Random Basis by Soroush Abbasi Koohpayegani, KL Navaneet, Parsa Nooralinejad, Soheil Kolouri, Hamed Pirsiavash. The paper introduces a novel technique called NOLA for fine-tuning and deploying large language models (LLMs) like GPT-3 more efficiently by addressing the limitations of existing Low-Rank Adaptation (LoRA)…

  • arxiv preprint – NOLA: Compressing LoRA using Linear Combination of Random Basis

    In this episode, we discuss NOLA: Compressing LoRA using Linear Combination of Random Basis by Soroush Abbasi Koohpayegani, KL Navaneet, Parsa Nooralinejad, Soheil Kolouri, Hamed Pirsiavash. The paper introduces a novel technique called NOLA for fine-tuning and deploying large language models (LLMs) like GPT-3 more efficiently by addressing the limitations of existing Low-Rank Adaptation (LoRA)…

  • arxiv preprint – StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

    In this episode, we discuss StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation by Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou. The paper introduces advanced techniques to improve diffusion-based generative models used for creating consistent and continuous sequences in image and video generation. It presents “Consistent Self-Attention” for maintaining content consistency…

  • arxiv preprint – Iterative Reasoning Preference Optimization

    In this episode, we discuss Iterative Reasoning Preference Optimization by Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston. This study explores a new iterative method aimed at improving how AI models generate step-by-step logical reasoning, or Chain-of-Thought (CoT), to reach correct answers by optimizing between competing reasoning steps. The technique…

  • arxiv preprint – Better & Faster Large Language Models via Multi-token Prediction

    In this episode, we discuss Better & Faster Large Language Models via Multi-token Prediction by Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve. The paper “Better & Faster Large Language Models via Multi-token Prediction” introduces a novel training methodology for large language models (LLMs) by predicting multiple future tokens simultaneously rather than…

  • arxiv preprint – Make Your LLM Fully Utilize the Context

    In this episode, we discuss Make Your LLM Fully Utilize the Context by Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou. The paper “Make Your LLM Fully Utilize the Context” delves into solving the lost-in-the-middle challenge in large language models (LLMs), where these models fail to fully use the contextual information provided in…

  • arxiv preprint – Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

    In this episode, we discuss Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation by Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang. The abstract discusses the evaluation of text-to-image models, focusing on ensuring the accuracy between text prompts and generated images through…

  • arxiv preprint – PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

    In this episode, we discuss PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning by Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng. The paper introduces PLLaVA, a model that expands image captioning techniques to video dense captioning, effectively describing various elements, including motions and attires.…

  • arxiv preprint – Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

    In this episode, we discuss Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare by Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem. The paper discusses the integration of Large Language Models (LLMs) in healthcare, focusing on their application in diagnostics, research, and patient…

  • arxiv preprint – SnapKV: LLM Knows What You are Looking for Before Generation

    In this episode, we discuss SnapKV: LLM Knows What You are Looking for Before Generation by Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen. The paper introduces SnapKV, a method designed to efficiently reduce the size of Key-Value (KV) caches in Large Language Models (LLMs)…

  • arxiv preprint – CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

    In this episode, we discuss CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models by Je-Yong Lee, Donghyun Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini. The paper presents “Contextually-Aware Thresholding for Sparsity (CATS),” a method intended to reduce the operational costs of Large Language Models (LLMs) by increasing activation sparsity while maintaining high performance levels.…

  • arxiv preprint – SpaceByte: Towards Deleting Tokenization from Large Language Modeling

    In this episode, we discuss SpaceByte: Towards Deleting Tokenization from Large Language Modeling by Kevin Slagle. Tokenization in large language models, while improving performance, presents challenges such as bias, increased adversarial vulnerability, and complexity. The new byte-level decoder architecture, SpaceByte, significantly diminishes these issues by integrating larger transformer blocks selectively at critical bytes like spaces,…

  • arxiv preprint – TextSquare: Scaling up Text-Centric Visual Instruction Tuning

    In this episode, we discuss TextSquare: Scaling up Text-Centric Visual Instruction Tuning by Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang. The paper describes advancements in text-centric visual question answering using…

  • arxiv preprint – EdgeFusion: On-Device Text-to-Image Generation

    In this episode, we discuss EdgeFusion: On-Device Text-to-Image Generation by Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim. The paper “EdgeFusion: On-Device Text-to-Image Generation” explores the difficulties of using Stable Diffusion models in text-to-image generation due to their intensive computational needs. It proposes a…

  • arxiv preprint – VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

    In this episode, we discuss VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time by Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo. VASA is a new framework designed to create realistic talking faces from a static image and audio clip, featuring lip synchronization, facial…

  • arxiv preprint – Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

    In this episode, we discuss Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models by Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia. The paper introduces Mini-Gemini, a framework aimed at improving Vision Language Models (VLMs) by addressing the performance gap with advanced models like GPT-4. Mini-Gemini…

  • arxiv preprint – High-Dimension Human Value Representation in Large Language Models

    In this episode, we discuss High-Dimension Human Value Representation in Large Language Models by Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung. The paper addresses the importance of aligning large language models (LLMs) with human values, introducing a new method called UniVaR for representing human value distributions…

  • arxiv preprint – Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

    In this episode, we discuss Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck by Nathan Godey, Éric de la Clergerie, Benoît Sagot. This paper investigates the phenomenon of performance saturation in small language models, attributing the issue to a mismatch between the model’s hidden layer size and the complexity…

  • arxiv preprint – Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

    In this episode, we discuss Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention by Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal. The paper presents a novel method for enabling Transformer-based Large Language Models to process extremely long inputs while keeping memory and computational requirements fixed. The technique introduced, called Infini-attention, blends a new form…

  • arxiv preprint – Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

    In this episode, we discuss Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs by Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan. The paper presents Ferret-UI, a new multimodal large language model tailored for interpreting and interacting with mobile user interface screens, which overcomes common challenges through…