Category: Uncategorized
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
In this episode, we discuss Beyond Language Modeling: An Exploration of Multimodal Pretraining by Shengbang Tong, David Fan, John Nguyen, Ellis Brown, Gaoyue Zhou, Shengyi Qian, Boyang Zheng, Théophane Vallaeys, Junlin Han, Rob Fergus, Naila Murray, Marjan Ghazvininejad, Mike Lewis, Nicolas Ballas, Amir Bar, Michael Rabbat, Jakob Verbeek, Luke Zettlemoyer, Koustuv Sinha, Yann LeCun, Saining…
-
Mode Seeking meets Mean Seeking for Fast Long Video Generation
In this episode, we discuss Mode Seeking meets Mean Seeking for Fast Long Video Generation by Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Zhang, Nanye Ma, Hansheng Chen, Maneesh Agrawala, Leonidas Guibas, Gordon Wetzstein, Arash Vahdat. The paper presents a novel training paradigm combining mode seeking and mean seeking to decouple local video…
-
Recursive Language Models
In this episode, we discuss Recursive Language Models by Alex L. Zhang, Tim Kraska, Omar Khattab. The paper introduces Recursive Language Models (RLMs), a novel inference approach that enables large language models to handle extremely long prompts by recursively processing prompt snippets. RLMs significantly extend effective context length by up to 100 times and outperform…
-
PaperBanana: Automating Academic Illustration for AI Scientists
In this episode, we discuss PaperBanana: Automating Academic Illustration for AI Scientists by Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon. The paper presents PaperBanana, an autonomous framework that generates publication-ready academic illustrations using advanced vision-language and image generation models. It coordinates specialized agents to retrieve references, plan, render,…
-
World-Gymnast: Training Robots with Reinforcement Learning in a World Model
In this episode, we discuss World-Gymnast: Training Robots with Reinforcement Learning in a World Model by Ansh Kumar Sharma, Yixiang Sun, Ninghao Lu, Yunzhe Zhang, Jiarao Liu, Sherry Yang. The paper introduces World-Gymnast, a method that fine-tunes robot policies using reinforcement learning within a video-based world model conditioned on vision and language. This approach significantly…
-
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
In this episode, we discuss Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory by Dohun Lee, Chun-Hao Paul Huang, Xuelin Chen, Jong Chul Ye, Duygu Ceylan, Hyeonho Jeong. The paper addresses the challenge of maintaining cross-consistency in multi-turn video editing using video-to-video diffusion models. It introduces Memory-V2V, a framework that enhances existing models by incorporating an…
-
Self-Rewarding Language Models
In this episode, we discuss Self-Rewarding Language Models by Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston. The paper proposes training language models to give themselves feedback using a self-rewarding approach, bypassing the limitations of human-labeled reward models. By iteratively fine-tuning Llama 2 70B with this method, the…
-
On the generalization of language models from in-context learning and finetuning: a controlled study
In this episode, we discuss On the generalization of language models from in-context learning and finetuning: a controlled study by Andrew K. Lampinen, Arslan Chaudhry, Stephanie C. Y. Chan, Cody Wild, Diane Wan, Alex Ku, Jörg Bornschein, Razvan Pascanu, Murray Shanahan, James L. McClelland. The paper compares the generalization and deductive reasoning abilities of large…
-
OpenThoughts: Data Recipes for Reasoning Models
In this episode, we discuss OpenThoughts: Data Recipes for Reasoning Models by Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hritik Bansal, Marianna Nezhurina, Jean Mercat, Trung Vu, Zayne Sprague, Ashima Suvarna, Benjamin Feuer, Liangyu Chen, Zaid Khan, Eric Frankel, Sachin Grover, Caroline Choi, Niklas Muennighoff, Shiye Su, Wanjia Zhao, John Yang, Shreyas…
-
Nested Learning: The Illusion of Deep Learning Architecture
In this episode, we discuss Nested Learning: The Illusion of Deep Learning Architecture by The authors of the paper “Nested Learning: The Illusion of Deep Learning Architecture” are: – Ali Behrouz – Meisam Razaviyayn – Peilin Zhong – Vahab Mirrokni. The paper introduces Nested Learning (NL), a new paradigm framing machine learning as multiple nested…
-
ARC Is a Vision Problem!
In this episode, we discuss ARC Is a Vision Problem! by Keya Hu, Ali Cy, Linlu Qiu, Xiaoman Delores Ding, Runqian Wang, Yeyin Eva Zhu, Jacob Andreas, Kaiming He. The paper reframes the Abstraction and Reasoning Corpus (ARC) tasks as an image-to-image translation problem using a vision-centric approach. It introduces Vision ARC (VARC), a model…
-
Solving a Million-Step LLM Task with Zero Errors
In this episode, we discuss Solving a Million-Step LLM Task with Zero Errors by Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Olivier Francon, Conor F. Hayes, Xin Qiu, Babak Hodjat, Risto Miikkulainen. The paper presents MAKER, a system that achieves error-free execution of tasks requiring over one million steps by decomposing them into subtasks…
-
DataRater: Meta-Learned Dataset Curation
In this episode, we discuss DataRater: Meta-Learned Dataset Curation by Dan A. Calian, Gregory Farquhar, Iurii Kemaev, Luisa M. Zintgraf, Matteo Hessel, Jeremy Shar, Junhyuk Oh, András György, Tom Schaul, Jeffrey Dean, Hado van Hasselt, David Silver. The paper proposes DataRater, a meta-learning approach that estimates the value of individual training data points to improve…
-
Mathematical exploration and discovery at scale
In this episode, we discuss Mathematical exploration and discovery at scale by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner. AlphaEvolve is an evolutionary coding agent that combines large language models with automated evaluation to iteratively generate and refine solutions for complex mathematical problems. It successfully rediscovered and improved known solutions across various math…
-
Kosmos: An AI Scientist for Autonomous Discovery
In this episode, we discuss Kosmos: An AI Scientist for Autonomous Discovery by Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Daniel L. Barabasi, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk…
-
World Simulation with Video Foundation Models for Physical AI
In this episode, we discuss World Simulation with Video Foundation Models for Physical AI by NVIDIA, :, Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, Prithvijit Chattopadhyay, Mike Chen, Yongxin Chen, Yu Chen, Shuai Cheng, Yin Cui, Jenna Diamond, Yifan Ding, Jiaojiao Fan,…
-
Towards Robust Mathematical Reasoning
In this episode, we discuss Towards Robust Mathematical Reasoning by Thang Luong, Dawsen Hwang, Hoang H. Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk Seo, Junsu Kim, Garrett Bingham, Jonathan Lee, Swaroop Mishra, Alex Zhai, Clara Huiyi Hu, Henryk Michalewski, Jimin Kim, Jeonghyun Ahn, Junhwi Bae, Xingyou Song, Trieu H. Trinh, Quoc V. Le, Junehyuk Jung. The…
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
In this episode, we discuss ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models by Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong. This paper introduces ProRL, a new reinforcement learning training method that uncovers novel reasoning strategies beyond those found in base language models.…
-
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
In this episode, we discuss Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models by Peter Robicheaux, Matvei Popov, Anish Madan, Isaac Robinson, Joseph Nelson, Deva Ramanan, Neehar Peri. The paper introduces Roboflow100-VL, a large benchmark of 100 diverse multi-modal object detection datasets designed to test vision-language models (VLMs) on out-of-distribution concepts beyond typical pre-training…
-
ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases
In this episode, we discuss ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases by Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini. The paper introduces ImpossibleBench, a benchmark framework designed to measure and analyze large language models’ tendency to cheat by exploiting test cases. It creates tasks with conflicting specifications and unit tests to quantify how often…
-
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
In this episode, we discuss Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset by Qingyan Bai, Qiuyu Wang, Hao Ouyang, Yue Yu, Hanlin Wang, Wen Wang, Ka Leong Cheng, Shuailei Ma, Yanhong Zeng, Zichen Liu, Yinghao Xu, Yujun Shen, Qifeng Chen. The paper presents Ditto, a comprehensive framework that generates large-scale, high-quality training data…
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
In this episode, we discuss Reasoning with Sampling: Your Base Model is Smarter Than You Think by Aayush Karan, Yilun Du. The paper proposes a novel iterative sampling algorithm based on Markov chain Monte Carlo techniques that enhances reasoning abilities of base large language models at inference time without additional training. This method significantly improves…
-
DeepSeek-OCR: Contexts Optical Compression
In this episode, we discuss DeepSeek-OCR: Contexts Optical Compression by The authors of the paper are: **Haoran Wei, Yaofeng Sun, Yukun Li**. DeepSeek-OCR introduces a method to compress long text contexts into compact 2D vision tokens using a DeepEncoder and a decoder model, achieving high OCR accuracy even at significant compression ratios. It outperforms existing…
-
The Markovian Thinker
In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville, Siva Reddy. The paper proposes Markovian Thinking, a reinforcement learning paradigm that limits reasoning context to a constant-size state, enabling linear compute with constant memory rather than quadratic overhead. They implement this approach in…
-
The Markovian Thinker
In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville, Siva Reddy. The paper proposes Markovian Thinking, a reinforcement learning paradigm that limits reasoning context to a constant-size state, enabling linear compute with constant memory rather than quadratic overhead. They implement this approach in…