Category: Uncategorized
-
ICLR 2023 – Emergence of Maps in the Memories of Blind Navigation Agents
In this episode we discuss Emergence of Maps in the Memories of Blind Navigation Agents by Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra. The paper explores whether blind artificial intelligence agents can develop implicit maps of their environment. The study involves training these agents in navigation tasks and finding…
-
ICLR 2023 – On the duality between contrastive and non-contrastive self-supervised learning
In this episode we discuss On the duality between contrastive and non-contrastive self-supervised learning by Quentin Garrido, Yubei Chen, Adrien Bardes, Laurent Najman, Yann Lecun. This paper discusses the duality between contrastive and non-contrastive self-supervised learning methods for image representations. It highlights the theoretical similarities between these approaches and introduces algebraically related contrastive and covariance-based…
-
arxiv Preprint – LISA: Reasoning Segmentation via Large Language Model
In this episode we discuss LISA: Reasoning Segmentation via Large Language Model by Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia. The paper introduces a new segmentation task called reasoning segmentation and presents a benchmark dataset for evaluating models. They propose LISA, a model that combines language generation with…
-
arxiv Preprint – Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
In this episode we discuss Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP by Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen. The paper proposes a single-stage framework for open-vocabulary segmentation using a shared Frozen Convolutional CLIP (FC-CLIP) backbone. FC-CLIP simplifies the pipeline and achieves a better accuracy-cost trade-off compared to…
-
ICCV 2023 – PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization
In this episode we discuss PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization by Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak. The paper introduces a method called PromptStyler for domain generalization in a joint vision-language space. It achieves this by synthesizing diverse styles using prompts without using any images. The method learns…
-
arxiv Preprint – Extrapolating Large Language Models to Non-English by Aligning Languages
In this episode we discuss Extrapolating Large Language Models to Non-English by Aligning Languages by Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li. The paper proposes a method to improve the language abilities of large language models (LLMs) in non-English languages. They achieve this by…
-
ICLR 2023 – Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
In this episode we discuss Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching by Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong. The paper proposes Visual Token Matching (VTM), a few-shot learning solution for arbitrary dense prediction tasks in computer vision. VTM uses non-parametric matching on patch-level embedded tokens of…
-
ICML 2023 – Generalization on the Unseen, Logic Reasoning and Degree Curriculum
In this episode we discuss Generalization on the Unseen, Logic Reasoning and Degree Curriculum by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk. This paper examines the performance of different network architectures trained by stochastic gradient descent (SGD) in the generalization on the unseen (GOTU) setting. The authors find that certain network models, such as…
-
arxiv Preprint – Gorilla: Large Language Model Connected with Massive APIs
In this episode we discuss Gorilla: Large Language Model Connected with Massive APIs by Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez. The paper introduces Gorilla, a fine-tuned Large Language Model (LLM) that excels in generating accurate API calls. By combining Gorilla with a document retriever, the model exhibits the ability to adapt…
-
ICML 2023 – Learning-Rate-Free Learning by D-Adaptation
In this episode we discuss Learning-Rate-Free Learning by D-Adaptation by Aaron Defazio, Konstantin Mishchenko. The paper introduces D-Adaptation, a learning-rate-free approach for setting the learning rate in convex minimization problems. It achieves the optimal rate of convergence without additional evaluations per step. The method is shown to match hand-tuned learning rates in diverse machine learning…
-
arxiv Preprint – Shepherd: A Critic for Language Model Generation
In this episode we discuss Shepherd: A Critic for Language Model Generation by Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O’Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz. The paper introduces Shepherd, a language model trained to critique responses generated by large language models (LLMs) and offer suggestions for…
-
ICML 2023 – Adapting to game trees in zero-sum imperfect information games
In this episode we discuss Adapting to game trees in zero-sum imperfect information games by Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko. The paper presents two Follow the Regularized Leader (FTRL) algorithms for learning ε-optimal strategies in zero-sum imperfect information games (IIGs). Players have uncertainty about the true game state,…
-
ICLR 2023 – Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
In this episode we discuss Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning by Zeyuan Allen-Zhu, Yuanzhi Li. The paper explores how ensembles of deep learning models can improve test accuracy and be distilled into a single model using knowledge distillation. It presents a theoretical framework that shows how ensembles can enhance test…
-
arxiv Preprint – Exploring Format Consistency for Instruction Tuning
In this episode we discuss Exploring Format Consistency for Instruction Tuning by Shihao Liang, Kunlun Zhu, Runchu Tian, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun. The paper investigates the impact of format inconsistency on the performance of instruction tuning and proposes a framework called “Unified Instruction Tuning” (UIT) that utilizes…
-
ICLR 2023 – DreamFusion: Text-to-3D using 2D Diffusion
In this episode we discuss DreamFusion: Text-to-3D using 2D Diffusion by Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall. The paper presents DREAMFUSION, a method that uses a pretrained 2D text-to-image diffusion model to synthesize 3D objects from text. By optimizing a randomly-initialized 3D model using gradient descent and a loss based on probability…
-
ICML 2023 – A Watermark for Large Language Models
In this episode we discuss A Watermark for Large Language Models by John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein. This paper presents a watermarking framework for large language models (LLMs), aiming to embed hidden signals in the generated text while remaining undetectable to humans. The approach involves selecting specific tokens…
-
arxiv Preprint – Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
In this episode we discuss Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding by Xuefei Ning, Zinan Lin, Zixuan Zhou, Huazhong Yang, Yu Wang. The paper proposes a method called “Skeleton-of-Thought” (SoT) to decrease the generation latency of large language models (LLMs). The sequential decoding approach used in current LLMs contributes to high latency. SoT…
-
ICLR 2023 – Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
In this episode we discuss Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning by Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown. The paper introduces a strategy called DiL-piKL that combines human imitation learning with reinforcement learning and planning to…
-
arxiv Preprint – RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
In this episode we discuss RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment by Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian. The paper presents a method called Reinforcement Learning from Contrast Distillation (RLCD) for aligning language models to natural language principles. RLCD trains a preference model using simulated preference pairs…
-
arxiv Preprint – DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule
In this episode we discuss DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule by Maor Ivgi, Oliver Hinder, Yair Carmon. The paper presents a dynamic SGD step size formula called DoG that does not require manual tuning. The authors analyze the DoG formula and demonstrate its strong convergence guarantees for stochastic convex…
-
CVPR 2023 – LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
In this episode we discuss LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling by Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang. The paper presents LAVENDER, a unified video-language framework that uses Masked Language Modeling (MLM) as the common interface for pre-training and downstream tasks. LAVENDER simplifies the model…
-
NeurIPS 2022 – Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
In this episode we discuss Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners by Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji. VidIL is a few-shot video-language learner that combines image and language models to…
-
arxiv preprint – MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
In this episode we discuss MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action by Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang. The paper introduces MM-REACT, a system that combines ChatGPT with expert vision models to tackle challenging visual tasks. MM-REACT utilizes a…
-
arxiv preprint – 3D-LLM: Injecting the 3D World into Large Language Models
In this episode we discuss 3D-LLM: Injecting the 3D World into Large Language Models by Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan. The paper proposes a new model called 3D-LLMs that integrates the 3D physical world into language models, allowing them to perform various 3D-related tasks such as…
-
arxiv preprint – Meta-Transformer: A Unified Framework for Multimodal Learning
In this episode we discuss Meta-Transformer: A Unified Framework for Multimodal Learning by Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue. The paper presents a framework called Meta-Transformer for processing multiple modalities in multimodal learning. It uses a frozen encoder for feature extraction across different modalities, including natural language,…