arxiv preprint – Learning to (Learn at Test Time): RNNs with Expressive Hidden States

In this episode, we discuss Learning to (Learn at Test Time): RNNs with Expressive Hidden States by Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin. The paper introduces Test-Time Training (TTT) layers, a new type of sequence modeling layer combining the efficiency of RNNs with the long-context performance of self-attention mechanisms. TTT layers make use of a machine learning model as their hidden state, updated through self-supervised learning iterations even on test sequences. The proposed TTT-Linear and TTT-MLP models demonstrate competitive or superior performance to both advanced Transformers and modern RNNs like Mamba, with TTT-Linear proving more efficient in certain long-context scenarios.


Posted

in

by

Tags: