In this episode, we discuss Language Model Can Listen While Speaking by Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen. The paper explores enhancing real-time interaction in speech-based conversational AI by introducing listening-while-speaking language models (LSLM) for full duplex communication. LSLM integrates simultaneous listening and speaking capabilities using a token-based decoder-only TTS and a streaming SSL encoder. Experimental results show LSLM’s robustness and sensitivity to diverse instructions, advocating its potential to improve interactive speech dialogue systems in real-world applications.
arxiv preprint – Language Model Can Listen While Speaking
by
Tags: