In this episode, we discuss WavLLM: Towards Robust and Adaptive Speech Large Language Model by Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei. The paper introduces WavLLM, a robust speech large language model with a unique dual-encoder system—one for semantic content and another for speaker identity—enhanced by a two-stage curriculum learning approach and a prompt-aware weight adapter for flexible task handling. WavLLM excels at a broad range of speech-processing tasks such as ASR, ST, SV, ER, and SQA, demonstrating state-of-the-art performance and strong generalization across various contexts. Resources related to the model, including codes and evaluation sets, have been made available for further research.
arxiv preprint – WavLLM: Towards Robust and Adaptive Speech Large Language Model
by
Tags: