CB-Whisper: Contextual Biasing Whisper Using Open-Vocabulary Keyword-Spotting

Yuang Li, Yinglu Li, Min Zhang, Chang Su, Jiawei Yu, Mengyao Piao, Xiaosong Qiao, Miaomiao Ma, Yanqing Zhao, Hao Yang


Abstract
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare name entities, such as personal names, organizations and terminologies that are not frequently encountered in the training data. This paper presents Contextual Biasing Whisper (CB-Whisper), a novel ASR system based on OpenAI’s Whisper model that can recognize user-defined name entities by performing open-vocabulary keyword-spotting (KWS) before the decoder. The KWS module leverages text-to-speech (TTS) techniques and a convolutional neural network (CNN) classifier to match the features between the entities and the utterances. To integrate the recognized entities into the Whipser decoder and avoid hallucinations, we carefully crafted multiple prompts with spoken form hints. Experiments show that the KWS module based on Whisper encoder’s features can recognize unseen user-defined keywords effectively. More importantly, the proposed CB-Whisper substantially improves the mixed-error-rate (MER) and entity recall compared to the original Whisper model on three internal datasets and two publicly available datasets including Aishell and ACL datasets that cover English-only, Chinese-only, and code-switching scenarios.
Anthology ID:
2024.lrec-main.262
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
2941–2946
Language:
URL:
https://aclanthology.org/2024.lrec-main.262
DOI:
Bibkey:
Cite (ACL):
Yuang Li, Yinglu Li, Min Zhang, Chang Su, Jiawei Yu, Mengyao Piao, Xiaosong Qiao, Miaomiao Ma, Yanqing Zhao, and Hao Yang. 2024. CB-Whisper: Contextual Biasing Whisper Using Open-Vocabulary Keyword-Spotting. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2941–2946, Torino, Italia. ELRA and ICCL.
Cite (Informal):
CB-Whisper: Contextual Biasing Whisper Using Open-Vocabulary Keyword-Spotting (Li et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.262.pdf