Advancing Test-Time Adaptation in Wild Acoustic Test Settings

Hongfu Liu, Hengguan Huang, Ye Wang


Abstract
Acoustic foundation models, fine-tuned for Automatic Speech Recognition (ASR), suffer from performance degradation in wild acoustic test settings when deployed in real-world scenarios. Stabilizing online Test-Time Adaptation (TTA) under these conditions remains an open and unexplored question. Existing wild vision TTA methods often fail to handle speech data effectively due to the unique characteristics of high-entropy speech frames, which are unreliably filtered out even when containing crucial semantic content. Furthermore, unlike static vision data, speech signals follow short-term consistency, requiring specialized adaptation strategies. In this work, we propose a novel wild acoustic TTA method tailored for ASR fine-tuned acoustic foundation models. Our method, Confidence-Enhanced Adaptation, performs frame-level adaptation using a confidence-aware weight scheme to avoid filtering out essential information in high-entropy frames. Additionally, we apply consistency regularization during test-time optimization to leverage the inherent short-term consistency of speech signals. Our experiments on both synthetic and real-world datasets demonstrate that our approach outperforms existing baselines under various wild acoustic test settings, including Gaussian noise, environmental sounds, accent variations, and sung speech.
Anthology ID:
2024.emnlp-main.405
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7138–7155
Language:
URL:
https://aclanthology.org/2024.emnlp-main.405/
DOI:
10.18653/v1/2024.emnlp-main.405
Bibkey:
Cite (ACL):
Hongfu Liu, Hengguan Huang, and Ye Wang. 2024. Advancing Test-Time Adaptation in Wild Acoustic Test Settings. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7138–7155, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Advancing Test-Time Adaptation in Wild Acoustic Test Settings (Liu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.405.pdf
Software:
 2024.emnlp-main.405.software.zip