Improving Generalization of Norwegian ASR with Limited Linguistic Resources

Per Erik Solberg, Pablo Ortiz, Phoebe Parsons, Torbjørn Svendsen, Giampiero Salvi


Abstract
With large amounts of training data, it is possible to train ASR models that generalize well across speakers and domains. But how do you train robust models when there is a limited amount of available training data? In the experiments reported here, we fine-tuned a pre-trained wav2vec2 ASR model on two transcribed, Norwegian speech datasets, one with parliamentary speech and one with radio recordings, as well as on combinations of the two datasets. We subsequently tested these models on different test sets with planned and unplanned speech and with speakers of various dialects. Our results show that models trained on combinations of the two datasets generalize better to new data than the single-dataset models, even when the length of the training data is the same. Our lexical analysis sheds light on the type of mistakes made by the models and on the importance of consistent standardization when training combined models of this kind.
Anthology ID:
2023.nodalida-1.51
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
508–517
Language:
URL:
https://aclanthology.org/2023.nodalida-1.51
DOI:
Bibkey:
Cite (ACL):
Per Erik Solberg, Pablo Ortiz, Phoebe Parsons, Torbjørn Svendsen, and Giampiero Salvi. 2023. Improving Generalization of Norwegian ASR with Limited Linguistic Resources. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 508–517, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Improving Generalization of Norwegian ASR with Limited Linguistic Resources (Solberg et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.51.pdf