Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification

Badr M. Abdullah, Jacek Kudera, Tania Avgustinova, Bernd Möbius, Dietrich Klakow


Abstract
Deep neural networks have been employed for various spoken language recognition tasks, including tasks that are multilingual by definition such as spoken language identification (LID). In this paper, we present a neural model for Slavic language identification in speech signals and analyze its emergent representations to investigate whether they reflect objective measures of language relatedness or non-linguists’ perception of language similarity. While our analysis shows that the language representation space indeed captures language relatedness to a great extent, we find perceptual confusability to be the best predictor of the language representation similarity.
Anthology ID:
2020.vardial-1.12
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer
Venue:
VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
128–139
Language:
URL:
https://aclanthology.org/2020.vardial-1.12
DOI:
Bibkey:
Cite (ACL):
Badr M. Abdullah, Jacek Kudera, Tania Avgustinova, Bernd Möbius, and Dietrich Klakow. 2020. Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 128–139, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification (Abdullah et al., VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.12.pdf