Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

Jannis Vamvas, Noëmi Aepli, Rico Sennrich


Abstract
Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments.
Anthology ID:
2024.moomin-1.3
Volume:
Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
Month:
March
Year:
2024
Address:
St Julians, Malta
Editors:
Raúl Vázquez, Timothee Mickus, Jörg Tiedemann, Ivan Vulić, Ahmet Üstün
Venues:
MOOMIN | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–23
Language:
URL:
https://aclanthology.org/2024.moomin-1.3
DOI:
Bibkey:
Cite (ACL):
Jannis Vamvas, Noëmi Aepli, and Rico Sennrich. 2024. Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect. In Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024), pages 16–23, St Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect (Vamvas et al., MOOMIN-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.moomin-1.3.pdf