Multiple Sources are Better Than One: Incorporating External Knowledge in Low-Resource Glossing

Changbing Yang, Garrett Nicolai, Miikka Silfverberg


Abstract
In this paper, we address the data scarcity problem in automatic data-driven glossing for low-resource languages by coordinating multiple sources of linguistic expertise. We enhance models by incorporating both token-level and sentence-level translations, utilizing the extensive linguistic capabilities of modern LLMs, and incorporating available dictionary resources. Our enhancements lead to an average absolute improvement of 5%-points in word-level accuracy over the previous state of the art on a typologically diverse dataset spanning six low-resource languages. The improvements are particularly noticeable for the lowest-resourced language Gitksan, where we achieve a 10%-point improvement. Furthermore, in a simulated ultra-low resource setting for the same six languages, training on fewer than 100 glossed sentences, we establish an average 10%-point improvement in word-level accuracy over the previous state-of-the-art system.
Anthology ID:
2024.emnlp-main.261
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4537–4552
Language:
URL:
https://aclanthology.org/2024.emnlp-main.261/
DOI:
10.18653/v1/2024.emnlp-main.261
Bibkey:
Cite (ACL):
Changbing Yang, Garrett Nicolai, and Miikka Silfverberg. 2024. Multiple Sources are Better Than One: Incorporating External Knowledge in Low-Resource Glossing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 4537–4552, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Multiple Sources are Better Than One: Incorporating External Knowledge in Low-Resource Glossing (Yang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.261.pdf