Enriching Wayúunaiki-Spanish Neural Machine Translation with Linguistic Information

Nora Graichen, Josef Van Genabith, Cristina España-bonet


Abstract
We present the first neural machine translation system for the low-resource language pair Wayúunaiki–Spanish and explore strategies to inject linguistic knowledge into the model to improve translation quality. We explore a wide range of methods and combine complementary approaches. Results indicate that incorporating linguistic information through linguistically motivated subword segmentation, factored models, and pretrained embeddings helps the system to generate improved translations, with the segmentation contributing most. In order to evaluate translation quality in a general domain and go beyond the available religious domain data, we gather and make publicly available a new test set and supplementary material. Although translation quality as measured with automatic metrics is low, we hope these resources will facilitate and support further research on Wayúunaiki.
Anthology ID:
2023.americasnlp-1.9
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–83
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.9
DOI:
10.18653/v1/2023.americasnlp-1.9
Bibkey:
Cite (ACL):
Nora Graichen, Josef Van Genabith, and Cristina España-bonet. 2023. Enriching Wayúunaiki-Spanish Neural Machine Translation with Linguistic Information. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 67–83, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Enriching Wayúunaiki-Spanish Neural Machine Translation with Linguistic Information (Graichen et al., AmericasNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.americasnlp-1.9.pdf