ChatGPT is not a good indigenous translator

David Stap, Ali Araabi


Abstract
This report investigates the continuous challenges of Machine Translation (MT) systems on indigenous and extremely low-resource language pairs. Despite the notable achievements of Large Language Models (LLMs) that excel in various tasks, their applicability to low-resource languages remains questionable. In this study, we leveraged the AmericasNLP competition to evaluate the translation performance of different systems for Spanish to 11 indigenous languages from South America. Our team, LTLAmsterdam, submitted a total of four systems including GPT-4, a bilingual model, fine-tuned M2M100, and a combination of fine-tuned M2M100 with $k$NN-MT. We found that even large language models like GPT-4 are not well-suited for extremely low-resource languages. Our results suggest that fine-tuning M2M100 models can offer significantly better performance for extremely low-resource translation.
Anthology ID:
2023.americasnlp-1.17
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
163–167
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.17
DOI:
10.18653/v1/2023.americasnlp-1.17
Bibkey:
Cite (ACL):
David Stap and Ali Araabi. 2023. ChatGPT is not a good indigenous translator. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 163–167, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
ChatGPT is not a good indigenous translator (Stap & Araabi, AmericasNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.americasnlp-1.17.pdf