A Research-Based Guide for the Creation and Deployment of a Low-Resource Machine Translation System

John E. Ortega, Kenneth Church


Abstract
The machine translation (MT) field seems to focus heavily on English and other high-resource languages. Though, low-resource MT (LRMT) is receiving more attention than in the past. Successful LRMT systems (LRMTS) should make a compelling business case in terms of demand, cost and quality in order to be viable for end users. When used by communities where low-resource languages are spoken, LRMT quality should not only be determined by the use of traditional metrics like BLEU, but it should also take into account other factors in order to be inclusive and not risk overall rejection by the community. MT systems based on neural methods tend to perform better with high volumes of training data, but they may be unrealistic and even harmful for LRMT. It is obvious that for research purposes, the development and creation of LRMTS is necessary. However, in this article, we argue that two main workarounds could be considered by companies that are considering deployment of LRMTS in the wild: human-in-the-loop and sub-domains.
Anthology ID:
2023.ranlp-1.88
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
813–823
Language:
URL:
https://aclanthology.org/2023.ranlp-1.88
DOI:
Bibkey:
Cite (ACL):
John E. Ortega and Kenneth Church. 2023. A Research-Based Guide for the Creation and Deployment of a Low-Resource Machine Translation System. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 813–823, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
A Research-Based Guide for the Creation and Deployment of a Low-Resource Machine Translation System (Ortega & Church, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.88.pdf