Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT

Annie Lamar, Zeyneb Kaya


Abstract
Data augmentation (DA) is a popular strategy to boost performance on neural machine translation tasks. The impact of data augmentation in low-resource environments, particularly for diverse and scarce languages, is understudied. In this paper, we introduce a simple yet novel metric to measure the impact of several different data augmentation strategies. This metric, which we call Data Augmentation Advantage (DAA), quantifies how many true data pairs a synthetic data pair is worth in a particular experimental context. We demonstrate the utility of this metric by training models for several linguistically-varied datasets using the data augmentation methods of back-translation, SwitchOut, and sentence concatenation. In lower-resource tasks, DAA is an especially valuable metric for comparing DA performance as it provides a more effective way to quantify gains when BLEU scores are especially small and results across diverse languages are more divergent and difficult to assess.
Anthology ID:
2023.loresmt-1.8
Volume:
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
101–109
Language:
URL:
https://aclanthology.org/2023.loresmt-1.8
DOI:
10.18653/v1/2023.loresmt-1.8
Bibkey:
Cite (ACL):
Annie Lamar and Zeyneb Kaya. 2023. Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT. In Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 101–109, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT (Lamar & Kaya, LoResMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.loresmt-1.8.pdf
Video:
 https://aclanthology.org/2023.loresmt-1.8.mp4