How to Translate Your Samples and Choose Your Shots? Analyzing Translate-train & Few-shot Cross-lingual Transfer

Iman Jundi, Gabriella Lapesa


Abstract
Translate-train or few-shot cross-lingual transfer can be used to improve the zero-shot performance of multilingual pretrained language models. Few-shot utilizes high-quality low-quantity samples (often manually translated from the English corpus ). Translate-train employs a machine translation of the English corpus, resulting in samples with lower quality that could be scaled to high quantity. Given the lower cost and higher availability of machine translation compared to manual professional translation, it is important to systematically compare few-shot and translate-train, understand when each has an advantage, and investigate how to choose the shots to translate in order to increase the few-shot gain. This work aims to fill this gap: we compare and quantify the performance gain of few-shot vs. translate-train using three different base models and a varying number of samples for three tasks/datasets (XNLI, PAWS-X, XQuAD) spanning 17 languages. We show that scaling up the training data using machine translation gives a larger gain compared to using the small-scale (higher-quality) few-shot data. When few-shot is beneficial, we show that there are random sets of samples that perform better across languages and that the performance on English and on the machine-translation of the samples can both be used to choose the shots to manually translate for an increased few-shot gain.
Anthology ID:
2022.findings-naacl.11
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–150
Language:
URL:
https://aclanthology.org/2022.findings-naacl.11
DOI:
10.18653/v1/2022.findings-naacl.11
Bibkey:
Cite (ACL):
Iman Jundi and Gabriella Lapesa. 2022. How to Translate Your Samples and Choose Your Shots? Analyzing Translate-train & Few-shot Cross-lingual Transfer. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 129–150, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
How to Translate Your Samples and Choose Your Shots? Analyzing Translate-train & Few-shot Cross-lingual Transfer (Jundi & Lapesa, Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.11.pdf
Data
PAWS-XXQuAD