cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations

Gleb Erofeev, Irina Sorokina, Lifeng Han, Serge Gladkoff


Abstract
Automatic MT evaluation metrics are indispensable for MT research. Augmented metrics such as hLEPOR include broader evaluation factors (recall and position difference penalty) in addition to the factors used in BLEU (sentence length, precision), and demonstrated higher accuracy. However, the obstacles preventing the wide use of hLEPOR were the lack of easy portable Python package and empirical weighting parameters that were tuned by manual work. This project addresses the above issues by offering a Python implementation of hLEPOR and automatic tuning of the parameters. We use existing translation memories (TM) as reference set and distillation modeling with LaBSE (Language-Agnostic BERT Sentence Embedding) to calibrate parameters for custom hLEPOR (cushLEPOR). cushLEPOR maximizes the correlation between hLEPOR and the distilling model similarity score towards reference. It can be used quickly and precisely to evaluate MT output from different engines, without need of manual weight tuning for optimization. In this session you will learn how to tune hLEPOR to obtain automatic custom-tuned cushLEPOR metric far more precise than BLEU. The method does not require costly human evaluations, existing TM is taken as a reference translation set, and cushLEPOR is created to select the best MT engine for the reference data-set.
Anthology ID:
2021.mtsummit-up.28
Volume:
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Month:
August
Year:
2021
Address:
Virtual
Editors:
Janice Campbell, Ben Huyck, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:
MTSummit
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
421–439
Language:
URL:
https://aclanthology.org/2021.mtsummit-up.28
DOI:
Bibkey:
Cite (ACL):
Gleb Erofeev, Irina Sorokina, Lifeng Han, and Serge Gladkoff. 2021. cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations. In Proceedings of Machine Translation Summit XVIII: Users and Providers Track, pages 421–439, Virtual. Association for Machine Translation in the Americas.
Cite (Informal):
cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations (Erofeev et al., MTSummit 2021)
Copy Citation:
Presentation:
 2021.mtsummit-up.28.Presentation.pdf