Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation

Esther Ploeger, Huiyuan Lai, Rik Van Noord, Antonio Toral


Abstract
Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of litrature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.
Anthology ID:
2024.eamt-1.24
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
286–299
Language:
URL:
https://aclanthology.org/2024.eamt-1.24
DOI:
Bibkey:
Cite (ACL):
Esther Ploeger, Huiyuan Lai, Rik Van Noord, and Antonio Toral. 2024. Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 286–299, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation (Ploeger et al., EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-1.24.pdf