SKIM at WMT 2023 General Translation Task

Keito Kudo, Takumi Ito, Makoto Morishita, Jun Suzuki


Abstract
The SKIM team’s submission used a standard procedure to build ensemble Transformer models, including base-model training, back-translation of base models for data augmentation, and retraining of several final models using back-translated training data. Each final model had its own architecture and configuration, including up to 10.5B parameters, and substituted self- and cross-sublayers in the decoder with a cross+self-attention sub-layer. We selected the best candidate from a large candidate pool, namely 70 translations generated from 13 distinct models for each sentence, using an MBR reranking method using COMET and COMET-QE. We also applied data augmentation and selection techniques to the training data of the Transformer models.
Anthology ID:
2023.wmt-1.9
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
128–136
Language:
URL:
https://aclanthology.org/2023.wmt-1.9
DOI:
10.18653/v1/2023.wmt-1.9
Bibkey:
Cite (ACL):
Keito Kudo, Takumi Ito, Makoto Morishita, and Jun Suzuki. 2023. SKIM at WMT 2023 General Translation Task. In Proceedings of the Eighth Conference on Machine Translation, pages 128–136, Singapore. Association for Computational Linguistics.
Cite (Informal):
SKIM at WMT 2023 General Translation Task (Kudo et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.9.pdf