NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task

Richard Saldanha, Ananthanarayana V. S, Anand Kumar M, Parameswari Krishnamurthy


Abstract
In this work, two Neural Machine Translation (NMT) systems have been developed and evaluated as part of the bidirectional Tamil-Telugu similar languages translation subtask in WMT21. The OpenNMT-py toolkit has been used to create quick prototypes of the systems, following which models have been trained on the training datasets containing the parallel corpus and finally the models have been evaluated on the dev datasets provided as part of the task. Both the systems have been trained on a DGX station with 4 -V100 GPUs. The first NMT system in this work is a Transformer based 6 layer encoder-decoder model, trained for 100000 training steps, whose configuration is similar to the one provided by OpenNMT-py and this is used to create a model for bidirectional translation. The second NMT system contains two unidirectional translation models with the same configuration as the first system, with the addition of utilizing Byte Pair Encoding (BPE) for subword tokenization through the pre-trained MultiBPEmb model. Based on the dev dataset evaluation metrics for both the systems, the first system i.e. the vanilla Transformer model has been submitted as the Primary system. Since there were no improvements in the metrics during training of the second system with BPE, it has been submitted as a contrastive system.
Anthology ID:
2021.wmt-1.32
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Editors:
Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
299–303
Language:
URL:
https://aclanthology.org/2021.wmt-1.32
DOI:
Bibkey:
Cite (ACL):
Richard Saldanha, Ananthanarayana V. S, Anand Kumar M, and Parameswari Krishnamurthy. 2021. NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task. In Proceedings of the Sixth Conference on Machine Translation, pages 299–303, Online. Association for Computational Linguistics.
Cite (Informal):
NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task (Saldanha et al., WMT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wmt-1.32.pdf
Video:
 https://aclanthology.org/2021.wmt-1.32.mp4