Hindi to Dravidian Language Neural Machine Translation Systems

Vijay Sundar Ram, Sobha Lalitha Devi


Abstract
Neural machine translation (NMT) has achieved state-of-art performance in high-resource language pairs, but the performance of NMT drops in low-resource conditions. Morphologically rich languages are yet another challenge in NMT. The common strategy to handle this issue is to apply sub-word segmentation. In this work, we compare the morphologically inspired segmentation methods against the Byte Pair Encoding (BPE) in processing the input for building NMT systems for Hindi to Malayalam and Hindi to Tamil, where Hindi is an Indo-Aryan language and Malayalam and Tamil are south Dravidian languages. These two languages are low resource, morphologically rich and agglutinative. Malayalam is more agglutinative than Tamil. We show that for both the language pairs, the morphological segmentation algorithm out-performs BPE. We also present an elaborate analysis on translation outputs from both the NMT systems.
Anthology ID:
2023.ranlp-1.121
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1143–1150
Language:
URL:
https://aclanthology.org/2023.ranlp-1.121
DOI:
Bibkey:
Cite (ACL):
Vijay Sundar Ram and Sobha Lalitha Devi. 2023. Hindi to Dravidian Language Neural Machine Translation Systems. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 1143–1150, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Hindi to Dravidian Language Neural Machine Translation Systems (Sundar Ram & Lalitha Devi, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.121.pdf