Linguistically Informed Hindi-English Neural Machine Translation

Vikrant Goyal, Pruthwik Mishra, Dipti Misra Sharma


Abstract
Hindi-English Machine Translation is a challenging problem, owing to multiple factors including the morphological complexity and relatively free word order of Hindi, in addition to the lack of sufficient parallel training data. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenarios. To overcome the data sparsity issue caused by the lack of large parallel corpora for Hindi-English, we propose a method to employ additional linguistic knowledge which is encoded by different phenomena depicted by Hindi. We generalize the embedding layer of the state-of-the-art Transformer model to incorporate linguistic features like POS tag, lemma and morph features to improve the translation performance. We compare the results obtained on incorporating this knowledge with the baseline systems and demonstrate significant performance improvements. Although, the Transformer NMT models have a strong efficacy to learn language constructs, we show that the usage of specific features further help in improving the translation performance.
Anthology ID:
2020.lrec-1.456
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3698–3703
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.456
DOI:
Bibkey:
Cite (ACL):
Vikrant Goyal, Pruthwik Mishra, and Dipti Misra Sharma. 2020. Linguistically Informed Hindi-English Neural Machine Translation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3698–3703, Marseille, France. European Language Resources Association.
Cite (Informal):
Linguistically Informed Hindi-English Neural Machine Translation (Goyal et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.456.pdf