Improving word alignment for low resource languages using English monolingual SRL

Meriem Beloucif, Markus Saers, Dekai Wu


Abstract
We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches.
Anthology ID:
W16-4507
Volume:
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Patrik Lambert, Bogdan Babych, Kurt Eberle, Rafael E. Banchs, Reinhard Rapp, Marta R. Costa-jussà
Venue:
HyTra
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
51–60
Language:
URL:
https://aclanthology.org/W16-4507
DOI:
Bibkey:
Cite (ACL):
Meriem Beloucif, Markus Saers, and Dekai Wu. 2016. Improving word alignment for low resource languages using English monolingual SRL. In Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6), pages 51–60, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Improving word alignment for low resource languages using English monolingual SRL (Beloucif et al., HyTra 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4507.pdf