Unsupervised learning of bilingual categories in inversion transduction grammar induction

Markus Saers, Dekai Wu


Abstract
We present the first known experiments incorporating unsupervised bilingual nonterminal category learning within end-to-end fully unsupervised transduction grammar induction using matched training and testing models. Despite steady recent progress, such induction experiments until now have not allowed for learning differentiated nonterminal categories. We divide the learning into two stages: (1) a bootstrap stage that generates a large set of categorized short transduction rule hypotheses, and (2) a minimum conditional description length stage that simultaneously prunes away less useful short rule hypotheses, while also iteratively segmenting full sentence pairs into useful longer categorized transduction rules. We show that the second stage works better when the rule hypotheses have categories than when they do not, and that the proposed conditional description length approach combines the rules hypothesized by the two stages better than a mixture model does. We also show that the compact model learned during the second stage can be further improved by combining the result of different iterations in a mixture model. In total, we see a jump in BLEU score, from 17.53 for a standalone minimum description length baseline with no category learning, to 20.93 when incorporating category induction on a Chinese–English translation task.
Anthology ID:
2013.iwslt-papers.15
Volume:
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
Month:
December 5-6
Year:
2013
Address:
Heidelberg, Germany
Editor:
Joy Ying Zhang
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2013.iwslt-papers.15
DOI:
Bibkey:
Cite (ACL):
Markus Saers and Dekai Wu. 2013. Unsupervised learning of bilingual categories in inversion transduction grammar induction. In Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, Heidelberg, Germany.
Cite (Informal):
Unsupervised learning of bilingual categories in inversion transduction grammar induction (Saers & Wu, IWSLT 2013)
Copy Citation:
PDF:
https://aclanthology.org/2013.iwslt-papers.15.pdf