Designing the Business Conversation Corpus

Matīss Rikters, Ryokan Ri, Tong Li, Toshiaki Nakazawa


Abstract
While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.
Anthology ID:
D19-5204
Volume:
Proceedings of the 6th Workshop on Asian Translation
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Toshiaki Nakazawa, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Nobushige Doi, Yusuke Oda, Ondřej Bojar, Shantipriya Parida, Isao Goto, Hidaya Mino
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–61
Language:
URL:
https://aclanthology.org/D19-5204
DOI:
10.18653/v1/D19-5204
Bibkey:
Cite (ACL):
Matīss Rikters, Ryokan Ri, Tong Li, and Toshiaki Nakazawa. 2019. Designing the Business Conversation Corpus. In Proceedings of the 6th Workshop on Asian Translation, pages 54–61, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Designing the Business Conversation Corpus (Rikters et al., WAT 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5204.pdf
Code
 tsuruoka-lab/BSD
Data
Business Scene Dialogue