An English-Swahili parallel corpus and its use for neural machine translation in the news domain

Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall


Abstract
This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.
Anthology ID:
2020.eamt-1.32
Volume:
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Month:
November
Year:
2020
Address:
Lisboa, Portugal
Editors:
André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
299–308
Language:
URL:
https://aclanthology.org/2020.eamt-1.32
DOI:
Bibkey:
Cite (ACL):
Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, and Julie Wall. 2020. An English-Swahili parallel corpus and its use for neural machine translation in the news domain. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 299–308, Lisboa, Portugal. European Association for Machine Translation.
Cite (Informal):
An English-Swahili parallel corpus and its use for neural machine translation in the news domain (Sánchez-Martínez et al., EAMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.eamt-1.32.pdf